Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-07 Thread Haribabu Kommi
On Wed, Nov 8, 2017 at 11:11 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Wed, Nov 8, 2017 at 9:04 AM, Haribabu Kommi <kommi.harib...@gmail.com>
> wrote:
> > The commit 98267e missed to check the empty SGML tag, attached patch
> > fixes the same.
>
> 
>  
> - pg_internal.init (found in multiple directories)
> + pg_internal.init (found in multiple
> directories)
>  
> 
> What has been committed in 98267ee and what is proposed here both look
> incorrect to me. The markup filename ought to be used only with file
> names, so "(found in multiple directories)" should not be within it.
> Simon's commit is not wrong with the markup usage by the way.
>

Thanks for the correction. I was not much aware of SGML markup usage.
While building the documentation, it raises an warning message of "empty
end-tag".
So I just added the end tag. Attached the update patch with the suggested
correction.

Regards,
Hari Babu
Fujitsu Australia


sgml_empty_tag_fix_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-07 Thread Haribabu Kommi
On Wed, Nov 8, 2017 at 3:03 AM, Simon Riggs  wrote:

> On 5 November 2017 at 11:55, Magnus Hagander  wrote:
> > On Sat, Nov 4, 2017 at 4:04 AM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >>
> >> On Fri, Nov 3, 2017 at 4:04 PM, Petr Jelinek
> >>  wrote:
> >> > Not specific problem to this patch, but I wonder if it should be made
> >> > more clear that those files (there are couple above of what you added)
> >> > are skipped no matter which directory they reside in.
> >>
> >> Agreed, it is a good idea to tell in the docs how this behaves. We
> >> could always change things so as the comparison is based on the full
> >> path like what is done for pg_control, but that does not seem worth
> >> complicating the code.
> >
> >
> > pg_internal.init can, and do, appear in multiple different directories.
> > pg_control is always in the same place. So they're not the same thing.
> >
> > So +1 for documenting the difference in how these are handled, as this is
> > important to know for somebody writing an external tool for it.
>
> Changes made, moving to commit the attached patch.
>

The commit 98267e missed to check the empty SGML tag, attached patch
fixes the same.

Regards,
Hari Babu
Fujitsu Australia


sgml_empty_tag_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-11-07 Thread Haribabu Kommi
On Wed, Nov 8, 2017 at 8:48 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Nov 7, 2017 at 4:35 AM, Haribabu Kommi <kommi.harib...@gmail.com>
> wrote:
> > The newly added option is not recommended to be used in normal cases and
> > it is used only for upgrade utilities.
>
> I don't know why it couldn't be used in normal cases.  That seems like
> a totally legitimate thing for somebody to want.  Maybe nobody does,
> but I see no reason to worry if they do.


Ok. Removed the documentation changes that it cannot be used for normal
scenarios, and also added a Note section explaining the problem of using
the dump with pg_restore command with --clean and --create options.

Regards,
Hari Babu
Fujitsu Australia


pg_dump-and-pg_dumpall-database-handling-refactoring_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-11-07 Thread Haribabu Kommi
On Thu, Oct 26, 2017 at 10:01 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Oct 23, 2017 at 7:36 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > Apologies for not providing much details.
> >
> > pg_dumpall is used to produce the following statements for database,
> >
> > "Create database" (other than default database) or
> > "Alter database set tablespace" for default database (if required)
> >
> > ACL queries related to database
> > Alter database config
> > Alter database role config
> >
> > whereas, pg_dump used to produce only "create database statement".
>
> How about adding a new flag --set-db-properties that doesn't produce
> CREATE DATABASE but does dump the other stuff?  -C would dump both
> CREATE DATABASE *and* the other stuff.  Then you could dump built-in
> databases with --set-db-properties and others with -C.


Thanks for the idea, Here I attached the patch that implements the same.

The newly added option is not recommended to be used in normal cases and
it is used only for upgrade utilities.

In case if user issues pg_dump with --set-db-properties option along with
--create
or --clean options, an error is raised. Currently there is no way to throw
an error
in case if the dump is generated with --set-db-properties and try to
restore with
--clean option. To avoid this change, we may need to add additional details
in the
archive handler, but is it really needed?

Regards,
Hari Babu
Fujitsu Australia


pg_dump-and-pg_dumpall-database-handling-refactoring_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-11-07 Thread Haribabu Kommi
On Wed, Sep 27, 2017 at 6:58 PM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

>
> Updated patch attached.
>

Patch rebased.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_walwrites-statistics-view_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-10-22 Thread Haribabu Kommi
On Sun, Oct 22, 2017 at 3:08 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Sat, Oct 21, 2017 at 1:30 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > Before refactoring, pg_dumpall doesn't print "create database" commands
> > for both tempalte1 and postgres database, but on the other hand pg_dump
> > dump the create database commands with --create option.
> >
> > To keep the behavior of all the database attributes in the dump of both
> > pg_dump and pg_dumpall, the code is unified and moved into pg_dump.
> > But to retain the pg_dumpall behavior of not dumping the "create
> database"
> > commands, a new option is added to pg_dump to skip dumping the
> > create database commands.
> >
> > The new option name is now "--skip-create-default-db", this can be used
> > normal user also when try to dump the postgres database to not let create
> > the database commands in the dump.
>
> I don't get this at all.  If you don't want to create the database,
> just don't pass the -C argument.  It doesn't make sense to have a -C
> argument which makes it create the database and then a
> --skip-create-default-db argument which makes it sometimes not create
> the database after all.


Apologies for not providing much details.

pg_dumpall is used to produce the following statements for database,

"Create database" (other than default database) or
"Alter database set tablespace" for default database (if required)

ACL queries related to database
Alter database config
Alter database role config

whereas, pg_dump used to produce only "create database statement".

With the refactoring, all the pg_dumpall database statements are moved
into pg_dump. -C/--create option of pg_dump produces all the statements
of pg_dumpall. The --skip-default-create-db option is to make sure that
it doesn't produce "Create database" statement and instead may produce
"Alter database set tablespace" for default databases of (postgres and
template1).

-C/--create option is to control the entire database statements.
--skip-create-default-db is option to control the "create" or "Alter"
database statement
for default database.

During restore the dump, the -C/--create restores all the Database
statements.

comments? or any better approach?

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-10-20 Thread Haribabu Kommi
On Fri, Oct 6, 2017 at 12:29 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Wed, Oct 4, 2017 at 3:40 AM, Haribabu Kommi <kommi.harib...@gmail.com>
> wrote:
> > There are some differences in handling database objects
> > between pg_dump and pg_dumpall, To retain both pg_dump
> > and pg_dumpall behavior even after refactoring, this option
> > is added. Currently this option is used mainly for the three
> > purposes.
> >
> > 1. Don't print unnecessary CREATE DATABASE options like
> > ENCODING, LC_COLLATE and LC_CTYPE options if the
> > default encoding is same with the above values.
> >
> > The above behavior is as per the pg_dumpall, but it can be
> > changed to print irrespective of the default encoding.
> >
> > 2. Do not dump postgres and template0 databases.
> >
> > 3. Set default_transaction_read_only = off.
>
> To me it seems that a refactoring which requires pg_dump to behave
> differently in small ways like this based on whether it is being
> called by pg_dumpall or not is probably not a good refactoring.  And I
> don't see why the proposal from Tom that started this thread would
> require such a thing to be true.
>

Before refactoring, pg_dumpall doesn't print "create database" commands
for both tempalte1 and postgres database, but on the other hand pg_dump
dump the create database commands with --create option.

To keep the behavior of all the database attributes in the dump of both
pg_dump and pg_dumpall, the code is unified and moved into pg_dump.
But to retain the pg_dumpall behavior of not dumping the "create database"
commands, a new option is added to pg_dump to skip dumping the
create database commands.

The new option name is now "--skip-create-default-db", this can be used
normal user also when try to dump the postgres database to not let create
the database commands in the dump.

>From your list, I would say that (1) and (3) seem like behaviors that
> we either want or do not want.  Whether pg_dump or pg_dumpall is
> involved seems irrelevant.  (2) seems like it might need some special
> handling, but that could be handled in pg_dumpall by just not calling
> pg_dump at all for those database.



I didn't any better way other than creating a new option to not let the
default db create database commands to dump, so I renamed the
older option to better one and change the behavior to use by the
normal users also.

Updated patch attached.

Regards,
Hari Babu
Fujitsu Australia


pg_dump-and-pg_dumpall-database-handling-refactoring_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-10-13 Thread Haribabu Kommi
On Fri, Oct 13, 2017 at 11:55 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Oct 12, 2017 at 8:00 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > Currently I added a snapshot_satisfies API to find out whether the tuple
> > satisfies the visibility or not with different types of visibility
> routines.
> > I feel these
> > are some how enough to develop a different storage methods like UNDO.
> > The storage methods can decide internally how to provide the visibility.
> >
> > + amroutine->snapshot_satisfies[MVCC_VISIBILITY] =
> HeapTupleSatisfiesMVCC;
> > + amroutine->snapshot_satisfies[SELF_VISIBILITY] =
> HeapTupleSatisfiesSelf;
> > + amroutine->snapshot_satisfies[ANY_VISIBILITY] = HeapTupleSatisfiesAny;
> > + amroutine->snapshot_satisfies[TOAST_VISIBILITY] =
> HeapTupleSatisfiesToast;
> > + amroutine->snapshot_satisfies[DIRTY_VISIBILITY] =
> HeapTupleSatisfiesDirty;
> > + amroutine->snapshot_satisfies[HISTORIC_MVCC_VISIBILITY] =
> > HeapTupleSatisfiesHistoricMVCC;
> > + amroutine->snapshot_satisfies[NON_VACUUMABLE_VISIBILTY] =
> > HeapTupleSatisfiesNonVacuumable;
> > +
> > + amroutine->snapshot_satisfiesUpdate = HeapTupleSatisfiesUpdate;
> > + amroutine->snapshot_satisfiesVacuum = HeapTupleSatisfiesVacuum;
> >
> > Currently no changes are carried out in snapshot logic as that is kept
> > seperate
> > from storage API.
>
> That seems like a strange choice of API.  I think it should more
> integrated with the scan logic.  For example, if I'm doing an index
> scan, and I get a TID, then I should be able to just say "here's a
> TID, give me any tuples associated with that TID that are visible to
> the scan snapshot".  Then for the current heap it will do
> heap_hot_search_buffer, and for zheap it will walk the undo chain and
> return the relevant tuple from the chain.


OK, Understood.
I will check along these lines and come up with storage API's.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Pluggable storage

2017-10-12 Thread Haribabu Kommi
On Fri, Oct 13, 2017 at 8:23 AM, Robert Haas  wrote:

> On Thu, Oct 12, 2017 at 4:38 PM, Alexander Korotkov
>  wrote:
> > It's probably that we imply different meaning to "MVCC implementation".
> > While writing "MVCC implementation" I meant that, for instance,
> alternative
> > storage
> > may implement UNDO chains to store versions of same row.
> Correspondingly,
> > it may not have any analogue of our HOT.
>
> Yes, the zheap project on which EnterpriseDB is working has precisely
> this characteristic.
>
> > However I imply that alternative storage would share our "MVCC model".
> So,
> > it
> > should share our transactional model including transactions,
> > subtransactions, snapshots etc.
> > Therefore, if alternative storage is transactional, then in particular it
> > should be able to fetch tuple with
> > given TID according to given snapshot.  However, how it's implemented
> > internally is
> > a black box for us.  Thus, we don't insist that tuple should have
> different
> > TID after update;
> > we don't insist there is any analogue of HOT; we don't insist alternative
> > storage needs vacuum
> > (or if even it needs vacuum, it might be performed in completely
> different
> > way) and so on.
>
> Fully agreed.


Currently I added a snapshot_satisfies API to find out whether the tuple
satisfies the visibility or not with different types of visibility
routines. I feel these
are some how enough to develop a different storage methods like UNDO.
The storage methods can decide internally how to provide the visibility.

+ amroutine->snapshot_satisfies[MVCC_VISIBILITY] = HeapTupleSatisfiesMVCC;
+ amroutine->snapshot_satisfies[SELF_VISIBILITY] = HeapTupleSatisfiesSelf;
+ amroutine->snapshot_satisfies[ANY_VISIBILITY] = HeapTupleSatisfiesAny;
+ amroutine->snapshot_satisfies[TOAST_VISIBILITY] = HeapTupleSatisfiesToast;
+ amroutine->snapshot_satisfies[DIRTY_VISIBILITY] = HeapTupleSatisfiesDirty;
+ amroutine->snapshot_satisfies[HISTORIC_MVCC_VISIBILITY] =
HeapTupleSatisfiesHistoricMVCC;
+ amroutine->snapshot_satisfies[NON_VACUUMABLE_VISIBILTY] =
HeapTupleSatisfiesNonVacuumable;
+
+ amroutine->snapshot_satisfiesUpdate = HeapTupleSatisfiesUpdate;
+ amroutine->snapshot_satisfiesVacuum = HeapTupleSatisfiesVacuum;

Currently no changes are carried out in snapshot logic as that is kept
seperate
from storage API.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] utility commands benefiting from parallel plan

2017-10-11 Thread Haribabu Kommi
On Fri, Oct 6, 2017 at 2:43 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Sep 15, 2017 at 2:22 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > Thanks for the review.
>
> I committed this patch with some cosmetic changes.  I think the fact
> that several people have asked for this indicates that, even without
> making some of the more complicated cases work, this has some value.
> I am not convinced it is safe in any case other than when the DML
> command is both creating and populating the table, so I removed
> REFRESH MATERIALIZED VIEW support from the patch and worked over the
> documentation and comments to a degree.
>
> The problem with a case like REFRESH MATERIALIZED VIEW is that there's
> nothing to prevent something that gets run in the course of the query
> from trying to access the view (and the heavyweight lock won't prevent
> that, due to group locking).  That's probably a stupid thing to do,
> but it can't be allowed to break the world.  The other cases are safe
> from that particular problem because the table doesn't exist yet.
>

Thanks for committing the patch.
I understand the problem of REFRESH MATERIALIZED VIEW case.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-10-04 Thread Haribabu Kommi
On Sat, Sep 30, 2017 at 3:31 AM, Robert Haas  wrote:

> On Fri, Sep 29, 2017 at 12:44 AM, Vaishnavi Prabakaran
>  wrote:
> > Option name "--enable-pgdumpall-behaviour"  is very generic
>
> Yeah, that's a terrible name, at least in my opinion.
>

OK. I will use a new name based on the discussion.


> and it is better
> > to rename it to something that reflects its functionality like
> > --skip-default-db-create/--no-default-db-create
>
> But I wonder why this patch needs a new option at all?
>

There are some differences in handling database objects
between pg_dump and pg_dumpall, To retain both pg_dump
and pg_dumpall behavior even after refactoring, this option
is added. Currently this option is used mainly for the three
purposes.

1. Don't print unnecessary CREATE DATABASE options like
ENCODING, LC_COLLATE and LC_CTYPE options if the
default encoding is same with the above values.

The above behavior is as per the pg_dumpall, but it can be
changed to print irrespective of the default encoding.

2. Do not dump postgres and template0 databases.

3. Set default_transaction_read_only = off.

As per the following comment in pg_dumpall, based on that flag
the GUC is set, to retain the same behavior even after this
refactoring.

/*
* Restore will need to write to the target cluster.  This connection
* setting is emitted for pg_dumpall rather than in the code also used
* by pg_dump, so that a cluster with databases or users which have
* this flag turned on can still be replicated through pg_dumpall
* without editing the file or stream.  With pg_dump there are many
* other ways to allow the file to be used, and leaving it out allows
 * users to protect databases from being accidental restore targets.
*/
fprintf(OPF, "SET default_transaction_read_only = off;\n\n");

we can remove the usage -1 and retain the usage-2 with modified option
name as --no-default-database or similar.

Any opinions about the usage-3, In case if we need to retain that change,
any best solution to the option name?

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Commitfest 201709 is now closed

2017-10-02 Thread Haribabu Kommi
On Tue, Oct 3, 2017 at 3:12 AM, Robert Haas  wrote:

> On Mon, Oct 2, 2017 at 11:57 AM, Tom Lane  wrote:
> > Daniel Gustafsson  writes:
> >> Thanks to everyone who participated, and to everyone who have responded
> to my
> >> nagging via the CF app email function. This is clearly an awesome
> community.
> >
> > And thanks to you for your hard work as CFM!  That's tedious and
> > largely thankless work, but it's needed to keep things moving.
>
> +1.
>
> I think we need to figure out some plan for tackling the backlog of
> Ready for Committer patches, though.  There are currently 42 of them,
> 10 of which are listed (maybe not all correctly) as bug fixes.  That's
> not great.


How about having an another extra week like (commit week) after commitfest
closed(may be new state like "commit" or etc) for the Ready for committer
patches to let have the feed back from committers, In case of some rework
is required and it will be get ready by the next commitfest begins.

I feel something along these lines may get earlier feedback to the the
patches.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] pg_stat_wal_write statistics view

2017-09-27 Thread Haribabu Kommi
On Fri, Sep 22, 2017 at 5:46 PM, Julien Rouhaud <rjuju...@gmail.com> wrote:

> Hello,
>
> On Wed, Sep 13, 2017 at 3:01 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > I ran the latest performance tests with and without IO times, there is an
> > overhead involved with IO time calculation and didn't observe any
> > performance
> > overhead with normal stats. May be we can enable the IO stats only in the
> > development environment to find out the IO stats?
> >
>
> -1 for me to have these columns depending on a GUC *and* wether it's a
> debug or assert build (unless this behaviour already exists for other
> functions, but AFAIK that's not the case).
>
> > I added the other background stats to find out how much WAL write is
> > carried out by the other background processes. Now I am able to collect
> > the stats for the other background processes also after the pgbench test.
> > So I feel now the separate background stats may be useful.
> >
> > Attached latest patch, performance test results and stats details with
> > separate background stats and also combine them with backend including
> > the IO stats also.
> >
>
> The results seem to vary too much to have a strong opinion, but it
> looks like the timing instrumentation can be too expensive, so I'd be
> -1 on adding this overhead to track_io_timing.
>

Thanks for the review.
I removed the time related columns from the view. As these columns are
adding
an overhead and GUC controlled behavior is different to the other views.

Apart from above time columns, I removed walwriter_dirty_writes, as the
walwriter writers cannot be treated as dirty writes.

I have some minor comments on the last patch:
>
> +
> +  backend_writes
> +  bigint
> +  Number of WAL writes that are carried out by the backend
> process
>
> it should be backend processes
>

Changed.


> +#define NUM_PG_STAT_WALWRITE_ATTS 16
> +
> +Datum
> +pg_stat_get_walwrites(PG_FUNCTION_ARGS)
> +{
> +   TupleDesc   tupdesc;
> +   Datum   values[NUM_PG_STAT_WALWRITE_ATTS];
> +   boolnulls[NUM_PG_STAT_WALWRITE_ATTS];
>
> For consistency, the #define should be just before the tupdesc
> declaration, and be named PG_STAT_GET_WALWRITE_COLS
>

Changed.


> +   PgStat_Counter backend_writes;  /* No of writes by backend */
>
> +   PgStat_Counter backend_dirty_writes;/* No of dirty writes by
> +* backend when WAL buffers
> +* full */
>
> +   PgStat_Counter backend_write_blocks;/* Total no of pages
> written by backend */
>
> these comments should all be say "backends" for consistency
>

Changed.


> +-- There will surely and maximum one record
> +select count(*) > 0 as ok from pg_stat_walwrites;
>
> The test should be count(*) = 1
>

Changed.

Updated patch attached.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_walwrites-statistics-view_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visual studio 2017 build support

2017-09-26 Thread Haribabu Kommi
On Mon, Sep 25, 2017 at 10:12 PM, Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
> On 09/25/2017 12:25 AM, Haribabu Kommi wrote:
> >
> > Thanks for pointing it out, I missed to check the Build tools support
> > section.
> > Here I attached the updated patch with the change in documentation to
> > include the 2008 R2 SP1 operating system also.
> >
>
> Thanks, committed and backpatched to 9.6 It would be nice to get
> buildfarm members going supporting  VS2015 and VS2017


Thanks.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] SERIALIZABLE with parallel query

2017-09-26 Thread Haribabu Kommi
On Mon, Sep 25, 2017 at 6:57 PM, Thomas Munro <thomas.mu...@enterprisedb.com
> wrote:

> On Mon, Sep 25, 2017 at 8:37 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > After I tune the GUC to go with sequence scan, still I am not getting the
> > error
> > in the session-2 for update operation like it used to generate an error
> for
> > parallel
> > sequential scan, and also it even takes some many commands until unless
> the
> > S1
> > commits.
>
> Hmm.  Then this requires more explanation because I don't expect a
> difference.  I did some digging and realised that the error detail
> message "Reason code: Canceled on identification as a pivot, during
> write." was reached in a code path that requires
> SxactIsPrepared(writer) and also MySerializableXact == writer, which
> means that the process believes it is committing.  Clearly something
> is wrong.  After some more digging I realised that
> ParallelWorkerMain() calls EndParallelWorkerTransaction() which calls
> CommitTransaction() which calls
> PreCommit_CheckForSerializationFailure().  Since the worker is
> connected to the leader's SERIALIZABLEXACT, that finishes up being
> marked as preparing to commit (not true!), and then the leader get
> confused during that write, causing a serialization failure to be
> raised sooner (though I can't explain why it should be raised then
> anyway, but that's another topic).  Oops.  I think the fix here is
> just not to do that in a worker (the worker's CommitTransaction()
> doesn't really mean what it says).
>
> Here's a version with a change that makes that conditional.  This way
> your test case behaves the same as non-parallel mode.
>

With the latest patch, I didn't find any problems.



> > I will continue my review on the latest patch and share any updates.
>
> Thanks!


The patch looks good, and I don't have any comments for the code.
The test that is going to add by the patch is not generating a true
parallelism scenario, I feel it is better to change the test that can
generate a parallel sequence/index/bitmap scan.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] SERIALIZABLE with parallel query

2017-09-25 Thread Haribabu Kommi
On Thu, Sep 21, 2017 at 4:13 PM, Thomas Munro <thomas.mu...@enterprisedb.com
> wrote:

> On Tue, Sep 19, 2017 at 1:47 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > During testing of this patch, I found some behavior difference
> > with the support of parallel query, while experimenting with the provided
> > test case in the patch.
> >
> > But I tested the V6 patch, and I don't think that this version contains
> > any fixes other than rebase.
> >
> > Test steps:
> >
> > CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT
> NULL);
> > INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
> >
> > Session -1:
> >
> > BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> > SELECT balance FROM bank_account WHERE id = 'Y';
> >
> > Session -2:
> >
> > BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> > SET max_parallel_workers_per_gather = 2;
> > SET force_parallel_mode = on;
> > set parallel_setup_cost = 0;
> > set parallel_tuple_cost = 0;
> > set min_parallel_table_scan_size = 0;
> > set enable_indexscan = off;
> > set enable_bitmapscan = off;
> >
> > SELECT balance FROM bank_account WHERE id = 'X';
> >
> > Session -1:
> >
> > update bank_account set balance = 10 where id = 'X';
> >
> > Session -2:
> >
> > update bank_account set balance = 10 where id = 'Y';
> > ERROR:  could not serialize access due to read/write dependencies among
> > transactions
> > DETAIL:  Reason code: Canceled on identification as a pivot, during
> write.
> > HINT:  The transaction might succeed if retried.
> >
> > Without the parallel query of select statement in session-2,
> > the update statement in session-2 is passed.
>

Hi Thomas,


> Yeah.  The difference seems to be that session 2 chooses a Parallel
> Seq Scan instead of an Index Scan when you flip all those GUCs into
> parallelism-is-free mode.  Seq Scan takes a table-level predicate lock
> (see heap_beginscan_internal()).  But if you continue your example in
> non-parallel mode (patched or unpatched), you'll find that only one of
> those transactions can commit successfully.
>

Yes, That's correct. Only one commit can be successful.


> Using the fancy notation in the papers about this stuff where w1[x=42]
> means "write by transaction 1 on object x with value 42", let's see if
> there is an apparent sequential order of these transactions that makes
> sense:
>
> Actual order: r1[Y=0] r2[X=0] w1[X=10] w2[Y=10] ... some commit order ...
> Apparent order A: r2[X=0] w2[Y=10] c2 r1[Y=0*] w1[X=10] c1 (*nonsense)
> Apparent order B: r1[Y=0] w1[X=10] c1 r2[X=0*] w2[Y=10] c2 (*nonsense)
>
> Both potential commit orders are nonsensical.  I think what happened
> in your example was that a Seq Scan allowed the SSI algorithm to
> reject a transaction sooner.  Instead of r2[X=0], the executor sees
> r2[X=0,Y=0] (we scanned the whole table, as if we read all objects, in
> this case X and Y, even though we only asked to read X).  Then the SSI
> algorithm is able to detect a "dangerous structure" at w2[Y=10],
> instead of later at commit time.
>

Thanks for explaining with more details, now I can understand some more
about serialization.

After I tune the GUC to go with sequence scan, still I am not getting the
error
in the session-2 for update operation like it used to generate an error for
parallel
sequential scan, and also it even takes some many commands until unless the
S1
commits.

I am just thinking that with parallel sequential scan with serialize
isolation,
the user has lost the control of committing the desired session. I may be
thinking
a rare and never happen scenario.

I will continue my review on the latest patch and share any updates.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-09-24 Thread Haribabu Kommi
On Mon, Sep 25, 2017 at 4:39 AM, Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:

>
>
> On 09/06/2017 09:45 AM, Haribabu Kommi wrote:
> >
> >
> > On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra
> > <tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>>
> wrote:
> >
> > On 7/25/17 12:55 AM, Tom Lane wrote:
> >
> > Tomas Vondra <tomas.von...@2ndquadrant.com
> > <mailto:tomas.von...@2ndquadrant.com>> writes:
> >
> > It seems to me that VACUUM and ANALYZE somewhat disagree on
> what
> > exactly reltuples means. VACUUM seems to be thinking that
> > reltuples
> > = live + dead while ANALYZE apparently believes that
> reltuples =
> > live
> >
> >
> > The question is - which of the reltuples definitions is the
> > right
> > one? I've always assumed that "reltuples = live + dead" but
> > perhaps
> > not?
> >
> >
> > I think the planner basically assumes that reltuples is the live
> > tuple count, so maybe we'd better change VACUUM to get in step.
> >
> >
> > Attached is a patch that (I think) does just that. The disagreement
> > was caused by VACUUM treating recently dead tuples as live, while
> > ANALYZE treats both of those as dead.
> >
> > At first I was worried that this will negatively affect plans in the
> > long-running transaction, as it will get underestimates (due to
> > reltuples not including rows it can see). But that's a problem we
> > already have anyway, you just need to run ANALYZE in the other
> session.
> >
> >
> > Thanks for the patch.
> > From the mail, I understand that this patch tries to improve the
> > reltuples value update in the catalog table by the vacuum command
> > to consider the proper visible tuples similar like analyze command.
> >
> > -num_tuples);
> > +num_tuples - nkeep);
> >
> > With the above correction, there is a problem in reporting the number
> > of live tuples to the stats.
> >
> > postgres=# select reltuples, n_live_tup, n_dead_tup
> >   from pg_stat_user_tables join pg_class using (relname)
> >  where relname = 't';
> >  reltuples | n_live_tup | n_dead_tup
> > ---++
> > 899818 | 799636 | 100182
> > (1 row)
> >
> >
> > The live tuples data value is again decremented with dead tuples
> > value before sending them to stats in function lazy_vacuum_rel(),
> >
> > /* report results to the stats collector, too */
> > new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;
> >
> > The fix needs a correction here also. Or change the correction in
> > lazy_vacuum_rel() function itself before updating catalog table similar
> > like stats.
> >
>
> Ah, haven't noticed that for some reason - you're right, we estimate the
> reltuples based on (num_tuples - nkeep), so it doesn't make much sense
> to subtract nkeep again. Attached is v2 of the fix.
>
> I've removed the subtraction from lazy_vacuum_rel(), leaving just
>
> new_live_tuples = new_rel_tuples;
>
> and now it behaves as expected (no second subtraction). That means we
> can get rid of new_live_tuples altogether (and the protection against
> negative values), and use new_rel_tuples directly.
>
> Which pretty much means that in this case
>
> (pg_class.reltuples == pg_stat_user_tables.n_live_tup)
>
> but I guess that's fine, based on the initial discussion in this thread.


The changes are fine and now it reports proper live tuples in both
pg_class and stats. The other issue of continuous vacuum operation
leading to decrease of number of live tuples is not related to this
patch and that can be handled separately.

I changed the patch status as ready for committer.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] visual studio 2017 build support

2017-09-24 Thread Haribabu Kommi
On Fri, Sep 22, 2017 at 10:40 PM, Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
>
> On 09/21/2017 08:16 PM, Haribabu Kommi wrote:
> >
> >
> > I was about to commit this after a good bit of testing when I
> > noticed this:
> >
> > +   Building with Visual Studio 2017
> is
> > supported
> > +   down to Windows 7 SP1 and
> Windows
> > Server 2012 R2.
> >
> > I was able to build on Windows Server 2008 without a problem, so I'm
> > curious why we are saying it's not supported.
> >
> >
> > Thanks for the review.
> >
> > From the visual studio system requirements [1], in the section of
> > supported
> > operating systems, it is mentioned as windows 7 SP1 and windows server
> > 2012 R2 and didn't mentioned anything about 2008, because of this reason,
> > I mentioned as that it supported till the above operating systems. As
> > I don't
> > have windows server 2008 system availability, so I didn't verify the
> same.
> >
> > The visual studio 2017 product itself is not mentioned as that it
> supports
> > windows server 2008, can we go ahead and mention it in our documentation?
> >
> > [1] -
> > https://www.visualstudio.com/en-us/productinfo/vs2017-
> system-requirements-vs
> >
> >
>
> That page also says:
>
>
> Microsoft Visual Studio Build Tools 2017
>
> Also installs on Windows Server 2008 R2 SP1
>
>
> So I'm inclined to adjust the documentation accordingly.


Thanks for pointing it out, I missed to check the Build tools support
section.
Here I attached the updated patch with the change in documentation to
include the 2008 R2 SP1 operating system also.

Regards,
Hari Babu
Fujitsu Australia


0001-Support-of-PostgreSQL-build-with-visual-studio-2017_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visual studio 2017 build support

2017-09-21 Thread Haribabu Kommi
On Fri, Sep 22, 2017 at 7:03 AM, Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
>
> On 08/25/2017 11:29 PM, Haribabu Kommi wrote:
> >
> >
> > On Fri, Aug 25, 2017 at 11:27 PM, Christian Ullrich
> > <ch...@chrullrich.net <mailto:ch...@chrullrich.net>> wrote:
> >
> > * On 2017-06-21 02:06, Haribabu Kommi wrote:
> >
> > Thanks for the review. Here I attached an updated patch with
> > README update.
> >
> >
> > Hello,
> >
> >
> > Thanks for the review.
> >
> >
> > the most recent update to VS 2017, version 15.3, now identifies as
> > "14.11" rather than "14.10" in the output of nmake /?. Simply
> > adding this value to the two places that check for 14.10 in your
> > patch appears to work for me.
> >
> >
> > VS 2017 doesn't change the nmake version to 15, and it is updating
> > with every minor version, so I changed the check to accept
> > everything that is greater than 14.10 and eq 15, in case in future if
> > VS 2017 changes the version number.
> >
> >
> > In a newly created project, PlatformToolset is still "v141".
> > ToolsVersion is "15.0" whereas your patch uses "14.1".
> >
> > ISTM that the ToolsVersion has been like this in all versions of
> > VS 2017; in my collection of .vcxproj files the auto-generated
> > PostgreSQL projects are the only ones using "14.1".
> >
> >
> > Updated the Tools version to 15.0 and kept the platform toolset as
> > V141, this because the toolset is version is still points
> > to V141, when I create a sample project with VS 2017 and the version
> > number is inline with nmake version also.
> >
> >
> >
>
>
> I was about to commit this after a good bit of testing when I noticed this:
>
> +   Building with Visual Studio 2017 is
> supported
> +   down to Windows 7 SP1 and Windows
> Server 2012 R2.
>
> I was able to build on Windows Server 2008 without a problem, so I'm
> curious why we are saying it's not supported.
>

Thanks for the review.

>From the visual studio system requirements [1], in the section of supported
operating systems, it is mentioned as windows 7 SP1 and windows server
2012 R2 and didn't mentioned anything about 2008, because of this reason,
I mentioned as that it supported till the above operating systems. As I
don't
have windows server 2008 system availability, so I didn't verify the same.

The visual studio 2017 product itself is not mentioned as that it supports
windows server 2008, can we go ahead and mention it in our documentation?

[1] -
https://www.visualstudio.com/en-us/productinfo/vs2017-system-requirements-vs

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Windows warnings from VS 2017

2017-09-21 Thread Haribabu Kommi
On Thu, Sep 21, 2017 at 12:26 PM, Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
>
> On 09/20/2017 08:18 PM, Andrew Dunstan wrote:
> >
> > On 09/20/2017 07:54 PM, Tom Lane wrote:
> >> Andrew Dunstan  writes:
> >>> It's also warning that it will copy 16 bytes to a 13 byte structure at
> >>> lines 518, 1293 and 1294 of src/backend/commands/dbcommands.c. I
> haven't
> >>> seen any ill effects of this so far, but it seems to indicate that
> >>> something is possibly amiss on this compiler with the MemSet macros.
> >> That's weird.  Is it too stupid to figure out that the if() inside
> >> MemSet evaluates to constant false in these calls?  It seems hard to
> >> see how it would realize that the loop will write 16 bytes if it doesn't
> >> propagate the constant value forward.
> >>
> >> However ... on some other compilers, I've noticed that the compiler
> seems
> >> more likely to make "obvious" deductions of that sort if the variables
> in
> >> question are marked const.  Does it help if you do
> >>
> >> -void   *_vstart = (void *) (start); \
> >> -int _val = (val); \
> >> -Size_len = (len); \
> >> +void   * const _vstart = (void *) (start); \
> >> +const int   _val = (val); \
> >> +const Size  _len = (len); \
> >>
> >>
> >> I don't think there's any strong reason not to just do s/MemSet/memset/
> >> in these calls and nearby ones, but it would be good to understand just
> >> what's wrong here.  And why it's only showing up in that file; seems
> >> nearly certain that we have similar coding elsewhere.
> >>
> >>
> >
> > I'll test it.
> >
>
>
> Doesn't make a difference. I agree it would be good to understand what's
> going on.


These warnings are not produced when the build is run in DEBUG mode.
Because of this I didn't see these warning when I was working with VS 2017.

This may be an issue with VS 2017 compiler.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] SERIALIZABLE with parallel query

2017-09-18 Thread Haribabu Kommi
On Tue, Sep 19, 2017 at 11:42 AM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> On Fri, Sep 1, 2017 at 5:11 PM, Thomas Munro
>  wrote:
> > On Wed, Jun 28, 2017 at 11:21 AM, Thomas Munro
> >  wrote:
> >> [ssi-parallel-v5.patch]
> >
> > Rebased.
>
> Rebased again.
>

During testing of this patch, I found some behavior difference
with the support of parallel query, while experimenting with the provided
test case in the patch.

But I tested the V6 patch, and I don't think that this version contains
any fixes other than rebase.

Test steps:

CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);

Session -1:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM bank_account WHERE id = 'Y';

Session -2:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET max_parallel_workers_per_gather = 2;
SET force_parallel_mode = on;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = 0;
set enable_indexscan = off;
set enable_bitmapscan = off;

SELECT balance FROM bank_account WHERE id = 'X';

Session -1:

update bank_account set balance = 10 where id = 'X';

Session -2:

update bank_account set balance = 10 where id = 'Y';
ERROR:  could not serialize access due to read/write dependencies among
transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during write.
HINT:  The transaction might succeed if retried.

Without the parallel query of select statement in session-2,
the update statement in session-2 is passed.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] utility commands benefiting from parallel plan

2017-09-15 Thread Haribabu Kommi
On Thu, Sep 14, 2017 at 2:42 PM, Rafia Sabih <rafia.sa...@enterprisedb.com>
wrote:

> On Wed, Sep 13, 2017 at 2:29 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> >
> > On Wed, Sep 13, 2017 at 4:17 PM, Rafia Sabih <
> rafia.sa...@enterprisedb.com>
> > wrote:
> >>
> >> On Fri, Sep 1, 2017 at 12:31 PM, Haribabu Kommi
> >> <kommi.harib...@gmail.com> wrote:
> >> >
> >> > Hi All,
> >> >
> >> > Attached a rebased patch that supports parallelism for the queries
> >> > that are underneath of some utility commands such as CREATE TABLE AS
> >> > and CREATE MATERIALIZED VIEW.
> >> >
> >> > Note: This patch doesn't make the utility statement (insert operation)
> >> > to run in parallel. It only allows the select query to be parallel if
> >> > the
> >> > query
> >> > is eligible for parallel.
> >> >
> >>
> >> Here is my feedback fro this patch,
> >>
> >> - The patch is working as expected, all regression tests are passing
> >
> >
> > Thanks for the review.
> >
> >>
> >> - I agree with Dilip that having similar mechanism for 'insert into
> >> select...' statements would add more value to the patch, but even then
> >> this looks like a good idea to extend parallelism for atleast a few of
> >> the write operations
> >
> >
> > Yes, I also agree that supporting of 'insert into select' will provide
> more
> > benefit. I already tried to support the same in [1], but it have many
> > drawbacks especially with triggers. To support a proper parallel support
> > for DML queries, I feel the logic of ParalleMode needs an update to
> > avoid the errors from PreventCommandIfParallelMode() function to
> > identify whether it is nested query operation and that should execute
> > only in backend and etc.
> >
> > As the current patch falls into DDL category that gets benefited from
> > parallel query, because of this reason, I didn't add the 'insert into
> > select'
> > support into this patch. Without support of it also, it provides the
> > benefit.
> > I work on supporting the DML write support with parallel query as a
> > separate patch.
> >
> Sounds sensible. In that case, I'll be marking this patch ready for
> committer.


Thanks for the review.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] utility commands benefiting from parallel plan

2017-09-13 Thread Haribabu Kommi
On Wed, Sep 13, 2017 at 4:17 PM, Rafia Sabih <rafia.sa...@enterprisedb.com>
wrote:

> On Fri, Sep 1, 2017 at 12:31 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> > Hi All,
> >
> > Attached a rebased patch that supports parallelism for the queries
> > that are underneath of some utility commands such as CREATE TABLE AS
> > and CREATE MATERIALIZED VIEW.
> >
> > Note: This patch doesn't make the utility statement (insert operation)
> > to run in parallel. It only allows the select query to be parallel if the
> > query
> > is eligible for parallel.
> >
>
> Here is my feedback fro this patch,
>
> - The patch is working as expected, all regression tests are passing
>

Thanks for the review.


> - I agree with Dilip that having similar mechanism for 'insert into
> select...' statements would add more value to the patch, but even then
> this looks like a good idea to extend parallelism for atleast a few of
> the write operations
>

Yes, I also agree that supporting of 'insert into select' will provide more
benefit. I already tried to support the same in [1], but it have many
drawbacks especially with triggers. To support a proper parallel support
for DML queries, I feel the logic of ParalleMode needs an update to
avoid the errors from PreventCommandIfParallelMode() function to
identify whether it is nested query operation and that should execute
only in backend and etc.

As the current patch falls into DDL category that gets benefited from
parallel query, because of this reason, I didn't add the 'insert into
select'
support into this patch. Without support of it also, it provides the
benefit.
I work on supporting the DML write support with parallel query as a
separate patch.


[1] - https://www.postgresql.org/message-id/CAJrrPGfo58TrYxnqwnFAo4%
2BtYr8wUH-oC0dJ7V9x7gAOZeaz%2BQ%40mail.gmail.com


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] pg_stat_wal_write statistics view

2017-09-12 Thread Haribabu Kommi
On Tue, Sep 12, 2017 at 3:14 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
wrote:

> On Tue, Sep 12, 2017 at 9:06 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> >
> > On Tue, Sep 12, 2017 at 2:04 AM, Kuntal Ghosh <
> kuntalghosh.2...@gmail.com>
> > wrote:
> >
> Thanks for the patch.
> + * Check whether the current process is a normal backend or not.
> + * This function checks for the background processes that does
> + * some WAL write activity only and other background processes
> + * are not considered. It considers all the background workers
> + * as WAL write activity workers.
> + *
> + * Returns false - when the current process is a normal backend
> + *true - when the current process a background process/worker
> + */
> +static bool
> +am_background_process()
> +{
> +   /* check whether current process is a background process/worker? */
> +   if (!AmBackgroundWriterProcess() ||
> +   !AmCheckpointerProcess() ||
> +   !AmStartupProcess() ||
> +   !IsBackgroundWorker ||
> +   !am_walsender ||
> +   !am_autovacuum_worker)
> +   {
> +   return false;
> +   }
> +
> +   return true;
> +}
> I think you've to do AND operation here instead of OR. Isn't it?
> Another point is that, the function description could start with
> 'Check whether the current process is a background process/worker.'
>

Yes, it should be AND, while correcting a review comment, I messed
up that function.

> There is an overhead with IO time calculation. Is the view is good
> > enough without IO columns?
> I'm not sure how IO columns are useful for tuning the WAL write/fsync
> performance from an user's perspective. But, it's definitely useful
> for developing/improving stuffs in XLogWrite.
>

I ran the latest performance tests with and without IO times, there is an
overhead involved with IO time calculation and didn't observe any
performance
overhead with normal stats. May be we can enable the IO stats only in the
development environment to find out the IO stats?


> >
> > And also during my tests, I didn't observe any other background
> > processes performing the xlogwrite operation, the values are always
> > zero. Is it fine to merge them with backend columns?
> >
> Apart from wal writer process, I don't see any reason why you should
> track other background processes separately from normal backends.
> However, I may be missing some important point.


I added the other background stats to find out how much WAL write is
carried out by the other background processes. Now I am able to collect
the stats for the other background processes also after the pgbench test.
So I feel now the separate background stats may be useful.

Attached latest patch, performance test results and stats details with
separate background stats and also combine them with backend including
the IO stats also.


Regards,
Hari Babu
Fujitsu Australia
stats with seperate background process stats info:

 writes | walwriter_writes | backend_writes | dirty_writes | 
walwriter_dirty_writes | backend_dirty_writes | write_blocks | 
walwriter_write_blocks | backend_write_blocks | write_time | 
walwriter_write_time | backend_write_time | sync_time | walwriter_sync_time | 
backend_sync_time |  stats_reset
+--++--++--+--++--++--++---+-+---+---
 256004 | 14223300 |  439408129 |0 |
  0 |65933 |  3018749 |  287733552 |   
1756612506 |  0 |0 |  0 | 0 
|   0 | 0 | 2017-09-12 19:21:03.103784+10
(1 row)


stats with background info with IO time:

 writes | walwriter_writes | backend_writes | dirty_writes | 
walwriter_dirty_writes | backend_dirty_writes | write_blocks | 
walwriter_write_blocks | backend_write_blocks | write_time | 
walwriter_write_time | backend_write_time | sync_time | walwriter_sync_time | 
backend_sync_time |  stats_reset
+--++--++--+--++--++--++---+-+---+---
 458362 | 27245324 |  881576768 |0 |
  0 |65933 |  3551641 |  304509489 |   
2767649450 |  0 |0 |  0 

Re: [HACKERS] Pluggable storage

2017-09-11 Thread Haribabu Kommi
On Sat, Sep 9, 2017 at 1:23 PM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

>
> I rebased the patch to the latest master and also fixed the duplicate OID
> and some slot fixes. Updated patches are attached.
>


While analyzing the removal of HeapScanDesc usage other than heap
modules, The mostly used member is "*rs_cbuf*" for the purpose of locking
the buffer, visibility checks and marking it as dirty. The buffer is
tightly integrated
with the visibility. Buffer may not be required for some storage routines
where
the data is always in the memory and etc.

I found the references of its structure members in the following
places. I feel other than the following 4 parameters, rest of them needs to
be
moved into their corresponding storage routines.

* Relation rs_rd; /* heap relation descriptor */*
* Snapshot rs_snapshot; /* snapshot to see */*
* int rs_nkeys; /* number of scan keys */*
* ScanKey rs_key; /* array of scan key descriptors */*

But currently I am treating the "*rs_cbuf" *also a needed member and also
expecting all storage routines will be provide it. Or we may need a another
approach to mark the buffer as dirty.

Suggestions?


Following are the rest of the parameters that are used
outside the heap.


* BlockNumber rs_nblocks; /* total number of blocks in rel */*

pgstattuple.c, tsm_system_rows.c, tsm_system_time.c, system.c
nodeBitmapheapscan.c nodesamplescan.c,
*Mostly for the purpose of checking the number of blocks in a rel.*

* BufferAccessStrategy rs_strategy; /* access strategy for reads */*

pgstattuple.c

* bool rs_pageatatime; /* verify visibility page-at-a-time? */*
* BlockNumber rs_startblock; /* block # to start at */*
* bool rs_syncscan; /* report location to syncscan logic? */*
* bool rs_inited; /* false = scan not init'd yet */*

nodesamplescan.c

* HeapTupleData rs_ctup; /* current tuple in scan, if any */*

*genam.c, nodeBitmapHeapscan.c, nodesamplescan.c*
*Used for retrieving the last scanned tuple.*

* BlockNumber rs_cblock; /* current block # in scan, if any */*

*index.c, nodesamplescan.c*

* Buffer rs_cbuf; /* current buffer in scan, if any */*

*pgrowlocks.c, pgstattuple.c, genam.c, index.c, cluster.c,*
*tablecmds.c, nodeBitmapHeapscan.c, nodesamplescan.c*
*Mostly used for Locking the Buffer.*


* ParallelHeapScanDesc rs_parallel; /* parallel scan information */*

*nodeseqscan.c*

* int rs_cindex; /* current tuple's index in vistuples */*

*nodeBitmapHeapScan.c*

* int rs_ntuples; /* number of visible tuples on page */*
* OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */*

*tsm_system_rows.c, nodeBitmapHeapscan.c, nodesamplescan.c*
*Used for retrieve the offsets mainly Bitmap and sample scans.*

I think rest of the above parameters usage other than heap can be changed
once the Bitmap and Sample scans are modified to use the storage routines
while returning the tuple instead of their own implementations. I feel these
scans are the major users of the rest of the parameters. This approach may
need to some more API's to get rid of Bitmap and sample scan's own
implementation.

suggestions?


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] pg_stat_wal_write statistics view

2017-09-11 Thread Haribabu Kommi
On Tue, Sep 12, 2017 at 2:04 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
wrote:

> On Wed, Sep 6, 2017 at 9:16 AM, Haribabu Kommi <kommi.harib...@gmail.com>
> wrote:
> >
> > Attached the latest patch and performance report.
> >
> While looking into the patch, I realized that a normal backend has to
> check almost 10 if conditions at worst case inside XLogWrite(6 in
> am_background_process method, 1 for write, 1 for blocks, 2 for
> fsyncs), just to decide whether it is a normal backend or not. The
> count is more for walwriter process. Although I've not done any
> performance tests, IMHO, it might add further overhead on the
> XLogWrite Lock.
>
> I was thinking whether it is possible to collect all the stats in
> XLogWrite() irrespective of the type of backend and update the shared
> counters at once at the end of the function. Thoughts?
>

Thanks for the review.
Yes, I agree with you that the stats update can be done at the end
of the function to avoid some overhead. Updated patch is attached.

There is an overhead with IO time calculation. Is the view is good
enough without IO columns?

And also during my tests, I didn't observe any other background
processes performing the xlogwrite operation, the values are always
zero. Is it fine to merge them with backend columns?

Regards,
Hari Babu
Fujitsu Australia


0001-pg_stat_walwrites-statistics-view_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-09-08 Thread Haribabu Kommi
On Fri, Sep 8, 2017 at 10:24 AM, Thomas Munro <thomas.mu...@enterprisedb.com
> wrote:

> On Mon, Aug 21, 2017 at 4:35 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > On Tue, Aug 15, 2017 at 7:29 AM, Peter Eisentraut
> > <peter.eisentr...@2ndquadrant.com> wrote:
> >> On 4/4/17 01:06, Haribabu Kommi wrote:
> >> > Both pg_dump and pg_upgrade tests are passed. Updated patch attached
> >> > I will add this patch to the next commitfest.
> >>
> >> This patch needs to be rebased for the upcoming commit fest.
> >
> > Thanks for checking. Rebased patch is attached.
>
> Hi Haribabu,
>
> This patch breaks the documentation build, possibly because of these empty
> tags:
>
> +--create option to dump <>ALTER ROLE IN
> DATABASE ... SET
>

Thanks for checking the patch.
Fixed patch is attached.


Regards,
Hari Babu
Fujitsu Australia


0001-pg_dump-and-pg_dumpall-database-handling-refactoring_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-09-06 Thread Haribabu Kommi
On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra 
wrote:

> On 7/25/17 12:55 AM, Tom Lane wrote:
>
>> Tomas Vondra  writes:
>>
>>> It seems to me that VACUUM and ANALYZE somewhat disagree on what
>>> exactly reltuples means. VACUUM seems to be thinking that reltuples
>>> = live + dead while ANALYZE apparently believes that reltuples =
>>> live
>>>
>>
>> The question is - which of the reltuples definitions is the right
>>> one? I've always assumed that "reltuples = live + dead" but perhaps
>>> not?
>>>
>>
>> I think the planner basically assumes that reltuples is the live
>> tuple count, so maybe we'd better change VACUUM to get in step.
>>
>>
> Attached is a patch that (I think) does just that. The disagreement was
> caused by VACUUM treating recently dead tuples as live, while ANALYZE
> treats both of those as dead.
>
> At first I was worried that this will negatively affect plans in the
> long-running transaction, as it will get underestimates (due to reltuples
> not including rows it can see). But that's a problem we already have
> anyway, you just need to run ANALYZE in the other session.


Thanks for the patch.
>From the mail, I understand that this patch tries to improve the
reltuples value update in the catalog table by the vacuum command
to consider the proper visible tuples similar like analyze command.

- num_tuples);
+ num_tuples - nkeep);

With the above correction, there is a problem in reporting the number
of live tuples to the stats.

postgres=# select reltuples, n_live_tup, n_dead_tup
  from pg_stat_user_tables join pg_class using (relname)
 where relname = 't';
 reltuples | n_live_tup | n_dead_tup
---++
899818 | 799636 | 100182
(1 row)


The live tuples data value is again decremented with dead tuples
value before sending them to stats in function lazy_vacuum_rel(),

/* report results to the stats collector, too */
new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;

The fix needs a correction here also. Or change the correction in
lazy_vacuum_rel() function itself before updating catalog table similar
like stats.


While testing this patch, I found another problem that is not related to
this patch. When the vacuum command is executed mutiple times on
a table with no dead rows, the number of reltuples value is slowly
reducing.

postgres=# select reltuples, n_live_tup, n_dead_tup
  from pg_stat_user_tables join pg_class using (relname)
 where relname = 't';
 reltuples | n_live_tup | n_dead_tup
---++
899674 | 899674 |  0
(1 row)

postgres=# vacuum t;
VACUUM
postgres=# select reltuples, n_live_tup, n_dead_tup
  from pg_stat_user_tables join pg_class using (relname)
 where relname = 't';
 reltuples | n_live_tup | n_dead_tup
---++
899622 | 899622 |  0
(1 row)

postgres=# vacuum t;
VACUUM
postgres=# select reltuples, n_live_tup, n_dead_tup
  from pg_stat_user_tables join pg_class using (relname)
 where relname = 't';
 reltuples | n_live_tup | n_dead_tup
---++
899570 | 899570 |  0
(1 row)


In lazy_scan_heap() function, we force to scan the last page of the
relation to avoid the access exclusive lock in lazy_truncate_heap
if there are tuples in the last page. Because of this reason, the
scanned_pages value will never be 0, so the vac_estimate_reltuples
function will estimate the tuples based on the number of tuples
from the last page of the relation. This estimation is leading to
reduce the number of retuples.

I am thinking whether this problem really happen in real world scenarios
to produce a fix?

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] pg_stat_wal_write statistics view

2017-09-05 Thread Haribabu Kommi
On Tue, Aug 15, 2017 at 7:39 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/29/17 22:10, Haribabu Kommi wrote:
> > Updated patch to use shared counter instead of adding pg_stat_ calls to
> send
> > the statistics from each background process/worker.
>
> Your patch needs to be rebased and the OID assignments updated.
>

Rebased patch is attached. And also it has taken care of the comments
provided
by Andres and Fuji Masao in the upthread.

I separated the walwriter statistics from the other background processes to
check
how much load does the other processes are really contributing to wal
writing.

Following is the test results during my performance test.

 writes | walwriter_writes | backend_writes | dirty_writes |
walwriter_dirty_writes | backend_dirty_writes | write_blocks |
walwriter_write_blocks | backend_write_blocks | write_time |
walwriter_write_time | backend_write_time | sync_time | walwriter_sync_time
| backend_sync_time |  stats_reset
+--++--++--+--++--++--++---+-+---+---
  0 | 17748394 | 1383789657 |0 |
   0 |0 |0 |   21153194 |
3039806652 |  0 |0 |  0
| 0 |   259250230 |   17262560725 | 2017-09-05
18:22:41.087955+10
(1 row)

I didn't find any other background processes contribution to the WAL
writing activity.
May be we can combine them with backend stats?

I ran the performance test on this patch with pgbench to find out the
impact of these
changes. Because of my low end machine, the performance results are varying
too
much, I will try to get the performance figures from an high end machine by
that time.

Attached the latest patch and performance report.

Regards,
Hari Babu
Fujitsu Australia


0001-pg_stat_walwrites-statistics-view_v6.patch
Description: Binary data


pg_stat_walwrites_perf.xlsx
Description: MS-Excel 2007 spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow INSTEAD OF DELETE triggers to modify the tuple for RETURNING

2017-09-05 Thread Haribabu Kommi
On Mon, Aug 14, 2017 at 6:48 AM, Marko Tiikkaja  wrote:

> On Fri, Jul 1, 2016 at 2:12 AM, I wrote:
>
>> Currently the tuple returned by INSTEAD OF triggers on DELETEs is only
>> used to determine whether to pretend that the DELETE happened or not, which
>> is often not helpful enough; for example, the actual tuple might have been
>> concurrently UPDATEd by another transaction and one or more of the columns
>> now hold values different from those in the planSlot tuple. Attached is an
>> example case which is impossible to properly implement under the current
>> behavior.  For what it's worth, the current behavior seems to be an
>> accident; before INSTEAD OF triggers either the tuple was already locked
>> (in case of BEFORE triggers) or the actual pre-DELETE version of the tuple
>> was fetched from the heap.
>>
>> So I'm suggesting to change this behavior and allow INSTEAD OF DELETE
>> triggers to modify the OLD tuple and use that for RETURNING instead of
>> returning the tuple in planSlot.  Attached is a WIP patch implementing that.
>>
>> Is there any reason why we wouldn't want to change the current behavior?
>
>
> Since nobody seems to have came up with a reason, here's a patch for that
> with test cases and some documentation changes.  I'll also be adding this
> to the open commit fest, as is customary.
>

Thanks for the patch. This patch improves the DELETE returning
clause with the actual row.

Compilation and tests are passed. I have some review comments.

! that was provided.  Likewise, for DELETE operations the
! OLD variable can be modified before returning it, and
! the changes will be reflected in the output data.

The above explanation is not very clear, how about the following?

Likewise, for DELETE operations the trigger may
modify the OLD row before returning it, and the
change will be reflected in the output data of DELETE RETURNING.


! TupleTableSlot *
  ExecIRDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
! HeapTuple trigtuple, TupleTableSlot *slot)

! oldtuple = ExecMaterializeSlot(slot); --nodeModifyTable.c

The trigtuple is part of the slot anyway, I feel there is no need to pass
the trigtuple seperately. The tuple can be formed internaly by Materializing
slot.

Or

Don't materialize the slot before the ExecIRDeleteTriggers function
call.

! /*
! * Return the modified tuple using the es_trig_tuple_slot.  We assume
! * the tuple was allocated in per-tuple memory context, and therefore
! * will go away by itself. The tuple table slot should not try to
! * clear it.
! */
! TupleTableSlot *newslot = estate->es_trig_tuple_slot;

The input slot that is passed to the function ExecIRDeleteTriggers
is same as estate->es_trig_tuple_slot. And also the tuple descriptor
is same. Instead of using the newslot, directly use the slot is fine.


+ /* trigger might have changed tuple */
+ oldtuple = ExecMaterializeSlot(slot);


+ if (resultRelInfo->ri_TrigDesc &&
+ resultRelInfo->ri_TrigDesc->trig_delete_instead_row)
+ return ExecProcessReturning(resultRelInfo, slot, planSlot);


Views cannot have before/after triggers, Once the call enters into
Instead of triggers flow, the oldtuple is used to frame the slot, if the
returning clause is present. But in case of instead of triggers, the call
is returned early as above and the framed old tuple is not used.

Either change the logic of returning for instead of triggers, or remove
the generation of oldtuple after instead triggers call execution.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Visual Studio 2017 Build Support

2017-09-01 Thread Haribabu Kommi
On Fri, Sep 1, 2017 at 11:06 AM, Tanay Varma <tanay.va...@microsoft.com>
wrote:

>
>
> Hello,
>
>
>
> This is with respect to the original thread on “visual studio 2017 build
> support” created by Haribabu Kommi (kommi.harib...@gmail.com).
>
>
> <https://www.postgresql.org/message-id/CAJrrPGcZpraBCe6fJ963kVzKdM7AWPTYmXJ=8neap87wed9...@mail.gmail.com>
>
> https://www.postgresql.org/message-id/CAJrrPGcZpraBCe6fJ963kVzKdM7AW
> PTYmXJ=8neap87wed9...@mail.gmail.com
>
>
>
> Firstly, I would like to thank Haribabu Kommi for authoring the patch.
>
>
>
> I am posting a small update to the final patch submitted by Haribabu Kommi
> to also support the recent* v15.3 Release of Visual Stuido 2017 which
> upgrades the VC tools to version 14.11.*
>
>
>
> It would be great if this patch could be accepted so that Postgres could
> be built using the latest VS tools.
>
>
Hi,

Thanks for the review and change the patch to support the latest VS 2017.
In [1],  I already posted an updated patch that provides the support for
latest VS 2017,
may be you could have missed to check it. And also that patch includes the
fix for any
future version number updates from VS 2017.

[1] -
https://www.postgresql.org/message-id/CAJrrPGe%3D4jMkREOffnaDU93OerJwkVborPGE5O1Z1h1Jj-hUrg%40mail.gmail.com

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] utility commands benefiting from parallel plan

2017-09-01 Thread Haribabu Kommi
Hi All,

Attached a rebased patch that supports parallelism for the queries
that are underneath of some utility commands such as CREATE TABLE AS
and CREATE MATERIALIZED VIEW.

Note: This patch doesn't make the utility statement (insert operation)
to run in parallel. It only allows the select query to be parallel if the
query
is eligible for parallel.

Regards,
Hari Babu
Fujitsu Australia


0001-Make-parallel-eligible-for-utility-commands-undernea_V3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-08-25 Thread Haribabu Kommi
On Wed, Aug 23, 2017 at 11:59 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Wed, Aug 23, 2017 at 11:05 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> >
> > On Mon, Aug 21, 2017 at 7:25 PM, Amit Kapila <amit.kapil...@gmail.com>
> > wrote:
> >>
> >> On Mon, Aug 21, 2017 at 12:58 PM, Haribabu Kommi
> >> <kommi.harib...@gmail.com> wrote:
> >> >
> >> > On Sun, Aug 13, 2017 at 5:17 PM, Amit Kapila <amit.kapil...@gmail.com
> >
> >> > wrote:
> >> >>
> >> >>
> >> >> Also, it is quite possible that some of the storage Am's don't even
> >> >> want to return bool as a parameter from HeapTupleSatisfies* API's.  I
> >> >> guess what we need here is to provide a way so that different storage
> >> >> am's can register their function pointer for an equivalent to
> >> >> satisfies function.  So, we need to change
> >> >> SnapshotData.SnapshotSatisfiesFunc in some way so that different
> >> >> handlers can register their function instead of using that directly.
> >> >> I think that should address the problem you are planning to solve by
> >> >> omitting buffer parameter.
> >> >
> >> >
> >> > Thanks for your suggestion. Yes, it is better to go in the direction
> of
> >> > SnapshotSatisfiesFunc.
> >> >
> >> > I verified the above idea of implementing the Tuple visibility
> functions
> >> > and assign them into the snapshotData structure based on the snapshot.
> >> >
> >> > The Tuple visibility functions that are specific to the relation are
> >> > available
> >> > with the RelationData structure and this structure may not be
> available,
> >> >
> >>
> >> Which functions are you referring here?  I don't see anything in
> >> tqual.h that uses RelationData.
> >
> >
> >
> > With storage API's, the tuple visibility functions are available with
> > RelationData
> > and those are needs used to update the SnapshotData structure
> > SnapshotSatisfiesFunc member.
> >
> > But the RelationData is not available everywhere, where the snapshot is
> > created,
> > but it is available every place where the tuple visibility is checked.
> So I
> > just changed
> > the way of checking the tuple visibility with the information of
> snapshot by
> > calling
> > the corresponding tuple visibility function from RelationData.
> >
> > If SnapshotData provides MVCC, then the MVCC specific tuple visibility
> > function from
> > RelationData is called. The SnapshotSatisfiesFunc member is changed to a
> > enum
> > that holds the tuple visibility type such as MVCC, DIRTY, SELF and etc.
> > Whenever
> > the visibility check is needed, the corresponding function is called.
> >
>
> It will be easy to understand and see if there is some better
> alternative once you have something in the form of a patch.


Sorry for the delay.

I will submit the new patch series with all comments given
in the upthread to the upcoming commitfest.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] visual studio 2017 build support

2017-08-25 Thread Haribabu Kommi
On Fri, Aug 25, 2017 at 11:27 PM, Christian Ullrich <ch...@chrullrich.net>
wrote:

> * On 2017-06-21 02:06, Haribabu Kommi wrote:
>
> Thanks for the review. Here I attached an updated patch with README update.
>>
>
> Hello,
>

Thanks for the review.


> the most recent update to VS 2017, version 15.3, now identifies as "14.11"
> rather than "14.10" in the output of nmake /?. Simply adding this value to
> the two places that check for 14.10 in your patch appears to work for me.
>

VS 2017 doesn't change the nmake version to 15, and it is updating with
every minor version, so I changed the check to accept
everything that is greater than 14.10 and eq 15, in case in future if VS
2017 changes the version number.


> In a newly created project, PlatformToolset is still "v141". ToolsVersion
> is "15.0" whereas your patch uses "14.1".
>
> ISTM that the ToolsVersion has been like this in all versions of VS 2017;
> in my collection of .vcxproj files the auto-generated PostgreSQL projects
> are the only ones using "14.1".
>

Updated the Tools version to 15.0 and kept the platform toolset as V141,
this because the toolset is version is still points
to V141, when I create a sample project with VS 2017 and the version number
is inline with nmake version also.


Regards,
Hari Babu
Fujitsu Australia


0001-vs-2017-support_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-08-22 Thread Haribabu Kommi
On Mon, Aug 21, 2017 at 7:25 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Mon, Aug 21, 2017 at 12:58 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> > On Sun, Aug 13, 2017 at 5:17 PM, Amit Kapila <amit.kapil...@gmail.com>
> > wrote:
> >>
> >>
> >> Also, it is quite possible that some of the storage Am's don't even
> >> want to return bool as a parameter from HeapTupleSatisfies* API's.  I
> >> guess what we need here is to provide a way so that different storage
> >> am's can register their function pointer for an equivalent to
> >> satisfies function.  So, we need to change
> >> SnapshotData.SnapshotSatisfiesFunc in some way so that different
> >> handlers can register their function instead of using that directly.
> >> I think that should address the problem you are planning to solve by
> >> omitting buffer parameter.
> >
> >
> > Thanks for your suggestion. Yes, it is better to go in the direction of
> > SnapshotSatisfiesFunc.
> >
> > I verified the above idea of implementing the Tuple visibility functions
> > and assign them into the snapshotData structure based on the snapshot.
> >
> > The Tuple visibility functions that are specific to the relation are
> > available
> > with the RelationData structure and this structure may not be available,
> >
>
> Which functions are you referring here?  I don't see anything in
> tqual.h that uses RelationData.



With storage API's, the tuple visibility functions are available with
RelationData
and those are needs used to update the SnapshotData structure
SnapshotSatisfiesFunc member.

But the RelationData is not available everywhere, where the snapshot is
created,
but it is available every place where the tuple visibility is checked. So I
just changed
the way of checking the tuple visibility with the information of snapshot
by calling
the corresponding tuple visibility function from RelationData.

If SnapshotData provides MVCC, then the MVCC specific tuple visibility
function from
RelationData is called. The SnapshotSatisfiesFunc member is changed to a
enum
that holds the tuple visibility type such as MVCC, DIRTY, SELF and etc.
Whenever
the visibility check is needed, the corresponding function is called.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Pluggable storage

2017-08-22 Thread Haribabu Kommi
On Tue, Aug 15, 2017 at 4:53 PM, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
>
> On 2017-06-13 11:50:27 +1000, Haribabu Kommi wrote:
> > Here I attached WIP patches to support pluggable storage. The patch
> series
> > are may not work individually. Still so many things are under
> development.
> > These patches are just to share the approach of the current development.
>
> Making a pass through the patchset to get a feel where this at, and
> where this is headed.  I previously skimmed the thread to get a rough
> sense on what's discused, but not in a very detailed manner.
>

Thanks for the review.


General:
>
> - I think one important discussion we need to have is what kind of
>   performance impact we're going to accept introducing this. It seems
>   very likely that this'll cause some slowdown.  We can kind of
>   alleviate that by doing some optimizations at the same time, but
>   nevertheless, this abstraction is going to cost.
>

OK. May be to take some decision, we may need some performance
figures, I will measure the performance once the API's stabilized.

- I don't think we should introduce this without a user besides
>   heapam. The likelihood that API will be usable by anything else
>   without a testcase seems fairly remote.  I think some src/test/modules
>   type implementation of a per-session, in-memory storage - relatively
>   easy to implement - or such is necessary.
>

Sure, I will add a test module once the API's are stabilized.



> - I think, and detailed some of that, we're going to need some cleanups
>   that go in before this, to decrease the size / increase the quality of
>   the new APIs.  It's going to get more painful to change APIs
>   subsequently.
>
> - We'll have to document clearly that these APIs are going to change for
>   a while, even after the release introducing them.
>

Yes, that's correct, because this is the first time we are developing the
storage API's to support pluggable storage, so it may needs some refinements
based on the usage to support different storage methods.



> StorageAm - Scan stuff:
>
> - I think API isn't quite right. There's a lot of granular callback
>   functionality like scan_begin_function / scan_begin_catalog /
>   scan_begin_bm - these largely are convenience wrappers around the same
>   function, and I don't think that would, or rather should, change in
>   any abstracted storage layer.  So I think there needs to be some
>   unification first (pretty close w/ beginscan_internal already, but
>   perhaps we should get rid of a few of these wrappers).
>

OK. I will change the API to add a function to beginscan_internal and
replace
the rest of the functions usage with beginscan_internal. And also there are
many bool flags that are passed to the beginscan_internal, I will try to
optimize
them also.



> - Some of the exposed functionality, e.g. scan_getpage,
>   scan_update_snapshot, scan_rescan_set_params looks like it should just
>   be excised, i.e. there's no reason for it to exist.
>

Currently these API's are used only in Bitmap and Sample scan's.
These scan methods are fully depends on the heap format. I will
check how to remove these API's.

- Minor: don't think the _function suffix for Storis necessary, just
>   makes things long, and every member has it. Besides that, it's also
>   easy to misunderstand - for a second I understood
>   scan_getnext_function to be about getting the next function...
>

OK. How about adding _hook?



> - Scans are still represented as HeapScanDesc - I don't think that's
>   going to fly. Either this needs to be an opaque type (i.e. a struct
>   that's not defined, just forward declared), or it needs to be a base
>   struct that individual AMs embed in their own structs.  Individual AMs
>   definitely are going to need different pieces of data.
>

Currently the internal members of the HeapScanDesc are directly used
in many places especially in Bitmap and Sample scan's. I am yet to write
the code in a best way to handle these scan methods and then removing
its usage will be easy.


> Storage AM - tuple stuff:
>
> - tuple_get_{xmin, updated_xid, cmin, itempointer, ctid, heaponly} are
>   each individual functions, that seems pretty painful to maintain, and
>   v/ likely to just grow and grow. Not sure what the solution is, but
>   this seems like a hard sell.
>

OK. How about adding a one API and takes some flags to represent
what type of data that is needed from the tuple and returned the
corresponding
data as void *. The caller must typecast the data to their corresponding
type before use it.

- The three *speculative* functions don't quite seem right to me, nor do
>   I understand:
> +*
> +* Setting a tuple's speculative token is a slot-only operation,
> s

Re: [HACKERS] parallelize queries containing initplans

2017-08-21 Thread Haribabu Kommi
On Mon, Aug 14, 2017 at 8:41 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Sun, Aug 13, 2017 at 6:49 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > On Fri, Aug 11, 2017 at 1:18 AM, Amit Kapila <amit.kapil...@gmail.com>
> > wrote:
> >>
> >
> > Thanks for the updated patch. Patch looks fine. I just have some
> > minor comments.
> >
> > + * ExecEvalParamExecParams
> > + *
> > + * Execute the subplan stored in PARAM_EXEC initplans params, if not
> > executed
> > + * till now.
> > + */
> > +void
> > +ExecEvalParamExecParams(Bitmapset *params, EState *estate)
> >
> > I feel it is better to explain when this function executes the sub plans
> > that are
> > not executed till now? Means like in what scenario?
> >
>
> It just means that it will execute the same initplan (subplan) just
> once in master backend even if it used in multiple places.  This is
> the same kind of usage as we have in ExecEvalParamExec.  You can find
> its usage by using some query like
>
> explain analyse select sum(t1.i) from t1, t2 where t1.j=t2.j and t1.k
> = (select count(k) from t3) and t1.k=t2.k;
>
> Ensure you insert some rows in t1 and t2 that match the count from t3.
> If you are using the schema and data given in Kuntal's script in email
> above, then you need to insert something like
> t1(900,900,900);t2(900,900,900);
>
> It generates plan like below:
>
> postgres=# explain analyse select sum(t1.i) from t1, t2 where
> t1.j=t2.j and t1.k = (select count(k) from t3) and t1.k=t2.k;
> QUERY PLAN
> 
> ---
>  Aggregate  (cost=29.65..29.66 rows=1 width=8) (actual
> time=22572.521..22572.521 rows=1 loops=1)
>InitPlan 1 (returns $0)
>  ->  Finalize Aggregate  (cost=9.70..9.71 rows=1 width=8) (actual
> time=4345.110..4345.111 rows=1 loops=1)
>->  Gather  (cost=9.69..9.70 rows=2 width=8) (actual
> time=4285.019..4345.098 rows=3 loops=1)
>  Workers Planned: 2
>  Workers Launched: 2
>  ->  Partial Aggregate  (cost=9.69..9.70 rows=1
> width=8) (actual time=0.154..0.155 rows=1 loops=3)
>->  Parallel Seq Scan on t3  (cost=0.00..8.75
> rows=375 width=4) (actual time=0.011..0.090 rows=300 loops=3)
>->  Nested Loop  (cost=0.00..19.93 rows=1 width=4) (actual
> time=22499.918..22572.512 rows=1 loops=1)
>  Join Filter: (t1.j = t2.j)
>  ->  Gather  (cost=0.00..9.67 rows=2 width=12) (actual
> time=10521.356..10521.363 rows=1 loops=1)
>Workers Planned: 2
>Params Evaluated: $0
>Workers Launched: 2
>->  Parallel Seq Scan on t1  (cost=0.00..9.67 rows=1
> width=12) (actual time=0.506..0.507 rows=0 loops=3)
>  Filter: (k = $0)
>  Rows Removed by Filter: 299
>  ->  Materialize  (cost=0.00..10.21 rows=2 width=8) (actual
> time=11978.557..12051.142 rows=1 loops=1)
>->  Gather  (cost=0.00..10.20 rows=2 width=8) (actual
> time=11978.530..12051.113 rows=1 loops=1)
>  Workers Planned: 2
>  Params Evaluated: $0
>  Workers Launched: 2
>  ->  Parallel Seq Scan on t2  (cost=0.00..10.20
> rows=1 width=8) (actual time=0.067..0.067 rows=0 loops=3)
>Filter: (k = $0)
>Rows Removed by Filter: 333
>  Planning time: 15103.237 ms
>  Execution time: 22574.703 ms
> (27 rows)
>
> You can notice that initplan is used at multiple nodes, but it will be
> evaluated just once.  If you want, I can add a sentence in the
> comments, but I think this is somewhat obvious and the same use case
> already exists.  Let me know if you still think that comments need to
> be expanded?
>

Thanks for providing details. Yes it is clear to me.


>
> > + if (IsA(plan, Gather))
> > + ((Gather *) plan)->initParam = bms_intersect(plan->lefttree->extParam,
> > initSetParam);
> > + else
> > + ((GatherMerge *) plan)->initParam =
> > bms_intersect(plan->lefttree->extParam, initSetParam);
> >
> >
> > I think the above code is to find out the common parameters that are
> prsent
> > in the external
> > and out params.
> >
>
> Here, we want to save all the initplan params that can be used below
> the gather node.  extParam contains the set of all external PARAM_E

Re: [HACKERS] Pluggable storage

2017-08-21 Thread Haribabu Kommi
On Sun, Aug 13, 2017 at 5:17 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Sat, Aug 12, 2017 at 10:31 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >>
> >> Why do we need to store handler function in TupleDesc?  As of now, the
> >> above patch series has it available in RelationData and
> >> TupleTableSlot, I am not sure if instead of that keeping it in
> >> TupleDesc is a good idea.  Which all kind of places require TupleDesc
> >> to contain handler?  If those are few places, can we think of passing
> >> it as a parameter?
> >
> >
> > Till now I am to able to proceed without adding any storage handler
> > functions to
> > TupleDesc structure. Sure, I will try the way of passing as a parameter
> when
> > there is a need of it.
> >
>
> Okay, I think it is better if you discuss such locations before
> directly modifying those.
>

Sure. I will check with community before making any such changes.



> > During the progress of the patch, I am facing problems in designing the
> > storage API
> > regarding the Buffer. For example To replace all the
> HeapTupleSatisfiesMVCC
> > and
> > related functions with function pointers, In HeapTuple format, the tuple
> may
> > belongs
> > to one buffer, so the buffer is passed to the HeapTupleSatifisifes***
> > functions along
> > with buffer, But in case of other storage formats, the single buffer may
> not
> > contains
> > the actual data.
> >
>
> Also, it is quite possible that some of the storage Am's don't even
> want to return bool as a parameter from HeapTupleSatisfies* API's.  I
> guess what we need here is to provide a way so that different storage
> am's can register their function pointer for an equivalent to
> satisfies function.  So, we need to change
> SnapshotData.SnapshotSatisfiesFunc in some way so that different
> handlers can register their function instead of using that directly.
> I think that should address the problem you are planning to solve by
> omitting buffer parameter.
>

Thanks for your suggestion. Yes, it is better to go in the direction of
SnapshotSatisfiesFunc.

I verified the above idea of implementing the Tuple visibility functions
and assign them into the snapshotData structure based on the snapshot.

The Tuple visibility functions that are specific to the relation are
available
with the RelationData structure and this structure may not be available,
so I changed the SnapShotData structure to hold an enum to represent
what type of snapshot it is, instead of storing the pointer to the tuple
visibility function. Whenever there is a need to check for the tuple
visibilty
the storageam handler pointer corresponding to the snapshot type is
called and result is obtained as earlier.


> This buffer is used to set the Hint bits and mark the
> > buffer as dirty.
> > In case if the buffer is not available, the performance may affect for
> the
> > following
> > queries if the hint bits are not set.
> >
>
> I don't think it is advisable to change that for the current heap.
>

I didn't change the prototype of existing functions. Currently tuple
visibility
functions assumes that Buffer is always proper, but that may not be correct
based on the storage.


>
> > And also the Buffer is used to get from heap_fetch, heap_lock_tuple and
> > related
> > functions to check the Tuple visibility, but currently returning a buffer
> > from the above
> > heap_** function is not possible for other formats.
> >
>
> Why not?  I mean if we consider that all the formats we are worried at
> this stage have TID (block number, tuple location), then we can get
> the buffer.  We might want to consider passing TID as a parameter to
> these API's if required to make that possible.  You also agreed above
> [1] that we can first design the API considering storage formats
> having TID.
>

The current approach is to support the storages that support TID bits.
But what I mean here, in some storage methods (for example column
storage), the tuple is not present in one buffer, the tuple data may be
calculated from many buffers and return the slot/storageTuple (until
unless we change everywhere to slot).

If any of the following code after the storage methods is expecting
a Buffer that should be valid may need some changes to check it first
whether it is a valid or not and perform the operations based on that.


> > And also for the
> > HeapTuple data,
> > the tuple data is copied into palloced buffer instead of pointing
> directly
> > to the page.
> > So, returning a Buffer is a valid or not here?
> >
>
> Yeah, but I think for the sake of compatibility and no

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-08-20 Thread Haribabu Kommi
On Tue, Aug 15, 2017 at 7:29 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 4/4/17 01:06, Haribabu Kommi wrote:
> > Both pg_dump and pg_upgrade tests are passed. Updated patch attached
> > I will add this patch to the next commitfest.
>
> This patch needs to be rebased for the upcoming commit fest.
>

Thanks for checking. Rebased patch is attached.

Regards,
Hari Babu
Fujitsu Australia


0001-pg_dump-and-pg_dumpall-database-handling-refactoring_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-08-13 Thread Haribabu Kommi
On Sun, Aug 13, 2017 at 5:21 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Sat, Aug 12, 2017 at 10:34 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> > On Tue, Aug 8, 2017 at 2:21 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> >>
> >> +typedef struct StorageAmRoutine
> >> +{
> >>
> >> In this structure, you have already covered most of the API's that a
> >> new storage module needs to provide, but I think there could be more.
> >> One such API could be heap_hot_search.  This seems specific to current
> >> heap where we have the provision of HOT.  I think we can provide a new
> >> API tuple_search or something like that.
> >
> >
> > Thanks for the review.
> >
> > Yes, the storageAmRoutine needs more function pointers. Currently I am
> > adding all the functions that are present in the heapam.h and some slot
> > related function from tuptable.h.
> >
>
> Hmm, this API is exposed via heapam.h.  Am I missing something?
>

Sorry I was not clearly explaining in my earlier mail. Yes your are right
that the heap_hot_search function exists in heapam.h, but I am yet to
add all the exposed functions that are present in the heapam.h file to
the storageAmRoutine structure.

> Once I stabilize the code and API's that
> > are
> > currently added, then I will further enhance it with remaining functions
> > that
> > are necessary to support pluggable storage API.
> >
>
> Sure, but I think if we found any thing during development/review,
> then we should either add it immediately or at the very least add fix
> me in a patch to avoid forgetting the finding.


OK. I will add all the functions that are identified till now.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] parallelize queries containing initplans

2017-08-13 Thread Haribabu Kommi
On Fri, Aug 11, 2017 at 1:18 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Wed, Aug 9, 2017 at 6:51 PM, Haribabu Kommi <kommi.harib...@gmail.com>
> wrote:
> >
> >
> > + if (IsA(plan, Gather))
> > + ((Gather *) plan)->initParam = bms_intersect(plan->lefttree->extParam,
> > initSetParam);
> > + else if (IsA(plan, GatherMerge))
> > + ((GatherMerge *) plan)->initParam =
> > bms_intersect(plan->lefttree->extParam, initSetParam);
> > + else
> > + elog(ERROR, "unrecognized node type: %d", nodeTag(plan));
> >
> > The else case is not possible, because it is already validated for
> Gather or
> > GatherMerge.
> > Can we change it simple if and else?
> >
>
> As we already have an assert in this function to protect from any
> other node type (nodes other than Gather and Gather Merge), it makes
> sense to change the code to just if...else, so changed accordingly.


Thanks for the updated patch. Patch looks fine. I just have some
minor comments.

+ * ExecEvalParamExecParams
+ *
+ * Execute the subplan stored in PARAM_EXEC initplans params, if not
executed
+ * till now.
+ */
+void
+ExecEvalParamExecParams(Bitmapset *params, EState *estate)

I feel it is better to explain when this function executes the sub plans
that are
not executed till now? Means like in what scenario?


+ if (params == NULL)
+ nparams = 0;
+ else
+ nparams = bms_num_members(params);

bms_num_members return 0 in case if the params is NULL.
Is it fine to keep the specific check for NULL is required for performance
benefit
or just remove it? Anything is fine for me.


+ if (IsA(plan, Gather))
+ ((Gather *) plan)->initParam = bms_intersect(plan->lefttree->extParam,
initSetParam);
+ else
+ ((GatherMerge *) plan)->initParam =
bms_intersect(plan->lefttree->extParam, initSetParam);


I think the above code is to find out the common parameters that are prsent
in the external
and out params. It may be better to explain the logic in the comments.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Pluggable storage

2017-08-11 Thread Haribabu Kommi
On Tue, Aug 8, 2017 at 2:21 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Tue, Jun 13, 2017 at 7:20 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> >
> > On Fri, Oct 14, 2016 at 7:26 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
> >>
> >> I have sent the partial patch I have to Hari Babu Kommi.  We expect that
> >> he will be able to further this goal some more.
> >
> >
> > Thanks Alvaro for sharing your development patch.
> >
> > Most of the patch design is same as described by Alvaro in the first mail
> > [1].
> > I will detail the modifications, pending items and open items (needs
> > discussion)
> > to implement proper pluggable storage.
> >
> > Here I attached WIP patches to support pluggable storage. The patch
> series
> > are may not work individually. Still so many things are under
> development.
> > These patches are just to share the approach of the current development.
> >
>
> +typedef struct StorageAmRoutine
> +{
>
> In this structure, you have already covered most of the API's that a
> new storage module needs to provide, but I think there could be more.
> One such API could be heap_hot_search.  This seems specific to current
> heap where we have the provision of HOT.  I think we can provide a new
> API tuple_search or something like that.


Thanks for the review.

Yes, the storageAmRoutine needs more function pointers. Currently I am
adding all the functions that are present in the heapam.h and some slot
related function from tuptable.h. Once I stabilize the code and API's that
are
currently added, then I will further enhance it with remaining functions
that
are necessary to support pluggable storage API.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Pluggable storage

2017-08-11 Thread Haribabu Kommi
On Mon, Aug 7, 2017 at 11:12 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Tue, Aug 1, 2017 at 1:56 PM, Haribabu Kommi <kommi.harib...@gmail.com>
> wrote:
> >
> >
> > On Sun, Jul 23, 2017 at 4:10 PM, Amit Kapila <amit.kapil...@gmail.com>
> > wrote:
> >>
> >
> >>
> >> > 1. Design an API that returns values/nulls array and convert that
> into a
> >> > HeapTuple whenever it is required in the upper layers. All the
> existing
> >> > heap form/deform tuples are used for every tuple with some
> adjustments.
> >> >
> >>
> >> So, this would have the additional cost of form/deform.  Also, how
> >> would it have lesser changes as compare to what you have described
> >> earlier?
> >
> >
> > Yes, It have the additional cost of form/deform. It is the same approach
> > that
> > is described earlier. But I have an intention of modifying everywhere the
> > HeapTuple is accessed. But with the other prototype changes of removing
> > HeapTuple usage from triggers, I realized that it needs some clear design
> > to proceed further, instead of combining those changes with pluggable
> > Storage API.
> >
> > - heap_getnext function is kept as it as and it is used only for system
> > table
> >   access.
> > - heap_getnext_slot function is introduced to return the slot whenever
> the
> >   data is found, otherwise NULL, This function is used in all the places
> > from
> >   Executor and etc.
> >
> > - The TupleTableSlot structure is modified to contain a void* tuple
> instead
> > of
> > HeapTuple. And also it contains the storagehanlder functions.
> > - heap_insert and etc function can take Slot as an argument and perform
> the
> > insert operation.
> >
> > The cases where the TupleTableSlot is not possible to sent, form a
> HeapTuple
> > from the data and sent it and also note down that it is a HeapTuple data,
> > not
> > the tuple from the storage.
> >
> ..
> >
> >
> >>
> >> > 3. Design an API that returns StorageTuple(void *) but the necessary
> >> > format information of that tuple can be get from the tupledesc.
> wherever
> >> > the tuple is present, there exists a tupledesc in most of the cases.
> How
> >> > about adding some kind of information in tupledesc to find out the
> tuple
> >> > format and call the necessary functions
> >> >
> >
> >
> > heap_getnext function returns StorageTuple instead of HeapTuple. The
> tuple
> > type information is available in the TupleDesc structure.
> >
> > All heap_insert and etc function accepts TupleTableSlot as input and
> perform
> > the insert operation. This approach is almost same as first approach
> except
> > the
> > storage handler functions are stored in TupleDesc.
> >
>
> Why do we need to store handler function in TupleDesc?  As of now, the
> above patch series has it available in RelationData and
> TupleTableSlot, I am not sure if instead of that keeping it in
> TupleDesc is a good idea.  Which all kind of places require TupleDesc
> to contain handler?  If those are few places, can we think of passing
> it as a parameter?


Till now I am to able to proceed without adding any storage handler
functions to
TupleDesc structure. Sure, I will try the way of passing as a parameter
when
there is a need of it.

During the progress of the patch, I am facing problems in designing the
storage API
regarding the Buffer. For example To replace all the HeapTupleSatisfiesMVCC
and
related functions with function pointers, In HeapTuple format, the tuple
may belongs
to one buffer, so the buffer is passed to the HeapTupleSatifisifes***
functions along
with buffer, But in case of other storage formats, the single buffer may
not contains
the actual data. This buffer is used to set the Hint bits and mark the
buffer as dirty.
In case if the buffer is not available, the performance may affect for the
following
queries if the hint bits are not set.

And also the Buffer is used to get from heap_fetch, heap_lock_tuple and
related
functions to check the Tuple visibility, but currently returning a buffer
from the above
heap_** function is not possible for other formats. And also for the
HeapTuple data,
the tuple data is copied into palloced buffer instead of pointing directly
to the page.
So, returning a Buffer is a valid or not here?

Currently I am proceeding to remove the Buffer as parameter in the API and
proceed
further, In case if it affects the performance, we need to find out a
different appraoch
in handling the hint bits.

comments?

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] parallelize queries containing initplans

2017-08-09 Thread Haribabu Kommi
On Wed, Aug 9, 2017 at 9:26 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Wed, Aug 9, 2017 at 10:24 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> >
> > For the following query the parallel plan is not chosen. The query
> contains
> > an init plan that refer the outer node.
> >
>
> We don't want to generate the parallel plan for such cases.  Whenever
> initplan refers to any outer node (aka correlated plan), it won't
> generate a parallel plan.  Also, for t2, it doesn't choose a parallel
> plan because one-time filter refers to the outer node (again
> correlated plan case). Basically, till now we don't support parallel
> plan for any case where the correlated plan is used.  So, it is
> perfectly valid that it doesn't use parallel plan here.


Thanks for providing the details.


> Here if you notice the parallel node t2 refers to the initplan which
> can be parallelised after this patch.  Basically, whenever the
> initplan is attached at or above Gather node, we compute its value and
> pass down to workers.
>

Thanks for the details. I checked the code also.

By the way, I tested the patch with by DML support for parallel patch to
check the returning of clause of insert, and all the returning clause init
plans
are parallel plans with this patch.


> By the way, the patch doesn't apply on HEAD, so attached rebased patch.



Thanks for the updated patch. I have some comments and I am yet to finish
the review.

+ /*
+ * We don't want to generate gather or gather merge node if there are
+ * initplans at some query level below the current query level as those
+ * plans could be parallel-unsafe or undirect correlated plans.  Ensure to
+ * mark all the partial and non-partial paths for a relation at this level
+ * to be parallel-unsafe.
+ */
+ if (is_initplan_below_current_query_level(root))
+ {
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path   *subpath = (Path *) lfirst(lc);
+
+ subpath->parallel_safe = false;
+ }
+
+ foreach(lc, rel->pathlist)
+ {
+ Path   *subpath = (Path *) lfirst(lc);
+
+ subpath->parallel_safe = false;
+ }
+ return;
+ }
+

The above part of the code is almost same in mutiple places, is it possible
to change to function?


+ node->initParam = NULL;

This new parameter is set to NULL in make_gather function, the same
parameter
is added to GatherMerge structure also, but anyway this parameter is set to
NULL
makeNode macro, why only setting it to here, but not the other place.

Do we need to set it to default value such as NULL or false if it is
already the same value?
This is not related to the above parameter, for all existing parameters
also.


+ if (IsA(plan, Gather))
+ ((Gather *) plan)->initParam = bms_intersect(plan->lefttree->extParam,
initSetParam);
+ else if (IsA(plan, GatherMerge))
+ ((GatherMerge *) plan)->initParam =
bms_intersect(plan->lefttree->extParam, initSetParam);
+ else
+ elog(ERROR, "unrecognized node type: %d", nodeTag(plan));

The else case is not possible, because it is already validated for Gather
or GatherMerge.
Can we change it simple if and else?

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] parallelize queries containing initplans

2017-08-09 Thread Haribabu Kommi
On Wed, Aug 9, 2017 at 8:54 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
wrote:

> On Wed, Aug 9, 2017 at 10:24 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> > I tested the latest patch and the parallel plan is getting choose for
> most
> > of
> > the init plans.
> >
> Thanks for testing.
>
> > For the following query the parallel plan is not chosen. The query
> contains
> > an init plan that refer the outer node.
> >
> > postgres=# explain analyze select * from t1 where t1.i in (select t2.i
> from
> > t2 where t1.k = (select max(k) from t3 where t3.i=t1.i));
> >  QUERY PLAN
> > 
> -
> >  Seq Scan on t1  (cost=0.00..22426.28 rows=448 width=12) (actual
> > time=8.335..132.557 rows=2 loops=1)
> >Filter: (SubPlan 2)
> >Rows Removed by Filter: 894
> >SubPlan 2
> >  ->  Result  (cost=16.27..31.26 rows=999 width=4) (actual
> > time=0.146..0.146 rows=0 loops=896)
> >One-Time Filter: (t1.k = $1)
> >InitPlan 1 (returns $1)
> >  ->  Aggregate  (cost=16.25..16.27 rows=1 width=4) (actual
> > time=0.145..0.145 rows=1 loops=896)
> >->  Seq Scan on t3  (cost=0.00..16.25 rows=2 width=4)
> > (actual time=0.131..0.144 rows=0 loops=896)
> >  Filter: (i = t1.i)
> >  Rows Removed by Filter: 900
> >->  Seq Scan on t2  (cost=16.27..31.26 rows=999 width=4)
> (actual
> > time=0.012..0.013 rows=10 loops=2)
> >  Planning time: 0.272 ms
> >  Execution time: 132.623 ms
> > (14 rows)
> >
> An observation is that the filter at Result node can't be pushed down
> to the sequential scan on t2 because the filter is on t1. So, it has
> to scan the complete t2 relation and send all the tuple to upper node,
> a worst case for parallelism. Probably, this is the reason the
> optimizer doesn't pick parallel plan for the above case.
>
> Just for clarification, do you see any changes in the plan after
> forcing parallelism(parallel_tuple_cost, parallel_setup_cost,
> min_parallel_table_scan_size=0)?


There is no plan change with parallel* GUC changes.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] parallelize queries containing initplans

2017-08-08 Thread Haribabu Kommi
On Mon, Jul 17, 2017 at 10:53 PM, Amit Kapila 
wrote:

> On Tue, Mar 28, 2017 at 7:25 AM, Amit Kapila 
> wrote:
> > On Thu, Mar 16, 2017 at 2:34 AM, Kuntal Ghosh
> >  wrote:
> >> On Tue, Mar 14, 2017 at 3:20 PM, Amit Kapila 
> wrote:
> >>> Based on that idea, I have modified the patch such that it will
> >>> compute the set of initplans Params that are required below gather
> >>> node and store them as bitmap of initplan params at gather node.
> >>> During set_plan_references, we can find the intersection of external
> >>> parameters that are required at Gather or nodes below it with the
> >>> initplans that are passed from same or above query level. Once the set
> >>> of initplan params are established, we evaluate those (if they are not
> >>> already evaluated) before execution of gather node and then pass the
> >>> computed value to each of the workers.   To identify whether a
> >>> particular param is parallel safe or not, we check if the paramid of
> >>> the param exists in initplans at same or above query level.  We don't
> >>> allow to generate gather path if there are initplans at some query
> >>> level below the current query level as those plans could be
> >>> parallel-unsafe or undirect correlated plans.
> >>
> >> I would like to mention different test scenarios with InitPlans that
> >> we've considered while developing and testing of the patch.
> >>
> >
> > Thanks a lot Kuntal for sharing different test scenarios.
> > Unfortunately, this patch doesn't received any review till now, so
> > there is no chance of making it in to PostgreSQL-10.  I have moved
> > this to next CF.
> >
>
> Attached is a rebased version of the patch with below changes:
> a. SubplanState now directly stores Subplan rather than ExprState, so
> patch needs some adjustment in that regard.
> b. While rejecting the paths (based on if there are initplans at level
> below the current query level) for parallelism, the rejected paths
> were not marked as parallel unsafe.  Due to this in
> force_parallel_mode=regress, we were able to add gather node above
> parallel unsafe paths.  The modified patch ensures to mark such paths
> as parallel unsafe.
> c. Added regression test.
> d. Improve comments in the code.
>
>
I tested the latest patch and the parallel plan is getting choose for most
of
the init plans.

For the following query the parallel plan is not chosen. The query contains
an init plan that refer the outer node.

postgres=# explain analyze select * from t1 where t1.i in (select t2.i from
t2 where t1.k = (select max(k) from t3 where t3.i=t1.i));
 QUERY PLAN

-
 Seq Scan on t1  (cost=0.00..22426.28 rows=448 width=12) (actual
time=8.335..132.557 rows=2 loops=1)
   Filter: (SubPlan 2)
   Rows Removed by Filter: 894
   SubPlan 2
 ->  Result  (cost=16.27..31.26 rows=999 width=4) (actual
time=0.146..0.146 rows=0 loops=896)
   One-Time Filter: (t1.k = $1)
   InitPlan 1 (returns $1)
 ->  Aggregate  (cost=16.25..16.27 rows=1 width=4) (actual
time=0.145..0.145 rows=1 loops=896)
   ->  Seq Scan on t3  (cost=0.00..16.25 rows=2 width=4)
(actual time=0.131..0.144 rows=0 loops=896)
 Filter: (i = t1.i)
 Rows Removed by Filter: 900
   ->  Seq Scan on t2  (cost=16.27..31.26 rows=999 width=4) (actual
time=0.012..0.013 rows=10 loops=2)
 Planning time: 0.272 ms
 Execution time: 132.623 ms
(14 rows)

If I change the query a little bit, the Result node doesn't appear and the
parallel plan
gets chosen.

postgres=# explain analyze select * from t1 where t1.i in (select t2.i from
t2 where t2.k = (select max(k) from t3 where t3.i=t1.i));
   QUERY PLAN

-
 Seq Scan on t1  (cost=0.00..19162.88 rows=448 width=12) (actual
time=3501.483..3501.483 rows=0 loops=1)
   Filter: (SubPlan 2)
   Rows Removed by Filter: 896
   SubPlan 2
 ->  Gather  (cost=16.27..26.47 rows=2 width=4) (actual
time=3.471..3.795 rows=0 loops=896)
   Workers Planned: 2
   Params Evaluated: $1
   Workers Launched: 2
   InitPlan 1 (returns $1)
 ->  Aggregate  (cost=16.25..16.27 rows=1 width=4) (actual
time=0.161..0.161 rows=1 loops=896)
   ->  Seq Scan on t3  (cost=0.00..16.25 rows=2 width=4)
(actual time=0.144..0.156 rows=0 loops=896)
 Filter: (i = t1.i)
 Rows Removed by Filter: 900
   ->  Parallel Seq Scan on t2  (cost=0.00..10.20 rows=1 width=4)
(actual time=0.001..0.001 rows=0 loops=804608)
 Filter: (k = $1)

Re: [HACKERS] Pluggable storage

2017-08-01 Thread Haribabu Kommi
On Sun, Jul 23, 2017 at 4:10 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Wed, Jul 19, 2017 at 11:33 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> > I am finding out that eliminating the HeapTuple usage in the upper layers
> > needs some major changes, How about not changing anything in the upper
> > layers of storage currently and just support the pluggable tuple with
> one of
> > the following approach for first version?
> >
>
> It is not very clear to me how any of the below alternatives are
> better or worse as compare to the approach you are already working.
> Do you mean to say that we will get away without changing all the
> places which take HeapTuple as input or return it as output with some
> of the below approaches?
>

Yes, With the following approaches, my intention is to reduce the changing
of HeapTuple everywhere to support the Pluggable Storage API with minimal
changes and later improvise further. But until unless we change the
HeapTuple
everywhere properly, may be we can't see the benefits of pluggable storages.



> > 1. Design an API that returns values/nulls array and convert that into a
> > HeapTuple whenever it is required in the upper layers. All the existing
> > heap form/deform tuples are used for every tuple with some adjustments.
> >
>
> So, this would have the additional cost of form/deform.  Also, how
> would it have lesser changes as compare to what you have described
> earlier?
>

Yes, It have the additional cost of form/deform. It is the same approach
that
is described earlier. But I have an intention of modifying everywhere the
HeapTuple is accessed. But with the other prototype changes of removing
HeapTuple usage from triggers, I realized that it needs some clear design
to proceed further, instead of combining those changes with pluggable
Storage API.

- heap_getnext function is kept as it as and it is used only for system
table
  access.
- heap_getnext_slot function is introduced to return the slot whenever the
  data is found, otherwise NULL, This function is used in all the places
from
  Executor and etc.

- The TupleTableSlot structure is modified to contain a void* tuple instead
of
HeapTuple. And also it contains the storagehanlder functions.
- heap_insert and etc function can take Slot as an argument and perform the
insert operation.

The cases where the TupleTableSlot is not possible to sent, form a HeapTuple
from the data and sent it and also note down that it is a HeapTuple data,
not
the tuple from the storage.

> 2. Design an API that returns StorageTuple(void *) with first member
> > represents the TYPE of the storage, so that corresponding registered
> > function calls can be called to deform/form the tuple whenever there is
> > a need of tuple.
> >
>
> Do you intend to say that we store such information in disk tuple or
> only in the in-memory version of same?


Only in-memory version.


>   Also, what makes you think
> that we would need hooks only for form and deform?  Right now, in many
> cases tuple will directly point to disk page and we deal with it by
> retaining the pin on the corresponding buffer, what if some kinds of
> tuple don't follow that rule?  For ex. to support in-place updates, we
> might always need a separate copy of tuple rather than the one
> pointing to disk page.
>

In any of the approaches, except for system tables, we are going to remove
the direct disk access of the tuple. Either with replace tuple with slot or
something,
no direct disk access will be removed. Otherwise it will be difficult to
support,
and also it doesn't provide much performance benefit also.

All the storagehandler functions needs to be maintained seperately
and accessed by them using the hanlder ID.

heap_getnext function returns StorageTuple and not HeapTuple.
The StorageTuple can be mapped to HeapTuple or another Tuple
based on the first member type.

heap_insert etc function takes input as StorageTuple and internally
decides based on the type of the tuple to perform the insert operation.

Instead of storing the handler functions inside a relation/slot, the
function pointers can be accessed directly based on the storage
type data.

This works every case where the tuple is accessed, but the problem
is, it may need changes wherever the tuple is accessed.



> > 3. Design an API that returns StorageTuple(void *) but the necessary
> > format information of that tuple can be get from the tupledesc. wherever
> > the tuple is present, there exists a tupledesc in most of the cases. How
> > about adding some kind of information in tupledesc to find out the tuple
> > format and call the necessary functions
> >
>

heap_getnext function returns StorageTuple instead of HeapTuple. The tuple
type information is available in the

Re: [HACKERS] Pluggable storage

2017-07-19 Thread Haribabu Kommi
On Sat, Jul 15, 2017 at 12:30 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Jul 14, 2017 at 8:35 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > To replace tuple with slot, I took trigger and SPI calls as the first
> step
> > in modifying
> > from tuple to slot, Here I attached a WIP patch. The notable changes are,
> >
> > 1. Replace most of the HeapTuple with Slot in SPI interface functions.
> > 2. In SPITupleTable, Instead of HeapTuple, it is changed to
> TupleTableSlot.
> > But this change may not be a proper approach, because a duplicate copy of
> > TupleTableSlot is generated and stored.
> > 3. Changed all trigger interfaces to accept TupleTableSlot Instead of
> > HeapTuple.
> > 4. ItemPointerData is added as a member to the TupleTableSlot structure.
> > 5. Modified the ExecInsert and others to work directly on TupleTableSlot
> > instead
> > of tuple(not completely).
>
> What problem are you trying to solve with these changes?  I'm not
> saying that it's a bad idea, but I think you should spell out the
> motivation a little more explicitly.
>

Sorry for not providing complete details. I am trying these experiments
to find out the best way to return the tuple from Storage methods by
designing a proper API.

The changes that I am doing are to reduce the dependency on the
HeapTuple format with value/nulls array. So if there is no dependency
on the HeapTuple format in the upper layers of the PostgreSQL storage,
then directly we can define the StorageAPI to return the value/nulls array
instead of one big chunck of tuple data like HeapTuple or StorageTuple(void
*).

I am finding out that eliminating the HeapTuple usage in the upper layers
needs some major changes, How about not changing anything in the upper
layers of storage currently and just support the pluggable tuple with one of
the following approach for first version?

1. Design an API that returns values/nulls array and convert that into a
HeapTuple whenever it is required in the upper layers. All the existing
heap form/deform tuples are used for every tuple with some adjustments.

2. Design an API that returns StorageTuple(void *) with first member
represents the TYPE of the storage, so that corresponding registered
function calls can be called to deform/form the tuple whenever there is
a need of tuple.

3. Design an API that returns StorageTuple(void *) but the necessary
format information of that tuple can be get from the tupledesc. wherever
the tuple is present, there exists a tupledesc in most of the cases. How
about adding some kind of information in tupledesc to find out the tuple
format and call the necessary functions

4. Design an API to return always the StorageTuple and it converts to
HeapTuple with a function hook if it gets registered (for heap storages
this is not required to register the hook, because it is already a HeapTuple
format). This function hook should be placed in the heap form/deform
functions.

Any other better ideas for a first version.

Regards,
Hari Babu
Fujitsu Australia


[HACKERS] Re: [BUGS] BUG #14634: On Windows pg_basebackup should write tar to stdout in binary mode

2017-07-13 Thread Haribabu Kommi
On Fri, Jul 14, 2017 at 2:54 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote:

> On 05/03/2017 07:32 AM, Haribabu Kommi wrote:
>
>> [Adding -hackers mailing list]
>>
>> On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehl...@agilent.com> wrote:
>>
>> The following bug has been logged on the website:
>>>
>>> Bug reference:  14634
>>> Logged by:  Henry Boehlert
>>> Email address:  henry_boehl...@agilent.com
>>> PostgreSQL version: 9.6.2
>>> Operating system:   Windows Server 2012 R2 6.3.9600
>>> Description:
>>>
>>> Executing command pg_basebackup -D -F t writes its output to stdout,
>>> which
>>> is open in text mode, causing LF to be converted to CR LF thus corrupting
>>> the resulting archive.
>>>
>>> To write the tar to stdout, on Windows stdout's mode should be
>>> temporarily
>>> switched to binary.
>>>
>>> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
>>>
>>>
>> Thanks for reporting the issue.
>> With the attached patch, I was able to extract the tar file that gets
>> generated when the tar file is written into stdout. I tested the
>> the compressed tar also.
>>
>> This bug needs to be fixed in back branches also.
>>
>
> Seems reasonable. One question:
>
> In the patch, you used "_setmode" function, while the calls in
> src/bin/pg_dump/pg_backup_archiver.c use "setmode". There are a few
> places in the backend that also use "_setmode". What's the difference?
> Should we change some of them to be consistent?
>

Actually there is no functional difference between these two functions.
one is a POSIX variant and another one is ISO C++ variant [1]. The support
of POSIX variant is deprecated in windows, because of this reason we should
use the _setmode instead of setmode.

I attached the patch to change the pg_dump code to use _setmode function
instead of _setmode to be consistent with other functions.

[1] -
https://docs.microsoft.com/en-gb/cpp/c-runtime-library/reference/posix-setmode


Regards,
Hari Babu
Fujitsu Australia


0002-Replace-setmode-with-_setmode-function.patch
Description: Binary data


0001-pg_basebackup-windows-tar-mode-to-stdout-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-06-27 Thread Haribabu Kommi
On Tue, Jun 27, 2017 at 11:53 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Jun 27, 2017 at 4:08 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
>
>> On Thu, Jun 22, 2017 at 8:00 PM, Alexander Korotkov
>> <a.korot...@postgrespro.ru> wrote:
>> > On Wed, Jun 21, 2017 at 10:47 PM, Robert Haas <robertmh...@gmail.com>
>> wrote:
>> >>
>> >> On Mon, Jun 12, 2017 at 9:50 PM, Haribabu Kommi
>> >> <kommi.harib...@gmail.com> wrote:
>> >> > Open Items:
>> >> >
>> >> > 1. The BitmapHeapScan and TableSampleScan are tightly coupled with
>> >> > HeapTuple and HeapScanDesc, So these scans are directly operating
>> >> > on those structures and providing the result.
>> >> >
>> >> > These scan types may not be applicable to different storage formats.
>> >> > So how to handle them?
>> >>
>> >> I think that BitmapHeapScan, at least, is applicable to any table AM
>> >> that has TIDs.   It seems to me that in general we can imagine three
>> >> kinds of table AMs:
>> >>
>> >> 1. Table AMs where a tuple can be efficiently located by a real TID.
>> >> By a real TID, I mean that the block number part is really a block
>> >> number and the item ID is really a location within the block.  These
>> >> are necessarily quite similar to our current heap, but they can change
>> >> the tuple format and page format to some degree, and it seems like in
>> >> many cases it should be possible to plug them into our existing index
>> >> AMs without too much heartache.  Both index scans and bitmap index
>> >> scans ought to work.
>> >
>> >
>> > If #1 is only about changing tuple and page formats, then could be much
>> > simpler than the patch upthread?  We can implement "page format access
>> > methods" with routines for insertion, update, pruning and deletion of
>> tuples
>> > *in particular page*.  There is no need to redefine high-level logic for
>> > scanning heap, doing updates and so on...
>>
>> If you are changing tuple format then you do need to worry about
>> places wherever we are using HeapTuple like TupleTableSlots,
>> Visibility routines, etc. (just think if somebody changes tuple
>> header, then all such places are susceptible to change).
>
>
> Agree.  I think that we can consider pluggable tuple format as an
> independent feature which is desirable to have before pluggable storages.
> For example, I believe some FDWs could already have benefit from pluggable
> tuple format.
>

Accepting multiple tuple format is possible with complete replacement of
HeapTuple with TupleTableSlot or something like value/null array
instead of a single memory chunk tuple data.

Currently I am working on it to replace the HeapTuple with TupleTableSlot
in the upper layers once the tuples is returned from the scan. In most of
the
places where the HeapTuple is present, either replace it with TupleTableSlot
or change it to StorageTuple (void *).

I am yet to evaluate whether it is possible to support as an independent
feature
without the need of some heap format function to understand the tuple format
in every place.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Pluggable storage

2017-06-27 Thread Haribabu Kommi
On Wed, Jun 28, 2017 at 12:00 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Jun 27, 2017 at 4:19 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
>
>> On Thu, Jun 22, 2017 at 5:46 PM, Alexander Korotkov
>> <a.korot...@postgrespro.ru> wrote:
>> > On Tue, Jun 13, 2017 at 4:50 AM, Haribabu Kommi <
>> kommi.harib...@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Oct 14, 2016 at 7:26 AM, Alvaro Herrera <
>> alvhe...@2ndquadrant.com>
>> >> wrote:
>> >>>
>> >>> I have sent the partial patch I have to Hari Babu Kommi.  We expect
>> that
>> >>> he will be able to further this goal some more.
>> >>
>> >>
>> >> Thanks Alvaro for sharing your development patch.
>> >>
>> >> Most of the patch design is same as described by Alvaro in the first
>> mail
>> >> [1].
>> >> I will detail the modifications, pending items and open items (needs
>> >> discussion)
>> >> to implement proper pluggable storage.
>> >>
>> >> Here I attached WIP patches to support pluggable storage. The patch
>> series
>> >> are may not work individually. Still so many things are under
>> development.
>> >> These patches are just to share the approach of the current
>> development.
>> >>
>> >> Some notable changes that I did to make the patch work:
>> >>
>> >> 1. Added storageam handler to the slot, this is because not all places
>> >> the relation is not available in handy.
>> >> 2. Retained the minimal Tuple in the slot, as this is used in HASH
>> join.
>> >> As per the first version, I feel it is fine to allow creating HeapTuple
>> >> format data.
>> >>
>> >> Thanks everyone for sharing their ideas in the developer's
>> unconference at
>> >> PGCon Ottawa.
>> >>
>> >> Pending items:
>> >>
>> >> 1. Replacement of Tuple with slot in Trigger functionality
>> >> 2. Replacement of Tuple with Slot from storage handler functions.
>> >> 3. Remove/minimize the use of HeapTuple as a Datum.
>> >> 4. Replace all references of HeapScanDesc with StorageScanDesc
>> >> 5. Planner changes to consider the relation storage during the
>> planning.
>> >> 6. Any planner changes based on the discussion of open items?
>> >> 7. some Executor changes to consider the storage advantages?
>> >>
>> >> Open Items:
>> >>
>> >> 1. The BitmapHeapScan and TableSampleScan are tightly coupled with
>> >> HeapTuple and HeapScanDesc, So these scans are directly operating
>> >> on those structures and providing the result.
>> >
>> >
>> > What about vacuum?  I see vacuum is untouched in the patchset and it is
>> not
>> > mentioned in this discussion.
>> > Do you plan to override low-level function like heap_page_prune(),
>> > lazy_vacuum_page() etc., but preserve high-level logic of vacuum?
>> > Or do you plan to let pluggable storage implement its own high-level
>> vacuum
>> > algorithm?
>> >
>>
>> Probably, some other algorithm for vacuum.  I am not sure current
>> vacuum with its parameters can be used so easily.  One thing that
>> might need some thoughts is that is it sufficient to say that keep
>> autovacuum as off and call some different API for places where the
>> vacuum can be invoked manually like Vacuum command to the developer
>> implementing some different strategy for vacuum or we need something
>> more as well.
>
>
> What kind of other vacuum algorithm do you expect?  It would be rather
> easier to understand if we would have examples.
>
> For me, changing of vacuum algorithm is not needed for just heap page
> format change.  Existing vacuum algorithm could just call page format API
> functions for manipulating individual pages.
>
> Changing of vacuum algorithm might be needed for more invasive changes
> than just heap page format.  However, we should first understand what these
> changes could be and how are they consistent with rest of API design.
>

Yes, I agree that we need some changes in the vacuum to handle the
pluggable storage.
Currently I didn't fully checked the changes that are needed in vacuum, but
I feel the low level changes of the function are enough, and also there
should be
some option from storage handler to decide whether it needs a vacuum or not?
Based on this flag, the vacuum may be skipped on those tables. So these
handlers
no need to register those API's.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Pluggable storage

2017-06-27 Thread Haribabu Kommi
On Thu, Jun 22, 2017 at 5:47 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Jun 12, 2017 at 9:50 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > Open Items:
> >
> > 1. The BitmapHeapScan and TableSampleScan are tightly coupled with
> > HeapTuple and HeapScanDesc, So these scans are directly operating
> > on those structures and providing the result.
> >
> > These scan types may not be applicable to different storage formats.
> > So how to handle them?
>
> I think that BitmapHeapScan, at least, is applicable to any table AM
> that has TIDs.   It seems to me that in general we can imagine three
> kinds of table AMs:
>
> 1. Table AMs where a tuple can be efficiently located by a real TID.
> By a real TID, I mean that the block number part is really a block
> number and the item ID is really a location within the block.  These
> are necessarily quite similar to our current heap, but they can change
> the tuple format and page format to some degree, and it seems like in
> many cases it should be possible to plug them into our existing index
> AMs without too much heartache.  Both index scans and bitmap index
> scans ought to work.
>
> 2. Table AMs where a tuple has some other kind of locator.  For
> example, imagine an index-organized table where the locator is the
> primary key, which is a bit like what Alvaro had in mind for indirect
> indexes.  If the locator is 6 bytes or less, it could potentially be
> jammed into a TID, but I don't think that's a great idea.  For things
> like int8 or numeric, it won't work at all.  Even for other things,
> it's going to cause problems because the bit patterns won't be what
> the code is expecting; e.g. bitmap scans care about the structure of
> the TID, not just how many bits it is.  (Due credit: Somebody, maybe
> Alvaro, pointed out this problem before, at PGCon.)  For these kinds
> of tables, larger modifications to the index AMs are likely to be
> necessary, at least if we want a really general solution, or maybe we
> should have separate index AMs - e.g. btree for traditional TID-based
> heaps, and generic_btree or indirect_btree or key_btree or whatever
> for heaps with some other kind of locator.  It's not too hard to see
> how to make index scans work with this sort of structure but it's very
> unclear to me whether, or how, bitmap scans can be made to work.
>
> 3. Table AMs where a tuple doesn't really have a locator at all.  In
> these cases, we can't support any sort of index AM at all.  When the
> table is queried, there's really nothing the core system can do except
> ask the table AM for a full scan, supply the quals, and hope the table
> AM has some sort of smarts that enable it to optimize somehow.  For
> example, you can imagine converting cstore_fdw into a table AM of this
> sort - ORC has a sort of inbuilt BRIN-like indexing that allows whole
> chunks to be proven uninteresting and skipped.  (You could use chunk
> number + offset to turn this into a table AM of the previous type if
> you wanted to support secondary indexes; not sure if that'd be useful,
> but it'd certainly be harder.)
>
> I'm more interested in #1 than in #3, and more interested in #3 than
> #2, but other people may have different priorities.


Hi Robert,

Thanks for the details and your opinion.
I also agree that option#1 is better to do first.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] visual studio 2017 build support

2017-06-20 Thread Haribabu Kommi
On Mon, Apr 24, 2017 at 5:50 PM, Ideriha, Takeshi <
ideriha.take...@jp.fujitsu.com> wrote:

> Hi
>
>
>
> I’ve noticed src/tools/msvc/README also needs some fix together with your
> patch.
>
> README discription haven’t updated since VS 2012.
>

Thanks for the review. Here I attached an updated patch with README update.

Regards,
Hari Babu
Fujitsu Australia


0001-VS-2017-build-support-to-PostgreSQL.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increasing parallel workers at runtime

2017-05-16 Thread Haribabu Kommi
On Wed, May 17, 2017 at 2:35 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, May 16, 2017 at 8:18 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > In the current state of the patch, the main backend tries to start the
> > extra workers only when there is no tuples that are available from the
> > available workers. I feel that the invocation for more workers doesn't
> > do for every tuple.
>
> Well, the point is, the frequency with which the leader will try to
> acquire more workers is going to be highly variable with your patch,
> and might sometimes be extremely frequent.  It depends on the timing
> of the workers and of the leader, and I don't think that's something
> we want to depend on here.
>

Ok.


> > Another background process logic can produce a fair distribution of
> > workers to the parallel queries. In this case also, the backend should
> > advertise only when the allotted workers are not enough, this is because
> > there may be a case where the planned workers may be 5, but because
> > of other part of the query, the main backend is feed by the tuples just
> by
> > 2 workers, then there is no need to provide extra workers.
>
> That's true, but I think taking it into account might be difficult.
>
> > The another background process approach of wait interval to reassign
> > more workers after an interval period doesn't work for the queries that
> > are getting finished before the configured time of the wait. May be we
> > can ignore those scenarios?
>
> I think we can ignore that.  We can still help a lot of cases, and
> queries with very short run times obviously aren't being seriously
> hurt by the lack of full parallelism.


Ok. In this approach, we need to share some of the gatherstate structure
members in the shared memory to access by the other background process
or some better logic to update these details whenever a new worker gets
allotted.

I will give a try and see how easy to implement the same.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Increasing parallel workers at runtime

2017-05-16 Thread Haribabu Kommi
On Tue, May 16, 2017 at 1:53 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, May 15, 2017 at 10:06 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > This still needs some adjustments to fix for the cases where
> > the main backend also does the scan instead of waiting for
> > the workers to finish the job. As increasing the workers logic
> > shouldn't add an overhead in this case.
>
> I think it would be pretty crazy to try relaunching workers after
> every tuple, as this patch does.  The overhead of that will be very
> high for queries where the number of tuples passing through the Gather
> is large, whereas when the number of tuples passing through Gather is
> small, or where tuples are sent all at once at the end of procesisng,
> it will not actually be very effective at getting hold of more
> workers.


In the current state of the patch, the main backend tries to start the
extra workers only when there is no tuples that are available from the
available workers. I feel that the invocation for more workers doesn't
do for every tuple.

1. When there are large number of tuples are getting transferred from
workers,
I feel there is very less chance that backend is free that it can start
more workers
as because the backend itself may not need to execute the plan locally.

2. When there are tuples that are transferred at the end of the plan for
the cases
something it involves a sort node or has aggregate or etc, either the
backend is
waiting for the tuples to arrive or by itself doing the plan execution
along with
workers after trying of extending the number workers once.

3. When there are small number of tuples that are getting transferred, in
this
case there are chance of extra workers invocation more time compare to the
other scenarios, but still in this case also, The less number of tuples
transfer
is may be because of a complex filter condition that is taking time and
also it
is filtering more records. So in this case also, once the backend tried to
extend
the number of workers, after that it also participate in executing the
plan, then
the backend also takes time to get a tuple by executing the plan locally.
By that
time there are more chances of that workers are already ready with tuples.

The problem of invoking for more number of workers is possible when there is
only one worker that is allotted to the query execution.

Am I missing?


>   A different idea is to have an area in shared memory where
> queries can advertise that they didn't get all of the workers they
> wanted, plus a background process that periodically tries to launch
> workers to help those queries as parallel workers become available.
> It can recheck for available workers after some interval, say 10s.
> There are some problems there -- the process won't have bgw_notify_pid
> pointing at the parallel leader -- but I think it might be best to try
> to solve those problems instead of making it the leader's job to try
> to grab more workers as we go along.  For one thing, the background
> process idea can attempt to achieve fairness.  Suppose there are two
> processes that didn't get all of their workers; one got 3 of 4, the
> other 1 of 4.  When a worker becomes available, we'd presumably like
> to give it to the process that got 1 of 4, rather than having the
> leaders race to see who grabs the new worker first.  Similarly if
> there are four workers available and two queries that each got 1 of 5
> workers they wanted, we'd like to split the workers two and two,
> rather than having one leader grab all four of them.  Or at least, I
> think that's what we want.


Another background process logic can produce a fair distribution of
workers to the parallel queries. In this case also, the backend should
advertise only when the allotted workers are not enough, this is because
there may be a case where the planned workers may be 5, but because
of other part of the query, the main backend is feed by the tuples just by
2 workers, then there is no need to provide extra workers.

The another background process approach of wait interval to reassign
more workers after an interval period doesn't work for the queries that
are getting finished before the configured time of the wait. May be we
can ignore those scenarios?

Needs some smarter logic to share the required details to start the worker
as it is started by the main backend itself. But this approach is useful for
the cases where the query doesn't get any workers I feel.

Regards,
Hari Babu
Fujitsu Australia


[HACKERS] Increasing parallel workers at runtime

2017-05-15 Thread Haribabu Kommi
In the current parallel implementation, in case if the
number of planned workers doesn't available during
the start of the query execution, the query starts the
execution with the available number of workers till
the end of the query.

It may be possible that during the query processing
the required number of workers may be available.
so how about increasing the parallel workers to the
planned workers (only when the main backend has
going to wait for the workers send the tuples or it
has to process the plan by itself.)

The wait of the workers to send tuples is may be
because of less number of workers. So increasing
the parallel workers may improve the performance.

POC Patch attached for the same.

This still needs some adjustments to fix for the cases where
the main backend also does the scan instead of waiting for
the workers to finish the job. As increasing the workers logic
shouldn't add an overhead in this case.

Regards,
Hari Babu
Fujitsu Australia


parallel_workers_increase_at_runtime.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

2017-05-08 Thread Haribabu Kommi
On Mon, May 8, 2017 at 11:39 AM, David Rowley 
wrote:

>
> We really need a machine with good IO concurrency, and not too much
> RAM to test these things out. It could well be that for a suitability
> large enough table we'd want to scan a whole 1GB extent per worker.
>
> I did post a patch to have heap_parallelscan_nextpage() use atomics
> instead of locking over in [1], but I think doing atomics there does
> not rule out also adding batching later. In fact, I think it
> structures things so batching would be easier than it is today.
>

As part of our internal PostgreSQL project, we developed parallel seq
scan with batch mode only. The problem that we faced with batch mode
is making sure that all the parallel workers should finish almost the same
time with a proper distribution of data pages. Otherwise, it may lead to
a problem where one worker only doing the last batch job and all others
gets finished their job. In these cases, we cannot achieve good performance.

Whereas in the current approach, the maximum time the last worker
will do the job is scanning the last one page of the table.

If we go with batching of 1GB per worker, there may be chances that, the
data that satisfies the query condition may fall into only one extent then
in these cases also the batching may not yield the good results.

Regards,
Hari Babu
Fujitsu Australia


[HACKERS] compiler warning with VS 2017

2017-05-04 Thread Haribabu Kommi
I am getting a compiler warning when I build the latest HEAD PostgreSQL with
visual studio 2017.

src/backend/replication/logical/proto.c(482): warning C4312: 'type cast':
conversion from 'unsigned int' to 'char *' of greater size

Details of the warning is available in the link [1].

The code at the line is,

tuple->values[i] = (char *) (Size)0xdeadbeef; /* make bad usage more
obvious */

If I change the code as (char *) (Size)0xdeadbeef;
or (char *) (INT_PTR)0xdeadbeef; the warning disappears.

How about fixing it as below and add the typecast or disable
this warning?

/*
 * PTR_CAST
 * Used when converting integer addresses to a pointer.
 * The casting is used to avoid generating warning in
 * windows
 */
#ifdef WIN32
#define PTR_CAST INT_PTR
#else
#define PTR_CAST
#endif

[1] - https://msdn.microsoft.com/en-us/library/h97f4b9y.aspx

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Re: [BUGS] BUG #14634: On Windows pg_basebackup should write tar to stdout in binary mode

2017-05-03 Thread Haribabu Kommi
On Wed, May 3, 2017 at 10:44 PM, Ashutosh Sharma <ashu.coe...@gmail.com>
wrote:

> Hi Craig,
>
> On Wed, May 3, 2017 at 10:50 AM, Craig Ringer <cr...@2ndquadrant.com>
> wrote:
> > On 3 May 2017 at 12:32, Haribabu Kommi <kommi.harib...@gmail.com> wrote:
> >> [Adding -hackers mailing list]
> >>
> >> On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehl...@agilent.com> wrote:
> >>>
> >>> The following bug has been logged on the website:
> >>>
> >>> Bug reference:  14634
> >>> Logged by:  Henry Boehlert
> >>> Email address:  henry_boehl...@agilent.com
> >>> PostgreSQL version: 9.6.2
> >>> Operating system:   Windows Server 2012 R2 6.3.9600
> >>> Description:
> >>>
> >>> Executing command pg_basebackup -D -F t writes its output to stdout,
> which
> >>> is open in text mode, causing LF to be converted to CR LF thus
> corrupting
> >>> the resulting archive.
> >>>
> >>> To write the tar to stdout, on Windows stdout's mode should be
> temporarily
> >>> switched to binary.
> >>>
> >>> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
> >>
> >>
> >> Thanks for reporting the issue.
> >> With the attached patch, I was able to extract the tar file that gets
> >> generated when the tar file is written into stdout. I tested the
> >> the compressed tar also.
> >>
> >> This bug needs to be fixed in back branches also.
> >
> > We should do the same for pg_dump in -Fc mode.
>
> Did you meant -Fp mode. I think we are already setting stdout file to
> binary mode if the output format is custom. Please refer to the
> following code in parseArchiveFormat() and _allocAH() respectively
>
> static ArchiveFormat
> parseArchiveFormat(const char *format, ArchiveMode *mode)
> {
> ...
> ...
> else if (pg_strcasecmp(format, "c") == 0)
> archiveFormat = archCustom;
> else if (pg_strcasecmp(format, "custom") == 0)
> archiveFormat = archCustom;
>
> else if (pg_strcasecmp(format, "p") == 0)
> archiveFormat = archNull;
> else if (pg_strcasecmp(format, "plain") == 0)
> archiveFormat = archNull;
> ...
> ...
> }
>
> static ArchiveHandle *
> _allocAH(const char *FileSpec, const ArchiveFormat fmt,
>  const int compression, bool dosync, ArchiveMode mode,
>  SetupWorkerPtrType setupWorkerPtr)
> {
>
> ...
> ...
> #ifdef WIN32
> if (fmt != archNull &&
> (AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0))
> {
> if (mode == archModeWrite)
> setmode(fileno(stdout), O_BINARY);
> else
> setmode(fileno(stdin), O_BINARY);
> }
> #endif
> ..
> ..
> }
>
> Please confirm.
>

I also think it is the plain text mode. There is no problem with text
mode file that contains the CR LF characters.

Meanwhile, I have unit tested the patch submitted by Hari upthread on
> postgresql v10 and it works fine. Below are the steps that i have
> followed to test Hari's patch.
>
> Without patch:
> ==
> C:\Users\ashu\postgresql\TMP\test\bin>.\pg_basebackup.exe -D - -F t -X
> none > base.tar
> NOTICE:  WAL archiving is not enabled; you must ensure that all required
> WAL seg
> ments are copied through other means to complete the backup
>
> C:\Users\ashu\postgresql\TMP\test\bin>tar -xf base.tar
> tar: Skipping to next header
> tar: Exiting with failure status due to previous errors
>
>
> With patch:
> ===
> C:\Users\ashu\git-clone-postgresql\postgresql\TMP\
> test\bin>.\pg_basebackup.exe
> -D - -F t -X none > base.tar
> NOTICE:  WAL archiving is not enabled; you must ensure that all required
> WAL seg
> ments are copied through other means to complete the backup
>
> C:\Users\ashu\postgresql\TMP\test\bin>cp base.tar ..\basebakup
>
> C:\Users\ashu\postgresql\TMP\test\basebakup>tar -xf base.tar
>
> C:\Users\ashu\postgresql\TMP\test\basebakup>ls
> PG_VERSIONpg_commit_ts   pg_multixact  pg_stat  pg_wal
> backup_label  pg_dynshmempg_notify pg_stat_tmp  pg_xact
> base  pg_hba.confpg_replslot   pg_subtrans
> postgresql.auto.conf
> base.tar  pg_ident.conf  pg_serial pg_tblspcpostgresql.conf
> globalpg_logical pg_snapshots  pg_twophase  tablespace_map
>
>
Thanks for the tests to verify the patch.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Re: [BUGS] BUG #14634: On Windows pg_basebackup should write tar to stdout in binary mode

2017-05-03 Thread Haribabu Kommi
On Wed, May 3, 2017 at 3:20 PM, Craig Ringer <cr...@2ndquadrant.com> wrote:

> On 3 May 2017 at 12:32, Haribabu Kommi <kommi.harib...@gmail.com> wrote:
> > [Adding -hackers mailing list]
> >
> > On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehl...@agilent.com> wrote:
> >>
> >> The following bug has been logged on the website:
> >>
> >> Bug reference:  14634
> >> Logged by:  Henry Boehlert
> >> Email address:  henry_boehl...@agilent.com
> >> PostgreSQL version: 9.6.2
> >> Operating system:   Windows Server 2012 R2 6.3.9600
> >> Description:
> >>
> >> Executing command pg_basebackup -D -F t writes its output to stdout,
> which
> >> is open in text mode, causing LF to be converted to CR LF thus
> corrupting
> >> the resulting archive.
> >>
> >> To write the tar to stdout, on Windows stdout's mode should be
> temporarily
> >> switched to binary.
> >>
> >> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
> >
> >
> > Thanks for reporting the issue.
> > With the attached patch, I was able to extract the tar file that gets
> > generated when the tar file is written into stdout. I tested the
> > the compressed tar also.
> >
> > This bug needs to be fixed in back branches also.
>
> We should do the same for pg_dump in -Fc mode.


There are no "CR LF" characters that are present in the dump file
that is created with custom format.

what is the problem do you see in custom format that needs similar
handling like pg_basebackup?

Regards,
Hari Babu
Fujitsu Australia


[HACKERS] Re: [BUGS] BUG #14634: On Windows pg_basebackup should write tar to stdout in binary mode

2017-05-02 Thread Haribabu Kommi
[Adding -hackers mailing list]

On Fri, Apr 28, 2017 at 6:28 PM,  wrote:

> The following bug has been logged on the website:
>
> Bug reference:  14634
> Logged by:  Henry Boehlert
> Email address:  henry_boehl...@agilent.com
> PostgreSQL version: 9.6.2
> Operating system:   Windows Server 2012 R2 6.3.9600
> Description:
>
> Executing command pg_basebackup -D -F t writes its output to stdout, which
> is open in text mode, causing LF to be converted to CR LF thus corrupting
> the resulting archive.
>
> To write the tar to stdout, on Windows stdout's mode should be temporarily
> switched to binary.
>
> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
>

Thanks for reporting the issue.
With the attached patch, I was able to extract the tar file that gets
generated when the tar file is written into stdout. I tested the
the compressed tar also.

This bug needs to be fixed in back branches also.

Regards,
Hari Babu
Fujitsu Australia


pg_basebackup_tar_to_stdout_windows_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] visual studio 2017 build support

2017-04-23 Thread Haribabu Kommi
Here I attached a small patch that adds the build support for
visual studio 2017.

The tools version number is still 14.X, irrespective of VS 2017
version of 15.0. I modified the versions accordingly.

Regards,
Hari Babu
Fujitsu Australia


vs2017_build_support.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2017-04-06 Thread Haribabu Kommi
On Thu, Apr 6, 2017 at 5:17 AM, Andres Freund  wrote:

> Hi,
>
>
> I'm somewhat inclined to think that this really would be better done in
> an extension like pg_stat_statements.
>

Thanks for the review.



>
> > +}
> > +
> > +/*
> > + * Count SQL statement for pg_stat_sql view
> > + */
> > +void
> > +pgstat_count_sqlstmt(const char *commandTag)
> > +{
> > + PgStat_SqlstmtEntry *htabent;
> > + boolfound;
> > +
> > + if (!pgstat_backend_sql)
> > + {
> > + /* First time through - initialize SQL statement stat
> table */
> > + HASHCTL hash_ctl;
> > +
> > + memset(_ctl, 0, sizeof(hash_ctl));
> > + hash_ctl.keysize = NAMEDATALEN;
> > + hash_ctl.entrysize = sizeof(PgStat_SqlstmtEntry);
> > + pgstat_backend_sql = hash_create("SQL statement stat
> entries",
> > +
>   PGSTAT_SQLSTMT_HASH_SIZE,
> > +
>   _ctl,
> > +
>   HASH_ELEM | HASH_BLOBS);
> > + }
> > +
> > + /* Get the stats entry for this SQL statement, create if necessary
> */
> > + htabent = hash_search(pgstat_backend_sql, commandTag,
> > +   HASH_ENTER, );
> > + if (!found)
> > + htabent->count = 1;
> > + else
> > + htabent->count++;
> > +}
>
>
> That's not really something in this patch, but a lot of this would be
> better if we didn't internally have command tags as strings, but as an
> enum.  We should really only convert to a string with needed.  That
> we're doing string comparisons on Portal->commandTag is just plain bad.
>
> If so, we could also make this whole patch a lot cheaper - have a fixed
> size array that has an entry for every possible tag (possibly even in
> shared memory, and then use atomics there).
>

During the development, I thought of using an array of all command tags
and update the counters using the tag name, but not like the enum values.
Now I have to create a mapping array with enums and tag names for easier
counter updates. The problem in this approach is, whenever any addition
of new commands, this mapping array needs to be updated with both
new enum and new tag name, whereas with hash table approach, it works
future command additions also, but there is some performance overhead
compared to the array approach.

I will modify the patch to use the array approach and provide it to the next
commitfest.



> > +++ b/src/backend/tcop/postgres.c
> > @@ -1109,6 +1109,12 @@ exec_simple_query(const char *query_string)
> >
> >   PortalDrop(portal, false);
> >
> > + /*
> > +  * Count SQL statement for pg_stat_sql view
> > +  */
> > + if (pgstat_track_sql)
> > + pgstat_count_sqlstmt(commandTag);
> > +
>
> Isn't doing this at the message level quite wrong?  What about
> statements executed from functions and such?  Shouldn't this integrate
> at the executor level instead?
>

Currently the patch is designed to find out only the user executed
statements
that are successfully finished, (no internal statements that are executed
from
functions and etc).

The same question was asked by dilip also in earlier mails, may be now it is
the time that we can decide the approach of statement counts.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-04-03 Thread Haribabu Kommi
On Thu, Mar 30, 2017 at 12:00 PM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

>
>
> On Wed, Mar 29, 2017 at 11:04 PM, Andreas Karlsson <andr...@proxel.se>
> wrote:
>
>> On 03/29/2017 05:43 AM, Haribabu Kommi wrote:
>> > Updated patch attached.
>>
>> I get a test failure in the pg_upgrade tests, but I do not have time
>> right now to investigate.
>>
>> The failing test is "Restoring database schemas in the new cluster".
>>
>
> Thanks for test.
>
> I found the reason for failure.
>
> Before this refactor patch, in case of --binary-upgrade, the pg_dumpall
> dumps all the global objects and also the database objects. These objects
> will be restored first during the preparation of the new cluster and later
> each individual database is restored.
>
> Because of the refactoring of the database objects, currently as part of
> globals dump with --binary-upgrade, no database objects gets dumped.
> During restore no databases are created. so while restoring individual
> database, it leads to failure as it not able to connect to the target
> database.
>

I modified the pg_upgrade code to use template1 database as a connecting
database while restoring the dump along with --create option to pg_restore
to create the database objects instead of connecting to the each individual
database.

And also while dumping the database objects, passed the new option of
--enable-pgdumpall-behaviour to pg_dump to dump the database objects
as it expected dump during pg_dumpall --binary-upgrade.

Both pg_dump and pg_upgrade tests are passed. Updated patch attached
I will add this patch to the next commitfest.

Regards,
Hari Babu
Fujitsu Australia


pg_dump_changes_5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] RE: DECLARE STATEMENT setting up a connection in ECPG

2017-03-30 Thread Haribabu Kommi
On Thu, Mar 30, 2017 at 1:57 PM, Ideriha, Takeshi <
ideriha.take...@jp.fujitsu.com> wrote:

>
> >+ if(connection_name == NULL)
> >+ {
> >+ /*
> >+ * Going to here means not using AT clause in the DECLARE STATEMENT
> >+ * We don't allocate a node to store the declared name because the
> >+ * DECLARE STATEMENT without using AT clause will be ignored.
> >+ */
> >+ return true;
> >+ }
> >
> >I am not sure that just ignore the declare statement may be wrong.
> >I feel whether such case is possible? Does the preprocessor allows it?
>
> As you pointed out, the above thing should be discussed.
> The current implementation is as follows:
>
> ECPG pre-processor allows the DECLARE STATEMENT without using AT clause.
> And the following statement after DECLARE STATEMENT such as PREPARE,
> EXECUTE are
> executed as usual on the current connection.
>
> But there's a limitation here.
>  (This limitation should be disccused earlier and be specified in the
> doc...
>   but I didn't realize this clearly by myself, sorry)
>
> When using DECLARE STATEMENT without AT clause
> and using OPEN statement with AT clause, it doesn't work.
>
> There's an example where you cannot fetch rows from db:
> EXEC SQL CONNECT TO db AS con;
>
> EXEC SQL DECLARE stmt STATEMENT;
> EXEC SQL AT con PREPARE stmt FROM :selectString;
> EXEC SQL AT con DECLARE cur CURSOR FOR stmt;
> EXEC SQL AT con OPEN cur;
>  ...
>
> This limitation looks troublesome for users,
> so maybe I need to fix this implementation.
>

As per above test steps, it doesn't produce the results and doesn't
generate the error also. I feel this needs to be fixed.

As we are at the end of commitfest, it is better you can move it
to next one commitfest and provide an updated patch to solve the
above problem.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-29 Thread Haribabu Kommi
On Wed, Mar 29, 2017 at 5:10 AM, Fujii Masao <masao.fu...@gmail.com> wrote:

> On Tue, Mar 28, 2017 at 1:40 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> >
> > Added stats collection for walsender, statrup and autovacuum processes.
> > The background workers that call pgstat_report_stat() function will
> > automatically
> > included.
>
> So you added pgstat_send_walwrites() into several places. But WAL activity
> by the background workers which don't call pgstat_report_stat() is still
> not
> considered in pg_stat_walwrites view. This design looks fragile to me.
> I think that we should capture all the WAL activities (even by such
> background
> workers) in XLogWrite(), instead.
>
> For example, we can add the "shared" counters (protected by WALWriteLock)
> into XLogCtl and make XLogWrite() update those counters if the process is
> NOT walwriter. OTOH, if it's walwriter, XLogWrite() updates its local
> counters.
> Then walwriter periodically packs those counters in the message and sends
> it to the stats collector.
>

Updated patch to use shared counter instead of adding pg_stat_ calls to send
the statistics from each background process/worker.

All the processes updates the shared counters, the walwriter process fill
the
local structure with the shared counters and send it to the stats collector.


> >
> > update patch attached.
>
> Thanks for updating the patch!
>
> +  writes
> +  bigint
> +  
> +  Number of WAL writes that are carried out by background
> processes and workers.
> +  
> 
> +  write_blocks
> +  bigint
> +  Number of WAL pages written to the disk by the
> background processes/workers
> + 
>
> Could you tell me why both numbers of WAL writes and pages written need to
> be reported? How can we tune the system by using those information?
>

With the column of write_blocks, it is useful to find out how many number
of blocks
that gets written from backend along with the number of writes. I feel
1. If we only get the count of number of writes and the write amount is
very less,
so it may not need any tuning.
2. If the amount of write is good even for less number of writes, then it
needs
some tuning.

Updated patch attached.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_walwrites_view_5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-03-29 Thread Haribabu Kommi
On Wed, Mar 29, 2017 at 11:04 PM, Andreas Karlsson <andr...@proxel.se>
wrote:

> On 03/29/2017 05:43 AM, Haribabu Kommi wrote:
> > Updated patch attached.
>
> I get a test failure in the pg_upgrade tests, but I do not have time right
> now to investigate.
>
> The failing test is "Restoring database schemas in the new cluster".
>

Thanks for test.

I found the reason for failure.

Before this refactor patch, in case of --binary-upgrade, the pg_dumpall
dumps all the global objects and also the database objects. These objects
will be restored first during the preparation of the new cluster and later
each individual database is restored.

Because of the refactoring of the database objects, currently as part of
globals dump with --binary-upgrade, no database objects gets dumped.
During restore no databases are created. so while restoring individual
database, it leads to failure as it not able to connect to the target
database.

Currently I marked the patch in the commitfest as "returned with feedback"
as in the current situation, this needs some analysis in handling database
objects in --binary-upgrade mode.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-03-28 Thread Haribabu Kommi
On Tue, Mar 28, 2017 at 12:50 AM, Andreas Karlsson 
wrote:

> Hi,
>
> Here is my review. I agree with the goal of the refactoring, as we want to
> make it easier to dump all the properties for the database object. But I
> think we need to solve the issues with the special casing of postgres and
> template1 which I personally would find very surprising if pg_dump -C did.
> On the other hand I think that we cannot get away from having pg_dumpall
> give them a special treatment.
>

Thanks for the review.

I added a new option --enable-pgdumpall-behaviour to get the pg_dumpall
behaviour for the database objects
while dumping them through pg_dump. I am open to change the option name if
we come up with any other
better name.


> The nitpicking section is for minor code style errors.
>
> = Functional review
>
> I have not done an in depth functional review due to the discussion about
> how postgres and template1 should be handled.
>
> - The patch does not apply cleanly anymore
>
> - I do not like the change in behavior which causes "pg_dump -C postgres"
> to no longer include CREATE DATABASE. Special treatment of specific
> databases based on name makes sense in pg_dumpall, but not in pg_dump.
>

With the new additional option, CREATE DATABASE commands for postgres and
special treatment of
"SET default_transaction_read_only = off" still held.

- There are test failures in the pg_dump tests. It seems like some could be
> related to that you do not include CREATE DATABASE postgres in the dumps
> but I also get errors like 'ERROR:  syntax error at or near
> "fault_tablespace"'.
>
> not ok 691 - createdb: dumps CREATE DATABASE postgres
> not ok 3003 - pg_dumpall_dbprivs: dumps CREATE DATABASE dump_test
> not ok 11 - restore full dump using environment variables for connection
> parameters
> not ok 12 - no dump errors
> not ok 13 - restore full dump with command-line options for connection
> parameters
> not ok 14 - no dump errors
>

Fixed. Now all tests pass.

= Code review
>
> - As a response to "TBD -- is it necessary to get the default encoding": I
> think so, but either way changing this seems unrelated to this patch.
>

Removed.


> - I know it is taken from the old pg_dumpall code, but the way the
> database owner is handled seems I wrong.think we should set it like the
> owner for other objects. And more importantly it should respect --no-owner.
>

Removed the code for owner, as it is handled in another place with ALTER
DATABASE
command.


> - The logic for switching database when setting the default table space is
> broken. You generate "\ connect" rather than "\connect".
>

Fixed.



> - I saw the comment "Note that we do not support initial privileges
> (pg_init_privs) on databases." and wondered: why not? I definitly think
> that we should support this.
>

This is the existing code that moved from pg_dumpall.

= Nitpicking
>
> - You should probably use SGML style  over  and
>  for inline tags.
>

Corrected.


> - In "database-level properties such as Ownership, ACLs, [...]" I do not
> think that "Ownerships" shuld be capitalized.
>

Fixed.

- There are two extra spaces on the lines below, and a space is missing
> after the closing tag.
>
>  ALTER ROLE IN DATABASE ...  SET commands.
>
> with --create option to dump  ALTER ROLE IN DATABASE ...  SET
> 
>

Fixed.


> - On the following comment ".." should be "...", since that is the correct
> way to write an ellipsis.
>
> * Frame the ALTER .. SET .. commands and fill it in buf.
>

Fixed.


> - Rename arrayitem to configitem in makeAlterConfigCommand().
>

Corrected.


> - In makeAlterConfigCommand() you should do "*pos++ = '\0';" rather than
> "*pos = 0;" and then remove the later + 1 so our code matches with the code
> in dumpFunc(). Either is correct, but it would be nice if both pieces of
> code looked more similar.
>

Corrected.


> - You removed an empty line in pg_backup_utils.h between globals variables
> and function declartions which I think should be left there. It should be
> directly after g_verbose.


Fixed.

- There is something wrong with the indentation of the query for collecting
> info about databases in dumpDatabase() for PG >= 9.6.
>

Fixed.

- Missing space before "'' as rdatacl" in dumpDatabase(), and a missing
> space at the end of the string.
>

Fixed.


> - Double space in 'FROM pg_database  "' in dumpDatabase().


Fixed.

Updated patch attached.


Regards,
Hari Babu
Fujitsu Australia


pg_dump_changes_4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-27 Thread Haribabu Kommi
On Tue, Mar 28, 2017 at 3:40 PM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

>
> update patch attached.
>

Forgot to execute git commit, new patch attached.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_walwrites_view_4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-27 Thread Haribabu Kommi
On Mon, Mar 27, 2017 at 1:27 PM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

>
>
> On Sat, Mar 25, 2017 at 6:40 AM, Fujii Masao <masao.fu...@gmail.com>
> wrote:
>
>> On Wed, Feb 15, 2017 at 12:53 PM, Haribabu Kommi
>> <kommi.harib...@gmail.com> wrote:
>> >
>> >
>> > On Wed, Feb 8, 2017 at 9:36 PM, Amit Kapila <amit.kapil...@gmail.com>
>> wrote:
>> >>
>> >> On Tue, Feb 7, 2017 at 11:47 AM, Haribabu Kommi
>> >> <kommi.harib...@gmail.com> wrote:
>> >> > Hi Hackers,
>> >> >
>> >> > I just want to discuss adding of a new statistics view that provides
>> >> > the information of wal writing details as follows
>> >> >
>> >>
>> >> +1.  I think it will be useful to observe WAL activity.
>> >
>> >
>> > Thanks for your opinion.
>> >
>> >> > postgres=# \d pg_stat_wal_writer
>> >> > View "pg_catalog.pg_stat_wal_writer"
>> >> > Column |   Type   | Collation |
>> Nullable
>> >> > |
>> >> > Default
>> >> >
>> >> > ---+--+-
>> --+--+-
>> >> >  num_backend_writes   | bigint   |   |
>> >> > |
>> >> >  num_total_writes  | bigint   |   |
>> |
>> >> >  num_blocks  | bigint   |   |  |
>> >> >  total_write_time   | bigint|   |  |
>> >> >  stats_reset   | timestamp with time zone |   |
>> >> > |
>> >> >
>> >> > The columns of the view are
>> >> > 1. Total number of xlog writes that are called from the backend.
>> >> > 2. Total number of xlog writes that are called from both backend
>> >> >  and background workers. (This column can be changed to just
>> >> >  display on the background writes).
>> >> > 3. The number of the blocks that are written.
>> >> > 4. Total write_time of the IO operation it took, this variable data
>> is
>> >> > filled only when the track_io_timing GUC is enabled.
>> >>
>> >> So, here is *write_time* the total time system has spent in WAL
>> >> writing before the last reset?
>> >
>> >
>> > total write_time spent in WAL writing "after" the last reset in
>> > milliseconds.
>> >
>> >> I think there should be a separate column for write and sync time.
>> >>
>> >
>> > Added.
>> >
>> >>
>> >> > Or it is possible to integrate the new columns into the existing
>> >> > pg_stat_bgwriter view also.
>> >> >
>> >>
>> >> I feel separate view is better.
>> >
>> >
>> > Ok.
>> >
>> > Following the sample out of the view after regress run.
>> >
>> > postgres=# select * from pg_stat_walwrites;
>> > -[ RECORD 1 ]--+--
>> > backend_writes | 19092
>> > writes | 663
>> > write_blocks   | 56116
>> > write_time | 0
>> > sync_time  | 3064
>> > stats_reset| 2017-02-15 13:37:09.454314+11
>> >
>> > Currently, writer, walwriter and checkpointer processes
>> > are considered as background processes that can do
>> > the wal write mainly.
>>
>
> Thanks for the review.
>
> I'm not sure if this categorization is good. You told that this view is
>> useful
>> to tune walwriter parameters at the top of this thread. If so, ISTM that
>> the information about walwriter's activity should be separated from
>> others.
>>
>
> Yes, that's correct. First I thought of providing the statistics of
> walwriter, but
> later in development, it turned into showing statistics of all wal write
> activity
> of background processes also to differentiate the actual write by the
> backends.
>
>
>> What about other processes which *can* write WAL, for example walsender
>> (e.g., BASE_BACKUP can cause WAL record), startup process (e.g., end-of-
>> recovery checkpoint) and logical replication worker (Not sure if it always
>> works with synchronous_commit=off, though). There might be other processes
>> which can write WAL
>
>
> It is possible to add the walsender, st

Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-26 Thread Haribabu Kommi
On Sat, Mar 25, 2017 at 6:40 AM, Fujii Masao <masao.fu...@gmail.com> wrote:

> On Wed, Feb 15, 2017 at 12:53 PM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> >
> >
> > On Wed, Feb 8, 2017 at 9:36 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> >>
> >> On Tue, Feb 7, 2017 at 11:47 AM, Haribabu Kommi
> >> <kommi.harib...@gmail.com> wrote:
> >> > Hi Hackers,
> >> >
> >> > I just want to discuss adding of a new statistics view that provides
> >> > the information of wal writing details as follows
> >> >
> >>
> >> +1.  I think it will be useful to observe WAL activity.
> >
> >
> > Thanks for your opinion.
> >
> >> > postgres=# \d pg_stat_wal_writer
> >> > View "pg_catalog.pg_stat_wal_writer"
> >> > Column |   Type   | Collation |
> Nullable
> >> > |
> >> > Default
> >> >
> >> > ---+--+-
> --+--+-
> >> >  num_backend_writes   | bigint   |   |
> >> > |
> >> >  num_total_writes  | bigint   |   |  |
> >> >  num_blocks  | bigint   |   |  |
> >> >  total_write_time   | bigint|   |  |
> >> >  stats_reset   | timestamp with time zone |   |
> >> > |
> >> >
> >> > The columns of the view are
> >> > 1. Total number of xlog writes that are called from the backend.
> >> > 2. Total number of xlog writes that are called from both backend
> >> >  and background workers. (This column can be changed to just
> >> >  display on the background writes).
> >> > 3. The number of the blocks that are written.
> >> > 4. Total write_time of the IO operation it took, this variable data is
> >> > filled only when the track_io_timing GUC is enabled.
> >>
> >> So, here is *write_time* the total time system has spent in WAL
> >> writing before the last reset?
> >
> >
> > total write_time spent in WAL writing "after" the last reset in
> > milliseconds.
> >
> >> I think there should be a separate column for write and sync time.
> >>
> >
> > Added.
> >
> >>
> >> > Or it is possible to integrate the new columns into the existing
> >> > pg_stat_bgwriter view also.
> >> >
> >>
> >> I feel separate view is better.
> >
> >
> > Ok.
> >
> > Following the sample out of the view after regress run.
> >
> > postgres=# select * from pg_stat_walwrites;
> > -[ RECORD 1 ]--+--
> > backend_writes | 19092
> > writes | 663
> > write_blocks   | 56116
> > write_time | 0
> > sync_time  | 3064
> > stats_reset| 2017-02-15 13:37:09.454314+11
> >
> > Currently, writer, walwriter and checkpointer processes
> > are considered as background processes that can do
> > the wal write mainly.
>

Thanks for the review.

I'm not sure if this categorization is good. You told that this view is
> useful
> to tune walwriter parameters at the top of this thread. If so, ISTM that
> the information about walwriter's activity should be separated from others.
>

Yes, that's correct. First I thought of providing the statistics of
walwriter, but
later in development, it turned into showing statistics of all wal write
activity
of background processes also to differentiate the actual write by the
backends.


> What about other processes which *can* write WAL, for example walsender
> (e.g., BASE_BACKUP can cause WAL record), startup process (e.g., end-of-
> recovery checkpoint) and logical replication worker (Not sure if it always
> works with synchronous_commit=off, though). There might be other processes
> which can write WAL


It is possible to add the walsender, stratup and other processes easily,
but not
background workers that does some wal write operations until unless they
report the stats with pgstat_report_stat(). Is it fine to ignore the
workers that
does not report the stats?



> Why didn't you separate "write_blocks", "write_time" and "sync_time" per
> the process category, like "backend_writes" and "writes"?
>

Ok. I will add those columns.


> This view doesn't seem to take into consideration the WAL writing and
> flushing
> during creating new WAL segment file.
>
> I think that it's better to make this view report also the number of WAL
> pages
> which are written when wal_buffer is full. This information is useful to
> tune the size of wal_buffers. This was proposed by Nagayasu before.
> https://www.postgresql.org/message-id/4ff824f3.5090...@uptime.jp


Ok. But this new column just shows how many times the WAL buffers are
flushed
because of wal buffers are full. Not the WAL pages that are actually
flushed because
of wal buffers full as a separate column.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] ANALYZE command progress checker

2017-03-23 Thread Haribabu Kommi
On Wed, Mar 22, 2017 at 8:11 PM, vinayak <pokale_vinayak...@lab.ntt.co.jp>
wrote:

>
> On 2017/03/21 21:25, Haribabu Kommi wrote:
>
>
>
> On Tue, Mar 21, 2017 at 3:41 PM, vinayak <pokale_vinayak...@lab.ntt.co.jp>
> wrote:
>
>> Thank you for testing the patch on Windows platform.
>>
>>
> Thanks for the updated patch.
>
> It works good for a normal relation.  But for a relation that contains
> child tables,
> the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results.
>
> Thank you for reviewing the patch.
> The attached patch implements a way to report sample rows count from
> acquire_sample_rows() even if called for child tables.
>
> How about adding another phase called PROGRESS_ANALYZE_PHASE_
> COLLECT_INHERIT_SAMPLE_ROWS
> and set this phase only when it is an inheritance analyze operation. And
> adding
> some explanation of ROWS_SAMPLED phase about inheritance tables
> how these sampled rows are calculated will provide good analyze progress of
> relation that contains child relations also.
>
> I have added the phase called PROGRESS_ANALYZE_PHASE_
> COLLECT_INH_SAMPLE_ROWS.
> I have also updated the documentation.
>

Thanks for the updated patch. I will check it.


> The ANALYZE command takes long time in computing statistics phase.So I
> think we can add some column or phase so that user can easily understand
> the progress.
> How about adding new column like "num_rows_processed" will compute the
> statistics of specified column?
>

I prefer a column with rows processed instead of a phase.
Because we already set the phase of compute stats and showing
the progress there will number of rows processed will be good.


> How about separate the computing "inheritance statistics" phase from the
> computing regular "single table" statistics.
> Comment?
>

Yes, this will be good to show both that states of inheritance of sampled
rows and
compute inheritance stats, so that it will be clearly visible to the user
the current
status.


Regards,
Hari Babu
Fujitsu Australia


Re: [WIP] RE: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

2017-03-22 Thread Haribabu Kommi
On Wed, Mar 22, 2017 at 7:57 PM, Ideriha, Takeshi <
ideriha.take...@jp.fujitsu.com> wrote:

> Hi, thank you very much for reviewing.
> Attached is v6 patch.
>
> >There was a minor conflict in applying 004_declareXX patch.
>
> I rebased 004_declareStmt_test_v6.patch.
>
> >Some comments in 001_declareStmt_preproc_v5.patch:
>
> >+  if (INFORMIX_MODE)
> >+  {
> >+  if (pg_strcasecmp(stmt+strlen("close "), "database") == 0)
> >+  {
> >+  if (connection)
> >+  mmerror(PARSE_ERROR, ET_ERROR, "AT option
> not allowed in CLOSE DATABASE statement");
> +
> >+  fprintf(base_yyout, "{ ECPGdisconnect(__LINE__,
> \"CURRENT\");");
> >+  whenever_action(2);
> >+  free(stmt);
> >+  break;
> >+  }
> >+  }
>
> >The same code block is present in the stmtClosePortalStmt condition to
> verify the close
> >statement. Because of the above code addition, the code present in
> stmtClosePortalStmt
> >is a dead code. So remove the code present in stmtClosePortalStmt or try
> to reuse it.
>
> I remove almost all the stmtClosePortalStmt code
> and just leave the interface which actually does nothing not to affect
> other codes.
>
> But I'm not sure this implementation is good or not.
> Maybe I should completely remove stmtClosePortalStmt code
> and rename ClosePortalStmtCLOSEcursor_name to stmtClosePortalStmt to make
> code easier to understand.
>
> >+static void
> >+output_cursor_name(char *str)
>
> >This function needs some comments to explain the code flow for better
> understanding.
>
> I add some explanation that output_cursor_name() filters escape sequences.
>
> >+/*
> >+ * Translate the EXEC SQL DECLARE STATEMENT into ECPGdeclare function
> >+ */
>
> >How about using Transform instead of Translate? (similar references also)
>
> I changed two comments with the word Translate.
>

Thanks for the updated patch. It looks good to me.
I have some comments in the doc patch.

+
+  
+   The third option is to declare a sql identifier linked to
+   the connection, for example:
+
+EXEC SQL AT connection-name DECLARE
statement-name STATEMENT;
+EXEC SQL PREPARE statement-name FROM
:dyn-string;
+
+   Once you link a sql identifier to a connection, you execute a dynamic
SQL
+   without AT clause.
+  

The above code part should be moved to the below of the following code
location
and provide some example usage of it in the example code will be better.


EXEC SQL SET CONNECTION connection-name;



+
+ DELARE CURSOR with a SQL statement identifier can
be written before PREPARE.
+

what is the need of providing details about DECLARE CURSOR here?

+
+ AT clause cannot be used with the SQL statement which have been
identified by DECLARE STATEMENT
+

I didn't clearly understand the limitation here, If you can provide an
example here, it will be good.

The test patch looks good to me.

Regards,
Hari Babu
Fujitsu Australia


Re: [WIP] RE: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

2017-03-22 Thread Haribabu Kommi
On Wed, Mar 22, 2017 at 12:51 PM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

>
>
> On Tue, Mar 7, 2017 at 4:09 PM, Ideriha, Takeshi <
> ideriha.take...@jp.fujitsu.com> wrote:
>
>>
>> Attached 004_declareStmt_test_v5.patch is a rebased one.
>> The rest of patches are same as older version.
>>
>
> Thanks for the update patch. I started reviewing the patches.
>
> There was a minor conflict in applying 004_declareXX patch.
>
> Some comments in 001_declareStmt_preproc_v5.patch:
>
> + if (INFORMIX_MODE)
> + {
> + if (pg_strcasecmp(stmt+strlen("close "), "database") == 0)
> + {
> + if (connection)
> + mmerror(PARSE_ERROR, ET_ERROR, "AT option not allowed in CLOSE DATABASE
> statement");
> +
> + fprintf(base_yyout, "{ ECPGdisconnect(__LINE__, \"CURRENT\");");
> + whenever_action(2);
> + free(stmt);
> + break;
> + }
> + }
>
> The same code block is present in the stmtClosePortalStmt condition to
> verify the close
> statement. Because of the above code addition, the code present in
> stmtClosePortalStmt
> is a dead code. So remove the code present in stmtClosePortalStmt or try
> to reuse it.
>
>
> +static void
> +output_cursor_name(char *str)
>
> This function needs some comments to explain the code flow for better
> understanding.
>
> +/*
> + * Translate the EXEC SQL DECLARE STATEMENT into ECPGdeclare function
> + */
>
> How about using Transform instead of Translate? (similar references also)
>
>
002_declareStmt_ecpglib_v5.patch:

+ struct connection *f = NULL;
+
  ecpg_init_sqlca(sqlca);
  for (con = all_connections; con;)
  {
- struct connection *f = con;
+ f = con;

What is the need of moving the structure declartion
to outside, i didn't find any usage of it later.

+ con = ecpg_get_connection(connection_name);
+ if (!con)
+ {
+ return;
+ }

No need of {}.

+ if (find_cursor(cursor_name, con))
+ {
+ /*
+ * Should never goto here, because ECPG has checked the duplication of
+ * the cursor in pre-compile stage.
+ */
+ return;
+ }

Do we really need this check? If it is for additional check, How about
checking
it with an Assert? (check for similar instances)

+ if (!ecpg_init(con, real_connection_name, line))
+ return false;

What is the need of ecpg_init call? why the same is not done in other
functions.
Better if you provide some comments about the need of the function call.

-#endif   /* _ECPG_LIB_EXTERN_H */
+#endif   /* _ECPG_LIB_EXTERN_H */

Not related change.

+ if(connection_name == NULL)
+ {
+ /*
+ * Going to here means not using AT clause in the DECLARE STATEMENT
+ * We don't allocate a node to store the declared name because the
+ * DECLARE STATEMENT without using AT clause will be ignored.
+ */
+ return true;
+ }

I am not sure that just ignore the declare statement may be wrong.
I feel whether such case is possible? Does the preprocessor allows it?

+ * Find the declared name referred by the cursor,
+ * then return the connection name used by the declared name.

How about rewriting the above statement as follows? This is because
we are not finding the declare name, as we are looping through all
the declare statements for this cursor.

"Find the connection name by referring the declared statements
cursors by using the provided cursor name"

+ struct declared_statement *cur = NULL;
+ struct declared_statement *prev = NULL;
+ struct declared_statement *next = NULL;
+
+ if(connection_name == NULL)
+ return;
+
+ cur = g_declared_list;
+ while(cur)
+ {
+ next = cur->next;
+ if (strcmp(cur->connection_name, connection_name) == 0)
+ {
+ /* If find then release the declared node from the list */
+ if(prev)
+ prev->next = next;
+ else
+ g_declared_list = next;
+
+ ecpg_log("ecpg_release_declared_statement: declared name %s is
released\n", cur->name);
+
+ ecpg_free(cur->name);
+ ecpg_free(cur->connection_name);
+ ecpg_free(cur->cursor_name);
+ ecpg_free(cur);
+
+ /* One connection can be used by multiple declared name, so no break here
*/
+ }
+ else
+ prev = cur;
+
+ cur = next;
+ }

The above logic can be written without "next" pointer.
That way it should be simpler.

-#endif   /* _ECPGTYPE_H */
+#endif /* _ECPGTYPE_H */

Not related change.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-03-21 Thread Haribabu Kommi
On Wed, Mar 22, 2017 at 1:53 PM, Seki, Eiji 
wrote:

>
> Thank you for your review, again.
>
> I think your proposals are better, so I reflected them.


Thanks for the updated patch. Patch looks good to me.
I marked it as "ready for committer".

While reviewing this patch, I found that PGXACT->vacuumFlags
variable name needs a rename because with the addition of
PROC_IN_LOGICAL_DECODING flag "vacuumFlags" doesn't
only use it for vacuum operation. I feel this variable can be renamed
as just "flags", but anyway that is a different patch.

Regards,
Hari Babu
Fujitsu Australia


Re: [WIP] RE: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

2017-03-21 Thread Haribabu Kommi
On Tue, Mar 7, 2017 at 4:09 PM, Ideriha, Takeshi <
ideriha.take...@jp.fujitsu.com> wrote:

>
> Attached 004_declareStmt_test_v5.patch is a rebased one.
> The rest of patches are same as older version.
>

Thanks for the update patch. I started reviewing the patches.

There was a minor conflict in applying 004_declareXX patch.

Some comments in 001_declareStmt_preproc_v5.patch:

+ if (INFORMIX_MODE)
+ {
+ if (pg_strcasecmp(stmt+strlen("close "), "database") == 0)
+ {
+ if (connection)
+ mmerror(PARSE_ERROR, ET_ERROR, "AT option not allowed in CLOSE DATABASE
statement");
+
+ fprintf(base_yyout, "{ ECPGdisconnect(__LINE__, \"CURRENT\");");
+ whenever_action(2);
+ free(stmt);
+ break;
+ }
+ }

The same code block is present in the stmtClosePortalStmt condition to
verify the close
statement. Because of the above code addition, the code present in
stmtClosePortalStmt
is a dead code. So remove the code present in stmtClosePortalStmt or try to
reuse it.


+static void
+output_cursor_name(char *str)

This function needs some comments to explain the code flow for better
understanding.

+/*
+ * Translate the EXEC SQL DECLARE STATEMENT into ECPGdeclare function
+ */

How about using Transform instead of Translate? (similar references also)


I am yet to review the other patches.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] ANALYZE command progress checker

2017-03-21 Thread Haribabu Kommi
On Tue, Mar 21, 2017 at 3:41 PM, vinayak 
wrote:

> Thank you for testing the patch on Windows platform.
>
>
Thanks for the updated patch.

It works good for a normal relation.  But for a relation that contains
child tables,
the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results.

How about adding another phase called
PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS
and set this phase only when it is an inheritance analyze operation. And
adding
some explanation of ROWS_SAMPLED phase about inheritance tables
how these sampled rows are calculated will provide good analyze progress of
relation that contains child relations also.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-03-21 Thread Haribabu Kommi
On Tue, Mar 21, 2017 at 3:16 PM, Seki, Eiji 
wrote:
>
>
> Thank you for you review.
>
> I reflected your comment and attach the updated patch.


Thanks for the updated patch.

+/* Use these flags in GetOldestXmin as "flags" */

How about some thing like the following.
/* Use the following flags as an input "flags" to GetOldestXmin function */


+/* Ignore vacuum backends (Note: this also ignores analyze with vacuum
backends) */
+#define PROCARRAY_FLAGS_VACUUM PROCARRAY_FLAGS_DEFAULT |
PROCARRAY_VACUUM_FLAG
+/* Ignore analyze backends (Note: this also ignores vacuum with analyze
backends) */
+#define PROCARRAY_FLAGS_ANALYZE PROCARRAY_FLAGS_DEFAULT |
PROCARRAY_ANALYZE_FLAG

Whenever the above flags are passed to the GetOldestXmin() function,
it just verifies whether any one of the flags are set in the backend flags
or not. And also actually the PROC_IN_ANALYZE flag will set when
analyze operation is started and reset at the end. I feel, it is not
required to mention the Note section.

+/* Ignore vacuum backends and analyze ones */

How about changing it as "Ignore both vacuum and analyze backends".


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-03-21 Thread Haribabu Kommi
Because of this refactor handing of database objects between
pg_dump and pg_dumpall, the latest pg_dump tap tests are
failing in the following scenarios.

1. CREATE DATABASE postgres

Before this patch, the pg_dump uses to dump the CREATE
DATABASE command of postgres but not by pg_dumpall.
During this refactor handling, the approach that I took in
pg_dump for the --create option to use the similar appraoch
of pg_dumpall to not to print the CREATE DATABASE commands
for "postgres" and "template1" databases.

It just prints the ALTER DATABASE commands to SET the
TABLESPACE for those two databases.

Solution -1) Just ignore dumping these CREATE DATABASE
commands and provide the user information in the documentation
to create "postgres" and "template1" database in the target in case
if they don't exist. If this kind of cases are very rare.

Solution-2) Add a new command line option/some other settings
to indicate the pg_dump execution is from pg_dumpall and follow
the current refactored behavior, otherwise follow the earlier pg_dump
behavior in handling CREATE DATABASE commands for "postgres"
and "template1" databases.


2.  In dumpDatabases function before calling the runPgDump command,
Before refactoring, it used to connect to the database and dump
"SET default_transaction_read_only = off;" to prevent some accidental
overwrite of the target.

I fixed it in the attached patch by removing the connection and dumping
the set command.

Does it needs the similar approach of solution-2) in previous problem and
handle dumping the "SET default_transaction_read_only = off;" whenever
the CREATE DATABASE and \connect command is issued?

Documentation is yet to update to reflect the above changes.

Regards,
Hari Babu
Fujitsu Australia


pg_dump_changes_3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-15 Thread Haribabu Kommi
On Thu, Mar 16, 2017 at 4:15 PM, vinayak <pokale_vinayak...@lab.ntt.co.jp>
wrote:
>
> On 2017/03/16 10:34, Haribabu Kommi wrote:
>
>
> Updated patch attached.
>
> The patch looks good to me.
>

Thanks for the review.

How about rename the view as "pg_stat_walwriter"?
>

With the use of name "walwriter" instead of "walwrites", the
user may confuse that this view is used for displaying
walwriter processes statistics. But actually it is showing
the WAL writes activity in the instance. Because of this
reason, I went with the name of "walwrites".


> The columns of view :
> backend_writes -> backend_wal_writes
> writes-> background_wal_writes
> write_blocks-> wal_write_blocks
> write_time->wal_write_time
> sync_time->wal_sync_time
>

As the view name already contains WAL, I am not sure
whether is it required to include WAL in every column?
I am fine to change if others have the same opinion of
adding WAL to column names.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-03-15 Thread Haribabu Kommi
On Fri, Feb 24, 2017 at 3:17 PM, Seki, Eiji 
wrote:

>
> Thank you for your comments.
>
> I reflected these comments to the attached patch. And I renamed IGNORE_XXX
> flags to PROCARRAY_XXX flags.


I checked the latest patch and I have some comments.

+static int
+ConvertProcarrayFlagToProcFlag(int flags)

I feel this function is not needed, if we try to maintain same flag values
for both PROC_XXX and PROCARRAY_XXX by writing some comments
in the both the declarations place to make sure that the person modifying
the flag values needs to update them in both the places. I feel it is
usually
rare that the flag values gets changed.

+ * Now, flags is used only for ignoring backends with some flags.

How about writing something like below.

The flags are used to ignore the backends in calculation when any of the
corresponding flags is set.

+#define PROCARRAY_A_FLAG_VACUUM

How about changing the flag name to PROCARRAY_VACUUM_FLAG
(similar changes to other declarations)


+/* Use these flags in GetOldestXmin as "flags" */

Add some comments here to explain how to use these flags.

+#define PROCARRAY_FLAGS_DEFAULT

same here also as PROCARRAY_DEFAULT_FLAGS
(similar changes to other declarations)


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-15 Thread Haribabu Kommi
On Thu, Mar 16, 2017 at 9:55 AM, Julien Rouhaud <julien.rouh...@dalibo.com>
wrote:

> On Wed, Feb 15, 2017 at 02:53:44PM +1100, Haribabu Kommi wrote:
> > Here I attached patch that implements the view.
> > I will add this patch to next commitfest.
>
> Hello,
>
> I just reviewed the patch.
>

Thanks for the review.

First, there are some whitespace issues that make git-apply complaining (at
> least lines 218 and 396).
>

Removed.

Patch is rather straightforward and works as expected, doc compiles without
> issue.
>
> I only found some minor issues:
>
> +  One row only, showing statistics about the
> +   wal writing activity. See
>
> +  Number of wal writes that are carried out by the backend
> process
>
>
> WAL should be uppercase (and for some more occurences).
>

Fixed.


> +  
> +  Number of wal writes that are carried out by background workers
> such as checkpointer,
> +  writer and walwriter.
>
> I guess you meant backgroung processes?
>

Yes, it is background processes. Updated.


> >+  This field data will be populated only when the track_io_timing
> GUC is enabled
> (and similar occurences)
>
> track_io_timing should link to 
> instead of
> mentionning GUC.
>

Updated accordingly.

I think you also need to update the track_io_timing description in
> sgml/config.sgml to mention this new view.
>

Added the reference of pg_stat_walwrites in the GUC description.


>
> +   else
> +   {
> +   LocalWalWritesStats.m_wal_total_write_time = 0;
> +   }
> (and similar ones)
>
> The brackets seem unnecessary.
>

Corrected.

Updated patch attached.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_walwrites_view_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-03-15 Thread Haribabu Kommi
On Thu, Mar 16, 2017 at 2:20 AM, Stephen Frost <sfr...@snowman.net> wrote:

> Greetings Hari Babu,
>
> * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
> > On Mon, Mar 13, 2017 at 6:52 AM, Stephen Frost <sfr...@snowman.net>
> wrote:
> > > And, naturally, re-reading the email as it hit the list made me realize
> > > that the documentation/error-message incorrectly said "3rd and 4th"
> > > bytes were being set to FF and FE, when it's actually the 4th and 5th
> > > byte.  The code was correct, just the documentation and the error
> > > message had the wrong numbers.  The commit message is also correct.
> >
> > Thanks for the review and corrections.
> >
> > I found some small corrections.
> >
> > s/4rd/4th/g -- Type correction.
> >
> > + Input is accepted in the following formats:
> >
> > As we are supporting many different input variants, and all combinations
> > are not listed, so how about changing the above statement as below.
> >
> > "Following are the some of the input formats that are accepted:"
>
> Good points, I incorporated them along with a bit of additional
> information in the documentation as to what we do actually support.
>
> > I didn't find any other problems during testing and review. The patch
> > is fine.
>
> Great!  I've committed this now.  If you see anything additional or
> other changes which should be made, please let me know.
>
> ... and bumped catversion after (thanks for the reminder, Robert!).
>

Thanks for the review and changes.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] utility commands benefiting from parallel plan

2017-03-15 Thread Haribabu Kommi
On Tue, Feb 28, 2017 at 12:48 PM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

>
>
> On Sat, Feb 25, 2017 at 3:21 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
>
>> On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
>> <kommi.harib...@gmail.com> wrote:
>> > Here I attached an implementation patch that allows
>> > utility statements that have queries underneath such as
>> > CREATE TABLE AS, CREATE MATERIALIZED VIEW
>> > and REFRESH commands to benefit from parallel plan.
>> >
>> > These write operations not performed concurrently by the
>> > parallel workers, but the underlying query that is used by
>> > these operations are eligible for parallel plans.
>> >
>> > Currently the write operations are implemented for the
>> > tuple dest types DestIntoRel and DestTransientRel.
>> >
>> > Currently I am evaluating other write operations that can
>> > benefit with parallelism without side effects in enabling them.
>> >
>> > comments?
>>
>> I think a lot more work than this will be needed.  See:
>>
>> https://www.postgresql.org/message-id/CA+TgmoZC5ft_t9uQWSO5_
>> 1vU6H8oVyD=zyuLvRnJqTN==fv...@mail.gmail.com
>>
>> ...and the discussion which followed.
>>
>
>
> Thanks for the link.
> Yes, it needs more work to support parallelism even for
> queries that involved in write operations like INSERT,
> DELETE and UPDATE commands.
>

This patch is marked as "returned with feedback" in the ongoing
commitfest.

The proposed DML write operations patch is having good number
of limitations like triggers and etc, but the utility writer operations
patch is in a good shape in my view to start supporting write operations.
This is useful for materialized view while refreshing the data.

Do you find any problems/missings in supporting parallel plan for utility
commands with the attached update patch?  Or is it something
like supporting all write operations at once?

Regards,
Hari Babu
Fujitsu Australia


0001_utility_write_using_parallel_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE command progress checker

2017-03-14 Thread Haribabu Kommi
On Fri, Mar 10, 2017 at 6:46 PM, vinayak 
wrote:

>
> + /* Report total number of heap blocks and collectinf sample row phase*/
> + initprog_val[0] = PROGRESS_ANALYZE_PHASE_COLLECT_HEAP_SAMPLE_ROWS;
> + initprog_val[1] = totalblocks;
> + pgstat_progress_update_multi_param(2, initprog_index, initprog_val);
>
> acquire_sample_rows function is called from acquire_inherited_sample_rows
> function, so adding the phase in that function will provide wrong info.
>
> I agree with you.
>
>
> +#define PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS 2
>
> why there is no code added for the phase, any specific reason?
>
> I am thinking how to report this phase. Do you have any suggestion?
>

Thanks for the update patch.

Actually the analyze operation is done in two stages for the relation that
contains child relations,
1. For parent relation
2. All child relations

So the progress with start each time for the above two stages. This needs
to clearly mentioned in the docs.

The acquire_sample_rows function collects statistics of the relation
that is provided, updating the analyze details like number of rows and etc
works fine for a single relation, but if it contains many child relations,
the
data will be misleading.

Apart from the above, some more comments.


+ /* Report compute index stats phase */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);

The above block of the code should be set when it actually doing the
compute_index_stats.

+ /* Report that we are now doing index cleanup */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);

The above code block should be inside  if (!(options & VACOPT_VACUUM))
block.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-03-13 Thread Haribabu Kommi
On Mon, Mar 13, 2017 at 6:52 AM, Stephen Frost <sfr...@snowman.net> wrote:

> Greetings,
>
> * Stephen Frost (sfr...@snowman.net) wrote:
> > * Stephen Frost (sfr...@snowman.net) wrote:
> > > * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
> > > > On Wed, Feb 1, 2017 at 6:27 AM, Vitaly Burovoy <
> vitaly.buro...@gmail.com> wrote:
> > > > > The new status of this patch is: Ready for Committer
> > > >
> > > > Thanks for the review.
> > >
> > > I've started taking a look at this with an eye towards committing it
> > > soon.
> >
> > I've spent a good bit of time going over this, possibly even more than
> > it was worth, but hopefully we'll see people making use of this data
> > type with PG10 and as more IPv6 deployment happens.
>
> And, naturally, re-reading the email as it hit the list made me realize
> that the documentation/error-message incorrectly said "3rd and 4th"
> bytes were being set to FF and FE, when it's actually the 4th and 5th
> byte.  The code was correct, just the documentation and the error
> message had the wrong numbers.  The commit message is also correct.
>

Thanks for the review and corrections.

I found some small corrections.

s/4rd/4th/g -- Type correction.

+ Input is accepted in the following formats:

As we are supporting many different input variants, and all combinations
are not listed, so how about changing the above statement as below.

"Following are the some of the input formats that are accepted:"

I didn't find any other problems during testing and review. The patch
is fine.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-03-13 Thread Haribabu Kommi
On Mon, Mar 13, 2017 at 6:38 AM, Stephen Frost <sfr...@snowman.net> wrote:

> Greetings,
>
> * Stephen Frost (sfr...@snowman.net) wrote:
> > * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
> > > On Wed, Feb 1, 2017 at 6:27 AM, Vitaly Burovoy <
> vitaly.buro...@gmail.com> wrote:
> > > > The new status of this patch is: Ready for Committer
> > >
> > > Thanks for the review.
> >
> > I've started taking a look at this with an eye towards committing it
> > soon.
>
> I've spent a good bit of time going over this, possibly even more than
> it was worth, but hopefully we'll see people making use of this data
> type with PG10 and as more IPv6 deployment happens.
>

Thanks for the review.


> Of particular note, I rewrote macaddr8_in to not use sscanf().
> sscanf(), at least on my system, would accept negative values even for
> '%2x', leading to slightly odd errors with certain inputs, including
> with our existing macaddr type:
>
> =# select '00-203040506'::macaddr;
> ERROR:  invalid octet value in "macaddr" value: "00-203040506"
> LINE 1: select '00-203040506'::macaddr;
>
> With my rewrite, the macaddr8 type will throw a clearer (in  my view, at
> least) error:
>
> =# select '00-203040506'::macaddr8;
> ERROR:  invalid input syntax for type macaddr8: "00-203040506"
> LINE 1: select '00-203040506'::macaddr8;
>
> One other point is that the previously disallowed format with just two
> colons ('0800:2b01:0203') is now allowed.  Given that both the two dot
> format ('0800.2b01.0203') and the two dash format ('0800-2b01-0203')
> were accepted, this seemed alright to me.  Is there really a good reason
> to disallow the two colon format?
>

No. There is no special reason to disallow.
The rewrite of macaddr8_in will allow all possible combinations of spacers.


> I also thought about what we expect the usage of macaddr8 to be and
> realized that we should really have a function to help go from EUI-48 to
> the IPv6 Modified EUI-64 format, since users will almost certainly want
> to do exactly that.  As such, I added a macaddr8_set7bit() function
> which will perform the EUI-64 -> Modified EUI-64 change (which is just
> setting the 7th bit) and added associated documentation and reasoning
> for why that function exists.
>

I checked and it is working as expected.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2017-03-07 Thread Haribabu Kommi
On Wed, Feb 1, 2017 at 3:13 PM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Fri, Jan 27, 2017 at 10:26 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > Thanks for the review.
> > Let's wait for the committer's opinion.
>
> I have moved this patch to CF 2017-03 to wait for this to happen.
>

Attached a rebased patch to resolve the OID conflict.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_sql_row_mode_5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-03-07 Thread Haribabu Kommi
On Wed, Mar 1, 2017 at 12:59 PM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

>
> Patch attached. Still some more docs needs to be added.
>

Updated patch attached to resolve the conflicts with following commit.

commit 9a83d56b38c870ce47b7651385ff2add583bf136
Author: Simon Riggs <si...@2ndquadrant.com>
Date:   Tue Mar 7 22:00:54 2017 +0800

Allow pg_dumpall to dump roles w/o user passwords

Add new option --no-role-passwords which dumps roles without passwords.
Since we don’t need passwords, we choose to use pg_roles in preference
to pg_authid since access may be restricted for security reasons in
some configrations.

Robins Tharakan and Simon Riggs



Regards,
Hari Babu
Fujitsu Australia


pg_dump_changes_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE command progress checker

2017-03-06 Thread Haribabu Kommi
On Tue, Mar 7, 2017 at 5:01 PM, Michael Paquier 
wrote:

>
> @@ -496,7 +499,6 @@ do_analyze_rel(Relation onerel, int options,
> VacuumParams *params,
> numrows = (*acquirefunc) (onerel, elevel,
>   rows, targrows,
>   , );
> -
> /*
> Useless diff.
>
> + 
> +   ANALYZE is currently collecting the sample rows.
> +   The sample it reads is taken randomly.Its size depends on
> +   the default_statistics_target parameter value.
> + 
> This should use a  markup for default_statistics_target.
>
> @@ -203,6 +204,8 @@ analyze_rel(Oid relid, RangeVar *relation, int options,
> if (onerel->rd_rel->relkind == RELKIND_RELATION ||
> onerel->rd_rel->relkind == RELKIND_MATVIEW)
> {
> +   pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
> +   RelationGetRelid(onerel));
> It seems to me that the report should begin in do_analyze_rel().


some more comments,

+ /* Report compute heap stats phase */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);

The above analyze phase is updated inside a for loop, instead just set it
above once.

+ /* Report compute index stats phase */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);

Irrespective of whether the index exists on the table or not, the above
analyze phase
is set. It is better to set the above phase and index cleanup phase only
when there
are indexes on the table.

+ /* Report total number of heap blocks and collectinf sample row phase*/

Typo? collecting?


+ /* Report total number of heap blocks and collectinf sample row phase*/
+ initprog_val[0] = PROGRESS_ANALYZE_PHASE_COLLECT_HEAP_SAMPLE_ROWS;
+ initprog_val[1] = totalblocks;
+ pgstat_progress_update_multi_param(2, initprog_index, initprog_val);

acquire_sample_rows function is called from acquire_inherited_sample_rows
function, so adding the phase in that function will provide wrong info.


+#define PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS 2

why there is no code added for the phase, any specific reason?


+/* Phases of analyze */

Can be written as following for better understanding, and also
similar like vacuum.

/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */


Regards,
Hari Babu
Fujitsu Australia


[HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-02-28 Thread Haribabu Kommi
Subject changed for better context of the patch.
(was - Re: Question about grant create on database and pg_dump/pg_dumpall)

On Fri, Sep 30, 2016 at 12:29 AM, Tom Lane
 wrote:
>
>1. pg_dump without --create continues to do what it does today, ie it just
>dumps objects within the database, assuming that database-level properties
>will already be set correctly for the target database.
>
>2. pg_dump with --create creates the target database and also sets all
>database-level properties (ownership, ACLs, ALTER DATABASE SET, etc etc).
>
>3. pg_dumpall loses all code relating to individual-database creation
>and property setting and instead relies on pg_dump --create to do that.
>This would leave only the code relating to "pg_dumpall -g" (ie, dump roles
>and tablespaces) within pg_dumpall itself.

I removed all the database related code from pg_dumpall and moved the
necessary part of the code into pg_dump and called pg_dump with --create
option from pg_dumpall to ensure that all the database create commands
are getting dumped.

Except postgres, template1 databases for rest of the databases the
CREATE DATABASE command is issued. And all other properties
dump is same for every database.

>One thing that would still be messy is that presumably "pg_dumpall -g"
>would issue ALTER ROLE SET commands, but it's unclear what to do with
>ALTER ROLE IN DATABASE SET commands.  Should those become part of
>"pg_dump --create"'s charter?  It seems like not, but I'm not certain.

Yes, I moved the ALTER ROLE IN DATABASE SET commands also as part
of pg_dump --create command, this way it will be easier to dump all the
database objects using (pg_dumpall -g and pg_dump -C ).

>Another thing that requires some thought is that pg_dumpall is currently
>willing to dump ACLs and other properties for template1/template0, though
>it does not invoke pg_dump on them.  If we wanted to preserve that
>behavior while still moving the code that does those things to pg_dump,
>pg_dump would have to grow an option that would let it do that.  But
>I'm not sure how much of that behavior is actually sensible.

Currently the ACLs and other changes related to template database are
getting
dumped with --create option in pg_dump. do we still need another option?

>This would probably take a pg_dump archive version bump, since I think
>we don't currently record enough information for --create to do this
>(and we can't just cram the extra commands into the DATABASE entry,
>since we don't know whether --create will be specified to pg_restore).
>But we've done those before.

There is no specific code is required related to the archive version check.
Still do we need to bump the archive version? As it just adds some new
commands as part of --create with pg_dump.

Patch attached. Still some more docs needs to be added.

comments?

[1] - https://www.postgresql.org/message-id/21573.1475162...@sss.pgh.pa.us

Regards,
Hari Babu
Fujitsu Australia


pg_dump_changes_1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] utility commands benefiting from parallel plan

2017-02-27 Thread Haribabu Kommi
On Sat, Feb 25, 2017 at 3:21 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > Here I attached an implementation patch that allows
> > utility statements that have queries underneath such as
> > CREATE TABLE AS, CREATE MATERIALIZED VIEW
> > and REFRESH commands to benefit from parallel plan.
> >
> > These write operations not performed concurrently by the
> > parallel workers, but the underlying query that is used by
> > these operations are eligible for parallel plans.
> >
> > Currently the write operations are implemented for the
> > tuple dest types DestIntoRel and DestTransientRel.
> >
> > Currently I am evaluating other write operations that can
> > benefit with parallelism without side effects in enabling them.
> >
> > comments?
>
> I think a lot more work than this will be needed.  See:
>
> https://www.postgresql.org/message-id/CA+TgmoZC5ft_t9uQWSO5_1vU6H8oVyD=
> zyuLvRnJqTN==fv...@mail.gmail.com
>
> ...and the discussion which followed.
>


Thanks for the link.
Yes, it needs more work to support parallelism even for
queries that involved in write operations like INSERT,
DELETE and UPDATE commands.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] utility commands benefiting from parallel plan

2017-02-27 Thread Haribabu Kommi
On Sat, Feb 25, 2017 at 2:45 AM, Dilip Kumar <dilipbal...@gmail.com> wrote:

> On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > Here I attached an implementation patch that allows
> > utility statements that have queries underneath such as
> > CREATE TABLE AS, CREATE MATERIALIZED VIEW
> > and REFRESH commands to benefit from parallel plan.
> >
> > These write operations not performed concurrently by the
> > parallel workers, but the underlying query that is used by
> > these operations are eligible for parallel plans.
> >
> > Currently the write operations are implemented for the
> > tuple dest types DestIntoRel and DestTransientRel.
> >
> > Currently I am evaluating other write operations that can
> > benefit with parallelism without side effects in enabling them.
>
> The Idea looks good to me.
>
> Since we are already modifying heap_prepare_insert, I am thinking that
> we can as well enable queries like "insert into .. select from .."
> with minor modification?
>

Thanks for the review.

I am finding it not so easy in supporting write operations like INSERT,
DELETE and UPDATE commands to use parallelism benefits for the
queries that are underneath.

Currently the parallelism is enabled only for the tables that don't have
any triggers and indexes with expressions. This limitation can be
removed after a though testing.

To support the same, I removed all the errors from heap functions
and functions to get a new transaction and updating the command id
to the current snapshot (Required for the cases where a single command
validates the input).

Attached a WIP patch for the support for DML write operations.
There is no functional change in base utility write support patch.

Regards,
Hari Babu
Fujitsu Australia


0002_dml_write_using_parallel_1.patch
Description: Binary data


0001_utility_write_using_parallel_1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] utility commands benefiting from parallel plan

2017-02-23 Thread Haribabu Kommi
Hi Hackers,

Here I attached an implementation patch that allows
utility statements that have queries underneath such as
CREATE TABLE AS, CREATE MATERIALIZED VIEW
and REFRESH commands to benefit from parallel plan.

These write operations not performed concurrently by the
parallel workers, but the underlying query that is used by
these operations are eligible for parallel plans.

Currently the write operations are implemented for the
tuple dest types DestIntoRel and DestTransientRel.

Currently I am evaluating other write operations that can
benefit with parallelism without side effects in enabling them.

comments?

Regards,
Hari Babu
Fujitsu Australia


utlity_write_with_query_parallel_1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-02-15 Thread Haribabu Kommi
On Wed, Feb 15, 2017 at 11:35 PM, Amit Kapila 
wrote:

> On Wed, Feb 15, 2017 at 12:03 PM, Seki, Eiji 
> wrote:
> > Amit Kapila wrote:
> >> How will you decide just based on oldest xmin whether the tuple is
> visible or not?  How will you take decisions about tuples which have xmax
> set?
> >
> > In our use case, GetOldestXmin is used by an original maintainer
> process[es] to an original control table[s]. The table can be concurrently
> read or inserted in any transactions. However, rows in the table can be
> deleted (set xmax) only by the maintainer process. Then, one control table
> can be processed by only one maintainer process at once.
> >
> > So I do MVCC as following.
> >
> > - The maintainer's transaction:
> >   - If xmax is set, simply ignore the tuple.
> >   - For other tuples, read tuples if GetOldestXmin() > xmin.
> > - Other transactions: Do ordinal MVCC using his XID.
> >
>
> Oh, this is a very specific case for which such an API can be useful.
> Earlier, I have seen that people proposing some kind of hooks which
> can be used for their specific purpose but exposing an API or changing
> the signature of an API sound bit awkward.  Consider tomorrow someone
> decides to change this API for some reason, it might become difficult
> to decide because we can't find it's usage.


The proposed change of new API is in the context of fixing the performance
problem of vertical clustered index feature [1].

During the performance test of VCI in parallel with OLTP queries, sometimes
the query performance is getting dropped because of not choosing the VCI
as the best plan. This is due to increase of more records in WOS relation
that are needed to be moved to ROS. If there are more records in WOS
relation, the it takes more time to generate the LOCAL ROS, because of
this reason, the VCI plan is not chosen.

Why there are more records in WOS? We used GetOldestXmin() function
to identify the minimum transaction that is present in the cluster in order
to
move the data from WOS to ROS. This function doesn't ignore the ANALYZE
transactions that are running in the system. As these transactions doesn't
do any changes that will affect the data movement from WOS to ROS.

Because of the above reason, we need a new API or some change in API
to provide the Oldest xmin by ignoring the ANALYZE transactions, so that
it will reduce the size of WOS and improves the VCI query performance.

[1] -
https://www.postgresql.org/message-id/cajrrpgfac7wc9nk6ptty6yn-nn+hcy8xolah2doyhvg5d6h...@mail.gmail.com

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-15 Thread Haribabu Kommi
On Thu, Feb 16, 2017 at 12:48 AM, Robert Haas  wrote:

> On Wed, Feb 15, 2017 at 12:24 AM, Joshua Chamberlain 
> wrote:
> > Hello,
> >
> > (I'm posting to hackers since I got no response on the general list.)
> >
> > I use Postgres + PostGIS quite heavily, and recently have been taking
> full
> > advantage of the new parallelism in 9.6. I'm now running queries in a few
> > hours that would otherwise take more than a day.
> >
> > However, parallelism is disabled for all queries that perform writes (as
> > documented). I would normally run "CREATE TABLE AS [some super-expensive
> > query]", but since that can't use parallelism I'm using the \o option in
> > psql, creating the table separately, and then \copy-ing in the results.
> That
> > works, but "CREATE TABLE AS" would be more convenient.
> >
> > Are there plans in 10.0 to allow parallelism in queries that write, or at
> > least in "CREATE TABLE AS" queries? (Support in materialized views would
> be
> > great, too!)
>
> Somebody else asked me about this in private email.  Nobody at
> EnterpriseDB is working on this right now, and I don't think anybody
> else is working on it either.  There are several ways to do it, but
> none of them are entirely easy and the ones likely to perform better
> are less easy.
>
> I think that what we need to do to make this work is come up with a
> way to fix the interaction between group locking (which allows
> multiple processes to hold mutually conflicting locks at the same time
> if they are in a parallel group) and relation extension (which uses
> heavyweight locking to prevent multiple processes from trying to
> extend the same relation at the same time).  I think that perhaps the
> right way to solve that problem is to come up with a way of providing
> mutual exclusion around relation extension that doesn't need the
> heavyweight lock manager.  Relation extension locks don't need
> multiple lock modes or deadlock detection or any of the other
> frammishes that the heavyweight lock manager provides, but they do
> need to be fast, which the heavyweight lock manager isn't especially
> good at.  So moving this out of the heavyweight lock manager might be
> a way to solve two problems at once.
>
> There's also a hazard related to GIN indexes since
> e2c79e14d998cd31f860854bc9210b37b457bb01, which introduced a new use
> of Page locks, which have a similar kind of problem.  We got rid of
> the major existing use of page locks in
> 6d46f4783efe457f74816a75173eb23ed8930020, which extirpated them from
> hash indexes, and I was kind of hoping they could go away altogether,
> but we can't do that as long as GIN is using them.
>
> Anyway, if we solve those problems, we can allow inserts (not updates
> or deletes, those have other problems, principally relating to combo
> CIDs) in parallel mode, which would make it possible to allow the
> kinds of things you are asking about here.  Then you could fix things
> so that each worker generates a subset of the tuples and inserts the
> ones it generates.  You'd end up with a parallel plan but no Gather
> node!   The workers could feed tuples directly to a suitable
> DestReceiver, which would be really spiffy.
>
> The other way of fixing this problem is to have each worker generate a
> subset of the tuples and funnel them all back to the leader through a
> Gather node; the leader then does all the inserts.  That avoids having
> to solve the problems mentioned above, but it probably doesn't perform
> nearly as well.
>

How about supporting something like, backend does the write operations
and whereas the worker will produce the results. This way it may not produce
good performance for all the cases compared to do the writer operation by
all parallel workers, but this may be useful for some scenarios like;
CREATE MATERIALIZED VIEW and etc.

Following are the explain plan with minimal changes in the code to allow
write operations. I didn't verified all the scenarios. How about supporting
writer operations as below and then later enhance it to do the write
operations
by the parallel workers also?

POC patch is attached.

postgres=# explain create materialized view mat_view as select * from tbl
where f1 =10;
   QUERY PLAN

 Gather  (cost=1000.00..37458.43 rows=1 width=214)
   Workers Planned: 2
   ->  Parallel Seq Scan on tbl  (cost=0.00..36458.33 rows=1 width=214)
 Filter: (f1 = 10)
(4 rows)

postgres=# explain insert into tbl select * from tbl where f1 = 10;
 QUERY PLAN


 Insert on tbl  (cost=1000.00..37458.43 rows=1 width=214)
   ->  Gather  (cost=1000.00..37458.43 rows=1 width=214)
 Workers Planned: 2
 ->  Parallel Seq Scan on tbl tbl_1  (cost=0.00..36458.33 rows=1
width=214)
   Filter: (f1 = 10)

Re: [HACKERS] pg_stat_wal_write statistics view

2017-02-14 Thread Haribabu Kommi
On Wed, Feb 8, 2017 at 9:36 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Tue, Feb 7, 2017 at 11:47 AM, Haribabu Kommi
> <kommi.harib...@gmail.com> wrote:
> > Hi Hackers,
> >
> > I just want to discuss adding of a new statistics view that provides
> > the information of wal writing details as follows
> >
>
> +1.  I think it will be useful to observe WAL activity.
>

Thanks for your opinion.

> postgres=# \d pg_stat_wal_writer
> > View "pg_catalog.pg_stat_wal_writer"
> > Column |   Type   | Collation | Nullable
> |
> > Default
> > ---+--+-
> --+--+-
> >  num_backend_writes   | bigint   |   |
> > |
> >  num_total_writes  | bigint   |   |  |
> >  num_blocks  | bigint   |   |  |
> >  total_write_time   | bigint|   |  |
> >  stats_reset   | timestamp with time zone |   |
> |
> >
> > The columns of the view are
> > 1. Total number of xlog writes that are called from the backend.
> > 2. Total number of xlog writes that are called from both backend
> >  and background workers. (This column can be changed to just
> >  display on the background writes).
> > 3. The number of the blocks that are written.
> > 4. Total write_time of the IO operation it took, this variable data is
> > filled only when the track_io_timing GUC is enabled.
>
> So, here is *write_time* the total time system has spent in WAL
> writing before the last reset?
>

total write_time spent in WAL writing "after" the last reset in
milliseconds.

I think there should be a separate column for write and sync time.
>
>
Added.


> > Or it is possible to integrate the new columns into the existing
> > pg_stat_bgwriter view also.
> >
>
> I feel separate view is better.
>

Ok.

Following the sample out of the view after regress run.

postgres=# select * from pg_stat_walwrites;
-[ RECORD 1 ]--+--
backend_writes | 19092
writes | 663
write_blocks   | 56116
write_time | 0
sync_time  | 3064
stats_reset| 2017-02-15 13:37:09.454314+11

Currently, writer, walwriter and checkpointer processes
are considered as background processes that can do
the wal write mainly.

Here I attached patch that implements the view.
I will add this patch to next commitfest.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_walwrites_view_1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-02-13 Thread Haribabu Kommi
On Tue, Feb 14, 2017 at 2:57 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> Hi,
>
> I wonder if it is possible to somehow benchmark your clustered index
> implementation.
> I tried to create VCI index for lineitem table from TPC and run Q6 query.
> After index creation Postgres is not using parallel execution plan any
> more but speed of sequential plan is not changed
> and nothing in query execution plan indicates that VCI index is used:
>
>
> postgres=# explain select
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>
>  QUERY
> PLAN
>
>
> 
> 
> ---
> 
> -
>  Finalize Aggregate  (cost=608333.85..608333.86 rows=1 width=4)
>->  Gather  (cost=608333.23..608333.84 rows=6 width=4)
>  Workers Planned: 6
>  ->  Partial Aggregate  (cost=607333.23..607333.24 rows=1 width=4)
>->  Parallel Seq Scan on lineitem_projection
> (cost=0.00..607024.83 rows=61680 width=8)
>  Filter: ((l_shipdate >= '1996-01-01'::date) AND
> (l_shipdate <= '1997-01-01'::date) AND (l_discount >= '0.08'::double
> precision) AN
> D (l_discount <= '0.1'::double precision) AND (l_quantity < '24'::double
> precision))
> (6 rows)
>
> postgres=# select
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>revenue
> -
>  6.2e+08
> (1 row)
>
> Time: 1171.324 ms (00:01.171)
>
> postgres=# create index vci_idx on lineitem_projection using
> vci(l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,
> l_returnflag,l_linestatus);
> CREATE INDEX
> Time: 4.705 ms
>
>
> postgres=# explain select
> * from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>
> QUERY
> PLAN
>
> 
> 
> ---
> ---
>  Seq Scan on lineitem_projection  (cost=0.00..382077.00 rows=1 width=22)
>Filter: ((l_shipdate >= '1996-01-01'::date) AND (l_shipdate <=
> '1997-01-01'::date) AND (l_discount >= '0.08'::double precision) AND
> (l_discount <= '
> 0.1'::double precision) AND (l_quantity < '24'::double precision))
> (2 rows)
>
> postgres=# select
>
>
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>   revenue
> 
>  6.2112e+08
> (1 row)
>
> Time: 4304.355 ms (00:04.304)
>
>
> I wonder if there is any query which can demonstrate advantages of using
> VCI index?
>

The current patch that I shared doesn't contains the plan and executor
changes to show
the performance benefit of the clustered index. we used custom plan to
generate the plan
for the clustered index. Currently I am working on it to rebase it to
current master and
other necessary changes.

In the current state of the patch, I cannot take any performance tests, as
it needs some
major changes according to the latest PostgreSQL version. I have an old
performance
report that is took on 9.5 attached for your reference.

The current patch that is shared is to find out the best approach in
developing a columnar
storage in PostgreSQL, by adopting Index access methods + additional hooks
or pluggable
storage access methods?

The only problem I can think of pluggable storage methods is, to use the
proper benefits of
columnar storage, the planner and executor needs to be changed to support
vector processing,
But whereas in the current model, we implemented the same with custom plan
and additional
hooks. The same may be possible with pluggable storage methods also.


Regards,
Hari Babu
Fujitsu Australia


VCI_DBT3_Query_Performance.xlsx
Description: MS-Excel 2007 spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-02-13 Thread Haribabu Kommi
On Tue, Feb 14, 2017 at 12:48 AM, Dilip Kumar  wrote:

> On Mon, Feb 13, 2017 at 6:24 PM, Robert Haas 
> wrote:
> > I don't think it's acceptable (or necessary) to move the DSA
> > definitions into postgres.h.  Why do you think you need to do that,
> > vs. just including dsa.h in a few more places?
>
> I need to access dsa_pointer in tidbitmap.h, which is included from
> FRONTEND as well. Now, problem is that dsa.h is including #include
> "port/atomics.h", but atomic.h can not be included if FRONTEND is
> defined.
>
> #ifndef ATOMICS_H
> #define ATOMICS_H
> #ifdef FRONTEND
> #error "atomics.h may not be included from frontend code"
> #endif
>
> Is there any other solution to this ?


How about creating another header file with the parallel changes
and include it only in necessary places?

Following are my observations, while going through the patch.

+#if SIZEOF_DSA_POINTER == 4
+typedef uint32 dsa_pointer;
+#else
+typedef uint64 dsa_pointer;
+#endif

I feel the declaration of the above typdef can be moved into the
section above if we going with the current move into postgres.h
file.

+/*
+ * tbm_alloc_shared
+ *
+ * Callback function for allocating the memory for hashtable elements.
+ * It allocates memory from DSA if tbm holds a reference to a dsa.
+ */
+static inline void *
+pagetable_allocate(pagetable_hash *pagetable, Size size)

Function name and comments mismatch?


Regards,
Hari Babu
Fujitsu Australia


[HACKERS] pg_stat_wal_write statistics view

2017-02-06 Thread Haribabu Kommi
Hi Hackers,

I just want to discuss adding of a new statistics view that provides
the information of wal writing details as follows

postgres=# \d pg_stat_wal_writer
View "pg_catalog.pg_stat_wal_writer"
Column |   Type   | Collation | Nullable |
Default
---+--+---+--+-
 num_backend_writes   | bigint   |   |
 |
 num_total_writes  | bigint   |   |  |
 num_blocks  | bigint   |   |  |
 total_write_time   | bigint|   |  |
 stats_reset   | timestamp with time zone |   |  |

The columns of the view are
1. Total number of xlog writes that are called from the backend.
2. Total number of xlog writes that are called from both backend
 and background workers. (This column can be changed to just
 display on the background writes).
3. The number of the blocks that are written.
4. Total write_time of the IO operation it took, this variable data is
filled only when the track_io_timing GUC is enabled.
5. Last time when the stats are reset.

I feel this view information may be useful in finding out how much
time does backend may spend in writing the xlog, based on this
information, it may be possible to tune wal_writer_delay GUC.

Or it is possible to integrate the new columns into the existing
pg_stat_bgwriter view also.

Opinions?

Regards,
Hari Babu
Fujitsu Australia


  1   2   3   4   5   6   >