[HACKERS] get current log file

2016-02-02 Thread Armor
Hello,


I find there is a new feature about getting current log file name on the 
TODO list (for detail please check 
http://www.postgresql.org/message-id/pine.gso.4.64.0811101325260.9...@westnet.com).
 On the other side, we finish a ticket to this requirement for our customer. 
If the PG community still need this feature,  there will be a pleasure for 
us to make contribution. 


--
Jerry Yu

Fw: Re: [HACKERS] get current log file

2016-02-24 Thread Armor
Sorry, forgot forwarding the mail to the mail list.
Please put some comments.


--
Jerry Yu
https://github.com/scarbrofair


 




-- Original --
From:  "Armor";<yupengst...@qq.com>;
Date:  Tue, Feb 2, 2016 09:22 PM
To:  "Alvaro Herrera"<alvhe...@2ndquadrant.com>; 

Subject:  Re:  [HACKERS] get current log file



 As we known, the name of current log file depends on the number of seconds 
(for simple, later I will call it last_syslogger_file_time) since Epoch when 
create new log file. So, for this feature, the key is how syslogger process 
pass last_syslogger_file_time to backend processes.


To pass last_syslogger_file_time, we have 2 solutions: 1, add a global 
variable to record last_syslogger_file_time which shared by backends and 
syslogger, so backends can get last_syslogger_file_time very easily; 2 
syslogger process send last_syslogger_file_time to pgstat process when 
last_syslogger_file_time changes, just as other auxiliary processes send stat  
message to pgstat process, and  pgstat process will write  
last_syslogger_file_time into stat file so that backend can get 
last_syslogger_file_time via reading this stat file.


   For these 2 solutions, we prefer to later, because we want to keep the 
global variables space much simpler.
   On the other side, we need to add a new function named 
pg_stat_get_log_file_name() which will return the current log file name  
according to last_syslogger_file_time and log file name format.
   If you have any question, please let me know.
--
Jerry Yu


 




-- Original --
From:  "Alvaro Herrera";<alvhe...@2ndquadrant.com>;
Date:  Tue, Feb 2, 2016 06:30 PM
To:  "Armor"<yupengst...@qq.com>; 
Cc:  "pgsql-hackers"<pgsql-hackers@postgresql.org>; 
Subject:  Re: [HACKERS] get current log file



Armor wrote:
> Hello,
> 
> 
> I find there is a new feature about getting current log file name on the 
> TODO list (for detail please check 
> http://www.postgresql.org/message-id/pine.gso.4.64.0811101325260.9...@westnet.com).
>  On the other side, we finish a ticket to this requirement for our customer. 
> If the PG community still need this feature,  there will be a pleasure 
> for us to make contribution. 

Please propose a design and we'll discuss.  There's clearly need for
this feature.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Service

Re: [HACKERS] get current log file

2016-02-24 Thread Armor
As we known, the name of current log file depends on the number of seconds (for 
simple, later I will call it last_syslogger_file_time) since Epoch when create 
new log file. So, for this feature, the key is how syslogger process pass 
last_syslogger_file_time to backend processes.


To pass last_syslogger_file_time, we have 2 solutions: 1, add a global 
variable to record last_syslogger_file_time which shared by backends and 
syslogger, so backends can get last_syslogger_file_time very easily; 2 
syslogger process send last_syslogger_file_time to pgstat process when 
last_syslogger_file_time changes, just as other auxiliary processes send stat  
message to pgstat process, and  pgstat process will write  
last_syslogger_file_time into stat file so that backend can get 
last_syslogger_file_time via reading this stat file.


   For these 2 solutions, we prefer to later, because we want to keep the 
global variables space much simpler.
   On the other side, we need to add a new function named 
pg_stat_get_log_file_name() which will return the current log file name  
according to last_syslogger_file_time and log file name format.
   If you have any question, please let me know.
--
Jerry Yu


 




-- Original --
From:  "Alvaro Herrera";<alvhe...@2ndquadrant.com>;
Date:  Tue, Feb 2, 2016 06:30 PM
To:  "Armor"<yupengst...@qq.com>; 
Cc:  "pgsql-hackers"<pgsql-hackers@postgresql.org>; 
Subject:  Re: [HACKERS] get current log file



Armor wrote:
> Hello,
> 
> 
> I find there is a new feature about getting current log file name on the 
> TODO list (for detail please check 
> http://www.postgresql.org/message-id/pine.gso.4.64.0811101325260.9...@westnet.com).
>  On the other side, we finish a ticket to this requirement for our customer. 
> If the PG community still need this feature,  there will be a pleasure 
> for us to make contribution. 

Please propose a design and we'll discuss.  There's clearly need for
this feature.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Service

Re: [HACKERS] get current log file

2016-02-29 Thread Armor
Yes, if we cannot find a perfect solution, we need to wait. 
Actually, the customer need a unified interface to access the status of 
database, so we implement it. 


--
Jerry Yu
https://github.com/scarbrofair


 




-- Original --
From:  "Robert Haas";<robertmh...@gmail.com>;
Date:  Fri, Feb 26, 2016 07:33 PM
To:  "Armor"<yupengst...@qq.com>; 
Cc:  "Tom Lane"<t...@sss.pgh.pa.us>; "Euler Taveira"<eu...@timbira.com.br>; 
"Alvaro Herrera"<alvhe...@2ndquadrant.com>; 
"pgsql-hackers"<pgsql-hackers@postgresql.org>; 
Subject:  Re: [HACKERS] get current log file



On Fri, Feb 26, 2016 at 8:31 AM, Armor <yupengst...@qq.com> wrote:
> I think I know what you are concerned about. May be I did not explain my
> solution very clearly.
> (i) Using a variable named last_syslogger_file_time replace
> first_syslogger_file_time in syslogger.c. When postmaster initialize logger
> process,   last_syslogger_file_time will be assign the time stamp when
> logger start, then fork the child process for logger. Later logger will
> create a log file based on last_syslogger_file_time . And
> last_syslogger_file_time in the postmaster process will be inherited by
> other  auxiliary processes
> (ii) when pgstat process initialize, it will read  last_syslogger_file_time
> from pg stat file of last time (because pgstat process will write it to pg
> stat file). And then pgstat process will get last_syslogger_file_time
> inherit from postmaster,  if this version of  last_syslogger_file_time is
> larger then that read from the stat file, it means logger create a new log
> file so use it as the latest value; else means pgstat process crashed
> before, so it need to use the value from stat file as the latest.
> (iii) when logger rotate a log file, it will assign time stamp to
> last_syslogger_file_time  and send it to pg_stat process. And pg_stat
> process will write last_syslogger_file_time to stat file so can be read by
> other backends.
> () Adding a stat function named pg_stat_get_log_file_name, when user
> call it, it will read  last_syslogger_file_time from stat file and construct
> the log file name based on log file name format and
> last_syslogger_file_time, return the log file name eventually.
>
> However, there is a risk for this solution: when logger create a new log
> file and then try to send new last_syslogger_file_time to pg_stat process,
> and pg_stat process crash at this moment, so the new pg_stat process cannot
> get the latest  last_syslogger_file_time. However, I think this case is a
> corner case.

I don't think we're going to accept this feature if it might fail in
corner cases.  And that design seems awfully complex.

The obvious way to implement this, to me at least, seems to be for the
syslogger to write a file someplace in the data directory containing
the name of the current log file.  When it switches log files, it
rewrites that file.  When you want to know what the current logfile
is, you read that file.

But there's one thing I'm slightly baffled about: why would you
actually need this?  I mean, it seems like a good idea to set
log_filename to a pattern that makes the name of the current logfile
pretty well predictable.  If not, maybe you should just fix that.
Also, if not on Windows, if you do get confused about which logfile is
active, you could just use lsof on the log_directory to figure out
which file the syslogger has open.  I just can't really remember
having a problem with this, and I'm wondering why someone would.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] get current log file

2016-02-26 Thread Armor
I think I know what you are concerned about. May be I did not explain my 
solution very clearly.
(i) Using a variable named last_syslogger_file_time replace 
first_syslogger_file_time in syslogger.c. When postmaster initialize logger 
process,   last_syslogger_file_time will be assign the time stamp when logger 
start, then fork the child process for logger. Later logger will create a log 
file based on last_syslogger_file_time . And   last_syslogger_file_time in the 
postmaster process will be inherited by other  auxiliary processes 

(ii) when pgstat process initialize, it will read  last_syslogger_file_time 
from pg stat file of last time (because pgstat process will write it to pg stat 
file). And then pgstat process will get last_syslogger_file_time inherit from 
postmaster,  if this version of  last_syslogger_file_time is larger then that 
read from the stat file, it means logger create a new log file so use it as the 
latest value; else means pgstat process crashed before, so it need to use the 
value from stat file as the latest.
(iii) when logger rotate a log file, it will assign time stamp to 
last_syslogger_file_time  and send it to pg_stat process. And pg_stat process 
will write last_syslogger_file_time to stat file so can be read by other 
backends.
() Adding a stat function named pg_stat_get_log_file_name, when user call 
it, it will read  last_syslogger_file_time from stat file and construct the log 
file name based on log file name format and last_syslogger_file_time, return 
the log file name eventually.


However, there is a risk for this solution: when logger create a new log file 
and then try to send new last_syslogger_file_time to pg_stat process, and 
pg_stat process crash at this moment, so the new pg_stat process cannot get the 
latest  last_syslogger_file_time. However, I think this case is a corner case. 

--
Jerry Yu
https://github.com/scarbrofair


 




-- Original --
From:  "Tom Lane";<t...@sss.pgh.pa.us>;
Date:  Thu, Feb 25, 2016 10:47 PM
To:  "Robert Haas"<robertmh...@gmail.com>; 
Cc:  "Euler Taveira"<eu...@timbira.com.br>; "Armor"<yupengst...@qq.com>; 
"Alvaro Herrera"<alvhe...@2ndquadrant.com>; "Pgsql 
Hackers"<pgsql-hackers@postgresql.org>; 
Subject:  Re: [HACKERS] get current log file



Robert Haas <robertmh...@gmail.com> writes:
> On Thu, Feb 25, 2016 at 1:15 AM, Euler Taveira <eu...@timbira.com.br>>> wrote:
>>> To pass last_syslogger_file_time, we have 2 solutions: 1, add a
>>> global variable to record last_syslogger_file_time which shared by
>>> backends and syslogger, so backends can get last_syslogger_file_time
>>> very easily; 2 syslogger process send last_syslogger_file_time to pgstat
>>> process when last_syslogger_file_time changes, just as other auxiliary
>>> processes send stat  message to pgstat process, and  pgstat process will
>>> write  last_syslogger_file_time into stat file so that backend can
>>> get last_syslogger_file_time via reading this stat file.

>> I prefer (1) because (i) logfile name is not statistics and (ii) stats
>> collector could not respond in certain circumstances (and even discard
>> some messages).

> (1) seems like a bad idea, because IIUC, the syslogger process doesn't
> currently touch shared memory.  And in fact, shared memory can be
> reset after a backend exits abnormally, but the syslogger (alone among
> all PostgreSQL processes other than the postmaster) lasts across
> multiple such resets.

Yes, allowing the syslogger to depend on shared memory is right out.
I don't particularly care for having it assume the stats collector
exists, either -- in fact, given the current initialization order
it's physically impossible for syslogger to send to stats collector
because the former is started before the latter's communication
socket is made.

I haven't actually heard a use-case for exposing the current log file name
anyway.  But if somebody convinced me that there is one, I should think
that the way to implement it is to report the actual *name*, not
components out of which you could reconstruct the name only by assuming
that you know everything about the current syslogger configuration and
the code that builds log file names.  That's obviously full of race
conditions and code-maintenance hazards.

regards, tom lane

[HACKERS] One question about transformation ANY Sublinks into joins

2016-07-17 Thread Armor
Hi
I run a simple SQL with latest PG??
postgres=# explain select * from t1 where id1 in (select id2 from t2 where 
c1=c2);
 QUERY PLAN 

 Seq Scan on t1  (cost=0.00..43291.83 rows=1130 width=8)
   Filter: (SubPlan 1)
   SubPlan 1
 ->  Seq Scan on t2  (cost=0.00..38.25 rows=11 width=4)
   Filter: (t1.c1 = c2)
(5 rows)



and the table schema are as following:


postgres=# \d t1
  Table "public.t1"
 Column |  Type   | Modifiers 
+-+---
 id1| integer | 
 c1 | integer | 


postgres=# \d t2
  Table "public.t2"
 Column |  Type   | Modifiers 
+-+---
 id2| integer | 
 c2 | integer | 



 I find PG decide not to pull up this sublink because the whereClauses in 
this sublink refer to the Vars of parent query, for detail please check the 
function named convert_ANY_sublink_to_join in 
src/backend/optimizer/plan/subselect.c. 
 However, for such simple sublink which has no agg, no window function, no 
limit, may be we can carefully pull up the predicates in whereCluase which 
refers to the Vars of parent query, then pull up this sublink and produce a 
query plan as following:

postgres=# explain select * from t1 where id1 in (select id2 from t2 where 
c1=c2);
   QUERY PLAN   

 Hash Join  (cost=49.55..99.23 rows=565 width=8)
   Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
   ->  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=46.16..46.16 rows=226 width=8)
 ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8)
   Group Key: t2.id2, t2.c2
   ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)
   
--
Jerry Yu
https://github.com/scarbrofair

Re: [HACKERS] One question about transformation ANY Sublinks into joins

2016-07-23 Thread Armor
After we pull up this sublink as semi join , when make join rel for semi join, 
the optimizer will take hash join method into account if a unique path can be 
created with the RHS, for detail please check make_join_rel in 
src/backend/optimizer/path/joinrels.c. 
For this case, the cost of  hash join is cheaper than semi join, so you can see 
the planner chose the hash join rather than semi join.


--
Jerry Yu
https://github.com/scarbrofair


 




-- Original --
From:  "Robert Haas";<robertmh...@gmail.com>;
Date:  Fri, Jul 22, 2016 00:23 AM
To:  "Armor"<yupengst...@qq.com>; 
Cc:  "pgsql-hackers"<pgsql-hackers@postgresql.org>; 
Subject:  Re: [HACKERS] One question about transformation ANY Sublinks into 
joins



On Sun, Jul 17, 2016 at 5:33 AM, Armor <yupengst...@qq.com> wrote:
> Hi
> I run a simple SQL with latest PG??
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>  QUERY PLAN
> 
>  Seq Scan on t1  (cost=0.00..43291.83 rows=1130 width=8)
>Filter: (SubPlan 1)
>SubPlan 1
>  ->  Seq Scan on t2  (cost=0.00..38.25 rows=11 width=4)
>Filter: (t1.c1 = c2)
> (5 rows)
>
> and the table schema are as following:
>
> postgres=# \d t1
>   Table "public.t1"
>  Column |  Type   | Modifiers
> +-+---
>  id1| integer |
>  c1 | integer |
>
> postgres=# \d t2
>   Table "public.t2"
>  Column |  Type   | Modifiers
> +-+---
>  id2| integer |
>  c2 | integer |
>
>  I find PG decide not to pull up this sublink because the whereClauses
> in this sublink refer to the Vars of parent query, for detail please check
> the function named convert_ANY_sublink_to_join in
> src/backend/optimizer/plan/subselect.c.
>  However, for such simple sublink which has no agg, no window function,
> no limit, may be we can carefully pull up the predicates in whereCluase
> which refers to the Vars of parent query, then pull up this sublink and
> produce a query plan as following:
>
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
>QUERY PLAN
> 
>  Hash Join  (cost=49.55..99.23 rows=565 width=8)
>Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
>->  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8)
>->  Hash  (cost=46.16..46.16 rows=226 width=8)
>  ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8)
>Group Key: t2.id2, t2.c2
>->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[HACKERS] propose to pushdown qual into EXCEPT

2016-12-23 Thread Armor
Recently, we find PG fails to generate an effective plan for following SQL:
select * from (select * from table1 execpt select * from table2) as foo 
where foo.a > 0;
Because PG does not pushdown qual to the none of the subquery. And I check 
the source code, find some comments in src/backend/optimizer/path/allpaths.c, 
which says "If the subquery contains EXCEPT or EXCEPT ALL set ops we cannot 
push quals into it, because that could change the results".
However, for this case, I think we can pushdown qual to the  left most 
subquery of EXCEPT, just like other database does. And we can get an more 
effective plan such as:
postgres=# explain select * from (select * from table1 except select * from 
table2) as foo where foo.a > 0;
   QUERY PLAN   


 Subquery Scan on foo  (cost=0.00..118.27 rows=222 width=8)
   ->  HashSetOp Except  (cost=0.00..116.05 rows=222 width=12)
 ->  Append  (cost=0.00..100.98 rows=3013 width=12)
   ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..45.78 rows=753 
width=12)
 ->  Seq Scan on table1  (cost=0.00..38.25 rows=753 width=8)
   Filter: (a > 0)
   ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..55.20 rows=2260 
width=12)
 ->  Seq Scan on table2  (cost=0.00..32.60 rows=2260 
width=8)
(8 rows)



   And the attached patch is a draft, it works for this case.



--
Jerry Yu
https://github.com/scarbrofair

push_qual_to_except.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers