[HACKERS] Re[2]: [HACKERS] Remove fsync ON/OFF as a visible option?

2015-03-21 Thread Миша Тюрин


why does we take so many attention to fsync issue?
but there are also table spaces in tmpfs, wal in tmpfs, disks with cache 
without bbu, writeback writes and fs without ordering and journal, any CLOUDS, 
etc etc... in our real world installations.
more over not all of these issues are usually in dba's medium, and what dba 
really have to do -- is to accept ugly bottom storage properties and DO 
properly PITR/standby.
and if we have PITR then fsync or not fsync in master host doesn't matter so 
much. and could matter providing fsync in archive host. but doing fsync in 
archive -- it is workaround for archive_command realization.
in conclusion -- imho, full-page writes is more sensual than fsync when we 
guarantee PITR.
-- misha



[HACKERS] Re: [HACKERS] Streaming replication and WAL archive interactions

2015-02-11 Thread Миша Тюрин

  This should be a very common setup in the field, so how are  people doing it 
in practice?

One of possible workaround with archive and streaming was to use pg_receivexlog 
from standby to copy/save WALs to archive. but with pg_receivexlog was also 
issue with fsync.


[ master ] -- streaming -- [ standby ] -- pg_receivexlog -- [ /archive ]


In that case archive is always in pre standby state and it could be better than 
had archive broken on promote.
--
Misha

[HACKERS] Re: [HACKERS] On partitioning

2014-12-03 Thread Миша Тюрин

maybe vertica's approach will be a useful example

http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/AdministratorsGuide/Partitions/PartitioningTables.htm

http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/CREATETABLE.htm
... [ PARTITION BY partition-clause ]
--
Mikhail



[HACKERS] Re[2]: [HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-21 Thread Миша Тюрин

Hi
But maybe postgres should provide its own subsystem like linux active/inactive 
memory over and/or near shared buffers? There 
could be some postgres special heuristics in its own approach.
And does anyone know how mysql-innodb guys are getting with similar issues?
Thank you!

[HACKERS] freeze cannot be finished

2013-11-16 Thread Миша Тюрин
Hello! 

Could anyone review patch suggested by Jeff Janes ?

Initial thread 
http://www.postgresql.org/message-id/flat/1384356585.995240...@f50.i.mail.ru#1384356585.995240...@f50.i.mail.ru

Thanks in advance!

 
 On Wed, Nov 13, 2013 at 3:53 PM, Sergey Burladyan   eshkin...@gmail.com  
 wrote:
 Jeff Janes  jeff.ja...@gmail.com  writes:
 
 If I not mistaken, looks like lazy_scan_heap() called from lazy_vacuum_rel()
 (see [1]) skip pages, even if it run with scan_all == true, lazy_scan_heap()
 does not increment scanned_pages if lazy_check_needs_freeze() return false, 
 so
 if this occurred at wraparound vacuum it cannot update pg_class, because
 pg_class updated via this code:
 
     new_frozen_xid = FreezeLimit;
     if (vacrelstats-scanned_pages  vacrelstats-rel_pages)
         new_frozen_xid = InvalidTransactionId;
 
     vac_update_relstats(onerel,
                         new_rel_pages,
                         new_rel_tuples,
                         new_rel_allvisible,
                         vacrelstats-hasindex,
                         new_frozen_xid);
 
 so i think in our prevent wraparound vacuum vacrelstats-scanned_pages always
 less than vacrelstats-rel_pages and pg_class relfrozenxid never updated.
 
 Yeah, I think that that is a bug.  If the clean-up lock is unavailable but 
 the page is inspected without it and found not to need freezing, then the 
 page needs to be counted as scanned, but is not so counted.
 
 commit bbb6e559c4ea0fb4c346beda76736451dc24eb4e
 Date:   Mon Nov 7 21:39:40 2011 -0500
 
 But this was introduced in 9.2.0, so unless the OP didn't upgrade to 9.2 
 until recently, I don't know why it just started happening.
 
 It looks like a simple fix (to HEAD attached), but I don't know how to test 
 it.
 
 Also, it seem like it might be worth issuing a warning if scan_all is true 
 but all was not scanned.
 
 Cheers,
 
 Jeff
 
 --
 Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

 

--
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
new file mode 100644
index e1d6d1c..6778c7d
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*** lazy_scan_heap(Relation onerel, LVRelSta
*** 602,607 
--- 602,608 
  			if (!lazy_check_needs_freeze(buf))
  			{
  UnlockReleaseBuffer(buf);
+ vacrelstats-scanned_pages++;
  continue;
  			}
  			LockBuffer(buf, BUFFER_LOCK_UNLOCK);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [HACKERS] Wal sync odirect

2013-07-22 Thread Миша Тюрин

i tell about wal_level is higher than MINIMAL


wal_level != minimal
http://doxygen.postgresql.org/xlogdefs_8h_source.html

48   * Because O_DIRECT bypasses the kernel buffers, and because we never
49   * read those buffers except during crash recovery or if wal_level != 
minimal 

 hi, list. there are my proposal. i would like to tell about odirect in wal 
 sync in wal_level is higher than minimal. i think in my case when wal 
 traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache 
 (or maybe ssd under wal) - there would be better if wall traffic could not 
 harm os memory eviction. and i do not use streaming. my archive command may 
 read wal directly without os cache. just opinion, i have not done any tests 
 yet. but i am still under the some memory eviction anomaly.

PostgreSQL already uses O_DIRECT for WAL writes if you use O_SYNC mode
for WAL writes. See comments in src/include/access/xlogdefs.h (search
for O_DIRECT). You should also examine
src/backend/access/transam/xlog.c, particularly the function
get_sync_bit(...)

Try doing some tests with pg_test_fsync, see how performance looks. If
your theory is right and WAL traffic is putting pressure on kernel write
buffers, using fsync=open_datasync - which should be the default on
Linux - may help.

I'd recommend doing some detailed tracing and performance measurements
before trying to proceed further.

-- 
 Craig Ringer  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



[HACKERS] Wal sync odirect

2013-07-21 Thread Миша Тюрин
hi, list. there are my proposal. i would like to tell about odirect in wal sync 
in wal_level is higher than minimal. i think in my case when wal traffic is up 
to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe ssd 
under wal) - there would be better if wall traffic could not harm os memory 
eviction. and i do not use streaming. my archive command may read wal directly 
without os cache. just opinion, i have not done any tests yet. but i am still 
under the some memory eviction anomaly.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Auto explain target tables

2013-07-21 Thread Миша Тюрин

hi, list, again. the next proposal into auto explain. one would be happy if 
could set list of target tables and indexes. sometimes it is very hard to 
detect who is using your indexes. but turn total logging on under thousands 
transactions per seconds is not seems like nice idea couse size of resulting 
log files (cpu utilization might not be so critical)
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] Re: [HACKERS] high io BUT huge amount of free memory

2013-06-03 Thread Миша Тюрин
 hm, in that case, wouldn't adding 48gb of physical memory have
 approximately the same effect?  or is something else going on?

imho, adding 48gb would have no effects.

server already has 376GB memory and still has a lot of unused GB.
let me repeat, we added 80GB for files cache by decreasing buffers from 64GB to 
16GB.
there are was 150GB of unused, and now unused part is only 70GB.

some of links i read about eviction
http://linux-mm.org/PageReplacementDesign
http://linux-mm.org/PageReplacementRequirements


mikhail
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Миша Тюрин

  thanks a lot for responses
 
1) just remind my case

Intel 32 core = 2*8 *2threads
Linux 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux
PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 
4.4.5-8) 4.4.5, 64-bit
shared_buffers 64GB / constant hit rate - 99,18
max_connections 160 / with pgbouncer pools there could not be more than 120 
connections at all
work_mem 32M
checkpoint 1h 1.0
swap off
numa off, interleaving on 
and
! disks usage 100% (free 128GB! WHY?)
disk throughput - up-to 30MB/s (24r+6w)
io - up-to 2,5-3K/s (0,5w + 2-2,5r)
typical work load - pk-index-scans
my warm work set is about 400GB
db at all - 700GB

2) numactl 
mtyurin@avi-sql09:~$ numactl --hardware
available: 1 nodes (0-0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31
node 0 size: 393181 MB
node 0 free: 146029 MB
node distances:
node   0
  0:  10

3) !! i just found suspicious relation between active processes and free 
memory. ~1GB per process.
in 376GB total memmory and 32 core
if ( user cpu + io wait ) is ~140% then i have ~140GB free.
but it could be just a coincidence.

4) now i think
a) upgrade linux core (to 3.2!?) and then (if case still will be)
b) set buffers to something like 300-320Gb

5) what do you know about workload in Berkus's case 
http://www.databasesoup.com/2012/04/red-hat-kernel-cache-clearing-issue.html ?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[3]: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Миша Тюрин

typo

 if ( user cpu + io wait ) is ~140% then i have ~140GB free.

140% === 1400%

if ~14 cores are busy then ~140GB is free

10GB per process

hmmm...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [HACKERS] Re[3]: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Миша Тюрин
vm state

root@avi-sql09:~# /sbin/sysctl -a|grep vm
vm.overcommit_memory = 0
vm.panic_on_oom = 0
vm.oom_kill_allocating_task = 0
vm.oom_dump_tasks = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.dirty_background_ratio = 10
vm.dirty_background_bytes = 0
vm.dirty_ratio = 20
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000
vm.nr_pdflush_threads = 0
vm.swappiness = 0
vm.nr_hugepages = 0
vm.hugetlb_shm_group = 0
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 0
vm.lowmem_reserve_ratio = 256   256 32
vm.drop_caches = 0
vm.min_free_kbytes = 65536
vm.percpu_pagelist_fraction = 0
vm.max_map_count = 65530
vm.laptop_mode = 0
vm.block_dump = 0
vm.vfs_cache_pressure = 100
vm.legacy_va_layout = 0
vm.zone_reclaim_mode = 0
vm.min_unmapped_ratio = 1
vm.min_slab_ratio = 5
vm.stat_interval = 1
vm.mmap_min_addr = 65536
vm.numa_zonelist_order = default
vm.scan_unevictable_pages = 0
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pending triggers infinite list

2013-02-08 Thread Миша Тюрин

  Hi all

  I've got suspicious behavior for transaction cooked with deferrable trigger. 
if trigger has update on row of his target table we get infinite recursion 
without limitation of stack depth. 

  trigger - update - trigger - update - ... ...  -- infinite pending list :)


 - Misha


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] standby, pg_basebackup and last xlog file

2013-02-07 Thread Миша Тюрин


Hello all and Heikki personally

Thank you for your answer

I have some new points:


Понедельник, 21 января 2013, 10:08 +02:00 от Heikki Linnakangas 
hlinnakan...@vmware.com:
On 21.01.2013 09:14, Миша Тюрин wrote:
Is there any reason why pg_basebackup has limitation in an online backup 
 from the standby: The backup history file is not created in the database 
 cluster backed up. ?

WAL archiving isn't active in a standby, so even if it created a backup 
history file, it wouldn't go anywhere. Also, the way the backup history 
files are named, if you take a backup on the master at the same time (or 
another backup at the same time in the standby), you would end up with 
multiple backup history files with the same name.

So i can't get last xlog file needed to restore :(

Good point. That was an oversight in the patch that allowed base backups 
from a standby.

Also maybe i can use something like ( pg_last_xlog_replay_location() + 1 
 ) after pg_basebackup finished.

Yeah, that should work.

Does anybody know true way to getting last xlog file in case of applying 
 pg_basebackup to standby?
How pg_basebackup gets last xlog file in case of standby and -x option?

The server returns the begin and end WAL locations to pg_basebackup, 
pg_basebackup just doesn't normally print them out to the user. In 
verbose mode, it does actually print them out, but only with -x, so that 
doesn't help you either. If you can compile from source, you could 
modify pg_basebackup.c to print the locations without -x and --verbose, 
search lines that print out transaction log start point / end position.

1) we can get last xlog by using control data's Minimum recovery ending 
location 




As a workaround, without modifying the source, you can do this after 
pg_basebackup has finished, to get the first WAL file you need:

$ pg_controldata backupdir | grep REDO WAL file
Latest checkpoint's REDO WAL file:00030009

and I would like to correct your suggestion about first wal file
2.1)  we should use backup_label to determine first needed wal
2.2)  and we must not use redo from checkpoint. because there are might be more 
than one checkpoint during base_backup



And as you suggested, pg_last_xlog_replay_location() for the last WAL 
file you need.

- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

- Misha



[HACKERS] Re[2]: [HACKERS] standby, pg_basebackup and last xlog file

2013-02-07 Thread Миша Тюрин


Hello all and Heikki personally
Thank you for your answer

I have some new points:


21.01.2013, 10:08 +02:00 от Heikki Linnakangas hlinnakan...@vmware.com:
On 21.01.2013 09:14, Миша Тюрин wrote:
Is there any reason why pg_basebackup has limitation in an online backup 
 from the standby: The backup history file is not created in the database 
 cluster backed up. ?

WAL archiving isn't active in a standby, so even if it created a backup 
history file, it wouldn't go anywhere. Also, the way the backup history 
files are named, if you take a backup on the master at the same time (or 
another backup at the same time in the standby), you would end up with 
multiple backup history files with the same name.

So i can't get last xlog file needed to restore :(

Good point. That was an oversight in the patch that allowed base backups 
from a standby.

Also maybe i can use something like ( pg_last_xlog_replay_location() + 1 
 ) after pg_basebackup finished.

Yeah, that should work.

Does anybody know true way to getting last xlog file in case of applying 
 pg_basebackup to standby?
How pg_basebackup gets last xlog file in case of standby and -x option?

The server returns the begin and end WAL locations to pg_basebackup, 
pg_basebackup just doesn't normally print them out to the user. In 
verbose mode, it does actually print them out, but only with -x, so that 
doesn't help you either. If you can compile from source, you could 
modify pg_basebackup.c to print the locations without -x and --verbose, 
search lines that print out transaction log start point / end position.


1) we can get last xlog by using control data's Minimum recovery ending 
location 




As a workaround, without modifying the source, you can do this after 
pg_basebackup has finished, to get the first WAL file you need:

$ pg_controldata backupdir | grep REDO WAL file
Latest checkpoint's REDO WAL file:00030009


and I would like to correct your suggestion about first wal file
2.1)  we should use backup_label to determine first needed wal
2.2)  and we must not use redo from checkpoint. because there are might be more 
than one checkpoint during base_backup



And as you suggested, pg_last_xlog_replay_location() for the last WAL 
file you need.

- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

- Misha

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] standby, pg_basebackup and last xlog file

2013-01-20 Thread Миша Тюрин

  Hello!

  I wrote to general (  [GENERAL] standby, pg_basebackup and last xlog file ) 
some times ago. but still hasn't got any feedback.


  Hello!

  Is there any reason why pg_basebackup has limitation in an online backup from 
the standby: The backup history file is not created in the database cluster 
backed up. ?

  So i can't get last xlog file needed to restore :(

  Think i can use -x option for getting last xlog file, but i would like to 
minimize size of resulting backup. // i also get all WALs by archive_command, 
so there is no reason to get two copies of each wal during basebackup.

  Also maybe i can use something like ( pg_last_xlog_replay_location() + 1 ) 
after pg_basebackup finished.

  Does anybody know true way to getting last xlog file in case of applying 
pg_basebackup to standby?
  How pg_basebackup gets last xlog file in case of standby and -x option?