[HACKERS] Re[2]: [HACKERS] Remove fsync ON/OFF as a visible option?
why does we take so many attention to fsync issue? but there are also table spaces in tmpfs, wal in tmpfs, disks with cache without bbu, writeback writes and fs without ordering and journal, any CLOUDS, etc etc... in our real world installations. more over not all of these issues are usually in dba's medium, and what dba really have to do -- is to accept ugly bottom storage properties and DO properly PITR/standby. and if we have PITR then "fsync or not fsync" in master host doesn't matter so much. and could matter providing fsync in archive host. but doing fsync in archive -- it is workaround for archive_command realization. in conclusion -- imho, full-page writes is more sensual than fsync when we guarantee PITR. -- misha
[HACKERS] Re: [HACKERS] Streaming replication and WAL archive interactions
> This should be a very common setup in the field, so how are people doing it >in practice? One of possible workaround with archive and streaming was to use pg_receivexlog from standby to copy/save WALs to archive. but with pg_receivexlog was also issue with fsync. [ master ] -- streaming --> [ standby ] -- pg_receivexlog --> [ /archive ] In that case archive is always in pre standby state and it could be better than had archive broken on promote. -- Misha
[HACKERS] Re: [HACKERS] On partitioning
maybe vertica's approach will be a useful example http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/AdministratorsGuide/Partitions/PartitioningTables.htm http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/CREATETABLE.htm ... [ PARTITION BY partition-clause ] -- Mikhail
[HACKERS] Re[2]: [HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)
Hi But maybe postgres should provide its own subsystem like linux active/inactive memory over and/or near shared buffers? There could be some postgres special heuristics in its own approach. And does anyone know how mysql-innodb guys are getting with similar issues? Thank you!
[HACKERS] freeze cannot be finished
Hello! Could anyone review patch suggested by Jeff Janes ? Initial thread http://www.postgresql.org/message-id/flat/1384356585.995240...@f50.i.mail.ru#1384356585.995240...@f50.i.mail.ru Thanks in advance! > > On Wed, Nov 13, 2013 at 3:53 PM, Sergey Burladyan < eshkin...@gmail.com > > wrote: > >Jeff Janes < jeff.ja...@gmail.com > writes: > > > >If I not mistaken, looks like lazy_scan_heap() called from lazy_vacuum_rel() > >(see [1]) skip pages, even if it run with scan_all == true, lazy_scan_heap() > >does not increment scanned_pages if lazy_check_needs_freeze() return false, > >so > >if this occurred at wraparound vacuum it cannot update pg_class, because > >pg_class updated via this code: > > > > new_frozen_xid = FreezeLimit; > > if (vacrelstats->scanned_pages < vacrelstats->rel_pages) > > new_frozen_xid = InvalidTransactionId; > > > > vac_update_relstats(onerel, > > new_rel_pages, > > new_rel_tuples, > > new_rel_allvisible, > > vacrelstats->hasindex, > > new_frozen_xid); > > > >so i think in our prevent wraparound vacuum vacrelstats->scanned_pages always > >less than vacrelstats->rel_pages and pg_class relfrozenxid never updated. > > Yeah, I think that that is a bug. If the clean-up lock is unavailable but > the page is inspected without it and found not to need freezing, then the > page needs to be counted as scanned, but is not so counted. > > commit bbb6e559c4ea0fb4c346beda76736451dc24eb4e > Date: Mon Nov 7 21:39:40 2011 -0500 > > But this was introduced in 9.2.0, so unless the OP didn't upgrade to 9.2 > until recently, I don't know why it just started happening. > > It looks like a simple fix (to HEAD attached), but I don't know how to test > it. > > Also, it seem like it might be worth issuing a warning if scan_all is true > but all was not scanned. > > Cheers, > > Jeff > > -- > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c new file mode 100644 index e1d6d1c..6778c7d *** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** lazy_scan_heap(Relation onerel, LVRelSta *** 602,607 --- 602,608 if (!lazy_check_needs_freeze(buf)) { UnlockReleaseBuffer(buf); + vacrelstats->scanned_pages++; continue; } LockBuffer(buf, BUFFER_LOCK_UNLOCK); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Wal sync odirect
i tell about wal_level is higher than MINIMAL wal_level != minimal http://doxygen.postgresql.org/xlogdefs_8h_source.html " 48 * Because O_DIRECT bypasses the kernel buffers, and because we never 49 * read those buffers except during crash recovery or if wal_level != minimal " >> hi, list. there are my proposal. i would like to tell about odirect in wal >> sync in wal_level is higher than minimal. i think in my case when wal >> traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache >> (or maybe ssd under wal) - there would be better if wall traffic could not >> harm os memory eviction. and i do not use streaming. my archive command may >> read wal directly without os cache. just opinion, i have not done any tests >> yet. but i am still under the some memory eviction anomaly. > >PostgreSQL already uses O_DIRECT for WAL writes if you use O_SYNC mode >for WAL writes. See comments in src/include/access/xlogdefs.h (search >for O_DIRECT). You should also examine >src/backend/access/transam/xlog.c, particularly the function >get_sync_bit(...) > >Try doing some tests with pg_test_fsync, see how performance looks. If >your theory is right and WAL traffic is putting pressure on kernel write >buffers, using fsync=open_datasync - which should be the default on >Linux - may help. > >I'd recommend doing some detailed tracing and performance measurements >before trying to proceed further. > >-- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > >-- >Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Auto explain target tables
hi, list, again. the next proposal into auto explain. one would be happy if could set list of target tables and indexes. sometimes it is very hard to detect who is using your indexes. but turn total logging on under thousands transactions per seconds is not seems like nice idea couse size of resulting log files (cpu utilization might not be so critical) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Wal sync odirect
hi, list. there are my proposal. i would like to tell about odirect in wal sync in wal_level is higher than minimal. i think in my case when wal traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe ssd under wal) - there would be better if wall traffic could not harm os memory eviction. and i do not use streaming. my archive command may read wal directly without os cache. just opinion, i have not done any tests yet. but i am still under the some memory eviction anomaly. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[2]: [HACKERS] Re: [HACKERS] high io BUT huge amount of free memory
> hm, in that case, wouldn't adding 48gb of physical memory have > approximately the same effect? or is something else going on? imho, adding 48gb would have no effects. server already has 376GB memory and still has a lot of unused GB. let me repeat, we added 80GB for files cache by decreasing buffers from 64GB to 16GB. there are was 150GB of unused, and now unused part is only 70GB. some of links i read about eviction http://linux-mm.org/PageReplacementDesign http://linux-mm.org/PageReplacementRequirements mikhail -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Re[3]: [HACKERS] high io BUT huge amount of free memory
vm state root@avi-sql09:~# /sbin/sysctl -a|grep vm vm.overcommit_memory = 0 vm.panic_on_oom = 0 vm.oom_kill_allocating_task = 0 vm.oom_dump_tasks = 0 vm.overcommit_ratio = 50 vm.page-cluster = 3 vm.dirty_background_ratio = 10 vm.dirty_background_bytes = 0 vm.dirty_ratio = 20 vm.dirty_bytes = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 3000 vm.nr_pdflush_threads = 0 vm.swappiness = 0 vm.nr_hugepages = 0 vm.hugetlb_shm_group = 0 vm.hugepages_treat_as_movable = 0 vm.nr_overcommit_hugepages = 0 vm.lowmem_reserve_ratio = 256 256 32 vm.drop_caches = 0 vm.min_free_kbytes = 65536 vm.percpu_pagelist_fraction = 0 vm.max_map_count = 65530 vm.laptop_mode = 0 vm.block_dump = 0 vm.vfs_cache_pressure = 100 vm.legacy_va_layout = 0 vm.zone_reclaim_mode = 0 vm.min_unmapped_ratio = 1 vm.min_slab_ratio = 5 vm.stat_interval = 1 vm.mmap_min_addr = 65536 vm.numa_zonelist_order = default vm.scan_unevictable_pages = 0 vm.memory_failure_early_kill = 0 vm.memory_failure_recovery = 1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[3]: [HACKERS] high io BUT huge amount of free memory
typo > if ( user cpu + io wait ) is ~140% then i have ~140GB free. 140% ===>> 1400% if ~14 cores are busy then ~140GB is free 10GB per process hmmm... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[2]: [HACKERS] high io BUT huge amount of free memory
thanks a lot for responses 1) just remind my case Intel 32 core = 2*8 *2threads Linux 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit shared_buffers 64GB / constant hit rate - 99,18 max_connections 160 / with pgbouncer pools there could not be more than 120 connections at all work_mem 32M checkpoint 1h 1.0 swap off numa off, interleaving on and ! disks usage 100% (free 128GB! WHY?) disk throughput - up-to 30MB/s (24r+6w) io - up-to 2,5-3K/s (0,5w + 2-2,5r) typical work load - pk-index-scans my warm work set is about 400GB db at all - 700GB 2) numactl mtyurin@avi-sql09:~$ numactl --hardware available: 1 nodes (0-0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 0 size: 393181 MB node 0 free: 146029 MB node distances: node 0 0: 10 3) !! i just found suspicious relation between "active" processes and free memory. ~1GB per process. in 376GB total memmory and 32 core if ( user cpu + io wait ) is ~140% then i have ~140GB free. but it could be just a coincidence. 4) now i think a) upgrade linux core (to 3.2!?) and then (if case still will be) b) set buffers to something like 300-320Gb 5) what do you know about workload in Berkus's case http://www.databasesoup.com/2012/04/red-hat-kernel-cache-clearing-issue.html ? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pending triggers infinite list
Hi all I've got suspicious behavior for transaction cooked with deferrable trigger. if trigger has update on row of his target table we get infinite "recursion" without limitation of stack depth. trigger -> update -> trigger -> update -> ... ... -- infinite pending list :) - Misha -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[2]: [HACKERS] standby, pg_basebackup and last xlog file
Hello all and Heikki personally Thank you for your answer I have some new points: 21.01.2013, 10:08 +02:00 от Heikki Linnakangas : >On 21.01.2013 09:14, Миша Тюрин wrote: >>Is there any reason why pg_basebackup has limitation in an online backup >> from the standby: "The backup history file is not created in the database >> cluster backed up." ? > >WAL archiving isn't active in a standby, so even if it created a backup >history file, it wouldn't go anywhere. Also, the way the backup history >files are named, if you take a backup on the master at the same time (or >another backup at the same time in the standby), you would end up with >multiple backup history files with the same name. > >>So i can't get last xlog file needed to restore :( > >Good point. That was an oversight in the patch that allowed base backups >from a standby. > >>Also maybe i can use something like ( pg_last_xlog_replay_location() + 1 >> ) after pg_basebackup finished. > >Yeah, that should work. > >>Does anybody know true way to getting last xlog file in case of applying >> pg_basebackup to standby? >>How pg_basebackup gets last xlog file in case of standby and -x option? > >The server returns the begin and end WAL locations to pg_basebackup, >pg_basebackup just doesn't normally print them out to the user. In >verbose mode, it does actually print them out, but only with -x, so that >doesn't help you either. If you can compile from source, you could >modify pg_basebackup.c to print the locations without -x and --verbose, >search lines that print out "transaction log start point / end position". 1) we can get last xlog by using control data's "Minimum recovery ending location" > > >As a workaround, without modifying the source, you can do this after >pg_basebackup has finished, to get the first WAL file you need: > >$ pg_controldata backupdir | grep "REDO WAL file" >Latest checkpoint's REDO WAL file:00030009 and I would like to correct your suggestion about first wal file 2.1) we should use backup_label to determine first needed wal 2.2) and we must not use redo from checkpoint. because there are might be more than one checkpoint during base_backup > >And as you suggested, pg_last_xlog_replay_location() for the last WAL >file you need. > >- Heikki > > >-- >Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-hackers - Misha -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[2]: [HACKERS] standby, pg_basebackup and last xlog file
Hello all and Heikki personally Thank you for your answer I have some new points: Понедельник, 21 января 2013, 10:08 +02:00 от Heikki Linnakangas : >On 21.01.2013 09:14, Миша Тюрин wrote: >>Is there any reason why pg_basebackup has limitation in an online backup >> from the standby: "The backup history file is not created in the database >> cluster backed up." ? > >WAL archiving isn't active in a standby, so even if it created a backup >history file, it wouldn't go anywhere. Also, the way the backup history >files are named, if you take a backup on the master at the same time (or >another backup at the same time in the standby), you would end up with >multiple backup history files with the same name. > >>So i can't get last xlog file needed to restore :( > >Good point. That was an oversight in the patch that allowed base backups >from a standby. > >>Also maybe i can use something like ( pg_last_xlog_replay_location() + 1 >> ) after pg_basebackup finished. > >Yeah, that should work. > >>Does anybody know true way to getting last xlog file in case of applying >> pg_basebackup to standby? >>How pg_basebackup gets last xlog file in case of standby and -x option? > >The server returns the begin and end WAL locations to pg_basebackup, >pg_basebackup just doesn't normally print them out to the user. In >verbose mode, it does actually print them out, but only with -x, so that >doesn't help you either. If you can compile from source, you could >modify pg_basebackup.c to print the locations without -x and --verbose, >search lines that print out "transaction log start point / end position". 1) we can get last xlog by using control data's "Minimum recovery ending location" > > >As a workaround, without modifying the source, you can do this after >pg_basebackup has finished, to get the first WAL file you need: > >$ pg_controldata backupdir | grep "REDO WAL file" >Latest checkpoint's REDO WAL file:00030009 and I would like to correct your suggestion about first wal file 2.1) we should use backup_label to determine first needed wal 2.2) and we must not use redo from checkpoint. because there are might be more than one checkpoint during base_backup > >And as you suggested, pg_last_xlog_replay_location() for the last WAL >file you need. > >- Heikki > > >-- >Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-hackers - Misha
[HACKERS] standby, pg_basebackup and last xlog file
Hello! I wrote to general ( [GENERAL] standby, pg_basebackup and last xlog file ) some times ago. but still hasn't got any feedback. Hello! Is there any reason why pg_basebackup has limitation in an online backup from the standby: "The backup history file is not created in the database cluster backed up." ? So i can't get last xlog file needed to restore :( Think i can use -x option for getting last xlog file, but i would like to minimize size of resulting backup. // i also get all WALs by archive_command, so there is no reason to get two copies of each wal during basebackup. Also maybe i can use something like ( pg_last_xlog_replay_location() + 1 ) after pg_basebackup finished. Does anybody know true way to getting last xlog file in case of applying pg_basebackup to standby? How pg_basebackup gets last xlog file in case of standby and -x option?