Re: [HACKERS] Where to point CommitFestOpen?

2008-11-03 Thread Jens-Wolfhard Schicke
Hash: SHA1

Tom Lane wrote:
 Somehow, prevent infinite loops doesn't seem like justification for
 refuse to deal with a situation that the software creates automatically.
 They ought to be willing to burrow more than one level ... see any Unix
 kernel's treatment of symlinks for behavior that has actually stood the
 test of usability over time.
Having faced similar problems in another wiki some month ago, I wrote the
attached script to automate some tasks in a wiki. Maybe it will be of use.

Unfortunately I wrote it for a german wiki, some of the special pages
are named differently. Hence to use it in the Postgres-Wiki, something needs
to be done probably. (Not much though).

In particular it includes a function to execute a redirect in all pages
referencing a redirect page, i.e. change the links within all incoming pages.

  Jens-W. Schicke
Version: GnuPG v1.4.9 (GNU/Linux)


use strict;
use warnings;
use MediaWiki;
use Data::Dumper;
use LWP;
use LWP::UserAgent;

my $wiki = MediaWiki-new() or die Wiki init failed;
my $lwp = LWP::UserAgent-new();
$lwp-agent(Drahflow's Wiki Bot);

my $WIKINAME = $ARGV[0] or die usage: ./ Wiki;

my $conf;

if($WIKINAME eq AK) {
  $conf = {
'wiki' = { 'host' = '', 'path' = '/' },
'bot' = { 'user' = 'Drahflow\'s Bot', 'pass' = 'secret' },
} else {
  die Unknown wiki: $WIKINAME;


while(my $command = STDIN) {
  chomp $command;
  last if($command eq q or $command eq quit);

  eval {
dumpContent($1) if($command =~ /^DUMP ([^|]*)$/);
execTest() if($command eq 'TEST');
cleanupRedirect($1, $2) if($command =~ /^CREDIR ([^|]*)\|?(del)?$/);
checkout($1) if($command =~ /^MVOUT ([^|]*)$/);
checkin($1, $2) if($command =~ /^MVIN ([^|]*)\|?([^|]*)$/);
copyout($1) if($command =~ /^GET ([^|]*)$/);
checkToDoUsage() if($command =~ /^QTODO$/);
checkLanguageSync() if($command =~ /^QLANG$/);
moveCategory($1, $2) if($command =~ /^CMV ([^|]*)\|?([^|]*)$/);
addCategories($1, $2) if($command =~ /^CADD (.*)\|\|(.*)$/);
  print STDERR $@ if $@;


sub loadSure {
  my ($name, $mode) = @_;

  die no mode given unless $mode;

  my $page = $wiki-get($name, $mode);
  unless($page and $page-load()) {
die could not load $name;

  print Page $name loaded.\n;
  return $page;

sub loadCategorySure {
  my ($name) = @_;

  unless($name =~ /Kategorie:|Category:/) {
die category name must be given with prefix;

  my $req = HTTP::Request-new(
'GET' = 'http://' . $conf-{'wiki'}-{'host'} . '/' . $name);
  my $res = $lwp-request($req);

  if(not $res-is_success()) {
die could not load $name;

  my ($subcatsPart) = $res-content() =~ /\nh2Unterkategorien(.*?)\nh2/s;
  my ($articlesPart) = $res-content() =~ /\nh2Seiten in der Kategorie(.*?)\nVon/s;

  my $subcats = [];
  my $articles = [];

  while(defined $subcatsPart and $subcatsPart =~ s/.*?a href=\/([^]+) title=([^]+)//) {
push @$subcats, $2;

  while(defined $articlesPart and $articlesPart =~ s/.*?a href=\/([^]+) title=([^]+)//) {
push @$articles, $2;

  print Category $name loaded.\n;
  return $articles, $subcats;

sub saveSure {
  my ($page, $summary, $minor) = @_;

  if($page-{'title'} =~ /Ortsgruppe/) {
askConfirmation(Page  . $page-{'title'} .  looks like it should be left alone);

  die no summary given unless $summary;

  if($minor) {
$page-{'minor'} = 1;
  } else {
$page-{'minor'} = 0;

  $page-{'summary'} = $summary;
  unless($page-save()) {
die could not save  . $page-{'title'};

  print Page  . $page-{'title'} .  saved.\n;
  return $page;

sub askConfirmation {
  my ($message) = @_;

  while(1) {
print == $message, continue [N/y]\n;
my $answer = STDIN;
chomp $answer;
if($answer eq '' or $answer eq 'n') {
  die User confirmation failed.;
if($answer eq 'y') {

sub dumpContent {
  my ($name) = @_;

  die no name given unless $name;

  my $page = loadSure($name, r);

  my $text = $page-content();
  print Dumper($text);

sub execTest {
  my $page = loadSure('Benutzer:Drahflow/Sandkasten', rw);

  my $text = $page-content();
  print Dumper($text);
  $page-{'content'} .= 'Minimaler Testlauf';
  saveSure($page, 'Testing [[Benutzer:Drahflow]]\'s Bot');

sub cleanupRedirect {
  my ($name, $del) = @_;

  die no name given unless $name;

  my @incoming;
  my $page = loadSure(Spezial:Linkliste/$name, r);
  my $content = $page-content();
  while($content =~ s!a href=/([^]+) title=([^]+)\2/a[^]*span class=mw-whatlinkshere-tools!!) {
my ($url, $title) = ($1, $2);
push @incoming, $title;
  print Incoming links:\n;
  print Dumper([EMAIL PROTECTED]);

  $page = 

Re: [HACKERS] [patch] gsoc, improving hash index v2

2008-08-05 Thread Jens-Wolfhard Schicke
Hash: SHA1

Xiao Meng wrote:
 Hi, hackers. Here is some test I run on a bigger set.
 The time of the two index is
 btree: 1/0.174700=5.00250125
 hash-patch: 1/0.199900=5.724098
Just to bring it to attention of everybody:

btree: 1/0.174700=5.724098
hash-patch: 1/0.199900=5.00250125

Hence the hash _is_ actually faster.

 $ pgbench -n -f /tmp/query.sql dict
 tps = 0.174694 (including connections establishing)
 tps = 0.174700 (excluding connections establishing)
 ---hash  patch-
 $ pgbench -n -f /tmp/query.sql dict
 transaction type: Custom query
 tps = 0.199892 (including connections establishing)
 tps = 0.199900 (excluding connections establishing)

Version: GnuPG v1.4.6 (GNU/Linux)


Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Jens-Wolfhard Schicke
Hash: SHA1

Tom Lane wrote:
 ERROR: database %s is being accessed by other users
 DETAIL: There are %d session(s) and %d prepared transaction(s) using the 
 I'm aware that this phrasing might not translate very nicely ... anyone
 have a suggestion for better wording?
I can only estimate translation effort into German, but how about:

DETAIL: Active users of the database: %d session(s), %d prepared transaction(s)

Version: GnuPG v1.4.6 (GNU/Linux)


Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Re: [HACKERS] Transaction-controlled robustness for replication

2008-07-22 Thread Jens-Wolfhard Schicke
Hash: SHA1

Simon Riggs wrote:
 Asynchronous commit controls whether we go to disk at time of commit, or
 whether we defer this slightly. We have the same options with
 replication: do we replicate at time of commit, or do we defer this
 slightly for performance reasons. DRBD and other replication systems
 show us that there is actually another difference when talking about
 synchronous replication: do we go to disk on the standby before
 acknowledging the primary?
 We can generalise this as three closed questions, answered either Yes
 (Synchronous) or No (Asynchronous)
 * Does WAL get forced to disk on primary at commit time?
 * Does WAL get forced across link to standby at commit time?
 * Does WAL get forced to disk on standby at commit time?
* Does WAL get applied [and synced] to disk on standby at commit time?
This is important if you want to use the standby as a read-only.
I am slightly confused about what the fsync setting does to all this, hence
the brackets.

I think that questions 2 and 3 are trivially bundled together. Once the
user can specify 2, implementing 3 should be trivial and vice versa.
I am not even convinced that these need to be two different parameters.
Also please note that an answer of yes to 3 means that 2 must also
be answered yes.

 We could represent this with 3 parameters:
 synchronous_commit = on | off
 synchronous_standby_transfer = on | off
 synchronous_standby_wal_fsync = on | off
synchronous_standby_apply = on | off# just to propose a name

 Changing the parameter setting at transaction-level would be expensive
 if we had to set three parameters.
What exactly does expensive mean? All three parameters can probably be set
in one TCP packet from client to server.

 Or we could use just a single parameter
 synchronous_commit = 'AAA', 'SAA', 'SSA', 'SSS' or on |off when no
 log-based replication is defined
 Having the ability to set these at the transaction-level would be very
 cool. Having it set via a *single* parameter would make it much more
 viable to switch between AAA for bulk, low importance data and SSS for
 very low volume, critical data, or somewhere in between on the same
 server, at the same time.
The problem with a single parameter is that everything becomes position
dependent and if whyever a new parameter is introduced, it's not easy to
upgrade old application code.

 So proposal in summary is
 * allow various modes of synchronous replication for perf/robustness
 * allow modes to be specified per-transaction
 * allow modes to be specified as a single parameter
How about creating named modes? This would give the user the ability to
define more fine-grained control especially in larger clusters of 
servers without totally clogging the parameter space and application code.
Whether this should be done SQL-style or in some config file is not so clear to 
although I'd prefer SQL-style like

  LOCALSYNCHRONOUS APPLY SYNCHRONOUS APPLY-- read-only slave SYNCHRONOUS APPLY-- read-only slave SYNCHRONOUS SHIP -- backup-server SYNCHRONOUS SHIP -- backup-server SYNHCRONOUS FSYNC-- backup-server with fast disks

and then something like

synchronize_mode = immediate_readonly;

Yeah, I know, give patches not pipe-dreams :)

  Jens-Wolfhard Schicke
Version: GnuPG v1.4.6 (GNU/Linux)


Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Re: [HACKERS] Array behavior oddities

2008-01-16 Thread Jens-Wolfhard Schicke
Hash: SHA1

Bruce Momjian wrote:
   An array subscript expression will return null if either the array itself
   or any of the subscript expressions are null.  Also, null is returned
   if a subscript is outside the array bounds (this case does not raise an
   error).  For example, if literalschedule/ currently has the
   dimensions literal[1:3][1:2]/ then referencing
   literalschedule[3][3]/ yields NULL.  Similarly, an array reference
   with the wrong number of subscripts yields a null rather than an error.
   An array slice expression likewise yields null if the array itself or
   any of the subscript expressions are null.  However, in other corner
   cases such as selecting an array slice that is completely outside the
   current array bounds, a slice expression yields an empty
   (zero-dimensional) array instead of null.  If the requested slice
   partially overlaps the array bounds, then it is silently reduced to just
   the overlapping region.
 Is there a reason out-of-bounds array accesses behave differently for
 slices and non-slices?
 Having slices and non-slices behave differently is very confusing to me.
I think the case of partially-out-of-bound slices is a good reason to have
this difference:

fastgraph=# select ('{foo,bar}'::text[])[1:2];
- ---
(1 row)

fastgraph=# select ('{foo,bar}'::text[])[2:3];
- ---
(1 row)

fastgraph=# select ('{foo,bar}'::text[])[3:4];
- --
(1 row)

We cannot return an empty array in case of unsliced out-of-bounds access
because the type wouldn't match at all.

Version: GnuPG v1.4.6 (GNU/Linux)


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] There's random access and then there's random access

2007-12-02 Thread Jens-Wolfhard Schicke
Hash: SHA1

Tom Lane wrote:
 Gregory Stark [EMAIL PROTECTED] writes:
 Recently there was a post on -performance about a particular case where
 Postgres doesn't make very good use of the I/O system. This is when you try 
 fetch many records spread throughout a table in random order.
 Since the OP in that thread has still supplied zero information
 (no EXPLAIN, let alone ANALYZE, and no version info), it's pure
 guesswork as to what his problem is.
Nonetheless, asynchronous IO will reap performance improvements.
Wether a specific case would indeed benefit from it is imho irrelevant,
if other cases can indeed be found, where performance would be
improved significantly.

I experimented with a raid of 8 solid state devices, and found that
the blocks/second for random access improved signifacantly with the
number of processes doing the access. I actually wanted to use said
raid as a tablespace for postgresql, and alas, the speedup did not
depend on the number of drives in the raid, which is very unfortunate.

I still got the lower solid-state latency, but the raid did not help.

   Jens-Wolfhard Schicke
Version: GnuPG v1.4.6 (GNU/Linux)


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

Re: [HACKERS] Ordered Append Node

2007-11-23 Thread Jens-Wolfhard Schicke
Hash: SHA1

Gregory Stark wrote:
 But that requires a) dealing with the problem of the parent table which has no
 constraints and b) an efficient way to prove that constraints don't overlap
 and order them properly. The latter looks like an O(n^2) problem to me, though
 it's a planning problem which might be worth making slow in exchange for even
 a small speedup at run-time.

Is it really worthwile to optimize away the heap access by thinking about what 
child tables hold? If the tables are read using IO, I think the complete plan 
turn out to be IO-bound, and the heap is of no interest. If the tables reside in
memory, the heap still only slows the process down by O(log(number of 
tables)) which
usually won't be that much imho.

Nonetheless, in the case of range partitioning, a sort on the lower ends of the 
and a linear test of neighbouring ranges for overlap, skipping emtpy ranges,
should work in O(n log(n)).

   Jens-W. Schicke
Version: GnuPG v1.4.6 (GNU/Linux)


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Jens-Wolfhard Schicke
Hash: SHA1

Shreya Bhargava wrote:
 Note that the bottom line for the problems with hash indexes is that the
 current implementation doesn't offer any advantages over btree indexes. Hash
 indexes need to be not only as good of a choice as btree indexes but
 significantly better a significantly better choice at least for some
 specific circumstances.
Oh it does. I recently used a hash index to speed up my database. Namely
I found it improved performance when indexing a non-integer column
containing english words.

I don't know how much of that data was cached, according to the sound
of my harddrive it wasn't all of it. Consider this anecdotical
evidence, but the speedup was noticeable.

 We performed some probe tests on our patch on 
 hash index and the original btree index to compare the 
 performance between the two. We used a 80 million tuple table.
 The table contained single integer attribute and the data
 range was 0 - 1. (generated by a random generator).

I'd be interested how much difference is there between non-integer
index behaviour. I at least had the impression that in my case
the sorted strings in the btree pages didn't compare too well.

 */Gregory Stark [EMAIL PROTECTED]/* wrote:
 Kenneth Marshall writes:
  On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
  Using our implementation, build times and index sizes are
  comparable with btree index build times and index sizes.
Way to go! Currently building hash indices is no fun.

  Jens-W. Schicke
Version: GnuPG v1.4.6 (GNU/Linux)


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke
--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke 

Kenneth Marshall wrote:
Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.

Space efficiency is provided by not storing the key, nor the header data
required (length prefix?).
Space efficiency at ~1 entry per bucket: How about using closed hashing, 
saving in each page a bitmask in front which specifies which entries hold 
valid entries and in the rest of the page row-pointers (is this the correct 
expression? I don't know...) without further data. Should provide 
reasonably simple data structure and alignment for the pointers.

Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance
that we are seeking.

One should look into new plan nodes for != ANY(), NOT EXISTS and 
similar. A node like look into hash and true if bucket is empty would 
work without checking tuple visibility when the bucket is empty and could 
be a win in some situations.

Do we want special cases for short keys like INT4? In those cases the 
implementation might use hash == key and put that knowledge to use in 
plans. Even a unique constraint might then be doable. Does the 
postgresql-storage backend on linux support sparse files? Might be a win 
when holes in the sequence turn up.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke

More random thoughts:

- Hash-Indices are best for unique keys, but every table needs a new hash 
key, which means one more random page access. Is there any way to build 
multi-_table_ indices? A join might then fetch all table rows with a given 
unique key after one page fetch for the combined index.

- Hashes with trivial hash-functions (like identity) can also return rows 
in a desired order.

- Is there a case where a sequentially scanning a hash-index is useful? I 
can't find any, but maybe somebody else has a use-case.

- What about HashJoins when the referenced tables have hash-indices?

- What about hash-indices where entries are inserted for multiple columns. 
Assume a table like this:

CREATE TABLE link (obj_id1 INT4, obj_id2 INT4);

and a query like

SELECT * FROM link WHERE ? IN (obj_id1, obj_id2);

or some join using a similar condition. It might be a nice thing to insert 
entries at both HASH(obj_id1) and HASH(obj_id2), which would eliminate the 
need to check in two indices and do a bitmap OR. OTOH it might not be 
faster in any significant use cases because who'd need a link table with 
nearly unique linked objects?

- In cases where the distribution of the hash-function is good, but a small 
and relatively even number of rows exist for each key (like it might be the 
case in the above example), it might be nice to reserve a given amount of 
same-key row entries in each bucket, and hold a fill-count at the front of 
it. That would avoid costly page fetches after each collision. You'd create 
a hash-index with n-buckets, each m-elements large. When the bucket is 
full, the usual collision handling continues.

- About hash enlargement: What about always using only the first k bits of 
each hash value. When you find that the hash is quite-full (however that 
is defined and detected), k is increased by one, effectively doubling the 
hash size. New entries are then written as usual, while retrieving the old 
entries needs to test at the k-bit-position first and if there is a miss, 
also at the k-1-position and so forth. To limit this search, some 
background process could after analyzing the index move old entries to the 
now correct k-bit-position and increment some min-k-value once all old 
entries have been moved. After the hash has been increased, the index would 
approximately half it's speed for some time then. Additionally one could 
also insert the entry at the new position if it has been found at the old 
one only while using the index. A special miss-entry at the new position 
doesn't help if nothing could be found because the old positions will 
usually hold some data which resides there even if it uses k bits.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Improving NOT IN

2007-01-31 Thread Jens-Wolfhard Schicke
--On Dienstag, Januar 30, 2007 23:24:40 + Simon Riggs 

Basically what I see here is a whole lot of work and new executor
infrastructure for something that will be a win in a very narrow
use-case and a significant loss the rest of the time.  I think there
are more productive ways to spend our development effort.
Maybe one should not aim for a special case of continuous sequences. It 
might be a better thing to have a fast look-up datastructure for 
row-existence. The optimization over the usual indices is that only 
existence, and no other information must be saved, thus a bit-field is 
really possible. Even 100 Mio rows would fit in 10 MB.

So, instead of trying to find a sequence, find (or guess and later correct 
your bitfield) the minimum, and then set the bits as you encounter rows. 
During the join, test whether the bit you want to join to exists and voila, 
depending on whether you used IN or NOT IN, decide what to do.

This datastructure could be used everywhere where only existence is 
important and no columns of a table are selected.

Possibly, the bit-field should allow for large-gaps to be represented more 
efficiently, if you have an 32-bit index column, make a 256 entry top-level 
array pointing to bitfields representing the numbers 0x0-0x00ff, 
0x0100 - 0x01ff... each such bitfield would need 2MB, the pointers 
are negligible. But now large holes in the sequence don't waste too much 
space and thus the minimum needs not to be known.

Jens Schicke
asco GmbH
Mittelweg 7   Tel 0531/3906-127
38106 BraunschweigFax 0531/3906-400

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] insert/update/delete returning and rules

2006-08-17 Thread Jens-Wolfhard Schicke
--On Dienstag, August 15, 2006 16:33:27 -0400 Tom Lane [EMAIL PROTECTED] 

I'm tempted to suggest that the RETURNING commands might need to be
separate rule events, and that to support this you'd need to write
an additional rule:


But even this seems like it would fail in complicated cases.  What if
the view is a join, and your ON INSERT rule inserts into two different
underlying tables in two commands?  If you need fields from both tables
to generate a full RETURNING list then there's no apparent way to make
it work.

Ugh.  Any ideas out there?

   INSERT ... INTO tbl_1;
   INSERT ... INTO tbl_2;
   RETURNING SELECT  FROM tbl_1, tbl_2 WHERE ...;

Just what crossed my mind first, no idea whether this is implementable or 
realistic or whatever.

Mit freundlichem Gruß
Jens Schicke
asco GmbH
Mittelweg 7   Tel 0531/3906-127
38106 BraunschweigFax 0531/3906-400

---(end of broadcast)---
TIP 4: Have you searched our list archives?

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Jens-Wolfhard Schicke
--On Donnerstag, Februar 16, 2006 10:39:45 -0800 Dann Corbit 

He refers to counting sort and radix sort (which comes in most
significant digit and least significant digit format).  These are also
called distribution (as opposed to comparison) sorts.

These sorts are O(n) as a function of the data size, but really they are
O(M*n) where M is the average key length and n is the data set size.
(In the case of MSD radix sort M is the average length to completely
differentiate strings)

So if you have an 80 character key, then 80*log(n) will only be faster

I suppose you meant 80 * n here?

than n*log(n) when you have 2^80th elements -- in other words -- never.
I think this is wrong. You can easily improve Radix sort by a constant if 
you don't take single bytes as the digits but rather k-byte values. At 
least 2 byte should be possible without problems. This would give you 40 * 
n time, not 80 * n. And you assume that the comparision of an 80-byte wide 
value only costs 1, which might (and in many cases will be imho) wrong. 
Actually it migh mean to compare 80 bytes as well, but I might be wrong.

What I think as the biggest problem is the digit representation necessary 
for Radix-Sort in cases of locales which sort without looking at spaces. I 
assume that would be hard to implement. The same goes for the proposed 
mapping of string values onto 4/8-byte values.

Mit freundlichem Gruß
Jens Schicke
asco GmbH
Mittelweg 7   Tel 0531/3906-127
38106 BraunschweigFax 0531/3906-400

---(end of broadcast)---
TIP 4: Have you searched our list archives?

Re: [HACKERS] optimizer questions

2006-02-15 Thread Jens-Wolfhard Schicke
--On Dienstag, Februar 14, 2006 10:35:12 -0600 hector Corrada Bravo 

Before I start trying this (creating aggregate paths seems the
reasonable thing to do) I would like your counsel.

1) Regardless of the optimization problem, is the executor able to
execute aggregate nodes within join trees (that is, not as the result
of subqueries)?

2) Has anyone tried something like this before?
I did and failed because I did not quite understand how the Postgres 
internal variables should be initialized.

My approach was to supply other join pathes if one of the two tables was an 

Mit freundlichem Gruß
Jens Schicke
asco GmbH
Mittelweg 7   Tel 0531/3906-127
38106 BraunschweigFax 0531/3906-400

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings