Re: [HACKERS] FDW and parallel execution

2017-04-11 Thread PostgreSQL - Hans-Jürgen Schönig
customer_fdw.c_nationkey))
>  ->  Hash Join  (cost=335084.53..2247223.46
> rows=701672 width=52)
>Hash Cond: (lineitem_fdw.l_orderkey =
> orders_fdw.o_orderkey)
>->  Hash Join  (cost=2887.07..1786058.18
> rows=4607421 width=52)
>  Hash Cond: (lineitem_fdw.l_suppkey =
> supplier_fdw.s_suppkey)
>  ->  Foreign Scan on lineitem_fdw 
> (cost=0.00..1512151.52 rows=59986176 width=16)
>  ->  Hash  (cost=2790.80..2790.80
> rows=7702 width=44)
>->  Hash Join 
> (cost=40.97..2790.80 rows=7702 width=44)
>  Hash Cond:
> (supplier_fdw.s_nationkey = nation.n_nationkey)
>  ->  Foreign Scan on
> supplier_fdw  (cost=0.00..2174.64 rows=100032 width=8)
>  ->  Hash 
> (cost=40.79..40.79 rows=15 width=36)
>->  Hash Join 
> (cost=20.05..40.79 rows=15 width=36)
>  Hash Cond:
> (nation.n_regionkey = region.r_regionkey)
>  ->  Seq Scan
> on nation  (cost=0.00..17.70 rows=770 width=40)
>  ->  Hash 
> (cost=20.00..20.00 rows=4 width=4)
>->  Seq
> Scan on region  (cost=0.00..20.00 rows=4 width=4)
> 
> Filter: ((r_name)::text = 'ASIA'::text)
>->  Hash  (cost=294718.76..294718.76
> rows=2284376 width=8)
>  ->  Foreign Scan on orders_fdw 
> (cost=0.00..294718.76 rows=2284376 width=8)
>      ->  Hash  (cost=32605.64..32605.64 rows=1500032
> width=8)
>->  Foreign Scan on customer_fdw 
> (cost=0.00..32605.64 rows=1500032 width=8)
>
> The plans look very similar, but first one is parallel and second - not.
> My FDW provides implementation for IsForeignScanParallelSafe which
> returns true.
> I wonder what can prevent optimizer from using parallel plan in this case?
>
> Thank in advance,
> -- 
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

-- 
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at



Re: [HACKERS] remove checkpoint_warning

2016-07-12 Thread PostgreSQL - Hans-Jürgen Schönig



On 07/09/2016 11:12 PM, Tom Lane wrote:

Alvaro Herrera  writes:

the checkpoint_warning feature was added by commit 2986aa6a668bce3cfb836
in November 2002 when we didn't have any logging of checkpointing at
all.  I propose to remove it: surely anyone who cares about analyzing
checkpointing behavior nowadays is using the log_checkpoint feature
instead, which contains much more detail.  The other one is just noise
now, and probably ignored amidst the number of other warning traffic.

Hmm, not sure.  ISTM log_checkpoint is oriented to people who know what
they are doing, whereas checkpoint_warning is more targeted to trying
to help people who don't.  Perhaps you could make an argument that
checkpoint_warning is useless because the people whom it's meant to help
won't notice the warning anyway --- but I doubt that it's been
"superseded" by log_checkpoint, because the latter would only be enabled
by people who already have a clue that checkpoint performance is something
to worry about.

Or in short, this may be a fine change to make, but I don't like your
argument for it.

regards, tom lane




i think tom is right here.
log_checkpoint and checkpoint_warning are for totally different people.
we might just want to do one thing: we might want to state explicitly 
that the database cannot break down if this warning shows up.
many people are scared to death that this warning somehow indicates that 
PostgreSQL is about to go up in flames, which is of course not true.
maybe we could do "consider increasing  to ensure good performance" 
or so ...


regards,

hans

--
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Data at rest encryption

2016-06-15 Thread PostgreSQL - Hans-Jürgen Schönig



On 06/14/2016 09:59 PM, Jim Nasby wrote:

On 6/12/16 2:13 AM, Ants Aasma wrote:

On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
 wrote:

> 1. Instead of doing the entire database files encryption, how about
> providing user an option to protect only some particular tables that
> wants the encryption at table/tablespace level. This not only 
provides

> an option to the user, it reduces the performance impact on tables
> that doesn't need any encryption. The problem with this approach
> is that every xlog record needs to validate to handle the encryption
> /decryption, instead of at page level.

Is there a real need for this? The customers I have talked to want to
encrypt the whole database and my goal is to make the feature fast
enough to make that feasible for pretty much everyone. I guess
switching encryption off per table would be feasible, but the key
setup would still need to be done at server startup. Per record
encryption would result in some additional information leakage though.
Overall I thought it would not be worth it, but I'm willing to have my
mind changed on this.


I actually design with this in mind. Tables that contain sensitive 
info go into designated schemas, partly so that you can blanket move 
all of those to an encrypted tablespace (or safer would be to move 
things not in those schemas to an unencrypted tablespace). Since that 
can be done with an encrypted filesystem maybe that's good enough. 
(It's not really clear to me what this buys us over an encrypted FS, 
other than a feature comparison checkmark...)


the reason why this is needed is actually very simple: security 
guidelines and legal requirements ...
we have dealt with a couple of companies recently, who explicitly 
demanded PostgreSQL level encryption in a transparent way to fulfill 
some internal or legal requirements. this is especially true for 
financial stuff. and yes, sure ... you can do a lot of stuff with 
filesystem encryption.
the core idea of this entire thing is however to have a counterpart on 
the database level. if you don't have the key you cannot start the 
instance and if you happen to get access to the filesystem you are still 
not able to fire up the DB.

as it said: requirements by ever bigger companies.

as far as benchmarking is concerned: i did a quick test yesterday (not 
with the final AES implementation yet) and i got pretty good results. 
with a reasonably well cached database in a typical application I expect 
to loose around 10-20%. if everything fits in memory there is 0 loss of 
course. the worst I got with the standard AES (no hardware support used 
yet) I lost around 45% or so. but this requires a value as low as 32 MB 
of shared buffers or so.


    many thanks,

hans


--
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Implement failover on libpq connect level.

2015-08-18 Thread PostgreSQL - Hans-Jürgen Schönig

> On 18 Aug 2015, at 11:19, Albe Laurenz  wrote:
> 
> Hans-Jürgen Schönig wrote:
>> in addition to that you have the “problem” of transactions. if you failover 
>> in the middle
>> of a transaction, strange things might happen from the application point of 
>> view.
>> 
>> the good thing, however, is that stupid middleware is sometimes not able to 
>> handle
>> failed connections. however, overall i think it is more of a danger than a 
>> benefit.
> 
> Maybe I misunderstood the original proposal, but my impression was that the 
> alternative
> servers would be tried only at the time the connection is established, and 
> there would be no
> such problems as you describe.



it would still leave the problem of having a read only on the other side unless 
you are using BDR or so.

regards,

hans




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Implement failover on libpq connect level.

2015-08-18 Thread PostgreSQL - Hans-Jürgen Schönig

> On 18 Aug 2015, at 10:32, Albe Laurenz  wrote:
> 
> Victor Wagner wrote:
>> Rationale
>> =
>> 
>> Since introduction of the WAL-based replication into the PostgreSQL, it is
>> possible to create high-availability and load-balancing clusters.
>> 
>> However, there is no support for failover in the client libraries. So, only
>> way to provide transparent for client application failover is IP address
>> migration. This approach has some limitation, i.e. it requires that
>> master and backup servers reside in the same subnet or may not be
>> feasible for other reasons.
>> 
>> Commercial RDBMS, such as Oracle, employ more flexible approach. They
>> allow to specify multiple servers in the connect string, so if primary
>> server is not available, client library tries to connect to other ones.
>> 
>> This approach allows to use geographically distributed failover clusters
>> and also is a cheap way to implement load-balancing (which is not
>> possible with IP address migration).
> 
> I wonder how useful this is at the present time.
> 
> If the primary goes down and the client gets connected to the standby,
> it would have read-only access there.  Most applications wouldn't cope
> well with that.
> 
> Once we have multi-master replication that can be used for fail-over,
> the picture will change.  Then a feature like that would be very useful 
> indeed.
> 
>>"host=main-server host=standby1 host=standby2 port=5432 dbname=database"
> 
> It seems a bit arbitrary to require that all servers use the same port.
> 
> Maybe parameters like host2, port2, host3, port3 etc. might be better.
> 
> Yours,
> Laurenz Albe


i totally agree with laurenz.
in addition to that you have the “problem” of transactions. if you failover in 
the middle 
of a transaction, strange things might happen from the application point of 
view.

the good thing, however, is that stupid middleware is sometimes not able to 
handle
failed connections. however, overall i think it is more of a danger than a 
benefit.

regards,

hans



> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Priority table or Cache table

2014-05-16 Thread Hans-Jürgen Schönig

On 20 Feb 2014, at 01:38, Tom Lane  wrote:

> Haribabu Kommi  writes:
>> I want to propose a new feature called "priority table" or "cache table".
>> This is same as regular table except the pages of these tables are having
>> high priority than normal tables. These tables are very useful, where a
>> faster query processing on some particular tables is expected.
> 
> Why exactly does the existing LRU behavior of shared buffers not do
> what you need?
> 
> I am really dubious that letting DBAs manage buffers is going to be
> an improvement over automatic management.
> 
>   regards, tom lane



the reason for a feature like that is to define an area of the application 
which needs more predictable runtime behaviour.
not all tables are created equals in term of importance. 

example: user authentication should always be supersonic fast while some 
reporting tables might gladly be forgotten even if they happened to be in use 
recently.

i am not saying that we should have this feature. 
however, there are definitely use cases which would justify some more control 
here.
otherwise people will fall back and use dirty tricks sucks as “SELECT count(*)” 
or so to emulate what we got here.

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small GIN optimizations (after 9.4)

2014-02-06 Thread PostgreSQL - Hans-Jürgen Schönig
i think there is one more thing which would be really good in GIN and which 
would solve a ton of issues.
atm GIN entries are sorted by item pointer.
if we could sort them by a "column" it would fix a couple of real work issues 
such as ...

SELECT ... FROM foo WHERE "tsearch_query" ORDER BY price DESC LIMIT 10 

... or so.
it many cases you want to search for a, say, product and find the cheapest / 
most expensive one.
if the tsearch_query yields a high number of rows (which it often does) the 
subsequent sort will kill you.

many thanks,

hans


On Feb 6, 2014, at 12:36 PM, Heikki Linnakangas wrote:

> While hacking on the GIN patches, I've come up with a few different ideas for 
> improving performance. It's too late for 9.4, but I'll list them here if 
> someone wants to work on them later:
> 
> * Represent ItemPointers as uint64's, to speed up comparisons. 
> ginCompareItemPointers is inlined into only a few instructions, but it's 
> still more expensive than a single-instruction 64-bit comparison. 
> ginCompareItemPointers is called very heavily in a GIN scan, so even a small 
> improvement there would make for a noticeable speedup. It might be an 
> improvement in code clarity, too.
> 
> * Keep the entry streams of a GinScanKey in a binary heap, to quickly find 
> the minimum curItem among them.
> 
> I did this in various versions of the fast scan patch, but then I realized 
> that the straightforward way of doing it is wrong, because a single 
> GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is 
> updated as part of advancing one key, and the entry is in a heap of another 
> key, updating the curItem can violate the heap property of the other entry's 
> heap.
> 
> * Build a truth table (or cache) of consistent-function's results, and use 
> that instead of calling consistent for every item.
> 
> * Deduce AND or OR logic from the consistent function. Or have the opclass 
> provide a tree of AND/OR/NOT nodes directly, instead of a consistent 
> function. For example, if the query is "foo & bar", we could avoid calling 
> consistent function altogether, and only return items that match both.
> 
> * Delay decoding segments during a scan. Currently, we decode all segments of 
> a posting tree page into a single array at once. But with "fast scan", we 
> might be able to skip over all entries in some of the segments. So it would 
> be better to copy the segments into backend-private memory in compressed 
> format, and decode them one segment at a time (or maybe even one item at a 
> time), when needed. That would avoid the unnecessary decoding of segments 
> that can be skipped over, and would also reduce memory usage of a scan.
> 
> I'll add these to the TODO.
> 
> - Heikki
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-08 Thread Hans-Jürgen Schönig

On Jan 8, 2014, at 9:27 PM, Bruce Momjian wrote:

> On Wed, Jan  8, 2014 at 05:39:23PM +, Simon Riggs wrote:
>> On 8 January 2014 09:07, Heikki Linnakangas  wrote:
>> 
>>> I'm going to say right off the bat that I think the whole notion to
>>> automatically disable synchronous replication when the standby goes down is
>>> completely bonkers.
>> 
>> Agreed
>> 
>> We had this discussion across 3 months and we don't want it again.
>> This should not have been added as a TODO item.
> 
> I am glad Heikki and Simon agree, but I don't.  ;-)
> 
> The way that I understand it is that you might want durability, but
> might not want to sacrifice availability.  Phrased that way, it makes
> sense, and notifying the administrator seems the appropriate action.
> 

technically and conceptually i agree with andres and simon but from daily 
experience i would say that we should make it configurable.
some people got some nasty experiences when their systems stopped working.

+1 for a GUC to control this one.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup throttling

2013-08-21 Thread PostgreSQL - Hans-Jürgen Schönig

On Aug 21, 2013, at 10:57 AM, Andres Freund wrote:

> On 2013-08-21 08:10:42 +0200, PostgreSQL - Hans-Jürgen Schönig wrote:
>> 
>> On Aug 19, 2013, at 9:11 PM, Andres Freund wrote:
>> 
>>> On 2013-08-19 20:15:51 +0200, Boszormenyi Zoltan wrote:
>>>> 2013-08-19 19:20 keltezéssel, Andres Freund írta:
>>>>> Hi,
>>>>> 
>>>>> On 2013-07-24 09:20:52 +0200, Antonin Houska wrote:
>>>>>> Hello,
>>>>>> the purpose of this patch is to limit impact of pg_backup on running 
>>>>>> server.
>>>>>> Feedback is appreciated.
>>>>> Based on a quick look it seems like you're throttling on the receiving
>>>>> side. Is that a good idea? Especially over longer latency links, TCP
>>>>> buffering will reduce the effect on the sender side considerably.
>>> 
>>>> Throttling on the sender side requires extending the syntax of
>>>> BASE_BACKUP and maybe START_REPLICATION so both can be
>>>> throttled but throttling is still initiated by the receiver side.
>>> 
>>> Seems fine to me. Under the premise that the idea is decided to be
>>> worthwile to be integrated. Which I am not yet convinced of.
>> 
>> i think there is a lot of value for this one. the scenario we had a couple 
>> of times is pretty simple:
>> just assume a weak server - maybe just one disk or two - and a slave.
>> master and slave are connected via a 1 GB network.
>> pg_basebackup will fetch data full speed basically putting those lonely 
>> disks out of business.
>> we actually had a case where a client asked if "PostgreSQL is locked during 
>> base backup". of 
>> course it was just disk wait caused by a full speed pg_basebackup.
> 
>> regarding the client side implementation: we have chosen this way because it 
>> is less invasive. 
>> i cannot see a reason to do this on the server side because we won't have 10 
>> pg_basebackup-style tools making use of this feature anyway.
> 
> The problem is that receiver side throttling over TCP doesn't always
> work all that nicely unless you have a low rate of transfer and/or very
> low latency . Quite often you will have OS buffers/the TCP Window being
> filled in bursts where the sender sends at max capacity and then a
> period where nothing happens on the sender. That's often not what you
> want when you need to throttle.
> 
> Besides, I can see some value in e.g. normal streaming replication also
> being rate limited...
> 


what would be a reasonable scenario where limiting streaming would make sense? 
i cannot think of any to be honest.

regards,

hans




--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup throttling

2013-08-20 Thread PostgreSQL - Hans-Jürgen Schönig

On Aug 19, 2013, at 9:11 PM, Andres Freund wrote:

> On 2013-08-19 20:15:51 +0200, Boszormenyi Zoltan wrote:
>> 2013-08-19 19:20 keltezéssel, Andres Freund írta:
>>> Hi,
>>> 
>>> On 2013-07-24 09:20:52 +0200, Antonin Houska wrote:
>>>> Hello,
>>>> the purpose of this patch is to limit impact of pg_backup on running 
>>>> server.
>>>> Feedback is appreciated.
>>> Based on a quick look it seems like you're throttling on the receiving
>>> side. Is that a good idea? Especially over longer latency links, TCP
>>> buffering will reduce the effect on the sender side considerably.
> 
>> Throttling on the sender side requires extending the syntax of
>> BASE_BACKUP and maybe START_REPLICATION so both can be
>> throttled but throttling is still initiated by the receiver side.
> 
> Seems fine to me. Under the premise that the idea is decided to be
> worthwile to be integrated. Which I am not yet convinced of.

i think there is a lot of value for this one. the scenario we had a couple of 
times is pretty simple:
just assume a weak server - maybe just one disk or two - and a slave.
master and slave are connected via a 1 GB network.
pg_basebackup will fetch data full speed basically putting those lonely disks 
out of business.
we actually had a case where a client asked if "PostgreSQL is locked during 
base backup". of 
course it was just disk wait caused by a full speed pg_basebackup.

regarding the client side implementation: we have chosen this way because it is 
less invasive. 
i cannot see a reason to do this on the server side because we won't have 10 
pg_basebackup-style tools making use of this feature anyway.

of course, if you got 20 disk and a 1 gbit network this is useless - but many 
people don't have that.

regards,

hans-jürgen schönig


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] "Bloom filter" for 9.2 ...

2012-07-03 Thread Hans-Jürgen Schönig
hello,

some time ago oleg and teodor have posted a PostgreSQL version of bloom filters.
as this appears to be a useful thing for many people i have ported this 
prototype to PostgreSQL 9.2.
it seems to work as expected on OS X and Linux.

as it is a contrib module it lacks xlog support.
maybe some people can make use of this one.

many thanks,

hans





bloom-0.4.tar.gz
Description: GNU Zip compressed data


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby

2012-07-01 Thread Hans-Jürgen Schönig
On Jul 1, 2012, at 5:44 PM, Magnus Hagander wrote:

> On Sun, Jul 1, 2012 at 1:02 PM, Boszormenyi Zoltan  wrote:
>> Hi,
>> 
>> attached is a patch that does $SUBJECT.
>> 
>> It's a usability enhancement, to take a backup, write
>> a minimalistic recovery.conf and start the streaming
>> standby in one go.
> 
> I like the writing of recovery.conf. In fact, I had it in my code at
> one very early point and took it out in order to get a clean patch
> ready :)
> 
> But I think that part is lacking in functionality: AFAICT it's
> hardcoded to only handle host, port, user and password. What about
> other connection parameters, likely passed to pg_basebackup through
> the environment in that case? isn't that quite likely to break the
> server later?
> 


one option would be to check the environments and take them if needed.
however, i am not sure if this is a good idea either - just thing of PGPASSWORD 
or so. do we really want to take it and write it to a file straight away? i 
guess there are arguments for both ideas.

still, i guess your argument is a reasonable one.



> Maybe the proper way around that is to provide the ability for
> pg_basebackup to take a full connection string, just like we allow
> psql to do?
> 


this would make things redundant. i am quite sure some users might not get the 
distinction straight away.


> 
> 
> I'm not sure we should go the way of providing the "start slave".
> Given thta how you want to start the slave differs so much on
> platforms. The most glaring example is on windows you really need to
> *start the service* rather than use pg_ctl. Sure, you can document
> your way around that, but I'm not sure the functionality added is
> really worth it. What about all the other potential connection
> parameters.


regards,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combine non-recursive and recursive CTEs?

2012-06-15 Thread PostgreSQL - Hans-Jürgen Schönig
On Jun 16, 2012, at 8:27 AM, Magnus Hagander wrote:

> I'm not sure if this is something I don't know how to do, or if it's
> something we simply can't do, or if it's something we could do but the
> syntax can't handle :-)
> 
> Basically, I'd like to combine a recursive and a non-recursive CTE in
> the same query. If I do it non-recursive, I can do something like:
> 
> WITH t1(z) AS (
>   SELECT a FROM x
> ),
> t2 AS (
>   SELECT z FROM t1
> )
> SELECT * FROM t2;
> 
> 
> But what if I want t2 to be recursive?
> 
> Trying something like:
> WITH t1 (z,b) AS (
>   SELECT a,b FROM x
> ),
> RECURSIVE t2(z,b) AS (
>   SELECT z,b FROM t1 WHERE b IS NULL
> UNION ALL
>   SELECT z,b FROM t2 INNER JOIN t1 ON t2.b=t1.z
> )
> 
> I get a syntax error on the RECURSIVE.
> 
> Is there any other position in this query that I can put the RECURSIVE
> in order for it to get through?
> 
> -- 
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


hm, this is interesting ...

cat /tmp/a.sql 
WITHy AS ( SELECT 1 AS n),
g AS (WITH RECURSIVE x(n) AS
(
SELECT (SELECT n FROM y) AS n
UNION ALL
SELECT n + 1 AS n
FROM x
WHERE n < 10))
SELECT * FROM g;

Hans-Jurgen-Scbonigs-MacBook-Pro:sql hs$ psql test < /tmp/a.sql 
ERROR:  syntax error at or near ")"
LINE 8:  WHERE n < 10))

this gives a syntax error as well ... 
if my early morning brain is correct this should be a proper statement ...


regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting rid of cheap-startup-cost paths earlier

2012-05-22 Thread PostgreSQL - Hans-Jürgen Schönig

On May 22, 2012, at 9:57 AM, Simon Riggs wrote:

> On 22 May 2012 06:50, Tom Lane  wrote:
> 
>> Currently, the planner keeps paths that appear to win on the grounds of
>> either cheapest startup cost or cheapest total cost.  It suddenly struck
>> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor
>> fast-start preference, etc) we could know a-priori that cheapest startup
>> cost is not going to be interesting, and hence immediately discard any
>> path that doesn't win on total cost.
> 
> My experience is that most people don't provide a LIMIT explicitly
> even when they know that's the desired behaviour. That's because
> either they simply don't understand that SQL can return lots of rows,
> or SQL knowledge isn't enough, or worse that people don't even know
> that specifying it would alter query plans.
> 
> Regrettably the current planning of LIMIT clauses causes more problems
> so in many cases these have been explicitly removed from SQL by
> developers that know how many rows they wish to see.
> 
> I would have proposed a default-LIMIT parameter before now, but for
> that last point.


this sounds like a total disaster to me ... 
why in the world should we have a default LIMIT parameter? 
i guess if somebody is not able to use LIMIT he should better not touch the DB.
we clearly cannot fix incompetence by adding parameters. 

regards,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Hans-Jürgen Schönig
hello,

does the problem show up on 2% of all problems after 2 weeks or so?
we had a similar problem on UNIX as well. it even materialized on 100 identical 
boxes (on 2% of them). it pops up randomly and never stops …
i checked some code paths. some of those messages are direct output via stderr 
(not even elog).
unfortunately i did not manage to find a box where i could GDB to attack the 
problem .
it was 8.4.8 as well.

do you see a certain workload which would make the problem reproducable?


regards,

hans



On Mar 26, 2012, at 5:03 PM, Andrew Dunstan wrote:

> 
> I'm not sure if this is a bug, but I have wrestling with this problem for a 
> client.
> 
> Platform is Windows Servers 2003 64 bit, PostgreSQL 8.4.8., 4Gb RAM, running 
> on an Amazon VM.
> 
> Shared buffers: 512Mb, work_mem: 25Mb. There are only a handful of 
> connections to the database, and no other activity.
> 
> We are seeing the error shown below. The table in question has two columns 
> (Oid, int) and roughly 43m rows. The only other thing remarkable about the 
> settings is that effective_cache_size is set to 5Gb, which is clearly too 
> high, but surely that shouldn't cause a memory error.
> 
> I'm really perplexed as to why this fairly simple query should cause an out 
> of memory error:
> 
>   select loid, max(pageno) from ldata group by loid order by 2 desc
>   limit 10;
> 
> I can't see what I might be missing.
> 
> 
> cheers
> 
> andrew
> 
>   TopMemoryContext: 49816 total in 6 blocks; 5384 free (7 chunks);
>   44432 used
>  TopTransactionContext: 8192 total in 1 blocks; 7696 free (0
>   chunks); 496 used
>  Record information cache: 8192 total in 1 blocks; 1800 free (0
>   chunks); 6392 used
>  Type information cache: 8192 total in 1 blocks; 1800 free (0
>   chunks); 6392 used
>  Operator class cache: 8192 total in 1 blocks; 3848 free (0
>   chunks); 4344 used
>  Operator lookup cache: 24576 total in 2 blocks; 14072 free (6
>   chunks); 10504 used
>  MessageContext: 40960 total in 3 blocks; 29920 free (6 chunks);
>   11040 used
>  smgr relation table: 8192 total in 1 blocks; 2816 free (0
>   chunks); 5376 used
>  TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0
>   chunks); 16 used
>  Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
>  PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
>PortalHeapMemory: 1024 total in 1 blocks; 920 free (0 chunks);
>   104 used
>  ExecutorState: 8192 total in 1 blocks; 2144 free (1 chunks);
>   6048 used
>TupleSort: 40984 total in 3 blocks; 24208 free (10 chunks);
>   16776 used
>ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
>AggContext: 864018432 total in 127 blocks; 3400 free (110
>   chunks); 864015032 used
>  TupleHashTable: 619175960 total in 95 blocks; 821528 free
>   (331 chunks); 618354432 used
>ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
>ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
>ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
>  Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks);
>   4816 used
>  CacheMemoryContext: 667696 total in 20 blocks; 169960 free (2
>   chunks); 497736 used
>pg_shdepend_reference_index: 1024 total in 1 blocks; 240 free
>   (0 chunks); 784 used
>pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0
>   chunks); 872 used
>pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0
>   chunks); 872 used
>pg_largeobject_loid_pn_index: 1024 total in 1 blocks; 280 free
>   (0 chunks); 744 used
>pg_database_datname_index: 1024 total in 1 blocks; 344 free (0
>   chunks); 680 used
>pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0
>   chunks); 720 used
>pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free
>   (0 chunks); 872 used
>pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 344
>   free (0 chunks); 680 used
>pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks);
>   680 used
>pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0
>   chunks); 784 used
>pg_foreign_server_oid_index: 1024 total in 1 blocks; 344 free
>   (0 chunks); 680 used
>pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free
>   (0 chunks); 784 used
>pg_cast_source_target_index: 1024 total in 1 blocks; 240 free
>   (0 chunks); 784 used
>pg_language_name_index: 1024 total in 1 blocks; 344 free (0
>   chunks); 680 used
>pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0
>   chunks); 720 used
>pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0
>   chunks); 936 used
>pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0
>   chunks); 720 used
>pg_ts_template_tmplname_index: 

Re: [HACKERS] pg_prewarm

2012-03-10 Thread Hans-Jürgen Schönig
On Mar 9, 2012, at 2:34 PM, Robert Haas wrote:

> On Fri, Mar 9, 2012 at 5:42 AM, Hans-Jürgen Schönig
>  wrote:
>> we had some different idea here in the past: what if we had a procedure / 
>> method to allow people to save the list of current buffers / cached blocks 
>> to be written to disk (sorted). we could then reload this "cache profile" on 
>> startup in the background or people could load a certain cache content at 
>> runtime (maybe to test or whatever).
>> writing those block ids in sorted order would help us to avoid some random 
>> I/O on reload.
> 
> I don't think that's a bad idea at all, and someone actually did write
> a patch for it at one point, though it didn't get committed, partly I
> believe because of technical issues and partly because Greg Smith was
> uncertain how much good it did to restore shared_buffers without
> thinking about the OS cache.  Personally, I don't buy into the latter
> objection: a lot of people are running with data sets that fit inside
> shared_buffers, and those people would benefit tremendously.
> 
> However, this just provides mechanism, not policy, and is therefore
> more general.  You could use pg_buffercache to save the cache contents
> at shutdown and pg_prewarm to load those blocks back in at startup, if
> you were so inclined.  Or if you just want to load up your main
> relation, and its indexes, you can do that, too.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



i also think that it can be beneficial. 
once in a while people ask how to "bring a database up to speed" after a 
restart. i have seen more than one case when a DB was close to death after a 
restart because random I/O was simply killing it during cache warmup. it seems 
the problem is getting worse as we see machines with more and more RAM in the 
field.
technically i would see a rather brute force approach: if we just spill out of 
the list of blocks we got in shared buffer atm (not content of course, just 
physical location sorted by file / position in file) it would be good enough. 
if a block physically does not exist on reload any more it would not even be an 
issue and allow people basically to "snapshot" their cache status. we could 
allow named cache profiles or so and make a GUC to indicate of one of them 
should be preloaded on startup (background or beforehand - i see usecases for 
both approaches).

yes, somehow linking to pg_buffercache makes a lot of sense. maybe just 
extending it with some extra functions is already enough for most cases.

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_prewarm

2012-03-09 Thread Hans-Jürgen Schönig
we had some different idea here in the past: what if we had a procedure / 
method to allow people to save the list of current buffers / cached blocks to 
be written to disk (sorted). we could then reload this "cache profile" on 
startup in the background or people could load a certain cache content at 
runtime (maybe to test or whatever).
writing those block ids in sorted order would help us to avoid some random I/O 
on reload.

regards,

hans



On Mar 9, 2012, at 5:13 AM, Robert Haas wrote:

> It's been bugging me for a while now that we don't have a prewarming
> utility, for a couple of reasons, including:
> 
> 1. Our customers look at me funny when I suggest that they use
> pg_relation_filepath() and /bin/dd for this purpose.
> 
> 2. Sometimes when I'm benchmarking stuff, I want to get all the data
> cached in shared_buffers.  This is surprisingly hard to do if the size
> of any relation involved is >=1/4 of shared buffers, because the
> BAS_BULKREAD stuff kicks in.  You can do it by repeatedly seq-scanning
> the relation - eventually all the blocks trickle in - but it takes a
> long time, and that's annoying.
> 
> So I wrote a prewarming utility.  Patch is attached.  You can prewarm
> either the OS cache or PostgreSQL's cache, and there are two options
> for prewarming the OS cache to meet different needs.  By passing the
> correct arguments to the function, you can prewarm an entire relation
> or just the blocks you choose; prewarming of blocks from alternate
> relation forks is also supported, for completeness.
> 
> Hope you like it.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-11 Thread PostgreSQL - Hans-Jürgen Schönig
On Oct 7, 2011, at 8:47 PM, Joshua D. Drake wrote:

> 
> On 10/07/2011 11:40 AM, Tom Lane wrote:
>> Robert Haas  writes:
>>> Please find attached a patch implementing a basic version of
>>> index-only scans.
>> 
>> I'm making some progress with this, but I notice what seems like a
>> missing feature: there needs to be a way to turn it off.  Otherwise
>> performance comparisons will be difficult to impossible.
>> 
>> The most obvious solution is a planner control GUC, perhaps
>> "enable_indexonlyscan".  Anyone object, or want to bikeshed the name?
> 
> enable_onlyindexscan
> 
> I'm kidding.
> 
> +1 on Tom's proposed name.


+1 ...
definitely an important thing to do.

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What is known about PostgreSQL HP-UX support?

2011-10-10 Thread PostgreSQL - Hans-Jürgen Schönig
On Oct 10, 2011, at 4:21 AM, Alex Goncharov wrote:

> [ Thanks all for the very productive discussion in the thread
>  "libpq, PQdescribePrepared -> PQftype, PQfmod, no PQnullable"
>  which I originated.  Very useful.  Now on something different. ]
> 
> About two years ago, I had to research some PostgreSQL failures on
> HP-UX on a lame PA-RISC box.  Looking at the PostgreSQL source code
> then, I got an impression that running PostgreSQL on HP-UX was an open
> question -- HP-UX didn't seem like a seriously targeted platform.
> 
> Was I wrong in my assessment?  Does anybody have a good experience
> running PostgreSQL on HP-UX?  What version of both? PA-RISC? IA64?
> 
> Thanks,




hello,

HPUX is not too common these days but it works like a charm.
we have a couple of large IA64 servers running on HPUX at a major customer.
things work without any problems. it compiled out of the box just like expected 
and we have not seen any failures or so for almost 2 years now.

so, thumbs up ... nothing to be afraid of.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2011-09-19 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 19, 2011, at 5:16 PM, Tom Lane wrote:

> Greg Stark  writes:
>> That said, to help in the case I described you would have to implement
>> the tapesort algorithm on the GPU as well.
> 
> I think the real problem would be that we are seldom sorting just the
> key values.  If you have to push the tuples through the GPU too, your
> savings are going to go up in smoke pretty quickly …
> 


i would argument along a similar line.
to make GPU code fast it has to be pretty much tailored to do exactly one thing 
- otherwise you have no chance to get anywhere close to card-bandwith.
if you look at "two similar" GPU codes which seem to do the same thing you 
might easily see that one is 10 times faster than the other - for bloody reason 
such as memory alignment, memory transaction size or whatever.
this opens a bit of a problem: PostgreSQL sorting is so generic and so flexible 
that i would be really surprised if somebody could come up with a solution 
which really comes close to what the GPU can do.
it would definitely be interesting to see a prototype, however.

btw, there is a handful of interesting talks / lectures about GPU programming 
provided by the university of chicago (just cannot find the link atm).

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 2, 2011, at 2:59 PM, Hannu Krosing wrote:

> On Fri, 2011-09-02 at 14:51 +0200, PostgreSQL - Hans-Jürgen Schönig
> wrote:
>> hello …
>> 
>> the goal of the entire proxy thing is to make the right query go to the 
>> right node / nodes.
>> we determine this by using a partitioning function and so on …
>> currently PL/proxy has only a handful of commands - one is RUN ON … which 
>> tells us where to put things.
>> assume you issue a select … some select will "fall out" on the target node. 
>> to restrict the data coming from the node you could add an additional 
>> constraint on the way …
>> 
>> say:
>>  SELECT * FROM proxy_table WHERE a = 20;
>> 
>> what you want to reach the node after a split is …
>> 
>>  SELECT * FROM proxy_table WHERE a = 20 AND col = "filter the wrong half 
>> away"
>> 
>> my idea is to add an additional command to the PL/proxy command set.
>> it should call a function generating this additional filter.
>> maybe somehow like that …
>> 
>>  RUN ON hashtext($1) 
>> -- this one already knows about the increased cluster
>>  GENERATE FILTER my_create_the_bloody_filter_func($1)-- this one 
>> would "massage" the query going to the node.
>> 
>> it would actually open the door for a lot of additional trickery.
>> the function would tell the proxy what to append - and: this "what" would be 
>> under your full control.
>> 
>> what do you think?
> 
> Hmm, could work for simplest cases, but this has 2 main problems:
> 
> 1) you need a full SQL parser to make this generally useful for plain
> SQL


i think that everything beyond a simple case is pretty hard to achieve anyway. 
to me it looks pretty impossible to solve this in a generic way without same 
insane amount of labor input - at listen given the ideas coming to me in the 
past.
and yes, functions are an issue. unless you have some sort of "virtually 
private database" thing it is close to impossible (unless you want to try some 
nightmare based on views / constraint exclusion on the partitions or so).

regards,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread PostgreSQL - Hans-Jürgen Schönig
hello …

the goal of the entire proxy thing is to make the right query go to the right 
node / nodes.
we determine this by using a partitioning function and so on …
currently PL/proxy has only a handful of commands - one is RUN ON … which tells 
us where to put things.
assume you issue a select … some select will "fall out" on the target node. 
to restrict the data coming from the node you could add an additional 
constraint on the way …

say:
SELECT * FROM proxy_table WHERE a = 20;

what you want to reach the node after a split is …

SELECT * FROM proxy_table WHERE a = 20 AND col = "filter the wrong half 
away"

my idea is to add an additional command to the PL/proxy command set.
it should call a function generating this additional filter.
maybe somehow like that …

RUN ON hashtext($1) 
-- this one already knows about the increased cluster
GENERATE FILTER my_create_the_bloody_filter_func($1)-- this one 
would "massage" the query going to the node.

it would actually open the door for a lot of additional trickery.
the function would tell the proxy what to append - and: this "what" would be 
under your full control.

what do you think?
i got to think about it futher but i can envision that this could be feasible 
...

hans


On Sep 2, 2011, at 2:36 PM, Hannu Krosing wrote:

> On Fri, 2011-09-02 at 14:01 +0200, PostgreSQL - Hans-Jürgen Schönig
> wrote:
>> hello …
>> 
>> i have been thinking about this issue for quite a while ...
>> given your idea i am not sure how this can work at all.
>> 
>> consider:
>>  begin;
>>  insert 1
>>  insert 2
>>  commit
>> 
>> assume this ends up in the same node,
>> now you split it into two …
>> 1 and 2 will have exactly the same visibility to and transaction.
>> i am not sure how you can get this right without looking at the data.
> 
> It has to consider the data when determining visibility, that's the
> whole point of the plug-in .
> 
> The idea is, that each row "belongs" to a certain partition, as
> determined by some function over it's fields. Most often this function
> is hash of primary key OR-ed by a bitmap representing cluster size and
> AND-ed by bitmap for partition(s) stored in this database. 
> 
> when you split the parition, then some row's don't belong in the old
> partition database anymore (and if you did a full copy, then the other
> half dont belong to the new one), so they should be handled as
> invisible / deleted. As this can be only done by looking at the tuple
> data, this needs an additional visibility function. And as this is only
> needed for partitioned databases, it makes sense to implement it as a
> plogin, so it would not wast cycles on non-partitioned databases
> 
>> alternative idea: what if the proxy would add / generate a filter by 
>> looking at the data?
>> a quick idea would be that once you split you add a simple directive 
>> such as "FILTER GENERATOR $1" or so to the PL/proxy code.
>> it would then behind the scene arrange the filter passed on.
>> what do you think?
> 
> Hmm. I'm not sure I understand what you are trying to say. Can you
> elaborate a little ?
> 
>> 
>>  regards,
>> 
>>  hans
>> 
>> 
>> 
>> On Sep 1, 2011, at 10:13 AM, Hannu Krosing wrote:
>> 
>>> Hallow hackers
>>> 
>>> I have the following problem to solve and would like to get advice on
>>> the best way to do it.
>>> 
>>> The problem:
>>> 
>>> When growing a pl/proxy based database cluster, one of the main
>>> operations is splitting a partition. The standard flow is as follows:
>>> 
>>> 1) make a copy of the partitions table(s) to another database
>>> 2) reconfigure pl/proxy to use 2 partitions instead of one
>>> 
>>> The easy part is making a copy of all or half of the table to another
>>> database. The hard part is fast deletion (i mean milliseconds,
>>> comparable to TRUNCATE) the data that should not be in a partition (so
>>> that RUN ON ALL functions will continue to return right results).
>>> 
>>> It would be relatively easy, if we still had the RULES for select
>>> available for plain tables, but even then the eventual cleanup would
>>> usually mean at least 3 passes of disk writes (set xmax, write deleted
>>> flag, vacuum and remove)
>>> 
>>> What I would like to have is possibility for additional visibility
>>> checks, which would run some simple C function

Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread PostgreSQL - Hans-Jürgen Schönig
hello …

i have been thinking about this issue for quite a while ...
given your idea i am not sure how this can work at all.

consider:
begin;
insert 1
insert 2
commit

assume this ends up in the same node,
now you split it into two …
1 and 2 will have exactly the same visibility to and transaction.
i am not sure how you can get this right without looking at the data.

alternative idea: what if the proxy would add / generate a filter by looking at 
the data?
a quick idea would be that once you split you add a simple directive such as 
"FILTER GENERATOR $1" or so to the PL/proxy code.
it would then behind the scene arrange the filter passed on.
what do you think?

regards,

hans



On Sep 1, 2011, at 10:13 AM, Hannu Krosing wrote:

> Hallow hackers
> 
> I have the following problem to solve and would like to get advice on
> the best way to do it.
> 
> The problem:
> 
> When growing a pl/proxy based database cluster, one of the main
> operations is splitting a partition. The standard flow is as follows:
> 
> 1) make a copy of the partitions table(s) to another database
> 2) reconfigure pl/proxy to use 2 partitions instead of one
> 
> The easy part is making a copy of all or half of the table to another
> database. The hard part is fast deletion (i mean milliseconds,
> comparable to TRUNCATE) the data that should not be in a partition (so
> that RUN ON ALL functions will continue to return right results).
> 
> It would be relatively easy, if we still had the RULES for select
> available for plain tables, but even then the eventual cleanup would
> usually mean at least 3 passes of disk writes (set xmax, write deleted
> flag, vacuum and remove)
> 
> What I would like to have is possibility for additional visibility
> checks, which would run some simple C function over tuple data (usually
> hash(fieldval) + and + or ) and return visibility (is in this partition)
> as a result. It would be best if this is run at so low level that also
> vacuum would use it and can clean up the foreign partition data in one
> pass, without doing the delete dance first.
> 
> So finally the QUESTION :
> 
> where in code would be the best place to check for this so that
> 
> 1) both regular queries and VACUUM see it
> 2) the tuple data (and not only system fields or just xmin/xmax) would
> be available for the function to use
> 
> 
> -- 
> ---
> Hannu Krosing
> PostgreSQL Unlimited Scalability and Performance Consultant
> 2ndQuadrant Nordic
> PG Admin Book: http://www.2ndQuadrant.com/books/
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Caching Python modules

2011-08-17 Thread PostgreSQL - Hans-Jürgen Schönig
On Aug 17, 2011, at 2:19 PM, Jan Urbański wrote:

> On 17/08/11 14:09, PostgreSQL - Hans-Jürgen Schönig wrote:
>> CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) 
>> RETURNS float4 AS $$
>> 
>>from SecondCorpus import SecondCorpus
>>from SecondDocument import SecondDocument
>> 
>> i am doing some intense text mining here.
>> the problem is: is it possible to cache those imported modules from function 
>> to function call.
>> GD works nicely for variables but can this actually be done with imported 
>> modules as well?
>> the import takes around 95% of the total time so it is definitely something 
>> which should go away somehow.
>> i have checked the docs but i am not more clever now.
> 
> After a module is imported in a backend, it stays in the interpreter's
> sys.modules dictionary and importing it again will not cause the module
> Python code to be executed.
> 
> As long as you are using the same backend you should be able to call
> add_to_corpus repeatedly and the import statements should take a long
> time only the first time you call them.
> 
> This simple test demonstrates it:
> 
> $ cat /tmp/slow.py
> import time
> time.sleep(5)
> 
> $ PYTHONPATH=/tmp/ bin/postgres -p 5433 -D data/
> LOG:  database system was shut down at 2011-08-17 14:16:18 CEST
> LOG:  database system is ready to accept connections
> 
> $ bin/psql -p 5433 postgres
> Timing is on.
> psql (9.2devel)
> Type "help" for help.
> 
> postgres=# select slow();
> slow
> --
> 
> (1 row)
> 
> Time: 5032.835 ms
> postgres=# select slow();
> slow
> --
> 
> (1 row)
> 
> Time: 1.051 ms
> 
> Cheers,
> Jan




hello jan …

the code is actually like this …
the first function is called once per backend. it compiles some fairly fat in 
memory stuff …
this takes around 2 secs or so … but this is fine and not an issue.

-- setup the environment
CREATE OR REPLACE FUNCTION textprocess.setup_sentiment(pypath text, lang text) 
RETURNS void AS $$
import sys
sys.path.append(pypath)
sys.path.append(pypath + "/external")

from SecondCorpus import SecondCorpus
import const

GD['path_to_classes'] = pypath
GD['corpus'] = SecondCorpus(lang)
GD['lang'] = lang

return;
$$ LANGUAGE 'plpythonu' STABLE;

this is called more frequently ...

-- add a document to the corpus
CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) RETURNS 
float4 AS $$

from SecondCorpus import SecondCorpus
from SecondDocument import SecondDocument

doc1 = SecondDocument(GD['corpus'].senti_provider, lang, t)
doc1.create_sentences()
GD['corpus'].add_document(doc1)
GD['corpus'].process()
return doc1.total_score
$$ LANGUAGE 'plpythonu' STABLE;

the point here actually is: if i use the classes in a normal python command 
line program this routine does not look like an issue
creating the document object and doing the magic in there is not a problem 
actually …

on the SQL side this is already fairly heavy for some reason ...

 funcid | schemaname  |funcname | calls | total_time | self_time | 
?column? 
+-+-+---++---+--
 235287 | textprocess | setup_sentiment |54 | 100166 |100166 | 
1854
 235288 | textprocess | add_to_corpus   |   996 | 438909 |438909 |  
440

looks like some afternoon with some more low level tools :(.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Caching Python modules

2011-08-17 Thread PostgreSQL - Hans-Jürgen Schönig
hello …

i have just fallen over a nasty problem (maybe missing feature) with PL/Pythonu 
…
consider:

-- add a document to the corpus
CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) RETURNS 
float4 AS $$

from SecondCorpus import SecondCorpus
from SecondDocument import SecondDocument

i am doing some intense text mining here.
the problem is: is it possible to cache those imported modules from function to 
function call.
GD works nicely for variables but can this actually be done with imported 
modules as well?
the import takes around 95% of the total time so it is definitely something 
which should go away somehow.
i have checked the docs but i am not more clever now.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronized snapshots

2011-08-15 Thread PostgreSQL - Hans-Jürgen Schönig

On Aug 15, 2011, at 9:40 AM, Simon Riggs wrote:

> On Mon, Aug 15, 2011 at 2:31 AM, Joachim Wieland  wrote:
> 
>> In short, this is how it works:
>> 
>> SELECT pg_export_snapshot();
>>  pg_export_snapshot
>> 
>>  03A1-1
>> (1 row)
>> 
>> 
>> (and then in a different session)
>> 
>> BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ (SNAPSHOT = '03A1-1');
> 
> I don't see the need to change the BEGIN command, which is SQL
> Standard. We don't normally do that.
> 
> If we have pg_export_snapshot() why not pg_import_snapshot() as well?
> 
> -- 
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



i would definitely argue for a syntax like the one proposed by Joachim.. i 
could stay the same if this is turned into some sort of flashback 
implementation some day.

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-12 Thread PostgreSQL - Hans-Jürgen Schönig
On Aug 12, 2011, at 10:03 PM, Heikki Linnakangas wrote:

> On 11.08.2011 23:06, Robert Haas wrote:
>> Comments, testing, review appreciated...
> 
> I would've expected this to use an index-only scan:
> 
> postgres=# CREATE TABLE foo AS SELECT generate_series(1,10) AS id;
> SELECT 10
> postgres=# CREATE INDEX i_foo ON foo (id) WHERE id = 10;
> CREATE INDEX
> postgres=# VACUUM ANALYZE foo;
> VACUUM
> postgres=# EXPLAIN SELECT id FROM foo WHERE id = 10;
>   QUERY PLAN
> -
> Index Scan using i_foo on foo  (cost=0.00..8.27 rows=1 width=4)
>   Index Cond: (id = 10)
> (2 rows)
> 
> If it's not a predicate index, then it works:
> 
> postgres=# DROP INDEX i_foo;
> DROP INDEX
> postgres=# EXPLAIN SELECT id FROM foo WHERE id = 10;
>  QUERY PLAN
> ---
> Index Only Scan using i_foo2 on foo  (cost=0.00..8.28 rows=1 width=4)
>   Index Cond: (id = 10)
> (2 rows)


is there any plan to revise the cost for index only scans compared to what it 
is now?

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Will switchover still need a checkpoint in 9.1 SR Hot Standby

2011-08-07 Thread PostgreSQL - Hans-Jürgen Schönig

On Aug 7, 2011, at 11:01 AM, Simon Riggs wrote:

> On Sun, Aug 7, 2011 at 8:57 AM, Hannu Krosing  wrote:
>> In 9.0 (as in earlier versions) a former standby host has to do a full
>> checkpoint before becoming available as an independent database instance
>> in either switchover or failover scenarios.
>> 
>> For most combinations of of bigger than minimal shared buffers and
>> non-memory-speed disks this can take from several seconds to tens of
>> minutes on busy systems.
> 
> For switchover, you issue a checkpoint first, to reduce this time as
> much as possible.
> 
>> Is the pre-activation checkpoint still required in 9.1 ?
> 
> Yes, but I've found a way to remove them in 9.2 and will be patching that 
> soon.



hi simon,

this is highly interesting. this is am important issue for big iron.
can you share the idea you have in mind?

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SYNONYMS (again)

2011-06-23 Thread PostgreSQL - Hans-Jürgen Schönig

On Jun 23, 2011, at 12:52 AM, Alvaro Herrera wrote:

> Excerpts from Joshua D. Drake's message of mié jun 22 15:37:17 -0400 2011:
>> Per:
>> 
>> http://archives.postgresql.org/pgsql-hackers/2010-11/msg02043.php
>> 
>> It seems we did come up with a use case in the procpid discussion. The 
>> ability to change the names of columns/databases etc, to handle the 
>> fixing of bad decision decisions during development over time.
>> 
>> Thoughts?
> 
> Let's start with what was discussed and supported in that thread, that
> is, databases.  It seems less clear that columns are widely believed to
> be a good idea to have synonyms for.  Besides, synonyms for databases
> should be reasonably simple to implement, which is not something I would
> say for columns.



sorry, i missed the links:

http://archives.postgresql.org/pgsql-patches/2006-03/msg00085.php

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SYNONYMS (again)

2011-06-23 Thread PostgreSQL - Hans-Jürgen Schönig

On Jun 23, 2011, at 12:52 AM, Alvaro Herrera wrote:

> Excerpts from Joshua D. Drake's message of mié jun 22 15:37:17 -0400 2011:
>> Per:
>> 
>> http://archives.postgresql.org/pgsql-hackers/2010-11/msg02043.php
>> 
>> It seems we did come up with a use case in the procpid discussion. The 
>> ability to change the names of columns/databases etc, to handle the 
>> fixing of bad decision decisions during development over time.
>> 
>> Thoughts?
> 
> Let's start with what was discussed and supported in that thread, that
> is, databases.  It seems less clear that columns are widely believed to
> be a good idea to have synonyms for.  Besides, synonyms for databases
> should be reasonably simple to implement, which is not something I would
> say for columns.



yes, implementing synonyms is not too hard.
some time ago (3 or 4 years ago most likely) we already posted a patch 
providing support for synonyms.
it was rejected because synonyms were said to be a bad design pattern which app 
developers to do nasty things.
so, if you want to work on it maybe this patch is the place to start.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad posix_fadvise support causes initdb to exit ungracefully

2011-06-15 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

2.4? we know that some versions of 2.4 cause problems due to broken 
posix_fadvise. if i remember correctly we built some configure magic into 
PostgreSQL to check for this bug. what does this check do?

many thanks,

hans



On Jun 15, 2011, at 6:12 PM, Merlin Moncure wrote:

> Due to unfortunate environmental conditions (don't ask) I've been
> trying to get postgres 9.0 up and running on a fairly ancient linux --
> redhat EL 3 which as kernel 2.4.21.   initdb borks on the create
> database step with the error message "child process exited with error
> code 139".  A bit of tracing revealed the exit was happening at the
> pg_flush_data which basically wraps posix_fadvise.   Disabling fadvise
> support in pg_config_manual.h fixed the problem.
> 
> Things brings up a couple of questions:
> *) Are linuxes this old out of support?
> *) Should configure be testing for working posix_fadvise?
> 
> merlin
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] workaround for expensive KNN?

2011-04-11 Thread PostgreSQL - Hans-Jürgen Schönig
price has a problem :(.
"iphone" can be a 20 cents bag or a sticker or a 900 euro thing signed by 
whoever ...
so, words and the sort-number / price are not related in anyway. price is in 
this case no way to narrow down the problem (e.g. evaluate first or so).

many thanks,

hans


On Apr 8, 2011, at 5:25 PM, Oleg Bartunov wrote:

> Hans,
> 
> what if you create index (price,title) ?
> 
> 
> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
> 
>> hello ...
>> 
>> i got that one ...
>> 
>>   "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, 
>> title), int_price)
>> 
>> so, i have a combined index on text + number.
>> to me the plan seems fine ... it looks like a prober KNN traversal.
>> the difference between my plan and your plan seems to be the fact that i 
>> have, say, 1 mio rows which have "handy" or so in it (1 mio out of 11 mio or 
>> so). you are moving out from one specific place.
>> 
>> my maths is like that:
>>  11 mio in total
>>  1 mio matching "iphone"
>>  cheapest / most expensive 10 out of this mio needed.
>> 
>> operator classes are all nice and in place:
>> 
>> SELECT 10 <-> 4 as distance;
>> distance
>> --
>>   6
>> (1 row)
>> 
>> what does "buffers true" in your case say?
>> 
>> many thanks,
>> 
>>  hans
>> 
>> 
>> On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:
>> 
>>> Probably, you miss two-columnt index. From my early post:
>>> http://www.sai.msu.su/~megera/wiki/knngist
>>> 
>>> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
>>> to_tsvector('french',address));
>>> =# SELECT id, address,  (coordinates <-> 
>>> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE 
>>> coordinates >< '(2.29470491409302,48.858263472125)'::point AND 
>>> to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
>>>  id|   address   |  
>>>dist 
>>> -+-+-
>>> 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
>>> 2.32488941293945e-05
>>> 4356328 | r Champ de Mars 75007 PARIS |  
>>> 0.00421854756964406
>>> 5200167 | Champ De Mars 75007 Paris   |  
>>> 0.00453564562587288
>>> 9301676 | Champ de Mars, 75007 Paris, |  
>>> 0.00453564562587288
>>> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France |  
>>> 0.00624152097590896
>>> 1923818 | Champ de Mars Paris, France |  
>>> 0.00838214733539654
>>> 5165953 | 39 Rue Champ De Mars Paris, France  |  
>>> 0.00874410234569529
>>> 7395870 | 39 Rue Champ De Mars Paris, France  |  
>>> 0.00874410234569529
>>> 4358671 | 32 Rue Champ De Mars Paris, France  |  
>>> 0.00876089659276339
>>> 1923742 | 12 rue du Champ de Mars Paris, France   |  
>>> 0.00876764731845995
>>> (10 rows)
>>> 
>>> Time: 7.859 ms
>>> 
>>> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< 
>>> '(2.29470491409302,48.858263472125)'::point
>>> AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
>>> 
>>>   QUERY PLAN
>>> --
>>> Limit
>>>  ->  Index Scan using spots_idx on spots
>>>Index Cond: ((coordinates >< 
>>> '(2.29470491409302,48.858263472125)'::point) AND 
>>> (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery))
>>> (3 rows)
>>> 
>>> 
>>> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
>>> 
 hello all ...
 
 given oleg's posting before i also wanted to fire up some KNN related 
 question.
 let us consider a simple example. i got some million lines and i want all 
 rows matching a tsquery sorted by price.
 i did some tests:
 
 test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
 product.t_product WHERE to_tsvector('german', title) @@ 
 to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
  
 QUERY PLAN
 
 -
 --
 Limit  (cost=0.00..41.11 rows=10 width=16) (actual 
 time=36391.717..45542.590 rows=10 loops=1)
 Buffers: shared hit=9 read=5004
 ->  Index Scan using idx_product_t_product_titleprice on t_product  
 (cost=0.00..13251.91 rows=3224 width=16) (actual time=
 36391.715..45542.573 rows=10 loops=1)
   Index Cond: (to_tsvector('german'::regconfig, title) @@ 
 '''iphon'''::

Re: [HACKERS] workaround for expensive KNN?

2011-04-08 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

i got that one ...

"idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, 
title), int_price)

so, i have a combined index on text + number.
to me the plan seems fine ... it looks like a prober KNN traversal.
the difference between my plan and your plan seems to be the fact that i have, 
say, 1 mio rows which have "handy" or so in it (1 mio out of 11 mio or so). you 
are moving out from one specific place.

my maths is like that:
11 mio in total
1 mio matching "iphone"
cheapest / most expensive 10 out of this mio needed.

operator classes are all nice and in place:

SELECT 10 <-> 4 as distance;
 distance 
--
6
(1 row)

what does "buffers true" in your case say?

many thanks,

hans


On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:

> Probably, you miss two-columnt index. From my early post:
> http://www.sai.msu.su/~megera/wiki/knngist
> 
> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
> to_tsvector('french',address));
> =# SELECT id, address,  (coordinates <-> 
> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE 
> coordinates >< '(2.29470491409302,48.858263472125)'::point AND 
> to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
>   id|   address   |   
>   dist 
> -+-+-
>  366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
> 2.32488941293945e-05
> 4356328 | r Champ de Mars 75007 PARIS |  
> 0.00421854756964406
> 5200167 | Champ De Mars 75007 Paris   |  
> 0.00453564562587288
> 9301676 | Champ de Mars, 75007 Paris, |  
> 0.00453564562587288
> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France |  
> 0.00624152097590896
> 1923818 | Champ de Mars Paris, France |  
> 0.00838214733539654
> 5165953 | 39 Rue Champ De Mars Paris, France  |  
> 0.00874410234569529
> 7395870 | 39 Rue Champ De Mars Paris, France  |  
> 0.00874410234569529
> 4358671 | 32 Rue Champ De Mars Paris, France  |  
> 0.00876089659276339
> 1923742 | 12 rue du Champ de Mars Paris, France   |  
> 0.00876764731845995
> (10 rows)
> 
> Time: 7.859 ms
> 
> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< 
> '(2.29470491409302,48.858263472125)'::point
> AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
> 
>QUERY PLAN
> --
> Limit
>   ->  Index Scan using spots_idx on spots
> Index Cond: ((coordinates >< 
> '(2.29470491409302,48.858263472125)'::point) AND 
> (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery))
> (3 rows)
> 
> 
> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
> 
>> hello all ...
>> 
>> given oleg's posting before i also wanted to fire up some KNN related 
>> question.
>> let us consider a simple example. i got some million lines and i want all 
>> rows matching a tsquery sorted by price.
>> i did some tests:
>> 
>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
>> product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
>> 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
>>   
>> QUERY PLAN
>> 
>> -
>> --
>> Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 
>> rows=10 loops=1)
>>  Buffers: shared hit=9 read=5004
>>  ->  Index Scan using idx_product_t_product_titleprice on t_product  
>> (cost=0.00..13251.91 rows=3224 width=16) (actual time=
>> 36391.715..45542.573 rows=10 loops=1)
>>Index Cond: (to_tsvector('german'::regconfig, title) @@ 
>> '''iphon'''::tsquery)
>>Order By: (int_price <-> 0::bigint)
>>Buffers: shared hit=9 read=5004
>> Total runtime: 45542.676 ms
>> (7 rows)
>> 
>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
>> product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
>> 'handy') ORDER BY int_price <-> 0 LIMIT 10;
>>   
>> QUERY PLAN
>> 
>> -
>> -
>> Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 
>> rows=10 loops=1)

[HACKERS] workaround for expensive KNN?

2011-04-08 Thread PostgreSQL - Hans-Jürgen Schönig
hello all ...

given oleg's posting before i also wanted to fire up some KNN related question.
let us consider a simple example. i got some million lines and i want all rows 
matching a tsquery sorted by price.
i did some tests:

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'iphone') ORDER BY int_price <-> 0 LIMIT 10;

QUERY PLAN   
  
-
--
 Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 
rows=10 loops=1)
   Buffers: shared hit=9 read=5004
   ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..13251.91 rows=3224 width=16) (actual time=
36391.715..45542.573 rows=10 loops=1)
 Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''iphon'''::tsquery)
 Order By: (int_price <-> 0::bigint)
 Buffers: shared hit=9 read=5004
 Total runtime: 45542.676 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'handy') ORDER BY int_price <-> 0 LIMIT 10;

QUERY PLAN   
 
-
-
 Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 
rows=10 loops=1)
   Buffers: shared hit=3 read=2316
   ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
7243.524..10935.217 rows=10 loops=1)
 Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
 Order By: (int_price <-> 0::bigint)
 Buffers: shared hit=3 read=2316
 Total runtime: 10935.265 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'handy') ORDER BY int_price <-> 0 LIMIT 1;
 QUERY 
PLAN  
   
-
---
 Limit  (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 
loops=1)
   Buffers: shared hit=1 read=1577
   ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
28.525..28.525 rows=1 loops=1)
 Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
 Order By: (int_price <-> 0::bigint)
 Buffers: shared hit=1 read=1577
 Total runtime: 28.558 ms
(7 rows)


under any circumstances - there is no way to reduce the number of buffers 
needed for a query like that.
if everything is cached this is still ok but as soon as you have to take a 
single block from disk you will die a painful random I/O death.
is there any alternative which does not simply die when i try to achieve what i 
want?

the use case is quite simple: all products with a certain word (10 cheapest or 
so).

is there any alternative approach to this?
i was putting some hope into KNN but it seems it needs too much random I/O :(.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] k-neighbourhood search in databases

2011-04-08 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

i have put some research into that some time ago and as far as i have seen 
there is a 99% chance that no other database can do it the way we do it. it 
seems nobody comes even close to it (especially not in the flexibility-arena).

oracle: disgusting workaround ...
http://www.orafaq.com/usenet/comp.databases.oracle.misc/2005/11/03/0083.htm

db2: disgusting workaround (no server side code it seems)

sybase: disgusting workaround (no serverside code it seems)

microsoft: there seems to be something coming out (or just out) but i have not 
seen anything working yet.

regards,

hans



On Apr 8, 2011, at 2:21 PM, Oleg Bartunov wrote:

> Hi there,
> 
> I'm interesting if other databases provides built-in effective knn search ? 
> Google didn't help me.
> 
>   Regards,
>   Oleg
> _
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread PostgreSQL - Hans-Jürgen Schönig
>>> 
>> 
>> Still, having more data a user can probe would be nice.
>> 
>> I wonder why everyone avoids Microsoft's approach to the subject. 
>> Apparently, they go in the 'auto-tune as much as possible' direction.
>> And tests we did a while ago, involving asking team from Microsoft and a 
>> team from oracle to optimise set of queries for the same set of data 
>> (bookies data, loads of it) showed that the auto-tuning Microsoft has in 
>> their
>> sql server performed much better than a team of over-sweating oracle dba's.
> 
> I don't think *anyone* is avoiding that approach.  There is almost
> universal consensus here that auto-tuning is better than manual
> tuning, even to the extent of being unwilling to add knobs to allow
> manual tuning of settings we have no idea how to auto-tune and no
> plans to auto-tune.
> 
>> In my current work place/camp we have many deployments of the same system, 
>> over different types of machines, each with different customer data that 
>> vary so much that queries need to be rather generic.
>> Postgresql shows its strength with planner doing a good job for different 
>> variants of data, however we do a very little tweaking to the configuration 
>> parameters. Just because it is just too hard to overlook all of them.
>> I guess that the systems could behave much better, but no one is going to 
>> tweak settings for 50 different installations over 50 different type of data 
>> and 50 different sets of hardware.
>> If there was even a tiny amount of automation provided in the postgresql, I 
>> would welcome it with open arms.
> 
> What do you have in mind?
> 



what we are trying to do is to explicitly store column correlations. so, a 
histogram for (a, b) correlation and so on.
the planner code then goes through its restrictions in the query and finds the 
best / longest combination it can find and which has some statistics defined.
it seems we can also do this for join selectivity and expressions. the planner 
code for "raw column correlation" without expression ( cos(id) or so)  and 
joins is there (WIP, no ANALYZE support and so on so far).

i think auto tuning is a good thing to have and the door to actually do it is 
wide open with our approach.
all it takes is a mechanism to see which "conditions" are used how often and 
somebody could write a job which automatically tells the system which stats to 
collect / sample.
i think for an "average" user this is the most simplistic thing then. but, to 
get there we have to get the bloody sampling and the rest of the planner code 
right in the first place.
auto tuning in this area is still something which is far in the future - but at 
least the road to it is clear.

some people suggested some approach dealing with effective_cache_size and so on 
... there are many good approaches here but they don't address the actual 
problem of wrong size-estimates.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
On Feb 24, 2011, at 2:09 AM, Josh Berkus wrote:

> 
>> Personally, I think the first thing we ought to do is add a real, bona
>> fide planner hint to override the selectivity calculation manually,
>> maybe something like this:
>> 
>> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);
>> 
>> Then, having provided a method for the DBA to extinguish the raging
>> flames of searing agony which are consuming them while a crocodile
>> chews off their leg and their boss asks them why they didn't use
>> Oracle, we can continue bikeshedding about the best way of fixing this
>> problem in a more user-transparent fashion.
> 
> Is there some way we can do that without adding the selectivity hint to
> the query itself?  That's the biggest issue with hints.
> 



well, you could hide this hint in the system table - say; instead of decorating 
the query you could store the decoration in some system relation ... but, if 
you get it right, you call this decoration histogram ;).
i think the patch with a multi-dim histogram is good (i have seen something 
similar for PostGIS).
what is still needed in our patch is a.) multi-dim sampling (no idea how to get 
it right) and b.) investigating how to deal with joins and expressions (e.g. 
cos(id) ).
hints into the right direction are highly welcome.

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig

On Feb 23, 2011, at 3:46 PM, Robert Haas wrote:

> On Wed, Feb 23, 2011 at 12:50 AM, Nathan Boley  wrote:
>>> Personally, I think the first thing we ought to do is add a real, bona
>>> fide planner hint to override the selectivity calculation manually,
>>> maybe something like this:
>>> 
>>> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);
>> 
>> If you're going to go that far, why not just collect statistics on
>> that specific predicate?
>> 
>> ie,  ANALYZE SELECTIVITY ON tablename (x, y) WHERE (x < 5 AND y = 1);
>> 
>> Then it won't fall subject to all of the pitfalls that Tom outlines below.
>> 
>> Selectivities are easy to estimate if we know the predicate. They only
>> become hard when they have to work for every possible predicate.
> 
> Fair point.
> 
> -- 
> Robert Haas


basically we got the idea of allowing "expressions" in cross column stuff. i 
think this can be very useful. it would fix the problem of a query like that:

SELECT * FROM table WHERE cos(field) = some_number;

this takes a constant fraction of the table which is usually plain wrong as 
well (and the error tends to multiply inside the plan).
i am just not sure if i have understood all corner cases of that already.
ultimate goal: get it right for join estimates (this is why a syntax extension 
is definitely needed - you cannot track all of them automatically).

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
>>> 
>> 
>> cheapest and easiest solution if you run into this: add "fake" functions 
>> which the planner cannot estimate properly.
>> use OR to artificially prop up estimates or use AND to artificially lower 
>> them. there is actually no need to redesign the schema to get around it but 
>> it is such an ugly solution that it does not even deserve to be called 
>> "ugly" ...
>> however, fast and reliable way to get around it.
> 
> We couldn't possibly design a hint mechanism that would be uglier or
> less future-proof than this workaround (which, by the way, I'll keep
> in mind for the next time I get bitten by this).
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 


i think the main issue is: what we do is ugly because of despair and a lack of 
alternative ... what you proposed is ugly by design ;).
overall: the workaround will win the ugliness contest, however ;).

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
> Those are real problems, but I still want it.  The last time I hit
> this problem I spent two days redesigning my schema and adding
> triggers all over the place to make things work.  If I had been
> dealing with a 30TB database instead of a 300MB database I would have
> been royally up a creek.
> 
> To put that another way, it's true that some people can't adjust their
> queries, but also some people can.  It's true that nonstandard stuff
> sucks, but queries that don't work suck, too.  And as for better
> solutions, how many major release cycles do we expect people to wait
> for them?  Even one major release cycle is an eternity when you're
> trying to get the application working before your company runs out of
> money, and this particular problem has had a lot of cycles expended on
> it without producing anything very tangible (proposed patch, which
> like you I can't spare a lot of cycles to look at just now, possibly
> excepted).



cheapest and easiest solution if you run into this: add "fake" functions which 
the planner cannot estimate properly.
use OR to artificially prop up estimates or use AND to artificially lower them. 
there is actually no need to redesign the schema to get around it but it is 
such an ugly solution that it does not even deserve to be called "ugly" ...
however, fast and reliable way to get around it.

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig

On Feb 23, 2011, at 2:58 AM, Robert Haas wrote:

> 2011/2/22 PostgreSQL - Hans-Jürgen Schönig :
>> how does it work? we try to find suitable statistics for an arbitrary length 
>> list of conditions so that the planner can use it directly rather than 
>> multiplying all the selectivities. this should make estimates a lot more 
>> precise.
>> the current approach can be extended to work with expressions and well as 
>> "straight" conditions.
> 
> /me prepares to go down in flames.
> 
> Personally, I think the first thing we ought to do is add a real, bona
> fide planner hint to override the selectivity calculation manually,
> maybe something like this:
> 
> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);


i thought there was an agreement that we don't want planner hints?
as tom pointed out - many broken queries come out of some query generator where 
even the design to make the design is broken by design.
personally i like query generators as long as other people use them ... telling 
people that this is the wrong way to go is actually financing my holiday next 
week ... ;).  in general - hibernate and stuff like that is a no-go.

personally i like the type of planner hints oleg and teodor came up with - i 
think we should do more of those hooks they are using but hiding it in some 
syntax is not a good idea.
it does not change the query and it still gives a lot of room to toy around. it 
looks like a compromise.

however, oleg's contrib module does not fix the core problem of cross column 
statistics because a hint is usually static but you want flexible selectivity.

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: cross column correlation ...

2011-02-22 Thread PostgreSQL - Hans-Jürgen Schönig
hello everbody,

we have spent some time in finally attacking cross column correlation. as this 
is an issue which keeps bugging us for a couple of applications (some years). 
this is a WIP patch which can do:

special cross column correlation specific syntax:

CREATE CROSS COLUMN STATISTICS ON tablename (field, ...);
DROP CROSS COLUMN STATISTICS ON tablename (field, ...);

we use specific syntax because we simply cannot keep track of all possible 
correlations in the DB so the admi can take care of things explicitly. some 
distant day somebody might want to write a mechanism to derive the desired 
stats automatically but this is beyond the scope of our project for now.

as far as the patch is concerned:
it is patched nicely into clauselist_selectivity(), but has some rough edges, 
even when a cross-col stat is found, the single col selectivities are still 
counted ( = lovering the selectivity even more), this is a TODO.
this patch adds the grammar and the start of planner integration with a static 
selectivity value for now, the previous discussion about cross-column 
statistics can be continued and perhaps comes to fruition soon.

how does it work? we try to find suitable statistics for an arbitrary length 
list of conditions so that the planner can use it directly rather than 
multiplying all the selectivities. this should make estimates a lot more 
precise. 
the current approach can be extended to work with expressions and well as 
"straight" conditions.

goal: to make cross column correlation work for 9.2 ...

the purpose of this mail is mostly to get the race for a patch going and to see 
if the approach as such is reasonable / feasible.

many thanks,

hans



cross-column-v5.patch
Description: Binary data






--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SELECT ... WHERE fti_query ORDER BY numeric_col LIMIT x - problem

2011-02-04 Thread PostgreSQL - Hans-Jürgen Schönig
I have toyed around with KNN a little and I am pretty impressed when it comes 
to the results we have seen in the GIS world.
Given the infrastructure we have at the moment I wonder if KNN can help to 
speedup queries like that:

SELECT ... WHERE fti_query ORDER BY numeric_col LIMIT x

The use case is fairly simple: Give me all products matching a certain tsquery 
and order those products by price or so to show the top 10.
KNN is supposed to gives sorted output which works perfectly for points and so 
on but can there theoretically be a sensible / reliable distance function for a 
(tsvector / numeric) combination? Some first testing gave me some "interesting" 
output .
If there is no way of defining a reasonable distance function performance is 
screwed up if fti_query returns a large list (it requires a complete sort of 
all prices then).

does anybody have some useful input here?

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ugly locking corner cases ...

2010-10-04 Thread Hans-Jürgen Schönig
On Oct 4, 2010, at 1:23 PM, Heikki Linnakangas wrote:

> On 04.10.2010 14:02, Hans-Jürgen Schönig wrote:
>> it seems we have found a fairly nasty problem.
>> imagine a long transaction which piles up XX.XXX of locks (count on 
>> pg_locks) inside the same transaction by doing some tasty savepoints, with 
>> hold cursors and so on.
>> in this case we see that a normal count issued in a second database 
>> connection will take ages. in a practical case we did a plain seq_scan in 
>> connection 2. instead of 262 ms (cached case) it started to head north 
>> linearily with the number of locks taken by connection 1. in an extreme case 
>> it took around 1.5 hours or so (on XXX.XXX pg_locks entries).
>> 
>> i tracked down the issue quickly and make the following profile (in 10k 
>> locks or so):
>> 
>> Flat profile:
>> 
>> Each sample counts as 0.01 seconds.
>>   %   cumulative   self  self total
>>  time   seconds   secondscalls   s/call   s/call  name
>>  32.49  6.01 6.01 98809118 0.00 0.00  
>> SimpleLruReadPage_ReadOnly
>>  26.97 11.00 4.99 98837761 0.00 0.00  LWLockAcquire
>>  21.89 15.05 4.05 98837761 0.00 0.00  LWLockRelease
>>   8.70 16.66 1.61 98789370 0.00 0.00  SubTransGetParent
>>   4.38 17.47 0.8119748 0.00 0.00  
>> SubTransGetTopmostTransaction
>>   2.41 17.92 0.45 98851951 0.00 0.00  TransactionIdPrecedes
>>   0.59 18.03 0.11 LWLockAssign
>>   0.54 18.13 0.10 
>> LWLockConditionalAcquire
>>   0.46 18.21 0.0919748 0.00 0.00  TransactionLogFetch
>>   0.38 18.28 0.07 SimpleLruReadPage
>>   0.27 18.33 0.05 SubTransSetParent
>>   0.05 18.34 0.01   136778 0.00 0.00  AllocSetAlloc
>>   0.05 18.35 0.0142996 0.00 0.00  slot_deform_tuple
>>   0.05 18.36 0.0142660 0.00 0.00  
>> TransactionIdIsCurrentTransactionId
>> 
>> it seems we are running into a nice shared buffer / locking contention here 
>> and the number of calls explodes (this profiling infos is coming from a seq 
>> scan on a 500.000 rows table - 400 mb or so).
> 
> That doesn't seem related to the lock manager. Is the long-running 
> transaction inserting a lot of tuples (by INSERT or UPDATE) to the same table 
> that the seqscan scans? With a lot of different subtransaction xids. That 
> profile looks like the seqscan is spending a lot of time swapping pg_subtrans 
> pages in and out of the slru buffers.
> 
> Increasing NUM_SUBTRANS_BUFFERS should help. A more sophisticated solution 
> would be to allocate slru buffers (for clog and other slru caches as well) 
> dynamically from shared_buffers. That's been discussed before but no-one has 
> gotten around to it.
> 
> -- 
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com



hello ...

yeah, it seems this solves the problem.
i had a closer look at the SQL trace and did some more profiling.
this was the case.

many thanks for the quick hint.

hans






--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ugly locking corner cases ...

2010-10-04 Thread Hans-Jürgen Schönig
hello all ...

it seems we have found a fairly nasty problem.
imagine a long transaction which piles up XX.XXX of locks (count on pg_locks) 
inside the same transaction by doing some tasty savepoints, with hold cursors 
and so on.
in this case we see that a normal count issued in a second database connection 
will take ages. in a practical case we did a plain seq_scan in connection 2. 
instead of 262 ms (cached case) it started to head north linearily with the 
number of locks taken by connection 1. in an extreme case it took around 1.5 
hours or so (on XXX.XXX pg_locks entries).

i tracked down the issue quickly and make the following profile (in 10k locks 
or so):

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 32.49  6.01 6.01 98809118 0.00 0.00  SimpleLruReadPage_ReadOnly
 26.97 11.00 4.99 98837761 0.00 0.00  LWLockAcquire
 21.89 15.05 4.05 98837761 0.00 0.00  LWLockRelease
  8.70 16.66 1.61 98789370 0.00 0.00  SubTransGetParent
  4.38 17.47 0.8119748 0.00 0.00  
SubTransGetTopmostTransaction
  2.41 17.92 0.45 98851951 0.00 0.00  TransactionIdPrecedes
  0.59 18.03 0.11 LWLockAssign
  0.54 18.13 0.10 LWLockConditionalAcquire
  0.46 18.21 0.0919748 0.00 0.00  TransactionLogFetch
  0.38 18.28 0.07 SimpleLruReadPage
  0.27 18.33 0.05 SubTransSetParent
  0.05 18.34 0.01   136778 0.00 0.00  AllocSetAlloc
  0.05 18.35 0.0142996 0.00 0.00  slot_deform_tuple
  0.05 18.36 0.0142660 0.00 0.00  
TransactionIdIsCurrentTransactionId

it seems we are running into a nice shared buffer / locking contention here and 
the number of calls explodes (this profiling infos is coming from a seq scan on 
a 500.000 rows table - 400 mb or so).

i am thinking of doing a workaround for this problem

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



Re: [HACKERS] Parallel Query Execution Project

2010-09-28 Thread Hans-Jürgen Schönig
On Sep 28, 2010, at 10:15 AM, Markus Wanner wrote:

> Hi,
> 
> On 09/28/2010 07:24 AM, Li Jie wrote:
>> I'm interested in this parallel project,
>> http://wiki.postgresql.org/wiki/Parallel_Query_Execution
>> 
>> But I can't find any discussion and current progress in the website, it
>> seems to stop for nearly a year?
> 
> Yeah, I don't know of anybody really working on it ATM.
> 
> If you are interested in a process based design, please have a look at
> the bgworker infrastructure stuff. It could be of help for a
> process-based implementation.
> 
> Regards
> 
> Markus Wanner



yes, i don't know of anybody either.
in addition to that it is more than a giant task. it means working on more than 
just one isolated part.
practically i cannot think of any stage of query execution which would not need 
some changes.
i don't see a feature like that within a realistic timeframe.

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Path question

2010-09-23 Thread Hans-Jürgen Schönig
On Sep 23, 2010, at 3:29 PM, Robert Haas wrote:

> On Tue, Sep 21, 2010 at 12:29 AM, David Fetter  wrote:
>> On Mon, Sep 20, 2010 at 10:57:00PM -0400, Robert Haas wrote:
>>> 2010/9/3 Hans-Jürgen Schönig :
>>>> On Sep 2, 2010, at 1:20 AM, Robert Haas wrote:
>>>>> I agree. Explicit partitioning may open up some additional
>>>>> optimization possibilities in certain cases, but Merge Append is
>>>>> more general and extremely valuable in its own right.
>>>> 
>>>> we have revised greg's wonderful work and ported the entire thing
>>>> to head.  it solves the problem of merge_append. i did some
>>>> testing earlier on today and it seems most important cases are
>>>> working nicely.
>>> 
>>> First, thanks for merging this up to HEAD.  I took a look through
>>> this patch tonight, and the previous reviews thereof that I was able
>>> to find, most notably Tom's detailed review on 2009-07-26.  I'm not
>>> sure whether or not it's accidental that this didn't get added to
>>> the CF,
>> 
>> It's because I missed putting it in, and oversight I've corrected.  If
>> we need to bounce it on to the next one, them's the breaks.
>> 
>>> [points elided]
>>> 
>>> 7. I think there's some basic code cleanup needed here, also: comment
>>> formatting, variable naming, etc.
>> 
>> Hans-Jürgen,
>> 
>> Will you be able to get to this in the next couple of days?
> 
> I don't see a response to this which I assume means "no" - I'm going
> to take a crack at fixing some of these issues.



hello ...

sorry for not getting back to you sooner. i am currently on the road for some 
days.
we got the top 3 things fixed already. however, some code seems to be relying 
on a sorted list somewhere(???).
we are in the process of sorting out most of the stuff.
i guess we will have something done next week.

sorry for the delay.

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-08 Thread Hans-Jürgen Schönig
On Sep 8, 2010, at 4:57 PM, Stephen Frost wrote:

> * Robert Haas (robertmh...@gmail.com) wrote:
>> Neat.  Have you checked what effect this has on memory consumption?
>> 
>> Also, don't forget to add it to
>> https://commitfest.postgresql.org/action/commitfest_view/open
> 
> Would be good to have the patch updated to be against HEAD before
> posting to the commitfest.
> 
>   Thanks,
> 
>   Stephen



we will definitely provide something which is for HEAD.
but, it seems the problem we are looking is not sufficiently fixed yet.
in our case we shaved off some 18% of planning time or so - looking at the 
other top 2 functions i got the feeling that more can be done to reduce this. i 
guess we have to attack this as well.

regards,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-08 Thread Hans-Jürgen Schönig
here is the patch again.
we accidentally attached a wrong test file to the original posting so it grew 
to big. we had to revoke it from the moderator (this happens if you code from 
8am to 10pm).
here is just the patch - it is nice and small.

you can easily test it by making yourself a nice parent table, many subtables 
(hundreds or thousands) and a decent number of indexes per partition.
then run PREPARE with \timing to see what happens.
you should get scary planning times. the more potential indexes and tables the 
more scary it will be.

using this wonderful RB tree the time for this function call goes down to 
basically zero.
i hope this is something which is useful to some folks out there.

many thanks,

hans






canon-pathkeys-as-rbtree-3-ctxdiff.patch
Description: Binary data



On Sep 8, 2010, at 4:18 PM, Stephen Frost wrote:

> * Hans-Jürgen Schönig (postg...@cybertec.at) wrote:
>> no, we have not checked memory consumption.
>> there is still some stuff left to optimize away - it seems we are going 
>> close to O(n^2) somewhere.
>> "equal" is called really often in our sample case as well:
> 
> Did the mail with the scripts, etc, get hung up due to size or
> something..?  I didn't see it on the mailing list nor in the archives..
> If so, could you post them somewhere so others could look..?
> 
>   Thanks,
> 
>   Stephen


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-08 Thread Hans-Jürgen Schönig
hello ...

no, we have not checked memory consumption.
there is still some stuff left to optimize away - it seems we are going close 
to O(n^2) somewhere.
"equal" is called really often in our sample case as well:

ach sample counts as 0.01 seconds.
 %   cumulative   self  self total  
time   seconds   secondscalls   s/call   s/call  name   
18.87  0.80 0.80 4812 0.00 0.00  make_canonical_pathkey
15.33  1.45 0.65 4811 0.00 0.00 
get_eclass_for_sort_expr
14.15  2.05 0.60  8342410 0.00 0.00  equal
 6.13  2.31 0.26   229172 0.00 0.00  SearchCatCache
 3.66  2.47 0.16  5788835 0.00 0.00  _equalList
 3.07  2.60 0.13  1450043 0.00 0.00 
hash_search_with_hash_value
 2.36  2.70 0.10  2272545 0.00 0.00  AllocSetAlloc
 2.12  2.79 0.09   811460 0.00 0.00  hash_any
 1.89  2.87 0.08  3014983 0.00 0.00  list_head
 1.89  2.95 0.08   574526 0.00 0.00  _bt_compare
 1.77  3.02 0.08 11577670 0.00 0.00  list_head
 1.42  3.08 0.06 1136 0.00 0.00  tzload
 0.94  3.12 0.04  2992373 0.00 0.00  AllocSetFreeIndex


look at the number of calls ...
"equal" is scary ...

make_canonical_pathkey is fixed it seems.
get_eclass_for_sort_expr seems a little more complex to fix.

great you like it ...

regards,

hans



On Sep 8, 2010, at 3:54 PM, Robert Haas wrote:

> On Tue, Sep 7, 2010 at 2:14 PM, Boszormenyi Zoltan  wrote:
>> Hi,
>> 
>> Robert Haas írta:
>>> 2010/9/3 PostgreSQL - Hans-Jürgen Schönig :
>>> 
>>>> i tried this one with 5000 unindexed tables (just one col):
>>>> 
>>>> test=# \timing
>>>> Timing is on.
>>>> test=# prepare x(int4) AS select * from t_data order by id desc;
>>>> PREPARE
>>>> Time: 361.552 ms
>>>> 
>>>> you will see similar or higher runtimes in case of 500 partitions and a 
>>>> handful of indexes.
>>>> 
>>> 
>>> I'd like to see (1) a script to reproduce your test environment (as
>>> Stephen also requested) and (2) gprof or oprofile results.
>>> 
>> 
>> attached are the test scripts, create_tables.sql and childtables.sql.
>> The following query takes 4.7 seconds according to psql with \timing on:
>> EXPLAIN SELECT * FROM qdrs
>> WHERE streamstart BETWEEN '2010-04-06' AND '2010-06-25'
>> ORDER BY streamhash;
> 
> Neat.  Have you checked what effect this has on memory consumption?
> 
> Also, don't forget to add it to
> https://commitfest.postgresql.org/action/commitfest_view/open
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 3, 2010, at 4:40 PM, Tom Lane wrote:

> =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?=  
> writes:
>> imagine a system with, say, 1000 partitions (heavily indexed) or so. the 
>> time taken by the planner is already fairly heavy in this case.
> 
> As the fine manual points out, the current scheme for managing
> partitioned tables isn't intended to scale past a few dozen partitions.
> 
> I think we'll be able to do better when we have an explicit
> representation of partitioning, since then the planner won't
> have to expend large amounts of effort reverse-engineering knowledge
> about how an inheritance tree is partitioned.  Before that happens,
> it's not really worth the trouble to worry about such cases.
> 
>   regards, tom lane
> 


thank you ... - the manual is clear here but we wanted to see if there is some 
reasonably low hanging fruit to get around this.
it is no solution but at least a clear statement ...

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Path question

2010-09-03 Thread Hans-Jürgen Schönig

On Sep 2, 2010, at 1:20 AM, Robert Haas wrote:

> On Sep 1, 2010, at 10:21 AM, Greg Stark  wrote:
>> For what it's worth I disagree with Tom. I think this is a situation
>> where we need *both* types of solution. Ideally we will be able to use
>> a plain Append node for cases where we know the relative ordering of
>> the data in different partitions, but there will always be cases where
>> the structured partition data doesn't actually match up with the
>> ordering requested and we'll need to fall back to a merge-append node.
> 
> I agree. Explicit partitioning may open up some additional optimization 
> possibilities in certain cases, but Merge Append is more general and 
> extremely valuable in its own right.
> 
> ...Robert
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



we have revised greg's wonderful work and ported the entire thing to head.
it solves the problem of merge_append. i did some testing earlier on today and 
it seems most important cases are working nicely.

here are some test cases:

test=# \d t_data
Table "public.t_data"
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 tstamp | date| 

test=# \d t_data_1
   Table "public.t_data_1"
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 tstamp | date| 
Indexes:
"idx_1" btree (id)
Check constraints:
"t_data_1_id_check" CHECK (id >= 1 AND id <= 1)
Inherits: t_data

test=# \d t_data_2
   Table "public.t_data_2"
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 tstamp | date| 
Indexes:
"idx_2" btree (id)
Check constraints:
"t_data_2_id_check" CHECK (id >= 10001 AND id <= 2)
Inherits: t_data

test=# \d t_data_3
   Table "public.t_data_3"
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 tstamp | date| 
Indexes:
"idx_3" btree (id)
Check constraints:
"t_data_3_id_check" CHECK (id >= 20001 AND id <= 3)
Inherits: t_data


simple windowing ...

test=# explain select *, max(id) OVER ( ORDER BY id) from t_data ; 
 QUERY PLAN 
 
-
 WindowAgg  (cost=149.99..2154.43 rows=32140 width=8)
   ->  Result  (cost=149.99..1672.33 rows=32140 width=8)
 ->  Append  (cost=149.99..1672.33 rows=32140 width=8)
   ->  Sort  (cost=149.78..155.13 rows=2140 width=8)
 Sort Key: public.t_data.id
 ->  Seq Scan on t_data  (cost=0.00..31.40 rows=2140 
width=8)
   ->  Index Scan using idx_1 on t_data_1 t_data  
(cost=0.00..318.25 rows=1 width=8)
   ->  Index Scan using idx_2 on t_data_2 t_data  
(cost=0.00..318.25 rows=1 width=8)
   ->  Index Scan using idx_3 on t_data_3 t_data  
(cost=0.00..318.25 rows=1 width=8)
(9 rows)

it does a nice index scan; merges the stuff and puts it up into the high level 
doing the windowing.

test=# select *, max(id) OVER ( ORDER BY id) from t_data LIMIT 10; 
 id |   tstamp   | max 
++-
  1 | 2010-01-01 |   1
  2 | 2010-01-01 |   2
  3 | 2010-01-01 |   3
  4 | 2010-01-01 |   4
  5 | 2010-01-01 |   5
  6 | 2010-01-01 |   6
  7 | 2010-01-01 |   7
  8 | 2010-01-01 |   8
  9 | 2010-01-01 |   9
 10 | 2010-01-01 |  10
(10 rows)

the cost model does what it should as well:

test=# explain select *, max(id) OVER ( ORDER BY id) from t_data ; 
 QUERY PLAN 
 
-
 WindowAgg  (cost=2872.41..3434.86 rows=32140 width=8)
   ->  Sort  (cost=2872.41..2952.76 rows=32140 width=8)
 Sort Key: public.t_data.id
 ->  Result  (cost=0.00..466.40 rows=32140 width=8)
   ->  Append  (cost=0.00..466.40 rows=32140 width=8)
 ->  Seq Scan on t_data  (cost=0.00..31.40 rows=2140 
width=8)
 ->  Seq Scan on t_data_1 t_data  (cost=0.00..145.00 
rows=1 width=8)
 ->  Seq Scan on t_data_2 t_data  (cost=0.00..145.00 
rows=1 width=8)
 ->  Seq Scan on t_data_3 t_data  (cost=0.00..145.00 
rows=1 width=8)
(9 rows)

it has proven to be really valuable in my first tests.
maybe this is helpful for some people out there.

many thanks,

hans




merge-append-91-v1.diff
Description: Binary data


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 3, 2010, at 2:04 PM, Stephen Frost wrote:

> * PostgreSQL - Hans-Jürgen Schönig (postg...@cybertec.at) wrote:
>> did anybody think of a solution to this problem.
>> or more precisely: can there be a solution to this problem?
> 
> Please post to the correct list (-performance) and provide information
> like PG version, postgresql.conf, the actual table definition, the
> resulting query plan, etc, etc...
> 
>   Thanks,
> 
>   Stephen



hello stephen,

this seems like more a developer question to me than a pre performance one.
it is not related to the table structure at all - it is basically an issue with 
incredibly large inheritance lists.
it applies to postgres 9 and most likely to everything before.
postgresql.conf is not relevant at all at this point.

the plan is pretty fine.
the question is rather: does anybody see a chance to handle such lists more 
efficiently inside postgres?
also, it is not the point if my data structure is sane or not. it is really 
more generic - namely a shortcut for this case inside the planing process.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread PostgreSQL - Hans-Jürgen Schönig
hello everybody,

we came across an issue which turned out to be more serious than previously 
expected.
imagine a system with, say, 1000 partitions (heavily indexed) or so. the time 
taken by the planner is already fairly heavy in this case.

i tried this one with 5000 unindexed tables (just one col):

test=# \timing
Timing is on.
test=# prepare x(int4) AS select * from t_data order by id desc;
PREPARE
Time: 361.552 ms

you will see similar or higher runtimes in case of 500 partitions and a handful 
of indexes.

does anybody see a potential way to do a shortcut through the planner?
a prepared query is no solution here as constraint exclusion would be dead in 
this case (making the entire thing an even bigger drama).

did anybody think of a solution to this problem.
or more precisely: can there be a solution to this problem?

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Path question

2010-09-01 Thread PostgreSQL - Hans-Jürgen Schönig
hello tom,

yeah, we have followed quite a lot of discussion as well ...
and yes, no patches.

as far as this problem is concerned: we are working on a patch and did some 
prototyping inside the planner already (attached).
the code we have is pretty limited atm (such as checking for a sort clause 
explicitly and so on - it has no support for windowing related optimizations 
and so on so far).

the cost model is not our problem - it is a lot easier to read than the code we 
are fighting with (it is a different level of complexity). i think costs can be 
handled.

yes, this merging adds some costs for sure. we might see a hell amount of 
operators being called - but i think given a reasonable number of partitions it 
is still a lot cheaper than actually resorting ... and, it is a lot more space 
efficient.
in my practical case i cannot resort because we would simply run out of space. 
an index scan is expensive but needs no additional sort space ...
and, merge is O(n) which sort is clearly not.

advise is highly appreciated.

many thanks,

hans




push-down-sort-into-inh-2.patch
Description: Binary data

On Sep 1, 2010, at 5:00 PM, Tom Lane wrote:

> =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?=  
> writes:
>> On Sep 1, 2010, at 4:10 PM, Tom Lane wrote:
>>> This is really premature, and anything you do along those lines now will
>>> probably never get committed.
> 
>> well, why non-overlapping? the idea is to make append smart enough to
>> take the sorted lists from below and merge them which will give sorted
>> output as well.
> 
> Well, an extra merge step is going to change the cost comparisons quite
> a bit; see Greg Starks' comments.  But in any case, my point wasn't that
> this is something we should never do; it was that it makes more sense to
> wait till something has happened with explicit partitioning.
> 
>>> The project direction is that we are going to add some explicit
>>> representation of partitioned tables.
> 
>> can you outline some ideas here and maybe point to some useful discussion 
>> here?
> 
> There's been boatloads of discussion of partitioning, and at least one
> submitted patch, over the past year or so ...
> 
>   regards, tom lane
> 


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Path question

2010-09-01 Thread PostgreSQL - Hans-Jürgen Schönig
On Sep 1, 2010, at 4:10 PM, Tom Lane wrote:

> Boszormenyi Zoltan  writes:
>> we are experimenting with modifying table partitioning
>> so the ORDER BY clause can be pushed down to
>> child nodes on the grounds that:
> 
> This is really premature, and anything you do along those lines now will
> probably never get committed.  The problem is that the transformation
> you propose is wrong unless the planner can prove that the different
> child tables contain nonoverlapping ranges of the sort key.  Now you
> might be intending to add logic to try to prove that from inspection of
> constraints, but I don't believe that reverse-engineering such knowledge
> on the fly is a sane approach: it will be hugely expensive and will add
> that cost even in many situations where the optimization fails to apply.
> 


well, why non-overlapping? the idea is to make append smart enough to take the 
sorted lists from below and merge them which will give sorted output as well.
my original idea was what you described but given Martijn van Oosterhout's 
posting we were pretty confident that we can get along without non-overlapping 
partitions.


> The project direction is that we are going to add some explicit
> representation of partitioned tables.  After that, the planner can just
> know immediately that a range-partitioned sort key is amenable to this
> treatment, and at that point it'll make sense to work on it.
> 


can you outline some ideas here and maybe point to some useful discussion here?


many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to construct an exact plan

2010-08-31 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

here is the ultimate revelation of planner hints in postgres ...
let us praise oleg and teodor for solving a practical problem for practical 
people ...

http://www.sai.msu.su/~megera/wiki/plantuner

try this one ...
it is excellent and definitely helpful for many many people out there.
unfortunately this code is not too well known.

many thanks,

hans



On Aug 30, 2010, at 9:41 PM, Pavel Stehule wrote:

> 2010/8/30 Pei He :
>> Hi,
>> I am hacking postgresql 8.2.5. a) and b) do not work for me.
>> 
>> The situation is that I made a join operator, and a scan operator.
>> And, The join operator requires the scan operator as the inner. So, I
>> need to have the full control of the join plan.
>> 
>> I am not ready to provide the optimization support for the two new
>> operators. And, I want to run some performance tests before working on
>> the optimization part.
>> 
>> So, I want to know if it is possible to directly create a path or a
>> plan, and do a unit test for the operators.
>> 
> 
> yes, it is possible - but it isn't simple. I thing, so better is
> simple implementation of all parts and then runtime blocking some (for
> you not interesting) buildin methods via SET enable_ to off.
> 
> Regards
> 
> Pavel Stehule
> 
>> 
>> Thanks
>> --
>> Pei
>> 
>> On Mon, Aug 30, 2010 at 1:59 PM, Josh Berkus  wrote:
>>> 
 I have developed a new operators, and I want to do some tests on it.
 I do not want the optimizer to choose the plan for me, and I need to
 construct a plan as exact as I want.
 
 Can anyone provide me a way to achieve that?
>>> 
>>> a) easy: choose a simple enough query that its plan is always predictable.
>>> 
>>> b) moderate: choose a query whose plan is predictable if you manipulate
>>> the enable_* configuration settings
>>> 
>>> c) hard: hack the PostgreSQL planner to choose a specific execution
>>> plan, and recompile Postgres.
>>> 
>>> --
>>>  -- Josh Berkus
>>> PostgreSQL Experts Inc.
>>> http://www.pgexperts.com
>>> 
>> 
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "micro bucket sort" ...

2010-08-11 Thread PostgreSQL - Hans-Jürgen Schönig
as tom pointed out - this is not possible.
there is no limit 20 in my case - i just used it to indicate that limiting does 
not make the index scan possible which it does in some other cases.
the partial sort thing simon pointed out is what is needed at this point.

many thanks,

hans



On Aug 11, 2010, at 5:29 PM, Alvaro Herrera wrote:

> Excerpts from Hans-Jürgen Schönig's message of mié ago 11 08:21:10 -0400 2010:
> 
>> same with limit ...
>> 
>> 
>> test=# explain analyze select * from t_test order by x, y limit 20;
> 
> But if you put the limit in a subquery which is ordered by the
> known-indexed condition, it is very fast:
> 
> alvherre=# explain analyze select * from (select * from t_test order by x 
> limit 20) f order by x, y;
>   QUERY PLAN  
>   
> ─
> Sort  (cost=1.24..1.29 rows=20 width=8) (actual time=0.252..0.296 rows=20 
> loops=1)
>   Sort Key: t_test.x, t_test.y
>   Sort Method:  quicksort  Memory: 26kB
>   ->  Limit  (cost=0.00..0.61 rows=20 width=8) (actual time=0.051..0.181 
> rows=20 loops=1)
> ->  Index Scan using idx_a on t_test  (cost=0.00..30408.36 
> rows=100 width=8) (actual time=0.046..0.098 rows=20 loops=1)
> Total runtime: 0.425 ms
> (6 filas)
> 
> 
> I guess it boils down to being able to sort a smaller result set.
> 
> -- 
> Álvaro Herrera 
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> 


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] "micro bucket sort" ...

2010-08-11 Thread Hans-Jürgen Schönig
hello all ...

i am bugged with a small issue which is basically like this ...

test=# create table t_test as select x, x % 5 as y from generate_series(1, 
100) AS x;
SELECT
test=# create index idx_a on t_test (x) ;
CREATE INDEX
test=# ANALYZE ;
ANALYZE
test=# explain analyze select * from t_test order by x;
 QUERY PLAN 


 Index Scan using idx_a on t_test  (cost=0.00..30408.36 rows=100 
width=8) (actual time=0.057..311.832 rows=100 loops=1)
 Total runtime: 392.943 ms
(2 rows)

we know that we get sorted output from the index and thus we do the index 
traversal here ...
if you add a condition to the sorting you will naturally get a sort in postgres 
because y is clearly now known to be sorted.

test=# explain analyze select * from t_test order by x, y;
   QUERY PLAN   


 Sort  (cost=141431.84..143931.84 rows=100 width=8) (actual 
time=1086.014..1271.257 rows=100 loops=1)
   Sort Key: x, y
   Sort Method:  external sort  Disk: 17608kB
   ->  Seq Scan on t_test  (cost=0.00..14425.00 rows=100 width=8) (actual 
time=0.024..143.474 rows=100 loops=1)
 Total runtime: 1351.848 ms
(5 rows)


same with limit ...


test=# explain analyze select * from t_test order by x, y limit 20;
  QUERY PLAN
  
--
 Limit  (cost=41034.64..41034.69 rows=20 width=8) (actual time=317.939..317.943 
rows=20 loops=1)
   ->  Sort  (cost=41034.64..43534.64 rows=100 width=8) (actual 
time=317.934..317.936 rows=20 loops=1)
 Sort Key: x, y
 Sort Method:  top-N heapsort  Memory: 26kB
 ->  Seq Scan on t_test  (cost=0.00..14425.00 rows=100 width=8) 
(actual time=0.019..144.109 rows=100 loops=1)
 Total runtime: 317.995 ms
(6 rows)

now, the problem is: i cannot easily create additional indexes as i have too 
many possible "second" conditions here.
what makes it even more funny: i don't have enough space  to do the resort of 
the entire thing (X TB).
so, a more expensive index traversal is my only option.

my question is: is there already a concept out there to make this work or does 
anybody know of a patch out there addressing an issue like that?
some idea is heavily appreciated. it seems our sort key infrastructure is not 
enough for this.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] non-overlapping, consecutive partitions

2010-07-29 Thread Hans-Jürgen Schönig
hello ...

yeah, this is fairly complicated.

greg:
can you send me how far you got?
i would be curious to see how you have attacked this issue.

i am still in the process of checking the codes.
we somehow have to find a solution for that. otherwise we are in slight trouble 
here.
it seems we have to solve it no matter what it takes.

many thanks,

hans


On Jul 26, 2010, at 1:14 AM, Robert Haas wrote:

> On Sun, Jul 25, 2010 at 6:40 PM, Greg Stark  wrote:
>> 2010/7/25 Robert Haas :
>>> 2010/7/25 PostgreSQL - Hans-Jürgen Schönig :
>>>> 
>>>> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:
>>>> 
>>>>> I think the right way to approach this is to teach the planner about
>>>>> merge sorts.
>> 
>> For what it's worth I think this is a belt-and-suspenders type of
>> situation where we want two solutions which overlap somewhat.
>> 
>> I would really like to have merge-append nodes because there are all
>> sorts of plans where append nodes destroying the ordering of their
>> inputs eliminates a lot of good plans. Those cases can be UNION ALL
>> nodes, or partitions where there's no filter on the partition key at
>> all.
>> 
>> But for partitioned tables like the OPs the "real" solution would be
>> to have more structured meta-data about the partitions that allows the
>> planner to avoid needing the merge at all. It would also means the
>> planner wouldn't need to look at every node; it could do a binary
>> search or equivalent for the right partitions.
> 
> Agreed on all points.
> 
>>> Greg Stark had a patch to do this a while back called merge append,
>>> but it never got finished...
>> 
>> I was basically in over my head with the planner. I don't understand
>> how equivalent classes are used or should be used and didn't
>> understand the code I was pointed at as being analogous. It's probably
>> not so complicated as all that, but I never really wrapped my head
>> around it and moved onto tasks I could make more progress on.
> 
> Yeah, I don't fully understand those either.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] non-overlapping, consecutive partitions

2010-07-25 Thread PostgreSQL - Hans-Jürgen Schönig

On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:

> On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote:
>>  create table foo ( x date );
>>  create table foo_2010 () INHERITS (foo)
>>  create table foo_2009 () INHERITS (foo)
>>  create table foo_2008 () INHERITS (foo)
>> 
>> now we add constraints to make sure that data is only in 2008, 2009 and 2010.
>> we assume that everything is indexed:
>> 
>> SELECT * FROM foo ORDER BY bar  will now demand an ugly sort for this data.
>> this is not an option if you need more than a handful of rows ...
> 
> I think the right way to approach this is to teach the planner about
> merge sorts. This is, if the planner has path to foo_* all ordered by
> the same key (because they have the same indexes) then it has a path to
> the UNION of those tables simply by merging the results of those paths.
> 
> This would be fairly straight forward to implement I think, you may
> even be able to reuse the merge sort in the normal sort machinery.
> (You'll need to watch out for UNION vs UNION ALL.)
> 
> The real advantage of this approach is that you no longer have to prove
> anything about the constraints or various datatypes and it is more
> general. Say you have partitioned by start_date but you want to sort by
> end_date, simple index scanning won't work while a merge sort will work
> beautifully.
> 
> You're also not limited to how the partitioning machinery will
> eventually work.
> 
> Hope this helps,


i think this is excellent input.
i will do some research going into that direction.

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] non-overlapping, consecutive partitions

2010-07-23 Thread Hans-Jürgen Schönig
hello everybody,

i have just come across some issue which has been bugging me for a while.
consider:

SELECT * FROM foo ORDER BY bar;

if we have an index on bar, we can nicely optimize away the sort step by 
consulting the index - a btree will return sorted output.
under normal circumstances it will be seq->sort but doing some config settings 
we can turn this into an index scan nicely to avoid to the sort (disk space is 
my issue here).

this is not so easy anymore:

create table foo ( x date );
create table foo_2010 () INHERITS (foo)
create table foo_2009 () INHERITS (foo)
create table foo_2008 () INHERITS (foo)

now we add constraints to make sure that data is only in 2008, 2009 and 2010.
we assume that everything is indexed:

SELECT * FROM foo ORDER BY bar  will now demand an ugly sort for this data.
this is not an option if you need more than a handful of rows ...

if constraints are non overlapping and if they are based on a "sortable" data 
type, we might be able to scan one index after the other and get a sorted list.
why is this an issue? imagine a case where you want to do billing, eg. some 
phone calls. the job now is: the last 10 calls of a customer are free and you 
want to sum up those which are not free.
to do that you basically need a sorted list per customer. if you have data here 
which is partitioned over time you are screwed up because you want to return a 
sorted list taken from X partitions to some higher level operation (windowing 
or whatever).
resorting vast amounts of data is a killer here. in the particular case i am 
talking about my problem is roughly 2 TB scaled out to some PL/proxy farm.

does anybody see a solution to this problem?
what are the main showstoppers to make something like this work?

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SHOW TABLES

2010-07-16 Thread Hans-Jürgen Schönig

On Jul 15, 2010, at 6:43 PM, Magnus Hagander wrote:

> On Thu, Jul 15, 2010 at 18:35, Simon Riggs  wrote:
>> On Thu, 2010-07-15 at 17:38 +0200, Magnus Hagander wrote:
>> 
>>> Is there an actual common use-case for having these commands available
>>> for *non-psql* interfaces?
>> 
>> There are many interfaces out there and people writing new ones
>> everyday. We just wrote an interface for Android, for example.
>> 
>> It is arguably *more* important to do this from non-psql interfaces.
>> 
>> There should be one command to "display a list of tables" and it needs
>> to be easily guessable for those who have forgotten.
> 
> The downside is that you are then limited to what can be returned as a
> resultset. A "\d table" in psql returns a hell of a lot more than
> that. So do we keep two separate formats for this? Or do we remove the
> current, useful, output format in favor of a much worse formt just to
> support more clients?
> 


i am not seeing this as an "instead" solution. this is an "additional" solution.
SHOW TABLES etc could return a set of table. there is not need to change good 
of \d for that.
it just a plain add on. everything else would be simply bad.

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SHOW TABLES

2010-07-15 Thread Hans-Jürgen Schönig

On Jul 15, 2010, at 6:20 PM, Thom Brown wrote:

> On 15 July 2010 17:16, Marc G. Fournier  wrote:
>> On Thu, 15 Jul 2010, Thom Brown wrote:
>> 
>>> On 15 July 2010 17:07, Marc G. Fournier  wrote:
 
 On Thu, 15 Jul 2010, Thom Brown wrote:
 
> If it's only a psql problem, why implement it as SQL?  Is it just so
> we're
> not adding keywords specifically to psql?  In that case, it shouldn't
> support QUIT.
 
 Personally, I think this is somethign that should go into the backend ...
 I'd like to be able to write perl scripts that talk to the backend
 without
 having to remember all the various system tables I need to query / join
 to
 get the same results as \d gives me in psql ... same for any interface
 language, really ...
 
>>> 
>>> Isn't that what the information_schema catalog is for?
>> 
>> I'd rather write:
>> 
>> SHOW TABLES;
>> 
>> then:
>> 
>> SELECT  table_name
>>  FROM information_schema.tables
>>  WHERE table_type = 'BASE TABLE'
>>   AND table_schema NOT IN
>>   ('pg_catalog', 'information_schema');
>> 
>> And, the latter, unless I'm doing it regularly, is alot harder to remember
>> then the former ...
> 
> Yes, I see what you mean now.  That would simplify things greatly.
> 
> Thom
> 


exactly ...
and also: how many people outside the "inner circle" do you know who have ever 
seen the information schema?
i have been in postgres business for more than 10 years (full time) and i 
cannot name 5 customers who ever used the information schema to do "show 
tables" ...
a big argument is: "show tables" (or whatever) could work for all versions to 
come while a direct hit on the pg_class or so would not give you total 
portability forever.

and yes, it is all about simplicity ...
it would not even add too much code to the backend and thus the complexity of 
this feature can really be neglected from a maintenance point of view.

regards,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SHOW TABLES

2010-07-15 Thread Hans-Jürgen Schönig
On Jul 15, 2010, at 5:20 PM, Simon Riggs wrote:

> On Thu, 2010-07-15 at 11:05 -0400, Tom Lane wrote:
>> Simon Riggs  writes:
>>> The biggest turn off that most people experience when using PostgreSQL
>>> is that psql does not support memorable commands.
>> 
>>> I would like to implement the following commands as SQL, allowing them
>>> to be used from any interface.
>> 
>>> SHOW TABLES
>>> SHOW COLUMNS
>>> SHOW DATABASES
>> 
>> This has been discussed before, and rejected before.  Please see
>> archives.
> 
> Many years ago. I think it's worth revisiting now in light of the number
> of people now joining the PostgreSQL community and the greater
> prevalence other ways of doing it. The world has changed, we have not.
> 
> I'm not proposing any change in function, just a simpler syntax to allow
> the above information to be available, for newbies.
> 
> Just for the record, I've never ever met anyone that said "Oh, this \d
> syntax makes so much sense. I'm a real convert to Postgres now you've
> shown me this". The reaction is always the opposite one; always
> negative. Which detracts from our efforts elsewhere.
> 
> -- 
> Simon Riggs   www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Training and Services
> 



simon is absolutely right here.
we should not mind being a little more user friendly in this area.
many people are simply used to this kind of stuff.

remember when you rejected something the last time (not necessarily software). 
was ist because you could not make it work in 2 min or was it because you did 
not like something else?
do you reject buying a car because of a non obvious screw in the engine or 
because "it somehow does not feel right"?

simon made an important point and i can simply agree - regardless of whether it 
has been discussed before or not.
if you die a beautiful death you are still dead after all.

regards,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-15 Thread Hans-Jürgen Schönig
hello ...

a view is already nice but i think it is still too narrow. 
the problem is: you don't want a view for every potential join.
in addition to that - ideally there is not much left of a view when it comes to 
checking for costs.
so, i think, this is not the kind of approach leading to total success here.

one side question: does anybody happen to know how this is one in oracle or db2?

many thanks,

hans



On Jul 15, 2010, at 1:33 AM, Dimitri Fontaine wrote:

> Joshua Tolley  writes:
>   ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id =3D y.id AND x.id=
> 2 =3D y.id2)
 =20
>>> it says X and Y ...  the selectivity of joins are what i am most
>>> interested in. cross correlation of columns within the same table are
>>> just a byproduct.  the core thing is: how can i estimate the number
>>> of rows returned from a join?
>> 
>> All the discussion of this topic that I've seen has been limited to the s=
> ingle
>> table case. The hard problem in that case is coming up with something you=
> can
>> precalculate that will actually be useful during query planning, without
>> taking too much disk, memory, CPU, or something else. Expanding the discu=
> ssion
>> to include join relations certainly still has valid use cases, but is even
>> harder, because you've also got to keep track of precisely how the underl=
> ying
>> relations are joined, so you know in what context the statistics remain v=
> alid.
> 
> Well I've been proposing to handle the correlation problem in another
> way in some past mails here, and I've been trying to write it down too:
> 
>  http://archives.postgresql.org/pgsql-performance/2009-06/msg00118.php
>  http://tapoueh.org/char10.html#sec13
> 
> What I propose is to extend ANALYZE to be able to work on a VIEW too,
> rather than just a table. The hard parts seems to be:
> 
> a. what stats to record, exploiting the view definition the best we can
> b. how to match a user query against the view definitions we have in
>order to actually use the stats
> 
> If you have answers or good ideas=C2=A0:)
> 
> Regards,
> --=20
> dim
> 
> 
> -- 
> dim
> 


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

look at the syntax i posted in more detail:

>>  ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
>> y.id2)
> 



it says X and Y ...
the selectivity of joins are what i am most interested in. cross correlation of 
columns within the same table are just a byproduct.
the core thing is: how can i estimate the number of rows returned from a join?

an example would be: you have a email accounts + messages. you know that each 
row will match in a join as you can assume that every account will have a 
message.
what we need is a syntax which covers the join case and the case where columns 
inside the same table correlate.
and the fact that an index cannot cover two tables leads me to the conclusion 
that stats on an index are not the solution to the join problem.

many thanks,

hans


On Jul 14, 2010, at 4:22 PM, Tom Lane wrote:

> =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?=  
> writes:
>> i think that having stats on an index is a problem by itself for 2 reasons - 
>> for cross column correlation at least:
> 
>>  a.) joins cannot be covered by an index on two tables - we would fix 
>> "inside a table correlation problems" but not joins.
> 
> Your proposed command didn't cover the two-table case either, and anyway
> what the heck do you mean by cross-correlation across tables?
> Cross-correlation is about the correlation between values in the same
> row.
> 
>>  b.) who says that there is actually an index in place?
> 
> If the combination of columns is actually interesting, there might well
> be an index in place, or the DBA might be willing to create it.  For
> that matter, have you considered the idea of examining the index
> contents to derive the statistics?  Might work better than trying to get
> numbers via ANALYZE.
> 
>   regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello tom,

i think that having stats on an index is a problem by itself for 2 reasons - 
for cross column correlation at least:

a.) joins cannot be covered by an index on two tables - we would fix 
"inside a table correlation problems" but not joins.
b.) who says that there is actually an index in place? assume you are 
doing some big seq scan to do analytics. you don't want it to be indexed for 10 
different types of queries.

i think i is pretty hard to determine automatically what to collect because we 
cannot know which permutations of cross-column magic people will use.
i was thinking along the line of having it automatic as well but i could not 
figure out how to do it.
i think we can suggest addition stats to the user and we can write tools to 
figure our somehow what would be useful but personally i cannot see anything 
which is better than a command here.

many thanks,

hans



On Jul 14, 2010, at 4:01 PM, Tom Lane wrote:

> Heikki Linnakangas  writes:
>> On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote:
>>> maybe somehow like this ...
>>> ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)
> 
>> +1 is my general feeling, it's good if you can tell the system to 
>> collect additional statistics where needed.
> 
> The previous discussions about this went in the direction of
> "automatically collect stats if there is an index on that combination of
> columns".  Do we really need a command?
> 
>> However, the problem is how to represent and store the 
>> cross-correlation.
> 
> Yes, whatever the triggering mechanism is for collecting cross-column
> stats, actually doing something useful is the hard part.
> 
>   regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig

On Jul 14, 2010, at 12:40 PM, Heikki Linnakangas wrote:

> On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote:
>> hello everybody,
>> 
>> we are currently facing some serious issues with cross correlation issue.
>> consider: 10% of all people have breast cancer. we have 2 genders (50:50).
>> if i select all the men with breast cancer, i will get basically nobody - 
>> the planner will overestimate the output.
>> this is the commonly known problem ...
>> 
>> this cross correlation problem can be quite nasty in many many cases.
>> underestimated nested loops can turn joins into a never ending nightmare and 
>> so on and so on.
>> 
>> my ideas is the following:
>> what if we allow users to specifiy cross-column combinations where we keep 
>> separate stats?
>> maybe somehow like this ...
>> 
>>  ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)
>> 
>> or ...
>> 
>>  ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
>> y.id2)
>> 
>> clearly we cannot store correlation for all combinations of all columns so 
>> we somehow have to limit it.
>> 
>> what is the general feeling about something like that?
> 
> +1 is my general feeling, it's good if you can tell the system to collect 
> additional statistics where needed. And once you have that, you can write an 
> agent or something to detect automatically which extra statistics might be 
> useful.
> 


it seems i can leave my bunker where i was hiding for cover when i was waiting 
for a reply ;).
yes, my idea was to have an agent as well - but this is just some follow up 
problem.


> However, the problem is how to represent and store the cross-correlation. For 
> fields with low cardinality, like "gender" and boolean "breast-cancer-or-not" 
> you can count the prevalence of all the different combinations, but that 
> doesn't scale. Another often cited example is zip code + street address. 
> There's clearly a strong correlation between them, but how do you represent 
> that?


we could play the same story with a table storing people including their home 
country and the color of their skin.
obviously we will have more black people in african countries..


> 
> For scalar values we currently store a histogram. I suppose we could create a 
> 2D histogram for two columns, but that doesn't actually help with the zip 
> code + street address problem.
> 


i think we might go for a second relation here specifically for this issue and 
a boolean flag in the current stats table indicating that additional 
correlation stats exist (to avoid an additional lookup unless really necessary).
do you have a useful syntax in mind? the thing is: this issue can be isolated 
inside a table (e.g. WHERE a.id = a.id2 AND a.id3 = a.id4) or it might span two 
tables with an arbitrary number of fields.

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello everybody,

we are currently facing some serious issues with cross correlation issue.
consider: 10% of all people have breast cancer. we have 2 genders (50:50).
if i select all the men with breast cancer, i will get basically nobody - the 
planner will overestimate the output.
this is the commonly known problem ...

this cross correlation problem can be quite nasty in many many cases.
underestimated nested loops can turn joins into a never ending nightmare and so 
on and so on.

my ideas is the following:
what if we allow users to specifiy cross-column combinations where we keep 
separate stats?
maybe somehow like this ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)

or ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
y.id2)

clearly we cannot store correlation for all combinations of all columns so we 
somehow have to limit it.

what is the general feeling about something like that?

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ECPG FETCH readahead

2010-06-24 Thread PostgreSQL - Hans-Jürgen Schönig
On Jun 24, 2010, at 2:13 PM, Michael Meskes wrote:

>> I think, yes, it does make sense. Because we are talking
>> about porting a whole lot of COBOL applications.
> 
> COBOL???
> 


yes, COBOL :).
it is much more common than people think.
it is not the first COBOL request for PostgreSQL hitting my desk.
in our concrete example we are using a C module written with ECPG which is 
magically attached to tons of COBOL code ...



>> The ESQL/C or ECPG connector was already written
>> the Informix quirks in mind, so it fetches only one record
>> at a time passing it to the application. And similar performance
>> is expected from ECPG - which excpectation is not fulfilled
>> currently because libecpg doesn't do the same caching as
>> ESQL/C does.
> 
> Eh, you are talking about a program you wrote for your customer or they wrote
> themselves, right? I simply refuse to add this stuff only to fix this 
> situation
> for that one customer of yours if it only hits them. Now the thing to discuss
> is how common is this situation.
> 
> Michael


i think that this cursor issue is a pretty common thing for many codes.
people are usually not aware of the fact that network round trips and parsing 
which are naturally related to "FETCH 1" are a lot more expensive than fetching 
one row somewhere deep inside the DB engine. 
out there there are many applications which fetch data row by row. if an app 
fetches data row by row in PostgreSQL it will be A LOT slower than in, say, 
Informix because most commercial database clients will cache data inside a 
cursor behind the scenes to avoid the problem we try to solve.

currently we are talking about a performance penalty of factor 5 or so. so - it 
is not a small thing; it is a big difference.

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] one more index for pg_tablespace?

2010-02-15 Thread Hans-Jürgen Schönig

hello ...

i have come an interesting corner case this morning and i am not sure  
if it is worth treating this as a bug or as just "bad luck".

imagine creating a directory along with a tablespace ...
hans-jurgen-schonigs-macbook:html hs$ mkdir /tmp/x
hans-jurgen-schonigs-macbook:html hs$ psql test
psql (8.4.1)
Type "help" for help.

test=# create tablespace x location '/tmp/x';
CREATE TABLESPACE
test=# create tablespace x2 location '/tmp/x';
ERROR:  directory "/tmp/x" is not empty
test=# \q

postgres errors our here correctly because it sees that the tablespace  
is not empty. this is perfect ...


hans-jurgen-schonigs-macbook:html hs$ cd /tmp/x
hans-jurgen-schonigs-macbook:x hs$ ls
PG_VERSION
hans-jurgen-schonigs-macbook:x hs$ rm PG_VERSION

now, after killing the PG_VERSION file, i am able to create a  
tablespace pointing to the same directoy.

this should be prevented by one more unique index on the directory.

hans-jurgen-schonigs-macbook:x hs$ psql test
psql (8.4.1)
Type "help" for help.

test=# create tablespace x2 location '/tmp/x';
CREATE TABLESPACE
test=# \d pg_tablespace
  Table "pg_catalog.pg_tablespace"
   Column|   Type| Modifiers
-+---+---
 spcname | name  | not null
 spcowner| oid   | not null
 spclocation | text  |
 spcacl  | aclitem[] |
Indexes:
"pg_tablespace_oid_index" UNIQUE, btree (oid), tablespace  
"pg_global"
"pg_tablespace_spcname_index" UNIQUE, btree (spcname), tablespace  
"pg_global"

Tablespace: "pg_global"

test=# SELECT * FROM pg_tablespace;
  spcname   | spcowner |  spclocation  | spcacl
+--+---+
 pg_default |   10 |   |
 pg_global  |   10 |   |
 x  |   10 | /tmp/x|
 x2 |   10 | /tmp/x|
(6 rows)

now, killing PG_VERSION manually is not what people do but what can  
happen is that, say, an NFS connection is gone or that somehow the  
directory is empty because of some other network filesystem doing some  
funny thing. it is quite realistic that this can happen.


how about one more unique index here?
pg_tablespace does not look too good with a duplicate entry ...

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] draft RFC: concept for partial, wal-based replication

2009-11-30 Thread Hans-Jürgen Schönig




Just a side note: in addition to its use for partial replication, this
might have potential for performance-prioritizing databases or  
tablespaces.




hello ...

this is an absolutely non-starter. the WAL is designed to be "hyper  
ordered" and hyper critical. once you fuck up order you will end up  
with a total disaster. WAL has to be applied in perfect order without  
skipping depending objects and so on. any concept which tries to get  
around those fundamental law is either broken.


hans


--
Cybertec Schönig & Schönig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] draft RFC: concept for partial, wal-based replication

2009-11-30 Thread Hans-Jürgen Schönig


On Nov 30, 2009, at 10:32 AM, Stefan Kaltenbrunner wrote:


Andres Freund wrote:

On Monday 30 November 2009 03:57:11 Itagaki Takahiro wrote:

Boszormenyi Zoltan  wrote:

we tried to discuss on a lower level what should be needed
for a partial replication based on streaming replication.
We need to discuss a "partial recovery" before the partial  
replication.
If you do the filtering on the sending side you dont actually need  
partial recover in the sense that you filter in the rmgr or similar.

Or do I miss something?


the question is if filtering on the sending side is actually the  
"right thing" to do.
It increases the overhead and the complexity on the master,  
especially if you think about different (partial) replication  
agreements for different slaves and it might also be hard to  
integrate with the planned sync/async modes.
On the other hand if you filter on the master you might be able to  
avoid a lot of network traffic du to filtered wal records.
I think for a first step it might make more sense to look into doing  
the filtering on the receiving side and look into actual integration  
with SR at a later stage.



Stefan



hello ...

one problem with not-filtering on the master is that you will end up  
with a lot of complexity if you start adding new tables to a replica  
because you just cannot add tables as easy as when you are doing stuff  
on the slave. the procedure seems ways more complex.
in addition to that you are sending WAL which has to be discarded  
anyway.
we thought about filtering "outside the master" a lot but to me it did  
not sound like good plan.


regards,

hans


--
Cybertec Schönig & Schönig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-09, two weeks on

2009-11-13 Thread Hans-Jürgen Schönig


On Nov 13, 2009, at 8:06 AM, Michael Meskes wrote:


On Thu, Nov 12, 2009 at 03:07:27PM -0500, Robert Haas wrote:
If you want to submit patches in a series like this one, they need  
to be
considered standalone, I think.  The Linux kernel devs work  
differently

than us here.


Zoltan broke them up because Michael asked him to do so.


Actually these patchsets add different features. I see no reason why  
they
should be done as one patch. However, I haven't had the time to look  
into the
latest ones, but at least that was the situation when I asked Zoltan  
to split

the patch.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ: 179140304, AIM/Yahoo/Skype: michaelmeskes, Jabber: mes...@jabber.org
VfL Borussia! Forca Barca! Go SF 49ers! Use: Debian GNU/Linux,  
PostgreSQL






good morning,

are there some pending technical issues with those patches or can we  
basically review and commit?


many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5

2009-05-11 Thread Hans-Jürgen Schönig

hello tom ...

the reason for SELECT FOR UPDATE is very simple:
this is the typical lock obtained by basically every business  
application if written properly (updating a product, whatever).
the problem with NOWAIT basically is that if a small transaction holds  
a a lock for a subsecond, you will already lose your transaction  
because it does not wait at all (which is exactly what you want in  
some cases). however, in many cases you want to compromise on wait  
forever vs. die instantly.
depending on the code path we could decide how long to wait for which  
operation. this makes sense as we would only fire 1 statement instead  
of 3 (set, run, set back).


i agree that a GUC is definitely an option.
however, i would say that adding an extension to SELECT FOR UPDATE,  
UPDATE and DELETE would make more sense form a usability point of view  
(just my 0.02 cents).


if hackers' decides to go for a GUC, we are fine as well and we will  
add it to 8.5.


many thanks,

hans



On May 11, 2009, at 4:46 PM, Tom Lane wrote:


Hans-Juergen Schoenig  writes:
i would like to propose an extension to our SELECT FOR UPDATE  
mechanism.

especially in web applications it can be extremely useful to have the
chance to terminate a lock after a given timeframe.


I guess my immediate reactions to this are:

1. Why SELECT FOR UPDATE in particular, and not other sorts of locks?

2. That "clear and easy to use" oracle syntax sucks.  You do not want
to be embedding lock timeout constants in your application queries.
When you move to a new server and the appropriate timeout changes,
do you want to be trying to update your clients for that?

What I think has been proposed previously is a GUC variable named
something like "lock_timeout", which would cause a wait for *any*
heavyweight lock to abort after such-and-such an interval.  This
would address your point about not wanting to use an overall
statement_timeout, and it would be more general than a feature
that only works for SELECT FOR UPDATE row locks, and it would allow
decoupling the exact length of the timeout from application query
logic.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SYNONYMs revisited

2009-03-04 Thread Hans-Jürgen Schönig

Joshua Tolley wrote:

Way back in this thread[1] one of the arguments against allowing
some version of CREATE SYNONYM was that we couldn't create a synonym for
an object in a remote database. Will the SQL/MED work make this sort of
thing a possibility? I realize since it's not standard anyway, there's
still a discussion or two to be had about how precisely it should work,
but thought I'd raise the possibility.

- Josh / eggyknap

[1] http://archives.postgresql.org/pgsql-patches/2006-03/msg00085.php
  


I still think that this is a useful feature.
If i remember correctly there were two killer arguments against this:
   - it encourages people to do "lousy development"
   - extra overhead
personally i think that this won't be revisted - i cannot see a real 
connection to SQL/MED here.


   best regards,

  hans


--
Cybertec Schönig & Schönig GmbH
PostgreSQL Support, Consulting, Training
www.postgresql-support.de



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hadoop backend?

2009-02-24 Thread Hans-Jürgen Schönig
why not just stream it in via set-returning functions and make sure  
that we can mark a set returning function as "STREAMABLE" or so (to  
prevent joins, whatever).

is it the easiest way to get it right and it helps in many other cases.
i think that the storage manager is definitely the wrong place to do  
this.


it is also easy to use more than just one backend then if you get the  
interface code right.


regards,

hans


On Feb 24, 2009, at 12:03 AM, Jonah H. Harris wrote:

On Sun, Feb 22, 2009 at 3:47 PM, Robert Haas   
wrote:

In theory, I think you could make postgres work on any type of
underlying storage you like by writing a second smgr implementation
that would exist alongside md.c.  The fly in the ointment is that
you'd need a more sophisticated implementation of this line of code,
from smgropen:

   reln->smgr_which = 0;   /* we only have md.c at present */

I believe there is more than that which would need to be done  
nowadays.  I seem to recall that the storage manager abstraction has  
slowly been dedicated/optimized for md over the past 6 years or so.   
It may even be easier/preferred to write a hadoop specific access  
method depending on what you're looking for from hadoop.


--
Jonah H. Harris, Senior DBA
myYearbook.com




--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de



Re: [HACKERS] Hadoop backend?

2009-02-22 Thread Hans-Jürgen Schönig

hi ...

i think the easiest way to do this is to simply add a mechanism to  
functions which allows a function to "stream" data through.
it would basically mean losing join support as you cannot "read data  
again" in a way which is good enough good enough for joining with the  
function providing the data from hadoop.


hannu ( I think) brought up some concept as well some time ago.

i think a straight forward implementation would not be too hard.

best regards,

hans



On Feb 22, 2009, at 3:37 AM, pi song wrote:


1) Hadoop file system is very optimized for mostly read operation
2) As of a few months ago, hdfs doesn't support file appending.

There might be a bit of impedance to make them go together.

However, I think it should a very good initiative to come up with  
ideas to be able to run postgres on distributed file system (doesn't  
have to be specific hadoop).


Pi Song

On Sun, Feb 22, 2009 at 7:17 AM, Paul Sheer   
wrote:

Hadoop backend for PostGreSQL

A problem that my client has, and one that I come across often,
is that a database seems to always be associated with a particular
physical machine, a physical machine that has to be upgraded,
replaced, or otherwise maintained.

Even if the database is replicated, it just means there are two or
more machines. Replication is also a difficult thing to properly
manage.

With a distributed data store, the data would become a logical
object - no adding or removal of machines would affect the data.
This is an ideal that would remove a tremendous maintenance
burden from many sites  well, at least the one's I have worked
at as far as I can see.

Does anyone know of plans to implement PostGreSQL over Hadoop?

Yahoo seems to be doing this:
 
http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html

But they store tables column-ways for their performance situation.
If one is doing a lot of inserts I don't think this is most  
efficient - ?


Has Yahoo put the source code for their work online?

Many thanks for any pointers.

-paul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de



Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!

2008-10-30 Thread Hans-Jürgen Schönig

Bruce Momjian wrote:

Greg Stark wrote:
  
I couldn't get async I/O to work on Linux. That is it "worked" but  
performed the same as reading one block at a time. On solaris the  
situation is reversed.


In what way is fadvise a kludge?



I think he is saying AIO gives us more flexibility, but I am unsure we
need it.
  



absolutely.
posix_fadvise is easy to implement and i would assume that it takes away 
a lot of "guessing" on the OS internals side.
the database usually knows that it is gonna read a lot of data in a 
certain way and it cannot be a bad idea to give the kernel a hint here.
especially synchronized seq scans and so on are real winners here as you 
stop confusing the kernel with XX concurrent readers on the same file.

this can also be an issue with some controller firmwares and so on.

   many thanks,

  hans


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initial prefetch performance testing

2008-09-22 Thread Hans-Jürgen Schönig


On Sep 22, 2008, at 12:02 PM, Simon Riggs wrote:



On Mon, 2008-09-22 at 04:57 -0400, Greg Smith wrote:


-As Greg Stark suggested, the larger the spindle count the larger the
speedup, and the larger the prefetch size that might make sense.  His
suggestion to model the user GUC as "effective_spindle_count" looks  
like a
good one.  The sequential scan fadvise implementation patch  
submitted uses
the earlier preread_pages name for that parameter, which I agree  
seems

less friendly.


Good news about the testing.



absolutely; we made tests and got similar figures.
also, I/O is much more stable and steady with the patch.




I'd prefer to set this as a tablespace level storage parameter. Since
that is where it would need to live when we have multiple tablespaces.
Specifically as a storage parameter, so we have same syntax for
table-level and tablespace-level storage parameters. That would also
allow us to have tablespace-level defaults for table-level settings.




+1


prefetch_... is a much better name since its an existing industry  
term.

I'm not in favour of introducing the concept of spindles, since I can
almost hear the questions about ramdisks and memory-based storage.  
Plus

I don't ever want to discover that the best setting for
effective_spindles is 7 (or 5) when I have 6 disks because of some
technology shift or postgres behaviour change in the future.




i would definitely avoid to use of "spindles".
i totally agree with simon here. once mature SSD storage or some in- 
memory stuff will be available for the masses, this is not suitable  
anymore.
the best thing would be to simply use the parameter as it was in the  
original patch.
maybe we should simply make the parameter adjustable per table and per  
index. this would automatically cover 95% of all cases such as  
clustered tables and so on.


many thanks and best regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Toasted table not deleted when no out of line columns left

2008-09-22 Thread Hans-Jürgen Schönig


On Sep 22, 2008, at 9:46 AM, Simon Riggs wrote:



On Sun, 2008-09-21 at 12:05 -0400, Tom Lane wrote:

... and it goes on to point out how to force immediate space  
reclamation
if you need that.  These statements apply independently of whether  
any

particular value is toasted or not.

The reason for this choice is that reclaiming the space immediately
would turn DROP COLUMN from a quick operation into a slow one, as it
would have to grovel over every row of the table looking for TOAST
pointers.


Judging from that, the toasted table
cleanup may be part of ALTER TABLE DROP COLUMN.


I thought Hans meant cleanup, not drop?

Perhaps there is room for a function that scans a toast table to  
remove
unreferenced toast data? It could be done much more efficiently than  
the

UPDATE and VACUUM FULL technique. No need to add it into DROP COLUMN,
but that doesn't mean it shouldn't be available somewhere, somehow.

Hans is likely to write this anyway for his customer, so it seems  
worth
defining how it should look so we can accept it into core. VACUUM  
TOAST

perhaps?




hello simon,

we definitely have to do something about this problem. VACUUM FULL is  
not an option at all.
once the last text column is gone (toastable column) we definitely  
have to reclaim space.
we just cannot afford to lose hundreds of gigs of good storage because  
of this missing feature.


so, to comment tom's answer - it is not about not understanding "no";  
it was more a request to get a "how to do it best" because we have to  
do it somehow.


best regards,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Toasted table not deleted when no out of line columns left

2008-09-21 Thread Hans-Jürgen Schönig




*snip*


Judging from that, the toasted table
cleanup may be part of ALTER TABLE DROP COLUMN.


That would only help if you were dropping the last potentially- 
toastable
column of a table.  And implementing it would require introducing  
weird

corner cases into the tuple toaster, because it might now come across
TOAST pointers that point to a no-longer-existent table, and have to
consider that to be a no-op instead of an error condition.

regards, tom lane





tom,

in our test case we had a table with 10 integer columns (nothing else)  
along with a 10 gb toast table - this is why we were a little surprised.

in this case it can definitely be cleaned up.
it is clear that we definitely don't want to change columns directly  
here when a column is dropped. - however, if there is not a single  
toastable column left, we should definitely clean up.

we will compile a patch within the next days to cover this case.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-17 Thread Hans-Jürgen Schönig




this entire thing is not about cartesian products at all.
it is about kicking out "expensive" queries before they even start to  
eat up tons of CPU.
imagine a user asking for "give me all phone call in the US within the  
past 10 years". you could kill the guy instantly because you know that  
this would take ages.
in addition to that you know that in an OLTP context everything which  
is expected to take longer than X cannot be useful anyway.
this has nothing to do with cartesian products or other bad things you  
can do in SQL.

it is just a simple and heuristic check.

many thanks,

hans




My point is that people should _know_ they are using a cartesian
product, and a warning would do that for users who have no need for a
cartesian product and want to be warned about a possible error.

--
 Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
 EnterpriseDB http://enterprisedb.com

 + If your life is a hard drive, Christ can be your backup. +



--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-03 Thread Hans-Jürgen Schönig

hello ...




I still support it. Regrettably, many SQL developers introduce product
joins and other unintentional errors. Why let problem queries through?



i think the killer is that we don't have to wait until the query dies  
with a statement_timeout.
it is ways more elegant to kill things before they have already eaten  
too many cycles.
one thing which is important as well: statement_cost_limit  does not  
kill queries which have just been waiting for a lock.

this makes things slightly more predictable.



Security-wise they're great Denial of Service attacks, bringing the
server to its knees better than most ways I know, in conjunction  
with a

nice hefty work_mem setting. 27 table product joins: memory, CPU, I/O
and diskspace resources used all in a simple killer query.




i am not too concerned about DNS, i have to admit.
i would rather see it as a way to make developers do better things.



If anybody thinks costs are inaccurate, don't use it. Or better still
improve the cost models. It isn't any harder or easier to find a  
useful
value than it is to use statement_timeout. What's the difference  
between

picking an arbitrary time and an arbitrary cost? You need to alter the
value according to people's complaints in both cases.



the cost model is good enough to see if something is good or bad.
this is basically all we want to do here --- killing all evil.



*snip*






A compromise would be to have log_min_statement_cost (or
warn_min_statement_cost) which will at least help find these  
problems in

testing before we put things live, but that still won't help with
production issues.




definitely. a good idea as well - but people will hardly read it, i  
guess :(.



Another alternative would be to have a plugin that can examine the  
plan
immediately after planner executes, so you can implement this  
yourself,

plus some other possibilities.




this would be really fancy.
how could a plugin like that look like?

hans



--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-02 Thread Hans-Jürgen Schönig

On Aug 2, 2008, at 8:38 PM, Tom Lane wrote:


Andrew Dunstan <[EMAIL PROTECTED]> writes:

Hans-Jürgen Schönig wrote:

i introduced a GUC called statement_cost_limit which can be used to
error out if a statement is expected to be too expensive.



You clearly have far more faith in the cost estimates than I do.


Wasn't this exact proposal discussed and rejected awhile back?

regards, tom lane




i don't remember precisely.
i have seen it on simon's wiki page and it is something which would  
have been useful in some cases in the past.


many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Mini improvement: statement_cost_limit

2008-08-02 Thread Hans-Jürgen Schönig

hello ...

i picked up csaba nagy's idea and implemented a very simple yet very  
useful extension.
i introduced a GUC called statement_cost_limit which can be used to  
error out if a statement is expected to be too expensive.
the advantage over statement_timeout is that we are actually able to  
error out before spending many seconds which is killed by  
statement_timeout anyway.


best regards,

hans




statement_cost_limit1.patch
Description: Binary data



--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bug with UTF-8 character

2006-05-25 Thread Hans-Jürgen Schönig

good morning,

I got a bug request for the following unicode character in PostgreSQL 
8.1.4: 0xedaeb8


ERROR:  invalid byte sequence for encoding "UTF8": 0xedaeb8

This one seemed to work properly in PostgreSQL 8.0.3.

I think the following code in postgreSQL 814 has a bug in it.

File: postgresql-8.1.4/src/backend/utils/mb/wchar.c


The entry values to the function are:

source = ed ae b8 20 20 20 20 20 20 20 20 20 20 20 20

length = 3 (length is the length of current utf-8 character)

But the code does a check where the second character should not be 
greater than 0x9F, when first character is 0xED. This is not according 
to UTF-8 standard in RFC 3629. I believe that is not a valid test.


This test fails on our string, when it shouldn’t.

I believe this is a bug, could you please confirm or let me know what I 
am doing wrong.



Many thanks,

Hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Tru64/Alpha problems

2006-04-07 Thread Hans-Jürgen Schönig

Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:

I'd like to know some settings that we can use that will get Tru64 
cleanly through the buildfarm set. If noone offers any, I propose that 
we revert the getaddrinfo() test in configure and use our own on Tru64 
until they do.



I have not had any response to this. Is there any objection to my 
reverting the configure changes for the head and 8.1 branches?



Presumably, whoever was complaining beforehand will come back ...
but I don't remember who that was.

regards, tom lane




i think the issue you are referring to comes from a Solaris report.
some patch levels of solaris have seriously broken getaddrinfo(). in 
this case pg_hba.conf cannot be read anymore.
we got a similar report some time ago. we did a simple configure tweak 
to make sure that the onboard function is used. it seems to happen only 
on some strange patchlevel (god knows which ones).


best regards,

hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal for SYNONYMS

2006-03-09 Thread Hans-Jürgen Schönig

Jonah H. Harris wrote:



This email is a preliminary design for the implementation of synonyms in 
PostgreSQL.  Comments and suggestions are welcomed.


BACKGROUND

Synonyms are database objects which can be used in place of their 
referenced object in SELECT, INSERT, UPDATE, and DELETE SQL statements.


There are two reasons to use synonyms which include:

- Abstraction from changes made to the name or location of database objects
- Alternative naming for another database object

Similarly, RDBMS support for synonyms exists in Oracle, SQL Server, DB2, 
SAP DB/MAX DB, and Mimer.


PROPOSED SQL ADDITIONS

CREATE SYNONYM qualified_name FOR qualified_name
DROP SYNONYM qualified_name

In addition, SYNONYMS do participate in ACLs and support GRANT/REVOKE 
for table privileges. DROP TABLE and TRUNCATE cannot be used with synonyms.


DESCRIPTION

- A synonym can be created for a table, view, or synonym.
- Synonyms can reference objects in any schema

RESTRICTIONS

- A synonym may only be created if the creator has some access privilege 
on the referenced object.

- A synonym can only be created for an existing table, view or synonym.
- A synonym name cannot be the same as the name of any other table, view 
or synonym which exists in the schema where the synonym is to be created.


PROPOSED IMPLEMENTATION

- Introduce a new relkind for synonyms
- Synonyms only act as pointers to a real object by oid
- Permission on a synonym does not override the permission on the 
referenced object
- Referenced objects becomes dependencies of the synonyms that reference 
them

- Synonyms follow PostgreSQL's current search_path behavior

RUNTIME COST

- Dependent on database user/administrator
- In catalog searches which do not reference a synonym, the only cost 
incurred is that of searching the additional number of synonym objects 
in the catalog
- In catalog searches which use a synonym, an additional cost is 
incurred to reference the real object

- If no synonyms are created, no additional costs are incurred




hi jonah ...

the main problem i can see here is that it is strictly limited to 
objects stored in pg_class.
however, support for stored procedures would be cool as well. what do 
you suggest for those?


best regards,

hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] status of concurrent VACUUM patch ...

2005-12-20 Thread Hans-Jürgen Schönig

i was just wondering about the status of hannu's concurrent vacuum patch.
are there any plans to integrate this or are there still improvements 
which have to be made?


many thanks,

  hans-juergen schoenig

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] request for enhancement of protocol

2005-11-19 Thread Hans-Jürgen Schönig


i think if the protocol is enhanced again we should also consider adding 
protocol level support for RESET CONNECTION.
i have sent this patch some time ago but i think it is not worth to do 
the remaining protocol level changes (sql level support is finished) if 
this is the only change on the protocol level.


best regards,

hans


Pavel Stehule wrote:

Hello

Meybe is time for some changes. Maybe. I haven't courage for it. But 
maybe is good time for discussion.  What I miss in protocol?


1. debug. support + other level for elog. Current elog is too heavy 
(sometimes)
2. multi result sets. This is necessery for support procedures in DB2, 
MySQL, "ANSI", MsSQL style.
3. session (package) variables and calling procedures with OUT, INOUT in 
normal style, tj. stmt CALL. - heavy task, because I can write function 
a(IN int, IN int), and a(OUT int, OUT int) now. This is problem, and 
need restriction.

4. ping

What is my motivation for 2?
 1. I can write "solution" - stored application. Example: info about 
growing of database. Output is n tables: first table is info about 
database, others about top n - 1 tables, ...
 2. easy reporting. I haven't possibility write stored procedure for 
generating cross table now. I have to do all in two steps (example): 
generate view, select from view. This is difference between procedures 
and functions. Function have to have exactly defined interface. 
Procedures can't.

 3. easy porting from databases which support this style.

sorry for my wrong english.

best regards
Pavel Stehule

_
Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. 
http://www.msn.cz/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] comments on prepared transactions ...

2005-10-06 Thread Hans-Jürgen Schönig

i had to deal with oracle in the past couple of days (*mega sigh*)
i have seen a very interesting feature which would make sense for 
PostgreSQL users.


currently we have:

test=# \h PREPARE TRANSACTION
Command: PREPARE TRANSACTION
Description: prepare the current transaction for two-phase commit
Syntax:
PREPARE TRANSACTION transaction_id

in oracle it is possible to comment transactions:

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-n';

if we added the possibility to comment prepared transactions it would be 
far easier for DBAs to find out what to do with prepared transactions 
once something has gone wrong (at least if an application adds some 
useful data to the comment). usually when the DBA has to fix something 
it has to be done FAST - some additional info would definitely help here ...


what do people think about this?

best regards,

hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] postgresql clustering

2005-09-30 Thread Hans-Jürgen Schönig

Luke Lonergan wrote:

Dan,

On 9/29/05 3:23 PM, "Daniel Duvall" <[EMAIL PROTECTED]> wrote:



What about clustered filesystems?  At first blush I would think the
overhead of something like GFS might kill performance.  Could one
potentially achieve a fail-over config using multiple nodes with GFS,
each having there own instance of PostgreSQL (but only one running at
any given moment)?



Interestingly - my friend Matt O'Keefe built GFS at UMN, I was one of his
first customers/sponsors of the research in 1998 when I implemented an
8-node shared disk cluster on Alpha Linux using GFS and Fibre Channel.

Again - it depends on what you're doing - if it's OLTP, you will spend too
much time in lock management for disk access and things like Oracle RAC's
CacheFusion becomes critical to reduce the number of times you have to hit
disks.  



Hitting the disk is really bad. However, we have seen that consulting 
the network for small portions of data (e.g. locks) is even more 
critical. you will see that the CPU on all nodes is running at 1% or so 
while the network is waiting for data to be exchanged (latency) - this 
is the real problem.


i don't know what oracle is doing in detail but they have real problem 
when losing a node inside the cluster (syncing again is really time 
consuming).




For warehousing/sequential scans, this kind of clustering is
irrelevant.


I suggest to look at Teradata - for do really nice query partitioning on 
so called AMPs (we'd simply call it node). It is really nice for really 
ugly warehousing queries (ugly in terms of amount of data).


Hans



--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: R: [HACKERS] feature proposal ...

2005-09-21 Thread Hans-Jürgen Schönig

Joshua D. Drake wrote:

Hans-Jürgen Schönig wrote:


no because a new is not a heap ...



Why not use a function with a temporary table?

That way you can pass a table parameter that
is the temporary table with a select statement
that you can populate the temp table with.

Sincerely,

Joshua D. Drake




hi joshua ...

temp tables are not an option - there is too much data around.
view are better here, i think ...

cheers,

hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: R: [HACKERS] feature proposal ...

2005-09-21 Thread Hans-Jürgen Schönig

no because a new is not a heap ...

em=# create view x as select * from pg_class;
CREATE VIEW

em=# copy x to '/tmp/x';
ERROR:  cannot copy from view "x"

best regards,

hans



Paolo Magnoli wrote:

Can't you just use a view?

-Messaggio originale-
Da: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] conto di Hans-Jürgen
Schönig
Inviato: mercoledì 21 settembre 2005 15.30
A: pgsql-hackers@postgresql.org; [EMAIL PROTECTED]
Oggetto: [HACKERS] feature proposal ...


hackers,

currently we have to hack tons of export scripts for various customers.
the problem is: if tables can be exported straight forward COPY will
give you all you need but when data has to be transformed while
exporting things start becoming a bit more complex. usually people want
to have CSV file (excel-ify data) which is supported by COPY.

the problem is: COPY can write data returned by a SELECT statement to a
file. our idea is to implement precisely that.

example:

COPY TO file_name USING some_select_statement;

the advantage would be that COPY would then be able to export data and
transform it on the fly. this would save many people a lot of work
because complex data extractors could in many cases be replaced by
simple SQL scripts.

how we plan to implement that:
currently copy simply opens a table and loops through the tuples (see
command/copy.c starting at line 1115).
to implement the desired feature we just had to add some SPI code to the
scenery (SPI will also return HeapTuples so it should fit in there).

Any comments?

Best regards,

Hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 6: explain analyze is your friend




--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] feature proposal ...

2005-09-21 Thread Hans-Jürgen Schönig

Rod Taylor wrote:
the problem is: COPY can write data returned by a SELECT statement to a 
file. our idea is to implement precisely that.


example:

COPY TO file_name USING some_select_statement;



I have run into plenty of cases where I wanted to dump part of a
structure and this could be used for that, but I've always found that
temporary tables were sufficient and equally SQL scriptable

CREATE TEMP TABLE tab AS SELECT ...; COPY tab TO file_name;



Hi Rod,

TEMP TABLE are not suitable for my case. Using a temp table would 
essentially mean that we had to store the data 3 times: Original data, 
temp table + dump. Temp tables are only fine for small amounts of data 
but we are talking about too much data here (my smallest export will 
contain 15.000.000 records).


Best regards,

Hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] feature proposal ...

2005-09-21 Thread Hans-Jürgen Schönig

hackers,

currently we have to hack tons of export scripts for various customers.
the problem is: if tables can be exported straight forward COPY will 
give you all you need but when data has to be transformed while 
exporting things start becoming a bit more complex. usually people want 
to have CSV file (excel-ify data) which is supported by COPY.


the problem is: COPY can write data returned by a SELECT statement to a 
file. our idea is to implement precisely that.


example:

COPY TO file_name USING some_select_statement;

the advantage would be that COPY would then be able to export data and 
transform it on the fly. this would save many people a lot of work 
because complex data extractors could in many cases be replaced by 
simple SQL scripts.


how we plan to implement that:
currently copy simply opens a table and loops through the tuples (see 
command/copy.c starting at line 1115).
to implement the desired feature we just had to add some SPI code to the 
scenery (SPI will also return HeapTuples so it should fit in there).


Any comments?

Best regards,

Hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] DISTINCT vs. GROUP BY

2005-09-19 Thread Hans-Jürgen Schönig
I was wondering whether it is possible to teach the planner to handle 
DISTINCT in a more efficient way:


em=# explain select distinct lastname from import.testtest;
   QUERY PLAN

 Unique  (cost=2647377.45..2709467.70 rows=1 width=7)
   ->  Sort  (cost=2647377.45..2678422.58 rows=12418051 width=7)
 Sort Key: lastname
 ->  Seq Scan on testtest  (cost=0.00..370082.51 rows=12418051 
width=7)

(4 Zeilen)


Isn't it possible to perform the same operation using a HashAggregate?
We have seen that a GROUP BY workaround is usually a lot faster than 
sort->unique - at least when work_mem is large enough.


best regards,

hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] random system table corruption ...

2005-09-11 Thread Hans-Jürgen Schönig

Alvaro Herrera wrote:

On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote:

in the past we have faced a couple of problems with corrupted system 
tables. this seems to be a version independent problem which occurs on 
hackers' from time to time.
i have checked a broken file and i have seen that the corrupted page has 
actually been zeroed out.



IIRC the XFS filesystem zeroes out pages that it recovers from the
journal but did not have a fsync on them (AFAIK XFS journals only
metadata, so page creation but not the content itself).  I don't think
this would be applicable to your case, because we do fsync modified
files on checkpoint, and rewrite them completely from WAL images after
that.  But I thought I'd mention it.




alvora,

thanks a lot.
we have some reports about sun systems.
meanwhile i got the impression that the filesystem might be doing 
something wrong. i have seen that the page is not completely zeroed out. 
at some strange positions there are 2 bytes of crap (i have overlooked 
that at first glance). the first couple hundreds of bytes are crap, 
however. very strange ...


best regards,

hans

--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


  1   2   3   >