Re: [HACKERS] FDW and parallel execution

2017-04-11 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

did you check out antonin houska's patches?
we basically got code, which can do that.

many thanks,

hans


On 04/02/2017 03:30 PM, Konstantin Knizhnik wrote:
> Hi hackers and personally Robet (you are the best expert in both areas).
> I want to ask one more question concerning parallel execution and FDW.
> Below are two plans for the same query (TPC-H Q5): one for normal
> tables, another for FDW to vertical representation of the same data.
> FDW supports analyze function and is expected to produce the similar
> statistic as for original tables.
>
> Query plans are the following:
>
> Normal tables:
>
>  Sort  (cost=2041588.48..2041588.98 rows=200 width=48)
>Sort Key: (sum((lineitem.l_extendedprice * ('1'::double precision -
> lineitem.l_discount DESC
>->  Finalize GroupAggregate  (cost=2041335.76..2041580.83 rows=200
> width=48)
>  Group Key: nation.n_name
>  ->  Gather Merge  (cost=2041335.76..2041568.33 rows=1400
> width=48)
>Workers Planned: 7
>->  Partial GroupAggregate 
> (cost=2040335.64..2040396.71 rows=200 width=48)
>  Group Key: nation.n_name
>  ->  Sort  (cost=2040335.64..2040345.49 rows=3938
> width=40)
>Sort Key: nation.n_name
>->  Hash Join  (cost=605052.97..2040100.48
> rows=3938 width=40)
>  Hash Cond: ((orders.o_custkey =
> customer.c_custkey) AND (nation.n_nationkey = customer.c_nationkey))
>  ->  Hash Join 
> (cost=525126.37..1951647.85 rows=98414 width=52)
>Hash Cond: (lineitem.l_orderkey
> = orders.o_orderkey)
>->  Hash Join 
> (cost=3802.22..1404473.37 rows=654240 width=52)
>  Hash Cond:
> (lineitem.l_suppkey = supplier.s_suppkey)
>  ->  Parallel Seq Scan on
> lineitem  (cost=0.00..1361993.36 rows=8569436 width=16)
>  ->  Hash 
> (cost=3705.97..3705.97 rows=7700 width=44)
>->  Hash Join 
> (cost=40.97..3705.97 rows=7700 width=44)
>  Hash Cond:
> (supplier.s_nationkey = nation.n_nationkey)
>  ->  Seq Scan
> on supplier  (cost=0.00..3090.00 rows=10 width=8)
>  ->  Hash 
> (cost=40.79..40.79 rows=15 width=36)
>-> 
> Hash Join  (cost=20.05..40.79 rows=15 width=36)
> 
> Hash Cond: (nation.n_regionkey = region.r_regionkey)
> 
> ->  Seq Scan on nation  (cost=0.00..17.70 rows=770 width=40)
> 
> ->  Hash  (cost=20.00..20.00 rows=4 width=4)
>   
> ->  Seq Scan on region  (cost=0.00..20.00 rows=4 width=4)
>   
>   
> Filter: ((r_name)::text = 'ASIA'::text)
>->  Hash 
> (cost=484302.37..484302.37 rows=2256542 width=8)
>  ->  Seq Scan on orders 
> (cost=0.00..484302.37 rows=2256542 width=8)
>Filter:
> ((o_orderdate >= '1996-01-01'::date) AND (o_orderdate <
> '1997-01-01'::date))
>  ->  Hash  (cost=51569.64..51569.64
> rows=1499864 width=8)
>->  Seq Scan on customer 
> (cost=0.00..51569.64 rows=1499864 width=8)
>
>
> Plan with FDW:
>
>  Sort  (cost=2337312.28..2337312.78 rows=200 width=48)
>Sort Key: (sum((lineitem_fdw.l_extendedprice * ('1'::double
> precision - lineitem_fdw.l_discount DESC
>->  GroupAggregate  (cost=2336881.54..2337304.64 rows=200 width=48)
>  Group Key: nation.n_name
>  ->  Sort  (cost=2336881.54..2336951.73 rows=28073 width=40)
>Sort Key: nation.n_name
>->  Hash Join  (cost=396050.65..2334807.39 rows=28073
> width=40)
>  Hash Cond: ((orders_fdw.o_custkey =
> customer_fdw.c_custkey) AND (nation.n_nationkey =
> customer_fdw.c_nationkey))
>  ->  Hash Join  (cost=335084.53..2247223.46
> rows=701672 width=52)
>Hash Cond: (lineitem_fdw.l_orderkey =
> orders_fdw.o_orderkey)
>->  Hash Join  (cost=2887.07..1786058.18
> rows=4607421 width=52)
>  Hash Cond: (lineitem_fdw.l_suppkey =
> supplier_fdw.s_suppkey)
> 

Re: [HACKERS] remove checkpoint_warning

2016-07-12 Thread PostgreSQL - Hans-Jürgen Schönig



On 07/09/2016 11:12 PM, Tom Lane wrote:

Alvaro Herrera <alvhe...@2ndquadrant.com> writes:

the checkpoint_warning feature was added by commit 2986aa6a668bce3cfb836
in November 2002 when we didn't have any logging of checkpointing at
all.  I propose to remove it: surely anyone who cares about analyzing
checkpointing behavior nowadays is using the log_checkpoint feature
instead, which contains much more detail.  The other one is just noise
now, and probably ignored amidst the number of other warning traffic.

Hmm, not sure.  ISTM log_checkpoint is oriented to people who know what
they are doing, whereas checkpoint_warning is more targeted to trying
to help people who don't.  Perhaps you could make an argument that
checkpoint_warning is useless because the people whom it's meant to help
won't notice the warning anyway --- but I doubt that it's been
"superseded" by log_checkpoint, because the latter would only be enabled
by people who already have a clue that checkpoint performance is something
to worry about.

Or in short, this may be a fine change to make, but I don't like your
argument for it.

regards, tom lane




i think tom is right here.
log_checkpoint and checkpoint_warning are for totally different people.
we might just want to do one thing: we might want to state explicitly 
that the database cannot break down if this warning shows up.
many people are scared to death that this warning somehow indicates that 
PostgreSQL is about to go up in flames, which is of course not true.
maybe we could do "consider increasing  to ensure good performance" 
or so ...


regards,

hans

--
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Data at rest encryption

2016-06-15 Thread PostgreSQL - Hans-Jürgen Schönig



On 06/14/2016 09:59 PM, Jim Nasby wrote:

On 6/12/16 2:13 AM, Ants Aasma wrote:

On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
<kommi.harib...@gmail.com> wrote:

> 1. Instead of doing the entire database files encryption, how about
> providing user an option to protect only some particular tables that
> wants the encryption at table/tablespace level. This not only 
provides

> an option to the user, it reduces the performance impact on tables
> that doesn't need any encryption. The problem with this approach
> is that every xlog record needs to validate to handle the encryption
> /decryption, instead of at page level.

Is there a real need for this? The customers I have talked to want to
encrypt the whole database and my goal is to make the feature fast
enough to make that feasible for pretty much everyone. I guess
switching encryption off per table would be feasible, but the key
setup would still need to be done at server startup. Per record
encryption would result in some additional information leakage though.
Overall I thought it would not be worth it, but I'm willing to have my
mind changed on this.


I actually design with this in mind. Tables that contain sensitive 
info go into designated schemas, partly so that you can blanket move 
all of those to an encrypted tablespace (or safer would be to move 
things not in those schemas to an unencrypted tablespace). Since that 
can be done with an encrypted filesystem maybe that's good enough. 
(It's not really clear to me what this buys us over an encrypted FS, 
other than a feature comparison checkmark...)


the reason why this is needed is actually very simple: security 
guidelines and legal requirements ...
we have dealt with a couple of companies recently, who explicitly 
demanded PostgreSQL level encryption in a transparent way to fulfill 
some internal or legal requirements. this is especially true for 
financial stuff. and yes, sure ... you can do a lot of stuff with 
filesystem encryption.
the core idea of this entire thing is however to have a counterpart on 
the database level. if you don't have the key you cannot start the 
instance and if you happen to get access to the filesystem you are still 
not able to fire up the DB.

as it said: requirements by ever bigger companies.

as far as benchmarking is concerned: i did a quick test yesterday (not 
with the final AES implementation yet) and i got pretty good results. 
with a reasonably well cached database in a typical application I expect 
to loose around 10-20%. if everything fits in memory there is 0 loss of 
course. the worst I got with the standard AES (no hardware support used 
yet) I lost around 45% or so. but this requires a value as low as 32 MB 
of shared buffers or so.


many thanks,

hans


--
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Implement failover on libpq connect level.

2015-08-18 Thread PostgreSQL - Hans-Jürgen Schönig

 On 18 Aug 2015, at 11:19, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 
 Hans-Jürgen Schönig wrote:
 in addition to that you have the “problem” of transactions. if you failover 
 in the middle
 of a transaction, strange things might happen from the application point of 
 view.
 
 the good thing, however, is that stupid middleware is sometimes not able to 
 handle
 failed connections. however, overall i think it is more of a danger than a 
 benefit.
 
 Maybe I misunderstood the original proposal, but my impression was that the 
 alternative
 servers would be tried only at the time the connection is established, and 
 there would be no
 such problems as you describe.



it would still leave the problem of having a read only on the other side unless 
you are using BDR or so.

regards,

hans




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Implement failover on libpq connect level.

2015-08-18 Thread PostgreSQL - Hans-Jürgen Schönig

 On 18 Aug 2015, at 10:32, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 
 Victor Wagner wrote:
 Rationale
 =
 
 Since introduction of the WAL-based replication into the PostgreSQL, it is
 possible to create high-availability and load-balancing clusters.
 
 However, there is no support for failover in the client libraries. So, only
 way to provide transparent for client application failover is IP address
 migration. This approach has some limitation, i.e. it requires that
 master and backup servers reside in the same subnet or may not be
 feasible for other reasons.
 
 Commercial RDBMS, such as Oracle, employ more flexible approach. They
 allow to specify multiple servers in the connect string, so if primary
 server is not available, client library tries to connect to other ones.
 
 This approach allows to use geographically distributed failover clusters
 and also is a cheap way to implement load-balancing (which is not
 possible with IP address migration).
 
 I wonder how useful this is at the present time.
 
 If the primary goes down and the client gets connected to the standby,
 it would have read-only access there.  Most applications wouldn't cope
 well with that.
 
 Once we have multi-master replication that can be used for fail-over,
 the picture will change.  Then a feature like that would be very useful 
 indeed.
 
host=main-server host=standby1 host=standby2 port=5432 dbname=database
 
 It seems a bit arbitrary to require that all servers use the same port.
 
 Maybe parameters like host2, port2, host3, port3 etc. might be better.
 
 Yours,
 Laurenz Albe


i totally agree with laurenz.
in addition to that you have the “problem” of transactions. if you failover in 
the middle 
of a transaction, strange things might happen from the application point of 
view.

the good thing, however, is that stupid middleware is sometimes not able to 
handle
failed connections. however, overall i think it is more of a danger than a 
benefit.

regards,

hans



 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small GIN optimizations (after 9.4)

2014-02-06 Thread PostgreSQL - Hans-Jürgen Schönig
i think there is one more thing which would be really good in GIN and which 
would solve a ton of issues.
atm GIN entries are sorted by item pointer.
if we could sort them by a column it would fix a couple of real work issues 
such as ...

SELECT ... FROM foo WHERE tsearch_query ORDER BY price DESC LIMIT 10 

... or so.
it many cases you want to search for a, say, product and find the cheapest / 
most expensive one.
if the tsearch_query yields a high number of rows (which it often does) the 
subsequent sort will kill you.

many thanks,

hans


On Feb 6, 2014, at 12:36 PM, Heikki Linnakangas wrote:

 While hacking on the GIN patches, I've come up with a few different ideas for 
 improving performance. It's too late for 9.4, but I'll list them here if 
 someone wants to work on them later:
 
 * Represent ItemPointers as uint64's, to speed up comparisons. 
 ginCompareItemPointers is inlined into only a few instructions, but it's 
 still more expensive than a single-instruction 64-bit comparison. 
 ginCompareItemPointers is called very heavily in a GIN scan, so even a small 
 improvement there would make for a noticeable speedup. It might be an 
 improvement in code clarity, too.
 
 * Keep the entry streams of a GinScanKey in a binary heap, to quickly find 
 the minimum curItem among them.
 
 I did this in various versions of the fast scan patch, but then I realized 
 that the straightforward way of doing it is wrong, because a single 
 GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is 
 updated as part of advancing one key, and the entry is in a heap of another 
 key, updating the curItem can violate the heap property of the other entry's 
 heap.
 
 * Build a truth table (or cache) of consistent-function's results, and use 
 that instead of calling consistent for every item.
 
 * Deduce AND or OR logic from the consistent function. Or have the opclass 
 provide a tree of AND/OR/NOT nodes directly, instead of a consistent 
 function. For example, if the query is foo  bar, we could avoid calling 
 consistent function altogether, and only return items that match both.
 
 * Delay decoding segments during a scan. Currently, we decode all segments of 
 a posting tree page into a single array at once. But with fast scan, we 
 might be able to skip over all entries in some of the segments. So it would 
 be better to copy the segments into backend-private memory in compressed 
 format, and decode them one segment at a time (or maybe even one item at a 
 time), when needed. That would avoid the unnecessary decoding of segments 
 that can be skipped over, and would also reduce memory usage of a scan.
 
 I'll add these to the TODO.
 
 - Heikki
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup throttling

2013-08-21 Thread PostgreSQL - Hans-Jürgen Schönig

On Aug 19, 2013, at 9:11 PM, Andres Freund wrote:

 On 2013-08-19 20:15:51 +0200, Boszormenyi Zoltan wrote:
 2013-08-19 19:20 keltezéssel, Andres Freund írta:
 Hi,
 
 On 2013-07-24 09:20:52 +0200, Antonin Houska wrote:
 Hello,
 the purpose of this patch is to limit impact of pg_backup on running 
 server.
 Feedback is appreciated.
 Based on a quick look it seems like you're throttling on the receiving
 side. Is that a good idea? Especially over longer latency links, TCP
 buffering will reduce the effect on the sender side considerably.
 
 Throttling on the sender side requires extending the syntax of
 BASE_BACKUP and maybe START_REPLICATION so both can be
 throttled but throttling is still initiated by the receiver side.
 
 Seems fine to me. Under the premise that the idea is decided to be
 worthwile to be integrated. Which I am not yet convinced of.

i think there is a lot of value for this one. the scenario we had a couple of 
times is pretty simple:
just assume a weak server - maybe just one disk or two - and a slave.
master and slave are connected via a 1 GB network.
pg_basebackup will fetch data full speed basically putting those lonely disks 
out of business.
we actually had a case where a client asked if PostgreSQL is locked during 
base backup. of 
course it was just disk wait caused by a full speed pg_basebackup.

regarding the client side implementation: we have chosen this way because it is 
less invasive. 
i cannot see a reason to do this on the server side because we won't have 10 
pg_basebackup-style tools making use of this feature anyway.

of course, if you got 20 disk and a 1 gbit network this is useless - but many 
people don't have that.

regards,

hans-jürgen schönig


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup throttling

2013-08-21 Thread PostgreSQL - Hans-Jürgen Schönig

On Aug 21, 2013, at 10:57 AM, Andres Freund wrote:

 On 2013-08-21 08:10:42 +0200, PostgreSQL - Hans-Jürgen Schönig wrote:
 
 On Aug 19, 2013, at 9:11 PM, Andres Freund wrote:
 
 On 2013-08-19 20:15:51 +0200, Boszormenyi Zoltan wrote:
 2013-08-19 19:20 keltezéssel, Andres Freund írta:
 Hi,
 
 On 2013-07-24 09:20:52 +0200, Antonin Houska wrote:
 Hello,
 the purpose of this patch is to limit impact of pg_backup on running 
 server.
 Feedback is appreciated.
 Based on a quick look it seems like you're throttling on the receiving
 side. Is that a good idea? Especially over longer latency links, TCP
 buffering will reduce the effect on the sender side considerably.
 
 Throttling on the sender side requires extending the syntax of
 BASE_BACKUP and maybe START_REPLICATION so both can be
 throttled but throttling is still initiated by the receiver side.
 
 Seems fine to me. Under the premise that the idea is decided to be
 worthwile to be integrated. Which I am not yet convinced of.
 
 i think there is a lot of value for this one. the scenario we had a couple 
 of times is pretty simple:
 just assume a weak server - maybe just one disk or two - and a slave.
 master and slave are connected via a 1 GB network.
 pg_basebackup will fetch data full speed basically putting those lonely 
 disks out of business.
 we actually had a case where a client asked if PostgreSQL is locked during 
 base backup. of 
 course it was just disk wait caused by a full speed pg_basebackup.
 
 regarding the client side implementation: we have chosen this way because it 
 is less invasive. 
 i cannot see a reason to do this on the server side because we won't have 10 
 pg_basebackup-style tools making use of this feature anyway.
 
 The problem is that receiver side throttling over TCP doesn't always
 work all that nicely unless you have a low rate of transfer and/or very
 low latency . Quite often you will have OS buffers/the TCP Window being
 filled in bursts where the sender sends at max capacity and then a
 period where nothing happens on the sender. That's often not what you
 want when you need to throttle.
 
 Besides, I can see some value in e.g. normal streaming replication also
 being rate limited...
 


what would be a reasonable scenario where limiting streaming would make sense? 
i cannot think of any to be honest.

regards,

hans




--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combine non-recursive and recursive CTEs?

2012-06-16 Thread PostgreSQL - Hans-Jürgen Schönig
On Jun 16, 2012, at 8:27 AM, Magnus Hagander wrote:

 I'm not sure if this is something I don't know how to do, or if it's
 something we simply can't do, or if it's something we could do but the
 syntax can't handle :-)
 
 Basically, I'd like to combine a recursive and a non-recursive CTE in
 the same query. If I do it non-recursive, I can do something like:
 
 WITH t1(z) AS (
   SELECT a FROM x
 ),
 t2 AS (
   SELECT z FROM t1
 )
 SELECT * FROM t2;
 
 
 But what if I want t2 to be recursive?
 
 Trying something like:
 WITH t1 (z,b) AS (
   SELECT a,b FROM x
 ),
 RECURSIVE t2(z,b) AS (
   SELECT z,b FROM t1 WHERE b IS NULL
 UNION ALL
   SELECT z,b FROM t2 INNER JOIN t1 ON t2.b=t1.z
 )
 
 I get a syntax error on the RECURSIVE.
 
 Is there any other position in this query that I can put the RECURSIVE
 in order for it to get through?
 
 -- 
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


hm, this is interesting ...

cat /tmp/a.sql 
WITHy AS ( SELECT 1 AS n),
g AS (WITH RECURSIVE x(n) AS
(
SELECT (SELECT n FROM y) AS n
UNION ALL
SELECT n + 1 AS n
FROM x
WHERE n  10))
SELECT * FROM g;

Hans-Jurgen-Scbonigs-MacBook-Pro:sql hs$ psql test  /tmp/a.sql 
ERROR:  syntax error at or near )
LINE 8:  WHERE n  10))

this gives a syntax error as well ... 
if my early morning brain is correct this should be a proper statement ...


regards,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting rid of cheap-startup-cost paths earlier

2012-05-22 Thread PostgreSQL - Hans-Jürgen Schönig

On May 22, 2012, at 9:57 AM, Simon Riggs wrote:

 On 22 May 2012 06:50, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Currently, the planner keeps paths that appear to win on the grounds of
 either cheapest startup cost or cheapest total cost.  It suddenly struck
 me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor
 fast-start preference, etc) we could know a-priori that cheapest startup
 cost is not going to be interesting, and hence immediately discard any
 path that doesn't win on total cost.
 
 My experience is that most people don't provide a LIMIT explicitly
 even when they know that's the desired behaviour. That's because
 either they simply don't understand that SQL can return lots of rows,
 or SQL knowledge isn't enough, or worse that people don't even know
 that specifying it would alter query plans.
 
 Regrettably the current planning of LIMIT clauses causes more problems
 so in many cases these have been explicitly removed from SQL by
 developers that know how many rows they wish to see.
 
 I would have proposed a default-LIMIT parameter before now, but for
 that last point.


this sounds like a total disaster to me ... 
why in the world should we have a default LIMIT parameter? 
i guess if somebody is not able to use LIMIT he should better not touch the DB.
we clearly cannot fix incompetence by adding parameters. 

regards,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-11 Thread PostgreSQL - Hans-Jürgen Schönig
On Oct 7, 2011, at 8:47 PM, Joshua D. Drake wrote:

 
 On 10/07/2011 11:40 AM, Tom Lane wrote:
 Robert Haasrobertmh...@gmail.com  writes:
 Please find attached a patch implementing a basic version of
 index-only scans.
 
 I'm making some progress with this, but I notice what seems like a
 missing feature: there needs to be a way to turn it off.  Otherwise
 performance comparisons will be difficult to impossible.
 
 The most obvious solution is a planner control GUC, perhaps
 enable_indexonlyscan.  Anyone object, or want to bikeshed the name?
 
 enable_onlyindexscan
 
 I'm kidding.
 
 +1 on Tom's proposed name.


+1 ...
definitely an important thing to do.

regards,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What is known about PostgreSQL HP-UX support?

2011-10-10 Thread PostgreSQL - Hans-Jürgen Schönig
On Oct 10, 2011, at 4:21 AM, Alex Goncharov wrote:

 [ Thanks all for the very productive discussion in the thread
  libpq, PQdescribePrepared - PQftype, PQfmod, no PQnullable
  which I originated.  Very useful.  Now on something different. ]
 
 About two years ago, I had to research some PostgreSQL failures on
 HP-UX on a lame PA-RISC box.  Looking at the PostgreSQL source code
 then, I got an impression that running PostgreSQL on HP-UX was an open
 question -- HP-UX didn't seem like a seriously targeted platform.
 
 Was I wrong in my assessment?  Does anybody have a good experience
 running PostgreSQL on HP-UX?  What version of both? PA-RISC? IA64?
 
 Thanks,




hello,

HPUX is not too common these days but it works like a charm.
we have a couple of large IA64 servers running on HPUX at a major customer.
things work without any problems. it compiled out of the box just like expected 
and we have not seen any failures or so for almost 2 years now.

so, thumbs up ... nothing to be afraid of.

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2011-09-19 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 19, 2011, at 5:16 PM, Tom Lane wrote:

 Greg Stark st...@mit.edu writes:
 That said, to help in the case I described you would have to implement
 the tapesort algorithm on the GPU as well.
 
 I think the real problem would be that we are seldom sorting just the
 key values.  If you have to push the tuples through the GPU too, your
 savings are going to go up in smoke pretty quickly …
 


i would argument along a similar line.
to make GPU code fast it has to be pretty much tailored to do exactly one thing 
- otherwise you have no chance to get anywhere close to card-bandwith.
if you look at two similar GPU codes which seem to do the same thing you 
might easily see that one is 10 times faster than the other - for bloody reason 
such as memory alignment, memory transaction size or whatever.
this opens a bit of a problem: PostgreSQL sorting is so generic and so flexible 
that i would be really surprised if somebody could come up with a solution 
which really comes close to what the GPU can do.
it would definitely be interesting to see a prototype, however.

btw, there is a handful of interesting talks / lectures about GPU programming 
provided by the university of chicago (just cannot find the link atm).

regards,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread PostgreSQL - Hans-Jürgen Schönig
hello …

i have been thinking about this issue for quite a while ...
given your idea i am not sure how this can work at all.

consider:
begin;
insert 1
insert 2
commit

assume this ends up in the same node,
now you split it into two …
1 and 2 will have exactly the same visibility to and transaction.
i am not sure how you can get this right without looking at the data.

alternative idea: what if the proxy would add / generate a filter by looking at 
the data?
a quick idea would be that once you split you add a simple directive such as 
FILTER GENERATOR $1 or so to the PL/proxy code.
it would then behind the scene arrange the filter passed on.
what do you think?

regards,

hans



On Sep 1, 2011, at 10:13 AM, Hannu Krosing wrote:

 Hallow hackers
 
 I have the following problem to solve and would like to get advice on
 the best way to do it.
 
 The problem:
 
 When growing a pl/proxy based database cluster, one of the main
 operations is splitting a partition. The standard flow is as follows:
 
 1) make a copy of the partitions table(s) to another database
 2) reconfigure pl/proxy to use 2 partitions instead of one
 
 The easy part is making a copy of all or half of the table to another
 database. The hard part is fast deletion (i mean milliseconds,
 comparable to TRUNCATE) the data that should not be in a partition (so
 that RUN ON ALL functions will continue to return right results).
 
 It would be relatively easy, if we still had the RULES for select
 available for plain tables, but even then the eventual cleanup would
 usually mean at least 3 passes of disk writes (set xmax, write deleted
 flag, vacuum and remove)
 
 What I would like to have is possibility for additional visibility
 checks, which would run some simple C function over tuple data (usually
 hash(fieldval) + and + or ) and return visibility (is in this partition)
 as a result. It would be best if this is run at so low level that also
 vacuum would use it and can clean up the foreign partition data in one
 pass, without doing the delete dance first.
 
 So finally the QUESTION :
 
 where in code would be the best place to check for this so that
 
 1) both regular queries and VACUUM see it
 2) the tuple data (and not only system fields or just xmin/xmax) would
 be available for the function to use
 
 
 -- 
 ---
 Hannu Krosing
 PostgreSQL Unlimited Scalability and Performance Consultant
 2ndQuadrant Nordic
 PG Admin Book: http://www.2ndQuadrant.com/books/
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread PostgreSQL - Hans-Jürgen Schönig
hello …

the goal of the entire proxy thing is to make the right query go to the right 
node / nodes.
we determine this by using a partitioning function and so on …
currently PL/proxy has only a handful of commands - one is RUN ON … which tells 
us where to put things.
assume you issue a select … some select will fall out on the target node. 
to restrict the data coming from the node you could add an additional 
constraint on the way …

say:
SELECT * FROM proxy_table WHERE a = 20;

what you want to reach the node after a split is …

SELECT * FROM proxy_table WHERE a = 20 AND col = filter the wrong half 
away

my idea is to add an additional command to the PL/proxy command set.
it should call a function generating this additional filter.
maybe somehow like that …

RUN ON hashtext($1) 
-- this one already knows about the increased cluster
GENERATE FILTER my_create_the_bloody_filter_func($1)-- this one 
would massage the query going to the node.

it would actually open the door for a lot of additional trickery.
the function would tell the proxy what to append - and: this what would be 
under your full control.

what do you think?
i got to think about it futher but i can envision that this could be feasible 
...

hans


On Sep 2, 2011, at 2:36 PM, Hannu Krosing wrote:

 On Fri, 2011-09-02 at 14:01 +0200, PostgreSQL - Hans-Jürgen Schönig
 wrote:
 hello …
 
 i have been thinking about this issue for quite a while ...
 given your idea i am not sure how this can work at all.
 
 consider:
  begin;
  insert 1
  insert 2
  commit
 
 assume this ends up in the same node,
 now you split it into two …
 1 and 2 will have exactly the same visibility to and transaction.
 i am not sure how you can get this right without looking at the data.
 
 It has to consider the data when determining visibility, that's the
 whole point of the plug-in .
 
 The idea is, that each row belongs to a certain partition, as
 determined by some function over it's fields. Most often this function
 is hash of primary key OR-ed by a bitmap representing cluster size and
 AND-ed by bitmap for partition(s) stored in this database. 
 
 when you split the parition, then some row's don't belong in the old
 partition database anymore (and if you did a full copy, then the other
 half dont belong to the new one), so they should be handled as
 invisible / deleted. As this can be only done by looking at the tuple
 data, this needs an additional visibility function. And as this is only
 needed for partitioned databases, it makes sense to implement it as a
 plogin, so it would not wast cycles on non-partitioned databases
 
 alternative idea: what if the proxy would add / generate a filter by 
 looking at the data?
 a quick idea would be that once you split you add a simple directive 
 such as FILTER GENERATOR $1 or so to the PL/proxy code.
 it would then behind the scene arrange the filter passed on.
 what do you think?
 
 Hmm. I'm not sure I understand what you are trying to say. Can you
 elaborate a little ?
 
 
  regards,
 
  hans
 
 
 
 On Sep 1, 2011, at 10:13 AM, Hannu Krosing wrote:
 
 Hallow hackers
 
 I have the following problem to solve and would like to get advice on
 the best way to do it.
 
 The problem:
 
 When growing a pl/proxy based database cluster, one of the main
 operations is splitting a partition. The standard flow is as follows:
 
 1) make a copy of the partitions table(s) to another database
 2) reconfigure pl/proxy to use 2 partitions instead of one
 
 The easy part is making a copy of all or half of the table to another
 database. The hard part is fast deletion (i mean milliseconds,
 comparable to TRUNCATE) the data that should not be in a partition (so
 that RUN ON ALL functions will continue to return right results).
 
 It would be relatively easy, if we still had the RULES for select
 available for plain tables, but even then the eventual cleanup would
 usually mean at least 3 passes of disk writes (set xmax, write deleted
 flag, vacuum and remove)
 
 What I would like to have is possibility for additional visibility
 checks, which would run some simple C function over tuple data (usually
 hash(fieldval) + and + or ) and return visibility (is in this partition)
 as a result. It would be best if this is run at so low level that also
 vacuum would use it and can clean up the foreign partition data in one
 pass, without doing the delete dance first.
 
 So finally the QUESTION :
 
 where in code would be the best place to check for this so that
 
 1) both regular queries and VACUUM see it
 2) the tuple data (and not only system fields or just xmin/xmax) would
 be available for the function to use
 
 
 -- 
 ---
 Hannu Krosing
 PostgreSQL Unlimited Scalability and Performance Consultant
 2ndQuadrant Nordic
 PG Admin Book: http://www.2ndQuadrant.com/books/
 
 
 -- 
 Sent via pgsql

Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 2, 2011, at 2:59 PM, Hannu Krosing wrote:

 On Fri, 2011-09-02 at 14:51 +0200, PostgreSQL - Hans-Jürgen Schönig
 wrote:
 hello …
 
 the goal of the entire proxy thing is to make the right query go to the 
 right node / nodes.
 we determine this by using a partitioning function and so on …
 currently PL/proxy has only a handful of commands - one is RUN ON … which 
 tells us where to put things.
 assume you issue a select … some select will fall out on the target node. 
 to restrict the data coming from the node you could add an additional 
 constraint on the way …
 
 say:
  SELECT * FROM proxy_table WHERE a = 20;
 
 what you want to reach the node after a split is …
 
  SELECT * FROM proxy_table WHERE a = 20 AND col = filter the wrong half 
 away
 
 my idea is to add an additional command to the PL/proxy command set.
 it should call a function generating this additional filter.
 maybe somehow like that …
 
  RUN ON hashtext($1) 
 -- this one already knows about the increased cluster
  GENERATE FILTER my_create_the_bloody_filter_func($1)-- this one 
 would massage the query going to the node.
 
 it would actually open the door for a lot of additional trickery.
 the function would tell the proxy what to append - and: this what would be 
 under your full control.
 
 what do you think?
 
 Hmm, could work for simplest cases, but this has 2 main problems:
 
 1) you need a full SQL parser to make this generally useful for plain
 SQL


i think that everything beyond a simple case is pretty hard to achieve anyway. 
to me it looks pretty impossible to solve this in a generic way without same 
insane amount of labor input - at listen given the ideas coming to me in the 
past.
and yes, functions are an issue. unless you have some sort of virtually 
private database thing it is close to impossible (unless you want to try some 
nightmare based on views / constraint exclusion on the partitions or so).

regards,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Caching Python modules

2011-08-17 Thread PostgreSQL - Hans-Jürgen Schönig
hello …

i have just fallen over a nasty problem (maybe missing feature) with PL/Pythonu 
…
consider:

-- add a document to the corpus
CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) RETURNS 
float4 AS $$

from SecondCorpus import SecondCorpus
from SecondDocument import SecondDocument

i am doing some intense text mining here.
the problem is: is it possible to cache those imported modules from function to 
function call.
GD works nicely for variables but can this actually be done with imported 
modules as well?
the import takes around 95% of the total time so it is definitely something 
which should go away somehow.
i have checked the docs but i am not more clever now.

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Caching Python modules

2011-08-17 Thread PostgreSQL - Hans-Jürgen Schönig
On Aug 17, 2011, at 2:19 PM, Jan Urbański wrote:

 On 17/08/11 14:09, PostgreSQL - Hans-Jürgen Schönig wrote:
 CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) 
 RETURNS float4 AS $$
 
from SecondCorpus import SecondCorpus
from SecondDocument import SecondDocument
 
 i am doing some intense text mining here.
 the problem is: is it possible to cache those imported modules from function 
 to function call.
 GD works nicely for variables but can this actually be done with imported 
 modules as well?
 the import takes around 95% of the total time so it is definitely something 
 which should go away somehow.
 i have checked the docs but i am not more clever now.
 
 After a module is imported in a backend, it stays in the interpreter's
 sys.modules dictionary and importing it again will not cause the module
 Python code to be executed.
 
 As long as you are using the same backend you should be able to call
 add_to_corpus repeatedly and the import statements should take a long
 time only the first time you call them.
 
 This simple test demonstrates it:
 
 $ cat /tmp/slow.py
 import time
 time.sleep(5)
 
 $ PYTHONPATH=/tmp/ bin/postgres -p 5433 -D data/
 LOG:  database system was shut down at 2011-08-17 14:16:18 CEST
 LOG:  database system is ready to accept connections
 
 $ bin/psql -p 5433 postgres
 Timing is on.
 psql (9.2devel)
 Type help for help.
 
 postgres=# select slow();
 slow
 --
 
 (1 row)
 
 Time: 5032.835 ms
 postgres=# select slow();
 slow
 --
 
 (1 row)
 
 Time: 1.051 ms
 
 Cheers,
 Jan




hello jan …

the code is actually like this …
the first function is called once per backend. it compiles some fairly fat in 
memory stuff …
this takes around 2 secs or so … but this is fine and not an issue.

-- setup the environment
CREATE OR REPLACE FUNCTION textprocess.setup_sentiment(pypath text, lang text) 
RETURNS void AS $$
import sys
sys.path.append(pypath)
sys.path.append(pypath + /external)

from SecondCorpus import SecondCorpus
import const

GD['path_to_classes'] = pypath
GD['corpus'] = SecondCorpus(lang)
GD['lang'] = lang

return;
$$ LANGUAGE 'plpythonu' STABLE;

this is called more frequently ...

-- add a document to the corpus
CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) RETURNS 
float4 AS $$

from SecondCorpus import SecondCorpus
from SecondDocument import SecondDocument

doc1 = SecondDocument(GD['corpus'].senti_provider, lang, t)
doc1.create_sentences()
GD['corpus'].add_document(doc1)
GD['corpus'].process()
return doc1.total_score
$$ LANGUAGE 'plpythonu' STABLE;

the point here actually is: if i use the classes in a normal python command 
line program this routine does not look like an issue
creating the document object and doing the magic in there is not a problem 
actually …

on the SQL side this is already fairly heavy for some reason ...

 funcid | schemaname  |funcname | calls | total_time | self_time | 
?column? 
+-+-+---++---+--
 235287 | textprocess | setup_sentiment |54 | 100166 |100166 | 
1854
 235288 | textprocess | add_to_corpus   |   996 | 438909 |438909 |  
440

looks like some afternoon with some more low level tools :(.

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronized snapshots

2011-08-15 Thread PostgreSQL - Hans-Jürgen Schönig

On Aug 15, 2011, at 9:40 AM, Simon Riggs wrote:

 On Mon, Aug 15, 2011 at 2:31 AM, Joachim Wieland j...@mcknight.de wrote:
 
 In short, this is how it works:
 
 SELECT pg_export_snapshot();
  pg_export_snapshot
 
  03A1-1
 (1 row)
 
 
 (and then in a different session)
 
 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ (SNAPSHOT = '03A1-1');
 
 I don't see the need to change the BEGIN command, which is SQL
 Standard. We don't normally do that.
 
 If we have pg_export_snapshot() why not pg_import_snapshot() as well?
 
 -- 
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



i would definitely argue for a syntax like the one proposed by Joachim.. i 
could stay the same if this is turned into some sort of flashback 
implementation some day.

regards,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-08-12 Thread PostgreSQL - Hans-Jürgen Schönig
On Aug 12, 2011, at 10:03 PM, Heikki Linnakangas wrote:

 On 11.08.2011 23:06, Robert Haas wrote:
 Comments, testing, review appreciated...
 
 I would've expected this to use an index-only scan:
 
 postgres=# CREATE TABLE foo AS SELECT generate_series(1,10) AS id;
 SELECT 10
 postgres=# CREATE INDEX i_foo ON foo (id) WHERE id = 10;
 CREATE INDEX
 postgres=# VACUUM ANALYZE foo;
 VACUUM
 postgres=# EXPLAIN SELECT id FROM foo WHERE id = 10;
   QUERY PLAN
 -
 Index Scan using i_foo on foo  (cost=0.00..8.27 rows=1 width=4)
   Index Cond: (id = 10)
 (2 rows)
 
 If it's not a predicate index, then it works:
 
 postgres=# DROP INDEX i_foo;
 DROP INDEX
 postgres=# EXPLAIN SELECT id FROM foo WHERE id = 10;
  QUERY PLAN
 ---
 Index Only Scan using i_foo2 on foo  (cost=0.00..8.28 rows=1 width=4)
   Index Cond: (id = 10)
 (2 rows)


is there any plan to revise the cost for index only scans compared to what it 
is now?

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Will switchover still need a checkpoint in 9.1 SR Hot Standby

2011-08-07 Thread PostgreSQL - Hans-Jürgen Schönig

On Aug 7, 2011, at 11:01 AM, Simon Riggs wrote:

 On Sun, Aug 7, 2011 at 8:57 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 In 9.0 (as in earlier versions) a former standby host has to do a full
 checkpoint before becoming available as an independent database instance
 in either switchover or failover scenarios.
 
 For most combinations of of bigger than minimal shared buffers and
 non-memory-speed disks this can take from several seconds to tens of
 minutes on busy systems.
 
 For switchover, you issue a checkpoint first, to reduce this time as
 much as possible.
 
 Is the pre-activation checkpoint still required in 9.1 ?
 
 Yes, but I've found a way to remove them in 9.2 and will be patching that 
 soon.



hi simon,

this is highly interesting. this is am important issue for big iron.
can you share the idea you have in mind?

many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SYNONYMS (again)

2011-06-23 Thread PostgreSQL - Hans-Jürgen Schönig

On Jun 23, 2011, at 12:52 AM, Alvaro Herrera wrote:

 Excerpts from Joshua D. Drake's message of mié jun 22 15:37:17 -0400 2011:
 Per:
 
 http://archives.postgresql.org/pgsql-hackers/2010-11/msg02043.php
 
 It seems we did come up with a use case in the procpid discussion. The 
 ability to change the names of columns/databases etc, to handle the 
 fixing of bad decision decisions during development over time.
 
 Thoughts?
 
 Let's start with what was discussed and supported in that thread, that
 is, databases.  It seems less clear that columns are widely believed to
 be a good idea to have synonyms for.  Besides, synonyms for databases
 should be reasonably simple to implement, which is not something I would
 say for columns.



yes, implementing synonyms is not too hard.
some time ago (3 or 4 years ago most likely) we already posted a patch 
providing support for synonyms.
it was rejected because synonyms were said to be a bad design pattern which app 
developers to do nasty things.
so, if you want to work on it maybe this patch is the place to start.

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SYNONYMS (again)

2011-06-23 Thread PostgreSQL - Hans-Jürgen Schönig

On Jun 23, 2011, at 12:52 AM, Alvaro Herrera wrote:

 Excerpts from Joshua D. Drake's message of mié jun 22 15:37:17 -0400 2011:
 Per:
 
 http://archives.postgresql.org/pgsql-hackers/2010-11/msg02043.php
 
 It seems we did come up with a use case in the procpid discussion. The 
 ability to change the names of columns/databases etc, to handle the 
 fixing of bad decision decisions during development over time.
 
 Thoughts?
 
 Let's start with what was discussed and supported in that thread, that
 is, databases.  It seems less clear that columns are widely believed to
 be a good idea to have synonyms for.  Besides, synonyms for databases
 should be reasonably simple to implement, which is not something I would
 say for columns.



sorry, i missed the links:

http://archives.postgresql.org/pgsql-patches/2006-03/msg00085.php

many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad posix_fadvise support causes initdb to exit ungracefully

2011-06-15 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

2.4? we know that some versions of 2.4 cause problems due to broken 
posix_fadvise. if i remember correctly we built some configure magic into 
PostgreSQL to check for this bug. what does this check do?

many thanks,

hans



On Jun 15, 2011, at 6:12 PM, Merlin Moncure wrote:

 Due to unfortunate environmental conditions (don't ask) I've been
 trying to get postgres 9.0 up and running on a fairly ancient linux --
 redhat EL 3 which as kernel 2.4.21.   initdb borks on the create
 database step with the error message child process exited with error
 code 139.  A bit of tracing revealed the exit was happening at the
 pg_flush_data which basically wraps posix_fadvise.   Disabling fadvise
 support in pg_config_manual.h fixed the problem.
 
 Things brings up a couple of questions:
 *) Are linuxes this old out of support?
 *) Should configure be testing for working posix_fadvise?
 
 merlin
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] workaround for expensive KNN?

2011-04-11 Thread PostgreSQL - Hans-Jürgen Schönig
price has a problem :(.
iphone can be a 20 cents bag or a sticker or a 900 euro thing signed by 
whoever ...
so, words and the sort-number / price are not related in anyway. price is in 
this case no way to narrow down the problem (e.g. evaluate first or so).

many thanks,

hans


On Apr 8, 2011, at 5:25 PM, Oleg Bartunov wrote:

 Hans,
 
 what if you create index (price,title) ?
 
 
 On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
 
 hello ...
 
 i got that one ...
 
   idx_product_t_product_titleprice gist (to_tsvector('german'::regconfig, 
 title), int_price)
 
 so, i have a combined index on text + number.
 to me the plan seems fine ... it looks like a prober KNN traversal.
 the difference between my plan and your plan seems to be the fact that i 
 have, say, 1 mio rows which have handy or so in it (1 mio out of 11 mio or 
 so). you are moving out from one specific place.
 
 my maths is like that:
  11 mio in total
  1 mio matching iphone
  cheapest / most expensive 10 out of this mio needed.
 
 operator classes are all nice and in place:
 
 SELECT 10 - 4 as distance;
 distance
 --
   6
 (1 row)
 
 what does buffers true in your case say?
 
 many thanks,
 
  hans
 
 
 On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:
 
 Probably, you miss two-columnt index. From my early post:
 http://www.sai.msu.su/~megera/wiki/knngist
 
 =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
 to_tsvector('french',address));
 =# SELECT id, address,  (coordinates - 
 '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE 
 coordinates  '(2.29470491409302,48.858263472125)'::point AND 
 to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
  id|   address   |  
dist 
 -+-+-
 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
 2.32488941293945e-05
 4356328 | r Champ de Mars 75007 PARIS |  
 0.00421854756964406
 5200167 | Champ De Mars 75007 Paris   |  
 0.00453564562587288
 9301676 | Champ de Mars, 75007 Paris, |  
 0.00453564562587288
 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France |  
 0.00624152097590896
 1923818 | Champ de Mars Paris, France |  
 0.00838214733539654
 5165953 | 39 Rue Champ De Mars Paris, France  |  
 0.00874410234569529
 7395870 | 39 Rue Champ De Mars Paris, France  |  
 0.00874410234569529
 4358671 | 32 Rue Champ De Mars Paris, France  |  
 0.00876089659276339
 1923742 | 12 rue du Champ de Mars Paris, France   |  
 0.00876764731845995
 (10 rows)
 
 Time: 7.859 ms
 
 =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates  
 '(2.29470491409302,48.858263472125)'::point
 AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
 
   QUERY PLAN
 --
 Limit
  -  Index Scan using spots_idx on spots
Index Cond: ((coordinates  
 '(2.29470491409302,48.858263472125)'::point) AND 
 (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery))
 (3 rows)
 
 
 On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
 
 hello all ...
 
 given oleg's posting before i also wanted to fire up some KNN related 
 question.
 let us consider a simple example. i got some million lines and i want all 
 rows matching a tsquery sorted by price.
 i did some tests:
 
 test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
 product.t_product WHERE to_tsvector('german', title) @@ 
 to_tsquery('german', 'iphone') ORDER BY int_price - 0 LIMIT 10;
  
 QUERY PLAN
 
 -
 --
 Limit  (cost=0.00..41.11 rows=10 width=16) (actual 
 time=36391.717..45542.590 rows=10 loops=1)
 Buffers: shared hit=9 read=5004
 -  Index Scan using idx_product_t_product_titleprice on t_product  
 (cost=0.00..13251.91 rows=3224 width=16) (actual time=
 36391.715..45542.573 rows=10 loops=1)
   Index Cond: (to_tsvector('german'::regconfig, title) @@ 
 '''iphon'''::tsquery)
   Order By: (int_price - 0::bigint)
   Buffers: shared hit=9 read=5004
 Total runtime: 45542.676 ms
 (7 rows)
 
 test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
 product.t_product WHERE to_tsvector('german', title) @@ 
 to_tsquery('german', 'handy') ORDER BY int_price - 0 LIMIT 10

Re: [HACKERS] k-neighbourhood search in databases

2011-04-08 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

i have put some research into that some time ago and as far as i have seen 
there is a 99% chance that no other database can do it the way we do it. it 
seems nobody comes even close to it (especially not in the flexibility-arena).

oracle: disgusting workaround ...
http://www.orafaq.com/usenet/comp.databases.oracle.misc/2005/11/03/0083.htm

db2: disgusting workaround (no server side code it seems)

sybase: disgusting workaround (no serverside code it seems)

microsoft: there seems to be something coming out (or just out) but i have not 
seen anything working yet.

regards,

hans



On Apr 8, 2011, at 2:21 PM, Oleg Bartunov wrote:

 Hi there,
 
 I'm interesting if other databases provides built-in effective knn search ? 
 Google didn't help me.
 
   Regards,
   Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] workaround for expensive KNN?

2011-04-08 Thread PostgreSQL - Hans-Jürgen Schönig
hello all ...

given oleg's posting before i also wanted to fire up some KNN related question.
let us consider a simple example. i got some million lines and i want all rows 
matching a tsquery sorted by price.
i did some tests:

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'iphone') ORDER BY int_price - 0 LIMIT 10;

QUERY PLAN   
  
-
--
 Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 
rows=10 loops=1)
   Buffers: shared hit=9 read=5004
   -  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..13251.91 rows=3224 width=16) (actual time=
36391.715..45542.573 rows=10 loops=1)
 Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''iphon'''::tsquery)
 Order By: (int_price - 0::bigint)
 Buffers: shared hit=9 read=5004
 Total runtime: 45542.676 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'handy') ORDER BY int_price - 0 LIMIT 10;

QUERY PLAN   
 
-
-
 Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 
rows=10 loops=1)
   Buffers: shared hit=3 read=2316
   -  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
7243.524..10935.217 rows=10 loops=1)
 Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
 Order By: (int_price - 0::bigint)
 Buffers: shared hit=3 read=2316
 Total runtime: 10935.265 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'handy') ORDER BY int_price - 0 LIMIT 1;
 QUERY 
PLAN  
   
-
---
 Limit  (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 
loops=1)
   Buffers: shared hit=1 read=1577
   -  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
28.525..28.525 rows=1 loops=1)
 Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
 Order By: (int_price - 0::bigint)
 Buffers: shared hit=1 read=1577
 Total runtime: 28.558 ms
(7 rows)


under any circumstances - there is no way to reduce the number of buffers 
needed for a query like that.
if everything is cached this is still ok but as soon as you have to take a 
single block from disk you will die a painful random I/O death.
is there any alternative which does not simply die when i try to achieve what i 
want?

the use case is quite simple: all products with a certain word (10 cheapest or 
so).

is there any alternative approach to this?
i was putting some hope into KNN but it seems it needs too much random I/O :(.

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] workaround for expensive KNN?

2011-04-08 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

i got that one ...

idx_product_t_product_titleprice gist (to_tsvector('german'::regconfig, 
title), int_price)

so, i have a combined index on text + number.
to me the plan seems fine ... it looks like a prober KNN traversal.
the difference between my plan and your plan seems to be the fact that i have, 
say, 1 mio rows which have handy or so in it (1 mio out of 11 mio or so). you 
are moving out from one specific place.

my maths is like that:
11 mio in total
1 mio matching iphone
cheapest / most expensive 10 out of this mio needed.

operator classes are all nice and in place:

SELECT 10 - 4 as distance;
 distance 
--
6
(1 row)

what does buffers true in your case say?

many thanks,

hans


On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:

 Probably, you miss two-columnt index. From my early post:
 http://www.sai.msu.su/~megera/wiki/knngist
 
 =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
 to_tsvector('french',address));
 =# SELECT id, address,  (coordinates - 
 '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE 
 coordinates  '(2.29470491409302,48.858263472125)'::point AND 
 to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
   id|   address   |   
   dist 
 -+-+-
  366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
 2.32488941293945e-05
 4356328 | r Champ de Mars 75007 PARIS |  
 0.00421854756964406
 5200167 | Champ De Mars 75007 Paris   |  
 0.00453564562587288
 9301676 | Champ de Mars, 75007 Paris, |  
 0.00453564562587288
 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France |  
 0.00624152097590896
 1923818 | Champ de Mars Paris, France |  
 0.00838214733539654
 5165953 | 39 Rue Champ De Mars Paris, France  |  
 0.00874410234569529
 7395870 | 39 Rue Champ De Mars Paris, France  |  
 0.00874410234569529
 4358671 | 32 Rue Champ De Mars Paris, France  |  
 0.00876089659276339
 1923742 | 12 rue du Champ de Mars Paris, France   |  
 0.00876764731845995
 (10 rows)
 
 Time: 7.859 ms
 
 =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates  
 '(2.29470491409302,48.858263472125)'::point
 AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
 
QUERY PLAN
 --
 Limit
   -  Index Scan using spots_idx on spots
 Index Cond: ((coordinates  
 '(2.29470491409302,48.858263472125)'::point) AND 
 (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery))
 (3 rows)
 
 
 On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
 
 hello all ...
 
 given oleg's posting before i also wanted to fire up some KNN related 
 question.
 let us consider a simple example. i got some million lines and i want all 
 rows matching a tsquery sorted by price.
 i did some tests:
 
 test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
 product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
 'iphone') ORDER BY int_price - 0 LIMIT 10;
   
 QUERY PLAN
 
 -
 --
 Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 
 rows=10 loops=1)
  Buffers: shared hit=9 read=5004
  -  Index Scan using idx_product_t_product_titleprice on t_product  
 (cost=0.00..13251.91 rows=3224 width=16) (actual time=
 36391.715..45542.573 rows=10 loops=1)
Index Cond: (to_tsvector('german'::regconfig, title) @@ 
 '''iphon'''::tsquery)
Order By: (int_price - 0::bigint)
Buffers: shared hit=9 read=5004
 Total runtime: 45542.676 ms
 (7 rows)
 
 test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
 product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
 'handy') ORDER BY int_price - 0 LIMIT 10;
   
 QUERY PLAN
 
 -
 -
 Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 
 rows=10 loops=1)
  Buffers: shared hit=3 read=2316
  -  Index Scan using idx_product_t_product_titleprice on t_product  
 (cost=0.00..29762.61 rows=7255 width=16) (actual

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread PostgreSQL - Hans-Jürgen Schönig
 
 
 Still, having more data a user can probe would be nice.
 
 I wonder why everyone avoids Microsoft's approach to the subject. 
 Apparently, they go in the 'auto-tune as much as possible' direction.
 And tests we did a while ago, involving asking team from Microsoft and a 
 team from oracle to optimise set of queries for the same set of data 
 (bookies data, loads of it) showed that the auto-tuning Microsoft has in 
 their
 sql server performed much better than a team of over-sweating oracle dba's.
 
 I don't think *anyone* is avoiding that approach.  There is almost
 universal consensus here that auto-tuning is better than manual
 tuning, even to the extent of being unwilling to add knobs to allow
 manual tuning of settings we have no idea how to auto-tune and no
 plans to auto-tune.
 
 In my current work place/camp we have many deployments of the same system, 
 over different types of machines, each with different customer data that 
 vary so much that queries need to be rather generic.
 Postgresql shows its strength with planner doing a good job for different 
 variants of data, however we do a very little tweaking to the configuration 
 parameters. Just because it is just too hard to overlook all of them.
 I guess that the systems could behave much better, but no one is going to 
 tweak settings for 50 different installations over 50 different type of data 
 and 50 different sets of hardware.
 If there was even a tiny amount of automation provided in the postgresql, I 
 would welcome it with open arms.
 
 What do you have in mind?
 



what we are trying to do is to explicitly store column correlations. so, a 
histogram for (a, b) correlation and so on.
the planner code then goes through its restrictions in the query and finds the 
best / longest combination it can find and which has some statistics defined.
it seems we can also do this for join selectivity and expressions. the planner 
code for raw column correlation without expression ( cos(id) or so)  and 
joins is there (WIP, no ANALYZE support and so on so far).

i think auto tuning is a good thing to have and the door to actually do it is 
wide open with our approach.
all it takes is a mechanism to see which conditions are used how often and 
somebody could write a job which automatically tells the system which stats to 
collect / sample.
i think for an average user this is the most simplistic thing then. but, to 
get there we have to get the bloody sampling and the rest of the planner code 
right in the first place.
auto tuning in this area is still something which is far in the future - but at 
least the road to it is clear.

some people suggested some approach dealing with effective_cache_size and so on 
... there are many good approaches here but they don't address the actual 
problem of wrong size-estimates.

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig

On Feb 23, 2011, at 2:58 AM, Robert Haas wrote:

 2011/2/22 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at:
 how does it work? we try to find suitable statistics for an arbitrary length 
 list of conditions so that the planner can use it directly rather than 
 multiplying all the selectivities. this should make estimates a lot more 
 precise.
 the current approach can be extended to work with expressions and well as 
 straight conditions.
 
 /me prepares to go down in flames.
 
 Personally, I think the first thing we ought to do is add a real, bona
 fide planner hint to override the selectivity calculation manually,
 maybe something like this:
 
 WHERE (x  5 AND y = 1) SELECTIVITY (0.1);


i thought there was an agreement that we don't want planner hints?
as tom pointed out - many broken queries come out of some query generator where 
even the design to make the design is broken by design.
personally i like query generators as long as other people use them ... telling 
people that this is the wrong way to go is actually financing my holiday next 
week ... ;).  in general - hibernate and stuff like that is a no-go.

personally i like the type of planner hints oleg and teodor came up with - i 
think we should do more of those hooks they are using but hiding it in some 
syntax is not a good idea.
it does not change the query and it still gives a lot of room to toy around. it 
looks like a compromise.

however, oleg's contrib module does not fix the core problem of cross column 
statistics because a hint is usually static but you want flexible selectivity.

regards,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
 Those are real problems, but I still want it.  The last time I hit
 this problem I spent two days redesigning my schema and adding
 triggers all over the place to make things work.  If I had been
 dealing with a 30TB database instead of a 300MB database I would have
 been royally up a creek.
 
 To put that another way, it's true that some people can't adjust their
 queries, but also some people can.  It's true that nonstandard stuff
 sucks, but queries that don't work suck, too.  And as for better
 solutions, how many major release cycles do we expect people to wait
 for them?  Even one major release cycle is an eternity when you're
 trying to get the application working before your company runs out of
 money, and this particular problem has had a lot of cycles expended on
 it without producing anything very tangible (proposed patch, which
 like you I can't spare a lot of cycles to look at just now, possibly
 excepted).



cheapest and easiest solution if you run into this: add fake functions which 
the planner cannot estimate properly.
use OR to artificially prop up estimates or use AND to artificially lower them. 
there is actually no need to redesign the schema to get around it but it is 
such an ugly solution that it does not even deserve to be called ugly ...
however, fast and reliable way to get around it.

regards,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
 
 
 cheapest and easiest solution if you run into this: add fake functions 
 which the planner cannot estimate properly.
 use OR to artificially prop up estimates or use AND to artificially lower 
 them. there is actually no need to redesign the schema to get around it but 
 it is such an ugly solution that it does not even deserve to be called 
 ugly ...
 however, fast and reliable way to get around it.
 
 We couldn't possibly design a hint mechanism that would be uglier or
 less future-proof than this workaround (which, by the way, I'll keep
 in mind for the next time I get bitten by this).
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
 


i think the main issue is: what we do is ugly because of despair and a lack of 
alternative ... what you proposed is ugly by design ;).
overall: the workaround will win the ugliness contest, however ;).

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig

On Feb 23, 2011, at 3:46 PM, Robert Haas wrote:

 On Wed, Feb 23, 2011 at 12:50 AM, Nathan Boley npbo...@gmail.com wrote:
 Personally, I think the first thing we ought to do is add a real, bona
 fide planner hint to override the selectivity calculation manually,
 maybe something like this:
 
 WHERE (x  5 AND y = 1) SELECTIVITY (0.1);
 
 If you're going to go that far, why not just collect statistics on
 that specific predicate?
 
 ie,  ANALYZE SELECTIVITY ON tablename (x, y) WHERE (x  5 AND y = 1);
 
 Then it won't fall subject to all of the pitfalls that Tom outlines below.
 
 Selectivities are easy to estimate if we know the predicate. They only
 become hard when they have to work for every possible predicate.
 
 Fair point.
 
 -- 
 Robert Haas


basically we got the idea of allowing expressions in cross column stuff. i 
think this can be very useful. it would fix the problem of a query like that:

SELECT * FROM table WHERE cos(field) = some_number;

this takes a constant fraction of the table which is usually plain wrong as 
well (and the error tends to multiply inside the plan).
i am just not sure if i have understood all corner cases of that already.
ultimate goal: get it right for join estimates (this is why a syntax extension 
is definitely needed - you cannot track all of them automatically).

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
On Feb 24, 2011, at 2:09 AM, Josh Berkus wrote:

 
 Personally, I think the first thing we ought to do is add a real, bona
 fide planner hint to override the selectivity calculation manually,
 maybe something like this:
 
 WHERE (x  5 AND y = 1) SELECTIVITY (0.1);
 
 Then, having provided a method for the DBA to extinguish the raging
 flames of searing agony which are consuming them while a crocodile
 chews off their leg and their boss asks them why they didn't use
 Oracle, we can continue bikeshedding about the best way of fixing this
 problem in a more user-transparent fashion.
 
 Is there some way we can do that without adding the selectivity hint to
 the query itself?  That's the biggest issue with hints.
 



well, you could hide this hint in the system table - say; instead of decorating 
the query you could store the decoration in some system relation ... but, if 
you get it right, you call this decoration histogram ;).
i think the patch with a multi-dim histogram is good (i have seen something 
similar for PostGIS).
what is still needed in our patch is a.) multi-dim sampling (no idea how to get 
it right) and b.) investigating how to deal with joins and expressions (e.g. 
cos(id) ).
hints into the right direction are highly welcome.

many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: cross column correlation ...

2011-02-22 Thread PostgreSQL - Hans-Jürgen Schönig
hello everbody,

we have spent some time in finally attacking cross column correlation. as this 
is an issue which keeps bugging us for a couple of applications (some years). 
this is a WIP patch which can do:

special cross column correlation specific syntax:

CREATE CROSS COLUMN STATISTICS ON tablename (field, ...);
DROP CROSS COLUMN STATISTICS ON tablename (field, ...);

we use specific syntax because we simply cannot keep track of all possible 
correlations in the DB so the admi can take care of things explicitly. some 
distant day somebody might want to write a mechanism to derive the desired 
stats automatically but this is beyond the scope of our project for now.

as far as the patch is concerned:
it is patched nicely into clauselist_selectivity(), but has some rough edges, 
even when a cross-col stat is found, the single col selectivities are still 
counted ( = lovering the selectivity even more), this is a TODO.
this patch adds the grammar and the start of planner integration with a static 
selectivity value for now, the previous discussion about cross-column 
statistics can be continued and perhaps comes to fruition soon.

how does it work? we try to find suitable statistics for an arbitrary length 
list of conditions so that the planner can use it directly rather than 
multiplying all the selectivities. this should make estimates a lot more 
precise. 
the current approach can be extended to work with expressions and well as 
straight conditions.

goal: to make cross column correlation work for 9.2 ...

the purpose of this mail is mostly to get the race for a patch going and to see 
if the approach as such is reasonable / feasible.

many thanks,

hans



cross-column-v5.patch
Description: Binary data






--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SELECT ... WHERE fti_query ORDER BY numeric_col LIMIT x - problem

2011-02-04 Thread PostgreSQL - Hans-Jürgen Schönig
I have toyed around with KNN a little and I am pretty impressed when it comes 
to the results we have seen in the GIS world.
Given the infrastructure we have at the moment I wonder if KNN can help to 
speedup queries like that:

SELECT ... WHERE fti_query ORDER BY numeric_col LIMIT x

The use case is fairly simple: Give me all products matching a certain tsquery 
and order those products by price or so to show the top 10.
KNN is supposed to gives sorted output which works perfectly for points and so 
on but can there theoretically be a sensible / reliable distance function for a 
(tsvector / numeric) combination? Some first testing gave me some interesting 
output .
If there is no way of defining a reasonable distance function performance is 
screwed up if fti_query returns a large list (it requires a complete sort of 
all prices then).

does anybody have some useful input here?

many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread PostgreSQL - Hans-Jürgen Schönig
hello everybody,

we came across an issue which turned out to be more serious than previously 
expected.
imagine a system with, say, 1000 partitions (heavily indexed) or so. the time 
taken by the planner is already fairly heavy in this case.

i tried this one with 5000 unindexed tables (just one col):

test=# \timing
Timing is on.
test=# prepare x(int4) AS select * from t_data order by id desc;
PREPARE
Time: 361.552 ms

you will see similar or higher runtimes in case of 500 partitions and a handful 
of indexes.

does anybody see a potential way to do a shortcut through the planner?
a prepared query is no solution here as constraint exclusion would be dead in 
this case (making the entire thing an even bigger drama).

did anybody think of a solution to this problem.
or more precisely: can there be a solution to this problem?

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 3, 2010, at 2:04 PM, Stephen Frost wrote:

 * PostgreSQL - Hans-Jürgen Schönig (postg...@cybertec.at) wrote:
 did anybody think of a solution to this problem.
 or more precisely: can there be a solution to this problem?
 
 Please post to the correct list (-performance) and provide information
 like PG version, postgresql.conf, the actual table definition, the
 resulting query plan, etc, etc...
 
   Thanks,
 
   Stephen



hello stephen,

this seems like more a developer question to me than a pre performance one.
it is not related to the table structure at all - it is basically an issue with 
incredibly large inheritance lists.
it applies to postgres 9 and most likely to everything before.
postgresql.conf is not relevant at all at this point.

the plan is pretty fine.
the question is rather: does anybody see a chance to handle such lists more 
efficiently inside postgres?
also, it is not the point if my data structure is sane or not. it is really 
more generic - namely a shortcut for this case inside the planing process.

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 3, 2010, at 4:40 PM, Tom Lane wrote:

 =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= postg...@cybertec.at 
 writes:
 imagine a system with, say, 1000 partitions (heavily indexed) or so. the 
 time taken by the planner is already fairly heavy in this case.
 
 As the fine manual points out, the current scheme for managing
 partitioned tables isn't intended to scale past a few dozen partitions.
 
 I think we'll be able to do better when we have an explicit
 representation of partitioning, since then the planner won't
 have to expend large amounts of effort reverse-engineering knowledge
 about how an inheritance tree is partitioned.  Before that happens,
 it's not really worth the trouble to worry about such cases.
 
   regards, tom lane
 


thank you ... - the manual is clear here but we wanted to see if there is some 
reasonably low hanging fruit to get around this.
it is no solution but at least a clear statement ...

many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Path question

2010-09-01 Thread PostgreSQL - Hans-Jürgen Schönig
On Sep 1, 2010, at 4:10 PM, Tom Lane wrote:

 Boszormenyi Zoltan z...@cybertec.at writes:
 we are experimenting with modifying table partitioning
 so the ORDER BY clause can be pushed down to
 child nodes on the grounds that:
 
 This is really premature, and anything you do along those lines now will
 probably never get committed.  The problem is that the transformation
 you propose is wrong unless the planner can prove that the different
 child tables contain nonoverlapping ranges of the sort key.  Now you
 might be intending to add logic to try to prove that from inspection of
 constraints, but I don't believe that reverse-engineering such knowledge
 on the fly is a sane approach: it will be hugely expensive and will add
 that cost even in many situations where the optimization fails to apply.
 


well, why non-overlapping? the idea is to make append smart enough to take the 
sorted lists from below and merge them which will give sorted output as well.
my original idea was what you described but given Martijn van Oosterhout's 
posting we were pretty confident that we can get along without non-overlapping 
partitions.


 The project direction is that we are going to add some explicit
 representation of partitioned tables.  After that, the planner can just
 know immediately that a range-partitioned sort key is amenable to this
 treatment, and at that point it'll make sense to work on it.
 


can you outline some ideas here and maybe point to some useful discussion here?


many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Path question

2010-09-01 Thread PostgreSQL - Hans-Jürgen Schönig
hello tom,

yeah, we have followed quite a lot of discussion as well ...
and yes, no patches.

as far as this problem is concerned: we are working on a patch and did some 
prototyping inside the planner already (attached).
the code we have is pretty limited atm (such as checking for a sort clause 
explicitly and so on - it has no support for windowing related optimizations 
and so on so far).

the cost model is not our problem - it is a lot easier to read than the code we 
are fighting with (it is a different level of complexity). i think costs can be 
handled.

yes, this merging adds some costs for sure. we might see a hell amount of 
operators being called - but i think given a reasonable number of partitions it 
is still a lot cheaper than actually resorting ... and, it is a lot more space 
efficient.
in my practical case i cannot resort because we would simply run out of space. 
an index scan is expensive but needs no additional sort space ...
and, merge is O(n) which sort is clearly not.

advise is highly appreciated.

many thanks,

hans




push-down-sort-into-inh-2.patch
Description: Binary data

On Sep 1, 2010, at 5:00 PM, Tom Lane wrote:

 =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= postg...@cybertec.at 
 writes:
 On Sep 1, 2010, at 4:10 PM, Tom Lane wrote:
 This is really premature, and anything you do along those lines now will
 probably never get committed.
 
 well, why non-overlapping? the idea is to make append smart enough to
 take the sorted lists from below and merge them which will give sorted
 output as well.
 
 Well, an extra merge step is going to change the cost comparisons quite
 a bit; see Greg Starks' comments.  But in any case, my point wasn't that
 this is something we should never do; it was that it makes more sense to
 wait till something has happened with explicit partitioning.
 
 The project direction is that we are going to add some explicit
 representation of partitioned tables.
 
 can you outline some ideas here and maybe point to some useful discussion 
 here?
 
 There's been boatloads of discussion of partitioning, and at least one
 submitted patch, over the past year or so ...
 
   regards, tom lane
 


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to construct an exact plan

2010-08-31 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

here is the ultimate revelation of planner hints in postgres ...
let us praise oleg and teodor for solving a practical problem for practical 
people ...

http://www.sai.msu.su/~megera/wiki/plantuner

try this one ...
it is excellent and definitely helpful for many many people out there.
unfortunately this code is not too well known.

many thanks,

hans



On Aug 30, 2010, at 9:41 PM, Pavel Stehule wrote:

 2010/8/30 Pei He hepeim...@gmail.com:
 Hi,
 I am hacking postgresql 8.2.5. a) and b) do not work for me.
 
 The situation is that I made a join operator, and a scan operator.
 And, The join operator requires the scan operator as the inner. So, I
 need to have the full control of the join plan.
 
 I am not ready to provide the optimization support for the two new
 operators. And, I want to run some performance tests before working on
 the optimization part.
 
 So, I want to know if it is possible to directly create a path or a
 plan, and do a unit test for the operators.
 
 
 yes, it is possible - but it isn't simple. I thing, so better is
 simple implementation of all parts and then runtime blocking some (for
 you not interesting) buildin methods via SET enable_ to off.
 
 Regards
 
 Pavel Stehule
 
 
 Thanks
 --
 Pei
 
 On Mon, Aug 30, 2010 at 1:59 PM, Josh Berkus j...@agliodbs.com wrote:
 
 I have developed a new operators, and I want to do some tests on it.
 I do not want the optimizer to choose the plan for me, and I need to
 construct a plan as exact as I want.
 
 Can anyone provide me a way to achieve that?
 
 a) easy: choose a simple enough query that its plan is always predictable.
 
 b) moderate: choose a query whose plan is predictable if you manipulate
 the enable_* configuration settings
 
 c) hard: hack the PostgreSQL planner to choose a specific execution
 plan, and recompile Postgres.
 
 --
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] micro bucket sort ...

2010-08-11 Thread PostgreSQL - Hans-Jürgen Schönig
as tom pointed out - this is not possible.
there is no limit 20 in my case - i just used it to indicate that limiting does 
not make the index scan possible which it does in some other cases.
the partial sort thing simon pointed out is what is needed at this point.

many thanks,

hans



On Aug 11, 2010, at 5:29 PM, Alvaro Herrera wrote:

 Excerpts from Hans-Jürgen Schönig's message of mié ago 11 08:21:10 -0400 2010:
 
 same with limit ...
 
 
 test=# explain analyze select * from t_test order by x, y limit 20;
 
 But if you put the limit in a subquery which is ordered by the
 known-indexed condition, it is very fast:
 
 alvherre=# explain analyze select * from (select * from t_test order by x 
 limit 20) f order by x, y;
   QUERY PLAN  
   
 ─
 Sort  (cost=1.24..1.29 rows=20 width=8) (actual time=0.252..0.296 rows=20 
 loops=1)
   Sort Key: t_test.x, t_test.y
   Sort Method:  quicksort  Memory: 26kB
   -  Limit  (cost=0.00..0.61 rows=20 width=8) (actual time=0.051..0.181 
 rows=20 loops=1)
 -  Index Scan using idx_a on t_test  (cost=0.00..30408.36 
 rows=100 width=8) (actual time=0.046..0.098 rows=20 loops=1)
 Total runtime: 0.425 ms
 (6 filas)
 
 
 I guess it boils down to being able to sort a smaller result set.
 
 -- 
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support
 


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] non-overlapping, consecutive partitions

2010-07-25 Thread PostgreSQL - Hans-Jürgen Schönig

On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:

 On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote:
  create table foo ( x date );
  create table foo_2010 () INHERITS (foo)
  create table foo_2009 () INHERITS (foo)
  create table foo_2008 () INHERITS (foo)
 
 now we add constraints to make sure that data is only in 2008, 2009 and 2010.
 we assume that everything is indexed:
 
 SELECT * FROM foo ORDER BY bar  will now demand an ugly sort for this data.
 this is not an option if you need more than a handful of rows ...
 
 I think the right way to approach this is to teach the planner about
 merge sorts. This is, if the planner has path to foo_* all ordered by
 the same key (because they have the same indexes) then it has a path to
 the UNION of those tables simply by merging the results of those paths.
 
 This would be fairly straight forward to implement I think, you may
 even be able to reuse the merge sort in the normal sort machinery.
 (You'll need to watch out for UNION vs UNION ALL.)
 
 The real advantage of this approach is that you no longer have to prove
 anything about the constraints or various datatypes and it is more
 general. Say you have partitioned by start_date but you want to sort by
 end_date, simple index scanning won't work while a merge sort will work
 beautifully.
 
 You're also not limited to how the partitioning machinery will
 eventually work.
 
 Hope this helps,


i think this is excellent input.
i will do some research going into that direction.

many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello everybody,

we are currently facing some serious issues with cross correlation issue.
consider: 10% of all people have breast cancer. we have 2 genders (50:50).
if i select all the men with breast cancer, i will get basically nobody - the 
planner will overestimate the output.
this is the commonly known problem ...

this cross correlation problem can be quite nasty in many many cases.
underestimated nested loops can turn joins into a never ending nightmare and so 
on and so on.

my ideas is the following:
what if we allow users to specifiy cross-column combinations where we keep 
separate stats?
maybe somehow like this ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)

or ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
y.id2)

clearly we cannot store correlation for all combinations of all columns so we 
somehow have to limit it.

what is the general feeling about something like that?

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig

On Jul 14, 2010, at 12:40 PM, Heikki Linnakangas wrote:

 On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote:
 hello everybody,
 
 we are currently facing some serious issues with cross correlation issue.
 consider: 10% of all people have breast cancer. we have 2 genders (50:50).
 if i select all the men with breast cancer, i will get basically nobody - 
 the planner will overestimate the output.
 this is the commonly known problem ...
 
 this cross correlation problem can be quite nasty in many many cases.
 underestimated nested loops can turn joins into a never ending nightmare and 
 so on and so on.
 
 my ideas is the following:
 what if we allow users to specifiy cross-column combinations where we keep 
 separate stats?
 maybe somehow like this ...
 
  ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)
 
 or ...
 
  ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
 y.id2)
 
 clearly we cannot store correlation for all combinations of all columns so 
 we somehow have to limit it.
 
 what is the general feeling about something like that?
 
 +1 is my general feeling, it's good if you can tell the system to collect 
 additional statistics where needed. And once you have that, you can write an 
 agent or something to detect automatically which extra statistics might be 
 useful.
 


it seems i can leave my bunker where i was hiding for cover when i was waiting 
for a reply ;).
yes, my idea was to have an agent as well - but this is just some follow up 
problem.


 However, the problem is how to represent and store the cross-correlation. For 
 fields with low cardinality, like gender and boolean breast-cancer-or-not 
 you can count the prevalence of all the different combinations, but that 
 doesn't scale. Another often cited example is zip code + street address. 
 There's clearly a strong correlation between them, but how do you represent 
 that?


we could play the same story with a table storing people including their home 
country and the color of their skin.
obviously we will have more black people in african countries..


 
 For scalar values we currently store a histogram. I suppose we could create a 
 2D histogram for two columns, but that doesn't actually help with the zip 
 code + street address problem.
 


i think we might go for a second relation here specifically for this issue and 
a boolean flag in the current stats table indicating that additional 
correlation stats exist (to avoid an additional lookup unless really necessary).
do you have a useful syntax in mind? the thing is: this issue can be isolated 
inside a table (e.g. WHERE a.id = a.id2 AND a.id3 = a.id4) or it might span two 
tables with an arbitrary number of fields.

many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello tom,

i think that having stats on an index is a problem by itself for 2 reasons - 
for cross column correlation at least:

a.) joins cannot be covered by an index on two tables - we would fix 
inside a table correlation problems but not joins.
b.) who says that there is actually an index in place? assume you are 
doing some big seq scan to do analytics. you don't want it to be indexed for 10 
different types of queries.

i think i is pretty hard to determine automatically what to collect because we 
cannot know which permutations of cross-column magic people will use.
i was thinking along the line of having it automatic as well but i could not 
figure out how to do it.
i think we can suggest addition stats to the user and we can write tools to 
figure our somehow what would be useful but personally i cannot see anything 
which is better than a command here.

many thanks,

hans



On Jul 14, 2010, at 4:01 PM, Tom Lane wrote:

 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote:
 maybe somehow like this ...
 ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)
 
 +1 is my general feeling, it's good if you can tell the system to 
 collect additional statistics where needed.
 
 The previous discussions about this went in the direction of
 automatically collect stats if there is an index on that combination of
 columns.  Do we really need a command?
 
 However, the problem is how to represent and store the 
 cross-correlation.
 
 Yes, whatever the triggering mechanism is for collecting cross-column
 stats, actually doing something useful is the hard part.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

look at the syntax i posted in more detail:

  ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
 y.id2)
 



it says X and Y ...
the selectivity of joins are what i am most interested in. cross correlation of 
columns within the same table are just a byproduct.
the core thing is: how can i estimate the number of rows returned from a join?

an example would be: you have a email accounts + messages. you know that each 
row will match in a join as you can assume that every account will have a 
message.
what we need is a syntax which covers the join case and the case where columns 
inside the same table correlate.
and the fact that an index cannot cover two tables leads me to the conclusion 
that stats on an index are not the solution to the join problem.

many thanks,

hans


On Jul 14, 2010, at 4:22 PM, Tom Lane wrote:

 =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= postg...@cybertec.at 
 writes:
 i think that having stats on an index is a problem by itself for 2 reasons - 
 for cross column correlation at least:
 
  a.) joins cannot be covered by an index on two tables - we would fix 
 inside a table correlation problems but not joins.
 
 Your proposed command didn't cover the two-table case either, and anyway
 what the heck do you mean by cross-correlation across tables?
 Cross-correlation is about the correlation between values in the same
 row.
 
  b.) who says that there is actually an index in place?
 
 If the combination of columns is actually interesting, there might well
 be an index in place, or the DBA might be willing to create it.  For
 that matter, have you considered the idea of examining the index
 contents to derive the statistics?  Might work better than trying to get
 numbers via ANALYZE.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ECPG FETCH readahead

2010-06-24 Thread PostgreSQL - Hans-Jürgen Schönig
On Jun 24, 2010, at 2:13 PM, Michael Meskes wrote:

 I think, yes, it does make sense. Because we are talking
 about porting a whole lot of COBOL applications.
 
 COBOL???
 


yes, COBOL :).
it is much more common than people think.
it is not the first COBOL request for PostgreSQL hitting my desk.
in our concrete example we are using a C module written with ECPG which is 
magically attached to tons of COBOL code ...



 The ESQL/C or ECPG connector was already written
 the Informix quirks in mind, so it fetches only one record
 at a time passing it to the application. And similar performance
 is expected from ECPG - which excpectation is not fulfilled
 currently because libecpg doesn't do the same caching as
 ESQL/C does.
 
 Eh, you are talking about a program you wrote for your customer or they wrote
 themselves, right? I simply refuse to add this stuff only to fix this 
 situation
 for that one customer of yours if it only hits them. Now the thing to discuss
 is how common is this situation.
 
 Michael


i think that this cursor issue is a pretty common thing for many codes.
people are usually not aware of the fact that network round trips and parsing 
which are naturally related to FETCH 1 are a lot more expensive than fetching 
one row somewhere deep inside the DB engine. 
out there there are many applications which fetch data row by row. if an app 
fetches data row by row in PostgreSQL it will be A LOT slower than in, say, 
Informix because most commercial database clients will cache data inside a 
cursor behind the scenes to avoid the problem we try to solve.

currently we are talking about a performance penalty of factor 5 or so. so - it 
is not a small thing; it is a big difference.

regards,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] next CommitFest

2009-11-09 Thread Hans-Juergen Schoenig -- PostgreSQL

*snip*


One pretty major fly in the ointment is that neither Hot Standby nor
Streaming Replication has been committed or shows much sign of being
about to be committed.  I think this is bad.  These are big features
that figure to have some bugs and break some things.  If they're not
committed in time for alpha3, then there won't be any significant
testing of these prior to alpha4/beta1, at the earliest.  I think
that's likely to lead to either (1) a very long beta period followed
by a late release or (2) a buggy release.  I feel like Simon Riggs and
Fujii Masao really pulled out all the stops to get these ready in time
for the September CommitFest, and while I'm not in a hurry to break
the world, I think the sooner these can hit the tree, the better of
we'll be in terms of releasing 8.5.

Just my $0.02,

  


absolutely, we should be commit this.
we did some testing and things look stable.
also, people would most likely want to build code on top of it in be 
ready for 8.5 (support scripts, etc.). this is important in order to 
create some acceptance in user land.

this stuffs seems mature and very well thought.

just my $0.02 ...

   regards,

   hans-jürgen schönig

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] draft RFC: concept for partial, wal-based replication

2009-10-30 Thread Hans-Juergen Schoenig -- PostgreSQL

hello ...

as my day has worked out quite nicely up to know i thought to f... it up 
and post a new concept which has been requested by a customer. the goal 
is to collect some feedback, ideas and so on (not to be mixed up with 
flames).
we have funding for this and we are trying to sort out how to do it the 
best way. comments are welcome ...

note, this is a first draft i want to refine based on some comments.
here we go ...


Partial WAL Replication for PostgreSQL:
---

As of now the PostgreSQL community has provided patches and functionalities
which allow full WAL-based replication as well as hot-standby. To extend 
this

functionality and to make PostgreSQL even more suitable for enterprise
computing than it is today, we have the commitment of a sponsor to fund 
partial

replication for PostgreSQL 8.5 / 8.6.

This is the first draft of a proposal to make partial WAL-based 
replication work
and to provide and additional set of fancy features to the community 
which has

been waiting for real in-core replication for a decade or more.


Why partial replication?


In some cases people have master servers which contain enormous amounts 
of data
(XX TB or so). If more than just one replica of this data is needed it 
might
happen that different slaves are used for different purposes.  This 
implies that

not all data will be used by all maschines.
An example: Consider a server at a phone company collecting phone calls, 
billing
data, and maybe network routing data. Data is used by different 
department and

one maschine is not enough to serve all three departments. With the new
functionality proposed here we could make 3 replicas each holding just a 
group
of tables for specific tasks thus allowing people to buy cheaper 
hardware for

slaves and use more maschines instead.


Current status:
---

Hot-standy and streaming replication have been a huge leap step forward 
for the
community and what is proposed here will be an extension to those 
patches and
functionalities. This concept is NOT aimed to replace anything - it is 
mainly an

addon.


Nodes and replication filters:
--

As of 8.4 standby systems are done by creating an archive_command along 
with a

base backup. Although it is easy to do some users still reported some
difficulties due to a total misunderstanding of PITR.

The idea is to add a functionality to add slaves like this:

CREATE REPLICA node_name
   CONNECT FROM SLAVE 'connect_string'
   TRANSFER COMMAND 'command'
   [ USING replication_filter ];

'command' would be any shell script copying data from the local master 
to the
new database node called node_name. Replication filters can be used to 
make X

replicas contain the same tables. Filtersets can be created like this:

CREATE REPLICATION FILTER filter_name
   [ EMPTY | FULL ] [ INCLUDE | EXCLUDE CHANGES ];

Replication filters can be modified ...

ALTER REPLICATION FILTER filter_name RENAME TO new_filtername;
ALTER REPLICATION FILTER filter_name
   { ADD | REMOVE } { TABLE | INDEX | SEQUENCE } object;

Filter sets can be dropped like this ...

DROP REPLICATION FILTER filter_name;

Internally CREATE REPLICA would initiate a base backup to the new slave 
server
just like we would do it manually otherwise. The server would 
automatically use
the user defined 'command' to copy one file after the other to the slave 
box.
The idea is basically stolen from archive_command and friends. At this 
stage we
either copy the entire instance as we would do it with a normal base 
backup or

just what is needed (defined by the replication filter). Users would
automatically only copy data to a slave which is really needed there and 
which
matches their filter config. If the copy is done, we can register the 
new node

inside a system table and commit the transaction. Also, we can automatically
create a useful recovery.conf setup - we know how to connect from the 
slave to

the master (we can use ' CONNECT FROM SLAVE [ USING ] ' to write a proper
recovery.conf file).

Tables can easily be added or removed from a replication filter with ALTER
REPLICATION FILTER.

Replicas can be removed easily:

DROP REPLICA node_name;

Why SQL to add a node? We are convinced that this is the most simplistic 
way of

doing things.  It is the most intuitive way of doing things.  We believe it
gives users a real feeling of simplicity. The current way of doing base 
backups
should stay in place as it is - it has proven to be nice for countless 
tasks.
However, it is not suitable for managing 10 or more replicas easily. 
Especially

not when they are not full blown copies of the master.


Technical ideas:


System tables:

We suggest to always replicate the entire system catalog.  It woulde be 
a total
disaster to try some other implementation. The same applies for other 
tables - we

always replicate entire tables; no WHERE-clauses allowed when it comes

Re: [HACKERS] contrib/plantuner - enable PostgreSQL planner hints

2009-10-12 Thread Hans-Juergen Schoenig -- PostgreSQL

hi there ...

for this work i will include you in my evening prayers for at least one 
week.
i know there has been a lot of discussion about this but what you just 
posted it excellent and more important: USEFUL to many people.


i had something else in mind recently as well: virtual indexes. it would 
help people to decide whether and index would make sense if it would 
actually exist. in some cases this would make sense as well as many 
datasets are just to big to try out if an index help.s


if there was a vote whether this should be in contrib or in core: +999 
from me ...


   many thanks,

  hans


Oleg Bartunov wrote:

Hi there,

this is an announcement of our new contribution module for PostgreSQL 
- Plantuner - enable planner hints

(http://www.sai.msu.su/~megera/wiki/plantuner).

Example:

=# LOAD 'plantuner';
=# create table test(id int);
=# create index id_idx on test(id);
=# create index id_idx2 on test(id);
=# \d test
 Table public.test
 Column |  Type   | Modifiers
+-+---
 id | integer |
Indexes:
id_idx btree (id)
id_idx2 btree (id)
=# explain select id from test where id=1;
  QUERY PLAN
---
 Bitmap Heap Scan on test  (cost=4.34..15.03 rows=12 width=4)
   Recheck Cond: (id = 1)
   -  Bitmap Index Scan on id_idx2  (cost=0.00..4.34 rows=12 width=0)
 Index Cond: (id = 1)
(4 rows)
=# set enable_seqscan=off;
=# set plantuner.forbid_index='id_idx2';
=# explain select id from test where id=1;
  QUERY PLAN
--
 Bitmap Heap Scan on test  (cost=4.34..15.03 rows=12 width=4)
   Recheck Cond: (id = 1)
   -  Bitmap Index Scan on id_idx  (cost=0.00..4.34 rows=12 width=0)
 Index Cond: (id = 1)
(4 rows)
=# set plantuner.forbid_index='id_idx2,id_idx';
=# explain select id from test where id=1;
   QUERY PLAN
-
 Seq Scan on test  (cost=100.00..140.00 rows=12 width=4)
   Filter: (id = 1)
(2 rows)



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83




--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5

2009-09-23 Thread Hans-Juergen Schoenig -- PostgreSQL

Tom Lane wrote:

Josh Berkus j...@agliodbs.com writes:
  

Jeff,


Will statement_timeout not suffice for that use case?
  


  

Well, currently statement_timeout doesn't affect waiting for locks.



Sure it does.

  

And as a DBA, I don't think I'd want the same timeout for executing
queries as for waiting for a lock.



this is exactly the point it is simply an additional use case.
while statement_timeout is perfect to kick out queries which take too 
long a lock_timeout serves a totally different purpose because you will 
get a totally different error message. imagine some old 4GL terminal 
application: in this case you will hardly reach a statement_timeout 
because you will simply want to wait until things appear on your screen. 
however, you definitely don't want to wait forever if somebody keeps 
working on some product which is on stock and never finishes.


btw, this old terminal application i was talking about is exactly the 
usecase we had - this is why this patch has been made.
we are porting roughly 2500 terminal application from informix to 
postgresql. we are talking about entire factory production lines and so 
on here (the ECPG patches posted recently are for the same project, btw.).
there are countless use-cases where you want to know whether you are 
locked out or whether you are just taking too long - the message is 
totally different. the goal of the patch is to have a mechanism to make 
sure that you don't starve to death.


as far is syntax is concerned: there are good reasons for WAIT and good 
reasons for a GUC.
while the WAIT syntax is clearly for a very precise instruction for a 
very certain place in a program, a GUC is a more overall policy. i don't 
see a reason why we should not have both anyway.
a GUC has the charm that it can be assigned to roles, procedures, etc. 
nicely a WAIT clause has the charm of being incredibly precise. i can 
see good arguments for both.
the code itself is pretty simplistic - it needs no effort to be up to 
date and it does not harm anything else - it is pretty isolated.


   many thanks,

  hans

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5

2009-09-23 Thread Hans-Juergen Schoenig -- PostgreSQL

Jeff Janes wrote:

Will statement_timeout not suffice for that use case?


we tried to get around it without actually touching the core but we 
really need this functionality.
patching the core here is not the primary desire we have. it is all 
about modeling some functionality which was truly missing.


   many thanks,

  hans

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] happy birthday Tom Lane ...

2009-09-18 Thread Hans-Juergen Schoenig -- PostgreSQL

Tom,

On behalf of the entire PostgreSQL team here in Austria I want to wish 
you a happy birthday.

We hope that you fill be a vital part of PostgreSQL for many years to come.

  Best regards,

  Hans-Jürgen Schönig + team


--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] combined indexes with Gist - planner issues?

2009-08-31 Thread Hans-Juergen Schoenig -- PostgreSQL

hello everybody,

we are seriously fighting with some planner issue which seems to be 
slightly obscure to us.

we have a table which is nicely indexed (several GB in size).
i am using btree_gist operator classes to use a combined index including 
an FTI expression along with a number:


db=# \d product.t_product
  Table product.t_product
   Column | Type  |   
Modifiers   
---+---+
id| bigint| not null default 
nextval('product.t_product_id_seq'::regclass)

shop_id   | integer   |
art_number| text  |
title | text  |
description   | text  |
display_price | numeric(10,4) |

Indexes:
   t_product_pkey PRIMARY KEY, btree (id)
   idx_test gist (display_price, to_tsvector('german'::regconfig, 
(title || ' '::text) || description))
*idx_test2 gist (to_tsvector('german'::regconfig, (title || ' 
'::text) || description), display_price)*



what we basically expected here is that Postgres will scan the table 
using the index to give us the cheapest products containing the words we 
are looking for.
i am totally surprised to see that we have to fetch all products given 
the words, sort and then do the limit.
this totally kills performance because some words simply show up 
millions of times. this totally kills everything.


the plans look like this:

db=#  explain analyze SELECT art_number, title
   FROM product.t_product
   WHERE to_tsvector('german'::regconfig, (title || ' '::text) || 
description) @@ plainto_tsquery('harddisk')

   ORDER BY display_price
   LIMIT 10;
  QUERY 
PLAN  

Limit  (cost=108340.08..108340.10 rows=10 width=54) (actual 
time=1328.900..1328.909 rows=10 loops=1)
  -  Sort  (cost=108340.08..108422.48 rows=32961 width=54) (actual 
time=1328.899..1328.905 rows=10 loops=1)

Sort Key: display_price
Sort Method:  top-N heapsort  Memory: 18kB
-  Bitmap Heap Scan on t_product  (cost=2716.62..107627.80 
rows=32961 width=54) (actual time=1052.706..1328.772 rows=55 loops=1)
  Recheck Cond: (to_tsvector('german'::regconfig, ((title 
|| ' '::text) || description)) @@ plainto_tsquery('harddisk'::text))
  -  Bitmap Index Scan on idx_test2  (cost=0.00..2708.38 
rows=32961 width=0) (actual time=1052.576..1052.576 rows=55 loops=1)
Index Cond: (to_tsvector('german'::regconfig, 
((title || ' '::text) || description)) @@ plainto_tsquery('harddisk'::text))

Total runtime: 1328.942 ms
(9 rows)


runtime increases badly if words start to be more likely ...


db=#  explain analyze SELECT art_number, title
   FROM product.t_product
   WHERE to_tsvector('german'::regconfig, (title || ' '::text) || 
description) @@ plainto_tsquery('spiel')

   ORDER BY display_price
   LIMIT 10;
 QUERY 
PLAN 
--
Limit  (cost=108340.08..108340.10 rows=10 width=54) (actual 
time=33489.675..33489.682 rows=10 loops=1)
  -  Sort  (cost=108340.08..108422.48 rows=32961 width=54) (actual 
time=33489.675..33489.675 rows=10 loops=1)

Sort Key: display_price
Sort Method:  top-N heapsort  Memory: 18kB
-  Bitmap Heap Scan on t_product  (cost=2716.62..107627.80 
rows=32961 width=54) (actual time=774.923..33408.522 rows=56047 loops=1)
  Recheck Cond: (to_tsvector('german'::regconfig, ((title 
|| ' '::text) || description)) @@ plainto_tsquery('spiel'::text))
  -  Bitmap Index Scan on idx_test2  (cost=0.00..2708.38 
rows=32961 width=0) (actual time=759.078..759.078 rows=56047 loops=1)
Index Cond: (to_tsvector('german'::regconfig, 
((title || ' '::text) || description)) @@ plainto_tsquery('spiel'::text))

Total runtime: 33489.906 ms
(9 rows)

i am wondering why postgres is not able to use a combined index here?
is this some obscure thing related to gist, a logical problem or a 
planner deficiency?


ideas are welcome.

   many thanks,

  hans



--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] combined indexes with Gist - planner issues?

2009-08-31 Thread Hans-Juergen Schoenig -- PostgreSQL

Tom Lane wrote:

Hans-Juergen Schoenig -- PostgreSQL postg...@cybertec.at writes:
  
what we basically expected here is that Postgres will scan the table 
using the index to give us the cheapest products containing the words we 
are looking for.
i am totally surprised to see that we have to fetch all products given 
the words, sort and then do the limit.



I don't know why you'd find that surprising.  GIST indexes have no
support for ordering.

regards, tom lane

  


ok, i thought it would be something gist specific i was not aware of.
the golden question now is: i am looking for the cheapest products given 
a certain text in an insane amount of data.
how to do it? other quals which could narrow down the amount of data 
would not help.


i cannot see an option with regular weapons ...
maybe you can an idea how to fix core to make it work? maybe there is a 
mechanism we could need.

we really have to make this work - no matter what it takes.
we are willing to put effort into that.

   many thanks,

  hans

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bison crashes postgresql

2009-08-31 Thread Hans-Juergen Schoenig -- PostgreSQL

Andrew Dunstan wrote:



Werner Echezuria wrote:

Hi, I have a code in which I translate some code from sqlf to sql, but
when it comes to yy_parse the server crashes, I have no idea why,
because it works fine in other situations.
  


I don't understand why you're doing what you're doing this way. 
Wouldn't it be better to patch the main postgres parser and make your 
functionality first class rather than having it run via an SQL string 
and a function that calls a secondary parser?


cheers

andrew



yes, this is the thing i had in mind as well.
what is your ultimate goal?

   many thanks,

  hans


--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] combined indexes with Gist - planner issues?

2009-08-31 Thread Hans-Juergen Schoenig -- PostgreSQL

hello ...

we did some experiments with doing such a table.
the problem is if you want to allow arbitrary combinations of words 
which can be modeled perfectly with FTI.
you would instantly end up with a self join with 5 relations or so - 
which is again bad.


there are too many common words to consider doing with partly with gist 
and partly with a btree.


is there any option to adapt gist in a way that a combined index would 
make sense here?


   many thanks,

  hans




Heikki Linnakangas wrote:

Hans-Juergen Schoenig -- PostgreSQL wrote:
  

my knowledge of how gist works internally is not too extensive. any
kickstart idea would be appreciated.



If there's not too many of those common words, you can create a simple
partial b-tree index for each, and handle the less common words with the
gist index you have (you can drop the display_price column from the index).

Another idea:

Create a table containing one row for each word in each product:

CREATE TABLE t_product_word (id bigint, word text, display_price
numeric(10,4));

with triggers to keep it up-to-date. You can then create a regular two
column b-tree index on that:

CREATE INDEX idx_word_price ON t_product_word (word, display_price);

And query with:

SELECT p.art_number, p.title
   FROM t_product p INNER JOIN t_product_word pw ON p.id = pw.id
   WHERE pw.word = 'harddisk'
ORDER BY pw.display_price DESC LIMIT 10;

The t_product_word table will be huge, but with a few gigabytes of data
it should still be manageable.

  



--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tsvector extraction patch

2009-07-03 Thread Hans-Juergen Schoenig -- PostgreSQL

hello,

this patch has not made it through yesterday, so i am trying to send it 
again.

i made a small patch which i found useful for my personal tasks.
it would be nice to see this in 8.5. if not core then maybe contrib.
it transforms a tsvector to table format which is really nice for text 
processing and comparison.


test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure 
this is a good patch'));

lex   | rank
+--
good   |8
patch  |9
pretti |3
sure   |4
(4 rows)

  many thanks,

 hans

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector extraction patch

2009-07-03 Thread Hans-Juergen Schoenig -- PostgreSQL

Hans-Juergen Schoenig -- PostgreSQL wrote:

hello,

this patch has not made it through yesterday, so i am trying to send 
it again.

i made a small patch which i found useful for my personal tasks.
it would be nice to see this in 8.5. if not core then maybe contrib.
it transforms a tsvector to table format which is really nice for text 
processing and comparison.


test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty 
sure this is a good patch'));

lex   | rank
+--
good   |8
patch  |9
pretti |3
sure   |4
(4 rows)

  many thanks,

 hans




--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de

diff -dcrpN postgresql-8.4.0.old/contrib/Makefile postgresql-8.4.0/contrib/Makefile
*** postgresql-8.4.0.old/contrib/Makefile	2009-03-26 00:20:01.0 +0100
--- postgresql-8.4.0/contrib/Makefile	2009-06-29 11:03:04.0 +0200
*** WANTED_DIRS = \
*** 39,44 
--- 39,45 
  		tablefunc	\
  		test_parser	\
  		tsearch2	\
+ 		tsvcontent	\
  		vacuumlo
  
  ifeq ($(with_openssl),yes)
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/Makefile postgresql-8.4.0/contrib/tsvcontent/Makefile
*** postgresql-8.4.0.old/contrib/tsvcontent/Makefile	1970-01-01 01:00:00.0 +0100
--- postgresql-8.4.0/contrib/tsvcontent/Makefile	2009-06-29 11:20:21.0 +0200
***
*** 0 
--- 1,19 
+ # $PostgreSQL: pgsql/contrib/tablefunc/Makefile,v 1.9 2007/11/10 23:59:51 momjian Exp $
+ 
+ MODULES = tsvcontent
+ DATA_built = tsvcontent.sql
+ DATA = uninstall_tsvcontent.sql
+ 
+ 
+ SHLIB_LINK += $(filter -lm, $(LIBS))
+ 
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/tsvcontent
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c
*** postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c	1970-01-01 01:00:00.0 +0100
--- postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c	2009-06-29 11:18:35.0 +0200
***
*** 0 
--- 1,169 
+ #include postgres.h
+ 
+ #include fmgr.h
+ #include funcapi.h
+ #include miscadmin.h
+ #include executor/spi.h
+ #include lib/stringinfo.h
+ #include nodes/nodes.h
+ #include utils/builtins.h
+ #include utils/lsyscache.h
+ #include utils/syscache.h
+ #include utils/memutils.h
+ #include tsearch/ts_type.h
+ #include tsearch/ts_utils.h
+ #include catalog/pg_type.h
+ 
+ #include tsvcontent.h
+ 
+ PG_MODULE_MAGIC;
+ 
+ PG_FUNCTION_INFO_V1(tsvcontent);
+ 
+ Datum
+ tsvcontent(PG_FUNCTION_ARGS)
+ {
+ 	FuncCallContext 	*funcctx;
+ 	TupleDesc		ret_tupdesc;
+ 	AttInMetadata		*attinmeta;
+ 	int			call_cntr;
+ 	int			max_calls;
+ 	ts_to_txt_fctx		*fctx;
+ 	Datum			result[2];
+ 	bool			isnull[2] = { false, false };
+ 	MemoryContext 		oldcontext;
+ 
+ 	/* input value containing the TS vector */
+ 	TSVector	in = PG_GETARG_TSVECTOR(0);
+ 
+ 	/* stuff done only on the first call of the function */
+ 	if (SRF_IS_FIRSTCALL())
+ 	{
+ 		TupleDesc	tupdesc;
+ 		int		i, j;
+ 		char		*wepv_base;
+ 
+ 		/* create a function context for cross-call persistence */
+ 		funcctx = SRF_FIRSTCALL_INIT();
+ 
+ 		/*
+ 		 * switch to memory context appropriate for multiple function calls
+ 		 */
+ 		oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
+ 
+ 		switch (get_call_result_type(fcinfo, NULL, tupdesc))
+ 		{
+ 			case TYPEFUNC_COMPOSITE:
+ /* success */
+ break;
+ 			case TYPEFUNC_RECORD:
+ /* failed to determine actual type of RECORD */
+ ereport(ERROR,
+ 		(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 		errmsg(function returning record called in context 
+ that cannot accept type record)));
+ break;
+ 			default:
+ /* result type isn't composite */
+ elog(ERROR, return type must be a row type);
+ break;
+ 		}
+ 
+ 		/* make sure we have a persistent copy of the tupdesc */
+ 		tupdesc = CreateTupleDescCopy(tupdesc);
+ 
+ 		/*
+ 		 * Generate attribute metadata needed later to produce tuples from raw
+ 		 * C strings
+ 		 */
+ 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ 		funcctx-attinmeta = attinmeta;
+ 
+ 		/* allocate memory */
+ 		fctx = (ts_to_txt_fctx *) palloc(sizeof(ts_to_txt_fctx));
+ 
+ 		wepv_base = (char *)in + offsetof(TSVectorData, entries) + in-size * sizeof(WordEntry);
+ 		
+ 		fctx-n_tsvt = 0;
+ 		for (i = 0; i  in-size; i++)
+ 		{
+ 			if (in-entries[i].haspos)
+ 			{
+ WordEntryPosVector *wepv = (WordEntryPosVector *)
+ (wepv_base + in-entries[i].pos + SHORTALIGN(in-entries[i].len));
+ 
+ fctx-n_tsvt += wepv-npos;
+ 			}
+ 			else
+ fctx-n_tsvt++;
+ 		}
+ 
+ 		fctx-tsvt = palloc(fctx-n_tsvt * sizeof(tsvec_tuple));
+ 
+ 		for (i = 0, j = 0; i  in-size; i++)
+ 		{
+ 			int pos = in-entries[i].pos

[HACKERS] tsvector extraction patch

2009-07-02 Thread Hans-Juergen Schoenig -- PostgreSQL

hello,

i made a small patch which i found useful for my personal tasks.
it would be nice to see this in 8.5. if not core then maybe contrib.
it transforms a tsvector to table format which is really nice for text 
processing and comparison.


test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure 
this is a good patch'));

 lex   | rank
+--
good   |8
patch  |9
pretti |3
sure   |4
(4 rows)

   many thanks,

  hans

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de

diff -dcrpN postgresql-8.4.0.old/contrib/Makefile postgresql-8.4.0/contrib/Makefile
*** postgresql-8.4.0.old/contrib/Makefile	2009-03-26 00:20:01.0 +0100
--- postgresql-8.4.0/contrib/Makefile	2009-06-29 11:03:04.0 +0200
*** WANTED_DIRS = \
*** 39,44 
--- 39,45 
  		tablefunc	\
  		test_parser	\
  		tsearch2	\
+ 		tsvcontent	\
  		vacuumlo
  
  ifeq ($(with_openssl),yes)
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/Makefile postgresql-8.4.0/contrib/tsvcontent/Makefile
*** postgresql-8.4.0.old/contrib/tsvcontent/Makefile	1970-01-01 01:00:00.0 +0100
--- postgresql-8.4.0/contrib/tsvcontent/Makefile	2009-06-29 11:20:21.0 +0200
***
*** 0 
--- 1,19 
+ # $PostgreSQL: pgsql/contrib/tablefunc/Makefile,v 1.9 2007/11/10 23:59:51 momjian Exp $
+ 
+ MODULES = tsvcontent
+ DATA_built = tsvcontent.sql
+ DATA = uninstall_tsvcontent.sql
+ 
+ 
+ SHLIB_LINK += $(filter -lm, $(LIBS))
+ 
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/tsvcontent
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c
*** postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c	1970-01-01 01:00:00.0 +0100
--- postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c	2009-06-29 11:18:35.0 +0200
***
*** 0 
--- 1,169 
+ #include postgres.h
+ 
+ #include fmgr.h
+ #include funcapi.h
+ #include miscadmin.h
+ #include executor/spi.h
+ #include lib/stringinfo.h
+ #include nodes/nodes.h
+ #include utils/builtins.h
+ #include utils/lsyscache.h
+ #include utils/syscache.h
+ #include utils/memutils.h
+ #include tsearch/ts_type.h
+ #include tsearch/ts_utils.h
+ #include catalog/pg_type.h
+ 
+ #include tsvcontent.h
+ 
+ PG_MODULE_MAGIC;
+ 
+ PG_FUNCTION_INFO_V1(tsvcontent);
+ 
+ Datum
+ tsvcontent(PG_FUNCTION_ARGS)
+ {
+ 	FuncCallContext 	*funcctx;
+ 	TupleDesc		ret_tupdesc;
+ 	AttInMetadata		*attinmeta;
+ 	int			call_cntr;
+ 	int			max_calls;
+ 	ts_to_txt_fctx		*fctx;
+ 	Datum			result[2];
+ 	bool			isnull[2] = { false, false };
+ 	MemoryContext 		oldcontext;
+ 
+ 	/* input value containing the TS vector */
+ 	TSVector	in = PG_GETARG_TSVECTOR(0);
+ 
+ 	/* stuff done only on the first call of the function */
+ 	if (SRF_IS_FIRSTCALL())
+ 	{
+ 		TupleDesc	tupdesc;
+ 		int		i, j;
+ 		char		*wepv_base;
+ 
+ 		/* create a function context for cross-call persistence */
+ 		funcctx = SRF_FIRSTCALL_INIT();
+ 
+ 		/*
+ 		 * switch to memory context appropriate for multiple function calls
+ 		 */
+ 		oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
+ 
+ 		switch (get_call_result_type(fcinfo, NULL, tupdesc))
+ 		{
+ 			case TYPEFUNC_COMPOSITE:
+ /* success */
+ break;
+ 			case TYPEFUNC_RECORD:
+ /* failed to determine actual type of RECORD */
+ ereport(ERROR,
+ 		(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 		errmsg(function returning record called in context 
+ that cannot accept type record)));
+ break;
+ 			default:
+ /* result type isn't composite */
+ elog(ERROR, return type must be a row type);
+ break;
+ 		}
+ 
+ 		/* make sure we have a persistent copy of the tupdesc */
+ 		tupdesc = CreateTupleDescCopy(tupdesc);
+ 
+ 		/*
+ 		 * Generate attribute metadata needed later to produce tuples from raw
+ 		 * C strings
+ 		 */
+ 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ 		funcctx-attinmeta = attinmeta;
+ 
+ 		/* allocate memory */
+ 		fctx = (ts_to_txt_fctx *) palloc(sizeof(ts_to_txt_fctx));
+ 
+ 		wepv_base = (char *)in + offsetof(TSVectorData, entries) + in-size * sizeof(WordEntry);
+ 		
+ 		fctx-n_tsvt = 0;
+ 		for (i = 0; i  in-size; i++)
+ 		{
+ 			if (in-entries[i].haspos)
+ 			{
+ WordEntryPosVector *wepv = (WordEntryPosVector *)
+ (wepv_base + in-entries[i].pos + SHORTALIGN(in-entries[i].len));
+ 
+ fctx-n_tsvt += wepv-npos;
+ 			}
+ 			else
+ fctx-n_tsvt++;
+ 		}
+ 
+ 		fctx-tsvt = palloc(fctx-n_tsvt * sizeof(tsvec_tuple));
+ 
+ 		for (i = 0, j = 0; i  in-size; i++)
+ 		{
+ 			int pos = in-entries[i].pos;
+ 			int len = in-entries[i].len;
+ 
+ 			if (in-entries[i].haspos)
+ 			{
+ WordEntryPosVector *wepv = (WordEntryPosVector

Re: [HACKERS] Time to update list of contributors

2007-12-02 Thread FAST PostgreSQL
   Josh Berkus wrote:

 Arul Shaji
  

   Sydney, Australia.

   Rgds,
   Arul Shaji


[HACKERS] Updatable cursor doubt

2007-09-03 Thread FAST PostgreSQL

In CVS HEAD

workspace=# begin;
BEGIN
workspace=# declare cu cursor for select * from t1 for read only;
DECLARE CURSOR
workspace=# fetch cu;
 a
---
 1
(1 row)

workspace=# delete from t1 where current of cu;
DELETE 1
workspace=# commit;
COMMIT

Is this the intended behaviour? If so should we remove the 'READ ONLY' 
clause from the allowable syntax?


The documentation does not have 'READ ONLY' as part of the cursor syntax 
anymore.


Rgds,
Arul Shaji



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] CSVlog vs tabs

2007-06-17 Thread FAST PostgreSQL


Andrew Dunstan wrote:


Now that we've fixed the partial/interleaved log line issue, I have 
returned to trying toi get the CSV log patch into shape. Sadly, it still 
needs lots of work, even after Greg Smith and I both attacked it, so I 
am now going through it with a fine tooth comb.


One issue I notice is that it mangles the log message to add a tab 
character before each newline. We do this in standard text logs to make 
them more readable for humans. but the whole point of having CSV logs is 
to make them machine readable, and I'm rather inclined to think this 
sort of behaviour is both unnecessary and undesirable.  So I'm intending 
to leave it out for CSV logs.


Comments?

With tab characters in the log, the user has to only enable csvlog to 
have both machine and human readable logs. The user can use the csvlog 
always and use it for human reading and load it into the database only 
if he wants it for further analysis. That was my original intention anyway.


(Also as I wrote the patch some time ago, I forgot how the loaded log 
gets displayed after a select. If it gets displayed with the tabs, then 
isn't it desirable that way as well?)


Rgds,
Arul Shaji




cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org





---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] COPYable logs status

2007-06-11 Thread FAST PostgreSQL

Andrew Dunstan wrote:

 The CSVlog pipe is a separate pipe from the stderr pipe. Anything that
 goes to stderr now will continue to go to stderr, wherever that is.

 I like this scheme for a couple of reasons:
 . it will include the ability to tell the real end of a message
 . it will let us handle non-protocol messages (although there shouldn't
 be any in the CSVlog pipe).

Another important reason I went for two seperate pipes is that, in 
Windows, the pipe calls being blocking calls, the performance really 
deteriorates unless we increase the allocated buffer to the pipes 
dramatically.


On a rather decent machine, simply running the regression tests would 
consume a lot of resources, especially when it comes to the errors tests.


Rgds,
Arul Shaji


Andrew Dunstan wrote:



Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:
 

The idea of one pipe per process is not really workable, because it
would mean having as many pipes as backends which does not sound very
good.  But how about a mixed approach -- like have the all the backends
share a pipe, controlled by an LWLock, and the auxiliary process have a
separate pipe each?



Multiple pipes seem like a mess, and in any case the above still doesn't
work for stderr output produced by non-cooperative software (dynamic
loader for instance).

The only solution that I can see is to invent some sort of simple
protocol for the syslogger pipe.  Assume that the kernel honors PIPE_BUF
(this assumption may need proving, see other message).  We could imagine
having elog.c divvy up its writes to the pipe into chunks of less than
PIPE_BUF bytes, where each chunk carries info sufficient to let it be
reassembled.  Perhaps something on the order of

\0 \0 2-byte-length source-PID end-flag text...

The syslogger reassembles these by joining messages with the same
origination PID, until it gets one with the end-flag set.  It would need
enough code to track multiple in-progress messages.

The logger would have to also be able to deal with random text coming
down the pipe (due to aforesaid non-cooperative software).  I would be
inclined to say just take any text not preceded by \0\0 as a standalone
message, up to the next \0\0.  Long chunks of non-protocol text would
risk getting treated as multiple messages, but there's probably not a
lot of harm in that.

BTW, exactly what is the COPYable-logs code going to do with random
text?  I trust the answer is not throw it away.

   
  


The CSVlog pipe is a separate pipe from the stderr pipe. Anything that 
goes to stderr now will continue to go to stderr, wherever that is.


I like this scheme for a couple of reasons:
. it will include the ability to tell the real end of a message
. it will let us handle non-protocol messages (although there shouldn't 
be any in the CSVlog pipe).


I'll try to get a patch out for just the stderr case, which should be 
back-patchable, then adjust the CSVlog patch to use it.


I'm thinking of handling the partial lines with a small dynahash of 
StringInfo buffers, which get discarded whenever we don't have a partial 
line for the PID.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Patch queue triage

2007-05-02 Thread FAST PostgreSQL

* [PATCHES] Updateable cursors patch  /FAST PostgreSQL/

This is incomplete, and I fear at this point has to be held over to 8.4.



It is true that my original patch post said that I need to modify the 
patch to work with tidscan. Since then I have realized that this 
modification is not needed as it would have the same result as the 
'branching out from sequential scan' solution currently implemented.


I was hoping that I could discuss this with whoever picks up the patch 
for review before doing modifications if any is needed. So in my humble 
opinion, it would be great if this can be considered for 8.3 as there 
are not many modifications needed.


P.S. Only Simon commented on my original patch. Simon, do you have time 
to revisit the patch so that we could discuss this?


Rgds,
Arul Shaji



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] SOS, help me please, one problem towards the postgresql developement on windows

2007-04-29 Thread FAST PostgreSQL
I assume you are trying to create a built-in function, right?

If thats the case you may want to create an entry in pg_proc (That seems
to be missing). Search for box_add in the source code and you will get
an idea what code needs to be added where.

Rgds,
Arul Shaji


shieldy wrote:
 my postgresql source code is at c:/mingw/postgresql and instal to
 C:/msys/1.0/local/pgsql/
 I add a function to src\backend\utils\adt\geo_ops.c as the following:
 /Datum
 box_add2(PG_FUNCTION_ARGS)
 {
  BOX *box = PG_GETARG_BOX_P(0);
  Point*p = PG_GETARG_POINT_P(1);/
 
 / PG_RETURN_BOX_P(box_construct((box-high.x + 2* p-x),
   (box-low.x + 2* p-x),
   (box-high.y +2* p-y),
   (box-low.y + 2* p-y)));
 }
 /there is another similar one(this is the original one):
 /Datum
 box_add(PG_FUNCTION_ARGS)
 {
  BOX *box = PG_GETARG_BOX_P(0);
  Point*p = PG_GETARG_POINT_P(1);/
 
 / PG_RETURN_BOX_P(box_construct((box-high.x + p-x),
   (box-low.x + p-x),
   (box-high.y + p-y),
   (box-low.y + p-y)));
 }/
 And i also add the declaration to the src\include\utils\geo_decls.h
 like this:
 
 extern Datum box_add2(PG_FUNCTION_ARGS);
 
 and then I did the following like step by step:
 
 
 $ cd /c/mingw/postgresql
 $ ./configure
 
 ///as i download the alib, but don't kown where it should be put. so
 i ignore this, does it
 
 matter
 configure: error: zlib library not found
 If you have zlib already installed, see config.log for details on the
 failure.  It is possible the compiler isn't looking in the proper
 directory.
 Use --without-zlib to disable zlib support.
 $ make
 ...
 All of PostgreSQL successfully made. Ready to install.
 $ make install
 
 PostgreSQL installation complete.
 $ initdb -D /usr/local/pgsql/data   //before this i add the
 environments
 
 variableslike this:
 PGDATA=C:/msys/1.0/local/pgsql/data
 PGHOME=C:/msys/1.0/local/pgsql
 PGHOST=localhost
 PGPORT=5434
 PATH= C:/msys/1.0/local/pgsql/bin
 .
 Success. You can now start the database server using:
 
 C:\msys\1.0\local\pgsql\bin\postgres -D
 C:/msys/1.0/local/pgsql/data
 or
 C:\msys\1.0\local\pgsql\bin\pg_ctl -D
 C:/msys/1.0/local/pgsql/data -l logfile start
 
 $ pg_ctl start -l logfile
 server starting
 
 $ createdb testdb
 CREATE DATABASE
 
 then I use pgadminIII to open the database:
 just run the scripts:
 /select box_add(box '((0,0),(1,1))',point'(2,2)')/
 got:
 (3,3),(2,2)
 
 /select box_add2(box '((0,0),(1,1))',point'(2,2)')/
 got:
 *ERROR: function box_add2(box, point) does not exist
 SQL state: 42883
 advice:No function matches the given name and argument types. You
 may need to add explicit **type casts.
 chars:8*
 
 anyone know this??? why this happened? what should I do?
 thankyou very much!!!
 
  
 
 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] WIP Patch - Updateable Cursors

2007-03-01 Thread FAST PostgreSQL
On Fri, 2 Mar 2007 01:20, Simon Riggs wrote:
 On Thu, 2007-03-01 at 15:17 +1100, FAST PostgreSQL wrote:

Hi Simon,

  We are happy to provide that. If and when it comes to the final patch
  being accepted, we can send a copyright waiver mail which will put our
  source code contribution under the BSD license.

 This approach is not practically workable and is a terrible shame.

I already made a clarification on this subject yesterday.

http://archives.postgresql.org/pgsql-patches/2007-02/msg00579.php

It is for every patch we submit. Not just the final one. 

I also sent a contribution statement yesterday regarding one of my patch 
which is already pending.

http://archives.postgresql.org/pgsql-patches/2007-02/msg00581.php


 What would happen if everybody said, Well, since Fujitsu want to act
 like that, we won't grant a BSD licence on our material until they grant
 a BSD licence on theirs. Deadlock.

 How do we know that you'll ever give that waiver? What would stop you
 from making contributions right up to the last minute, receiving lots of
 useful feedback, then at the last minute pulling the patch, once you
 think its got no problems in it? If you do this, how will any of us fend
 off *our* corporate lawyers who would like to do the same (probably)? Or
 did you think the various companies on this list don't have any?

 I provided my detailed implementation thoughts on the initial proposal.
 Should I ignore posts from Fujitsu in the future because of this issue?

 Open source requires trust, not legal brinkmanship. If you're even
 thinking of submitting patches here, then it should already be clear
 that the people on this list are better friends to you than people from
 other companies who provide non-PostgreSQL-based services and products.
 If you don't believe that, it seems better not to post at all.

 I'll trust you, and hope that you'll grow to trust others back.

Of course it is. If we didn't trust the community, why would we even want to 
contribute the source code in the first place.? 

Rgds,
Arul Shaji


This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 
693 481. It is confidential to the ordinary user of the email address to which 
it was addressed and may contain copyright and/or legally privileged 
information. No one else may read, print, store, copy or forward all or any of 
it or its attachments. If you receive this email in error, please return to 
sender. Thank you.

If you do not wish to receive commercial email messages from Fujitsu Australia 
Software Technology Pty Ltd, please email [EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [PATCHES]

2007-02-28 Thread FAST PostgreSQL
On Thu, 1 Mar 2007 04:28, Bruce Momjian wrote:
 I have added this to the developer's FAQ to clarify the situtation of
 posting a patch:

 liPostgreSQL is licensed under a BSD license.  By posting a patch
 to the public PostgreSQL mailling lists, you are giving the PostgreSQL
 Global Development Group the non-revokable right to distribute your
 patch under the BSD license.  If you use code that is available under
 some other license that is BSD compatible (eg. public domain), please
 note that in your email submission./li


We are happy to do this for every patch we submit. We can add an explicit 
statement which will put our contribution under the BSD license. This 
statement will override the email signature and will be approved by the 
appropriate person.

Rgds,
Arul Shaji



 ---

 Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
   Neil Conway wrote:
   For the case in question, sure, requiring some clarification from FJ
   would be reasonable. But more broadly, my point is that I think you're
   fooling yourself if you think that requiring a disclaimer or explicit
   transfer of copyright for this *one* particular patch is likely to
   make any material difference to the overall copyright status of the
   code base.
  
   Yes, I do.  If there is an explicit claim, like an email footer or a
   copyright in the code, we do try to nail that down.
 
  AFAICT, the footer in question tries to make it illegal for us even to
  have the message in our mail archives.  If I were running the PG lists,
  I would install filters that automatically reject mails containing such
  notices, with a message like Your corporate lawyers do not deserve to
  have access to the internet.  Go away until you've acquired a clue.
 
  I fully support Bruce's demand that patches be submitted with no such
  idiocy attached.
 
  regards, tom lane
This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 
693 481. It is confidential to the ordinary user of the email address to which 
it was addressed and may contain copyright and/or legally privileged 
information. No one else may read, print, store, copy or forward all or any of 
it or its attachments. If you receive this email in error, please return to 
sender. Thank you.

If you do not wish to receive commercial email messages from Fujitsu Australia 
Software Technology Pty Ltd, please email [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] WIP Patch - Updateable Cursors

2007-02-27 Thread FAST PostgreSQL
On Wed, 28 Feb 2007 09:48, Bruce Momjian wrote:

[Added a subejct line]

 FYI, I am not going to be comfortable accepting a final patch that
 contains this email signature:

   This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN
   27 003 693 481. It is confidential to the ordinary user of the email
   address to which it was addressed and may contain copyright and/or
 -
   legally privileged information. No one else may read, print, store, copy
   or forward all or any of it or its attachments. If you receive this
   email in error, please return to s ender. Thank you.

 unless you provide additional details on your contribution of this code
 under a BSD license.

We are happy to provide that. If and when it comes to the final patch being 
accepted, we can send a copyright waiver mail which will put our source code 
contribution under the BSD license.

Rgds,
Arul Shaji



 ---

 John Bartlett wrote:
  Hi,
 
 
 
  This is the first posting to the community of the WIP patch for the
  Updatable Cursor implementation.
 
 
 
  I want to confirm that the community is satisfied that the effort to date
  is in a suitable direction and to get comments on the development to
  date.
 
 
 
  The patch is in the following state:
 
 
 
  The grammar definition is complete and 'yacc'ed to produce gram.y.c.
 
 
 
  The functions transformUpdateStmt and transformDeleteStmt have been
  updated to process the cursor name and obtain the related portal.
 
 
 
  The change to save the current tuple id (ctid) into the portal, related
  to the Fetch command has been done.
 
 
 
  The ctids relating to the Update/Delete statements' TidScan are being
  extracted to be saved in the executor.
 
 
 
  The parts in progress are to complete the saving of the ctids from the
  TidScan into a list stored in a file, plus related searching the list for
  an individual ctid obtained from the Update/Delete statements.
 
 
 
  Unstarted as yet:
 
 
 
  1)Correctly process, in the database, the Delete / Update of
  the tuple from the cursor.
 
  2)To enable the cursor name to be defined as a parameter in a
  PREPARE statement and provided as part if an EXECUTE statement.
 
 
 
  The community may wish to comment on the following issue:
 
 
 
  1)At present the file that will contain the list of ctids is going
  into a new directory called pg_ctids, analogous to pg_twophase, and also
  stored in the pg_data directory.
 
 
 
  Regards,
  John Bartlett
 
  This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN
  27 003 693 481. It is confidential to the ordinary user of the email
  address to which it was addressed and may contain copyright and/or
  legally privileged information. No one else may read, print, store, copy
  or forward all or any of it or its attachments. If you receive this email
  in error, please return to sender. Thank you.
 
  If you do not wish to receive commercial email messages from Fujitsu
  Australia Software Technology Pty Ltd, please email
  [EMAIL PROTECTED]

 [ Attachment, skipping... ]

  ---(end of broadcast)---
  TIP 9: In versions below 8.0, the planner will ignore your desire to
 choose an index scan if your joining column's datatypes do not
 match
This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 
693 481. It is confidential to the ordinary user of the email address to which 
it was addressed and may contain copyright and/or legally privileged 
information. No one else may read, print, store, copy or forward all or any of 
it or its attachments. If you receive this email in error, please return to 
sender. Thank you.

If you do not wish to receive commercial email messages from Fujitsu Australia 
Software Technology Pty Ltd, please email [EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Updateable cursors

2007-01-22 Thread FAST PostgreSQL
We are trying to develop the updateable cursors functionality into 
Postgresql. I have given below details of the design and also issues we are 
facing.  Looking forward to the advice on how to proceed with these issues.

Rgds,
Arul Shaji




 
1. Introduction
--
This is a combined proposal and design document for adding updatable 
(insensitive) cursor capability to the PostgreSQL database. 
There have already been a couple of previous proposals since 2003 for 
implementing this feature so there appears to be community interest in doing 
so. This will enable the following constructs to be processed:


UPDATE table_name SET value_list WHERE CURRENT OF cursor_name
DELETE FROM table_name WHERE CURRENT OF cursor_name

This has the effect of users being able to update or delete specific rows of 
a table, as defined by the row currently fetched into the cursor.


2. Overall Conceptual Design
-
The design is considered from the viewpoint of progression of a command 
through the various stages of processing, from changes to the file ‘gram.y’ 
to implement the actual grammar changes, through to changes in the Executor 
portion of the database architecture.

2.1 Changes to the Grammar
--
The following changes will be done to the PostgreSQL grammar:

UPDATE statement has the option ‘WHERE CURRENT OF cursor_name’ added
DELETE statement has the option ‘WHERE CURRENT OF cursor_name’ added

The cursor_name data is held in the UpdateStmt and DeleteStmt structures and 
contains just the name of the cursor.

The pl/pgsql grammar changes in the same manner.

The word CURRENT will be added to the ScanKeywords array in keywords.c.


2.2 Changes to Affected Data Structures
--
The following data structures are affected by this change: 

Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt 
structures

The Portal will contain a list of structures of relation ids and tuple ids 
relating to the tuple held in the QueryDesc structure. There will be one 
entry in the relation and tuple id list for each entry in the relation-list 
of the statement below: 

DECLARE cursor_name [WITH HOLD] SELECT FOR UPDATE OF relation-list 

The QueryDesc structure will contain the relation id and the tuple id 
relating to the tuple obtained via the FETCH command so that it can be 
propagated back to the Portal for storage in the list described above.

The UpdateStmt and DeleteStmt structures have the cursor name added so that 
the information is available for use in obtaining the portal structure 
related to the cursor previously opened via the DECLARE CURSOR request.


2.3 Changes to the SQL Parser

At present, although the FOR UPDATE clause of the DECLARE CURSOR command has 
been present in the grammar, it causes an error message later in the 
processing since cursors are currently not updatable. This now needs to 
change. The ‘FOR UPDATE’ clause has to be valid, but not the ‘FOR SHARE’ 
clause. 

The relation names that follow the ‘FOR UPDATE’ clause will be added to the 
rtable in the Query structure and identified by means of the rowMarks array. 
In the case of an updatable cursor the FOR SHARE option is not allowed 
therefore all entries in the rtable that are identified by the rowMarks array 
must relate to tables that are FOR UPDATE.

In the UPDATE or DELETE statements the ‘WHERE CURRENT OF cursor_name’ 
clause results in the cursor name being placed in the UpdateStmt or 
DeleteStmt structure. During the processing of the functions - 
transformDeleteStmt() and transformUpdateStmt() - the cursor name is used to 
obtain a pointer to the related Portal structure and the tuple affected by 
the current UPDATE or DELETE statement is extracted from the Portal, where it 
has been placed as the result of a previous FETCH request. At this point all 
the information for the UPDATE or DELETE statement is available so the 
statements can be transformed into standard UPDATE or DELETE statements and 
sent for re-write/planning/execution as usual.

2.4 Changes to the Optimizer
--
There is a need to add a TidScan node to planning UPDATE / DELETE statements 
where the statements are ‘UPDATE / DELETE at position’. This is to enable the 
tuple ids of the tuples in the tables relating to the query to be obtained. 
There will need to be a new mechanism to achieve this, as at present, a Tid 
scan is done only if there is a standard WHERE condition on update or delete 
statements to provide Tid qualifier data.


2.5 Changes to the Executor
---
There are various options that have been considered for this part of the 
enhancement. These are described in the sections below.

We would like to hear opinions on which option is the best way to go or if 
none of these is acceptable, any alternate ideas ?

Option 1  MVCC Via Continuous Searching of Database

Re: [HACKERS] Updateable cursors

2007-01-22 Thread FAST PostgreSQL
On Tue, 23 Jan 2007 15:48, Joshua D. Drake wrote:
 FAST PostgreSQL wrote:
  We are trying to develop the updateable cursors functionality into
  Postgresql. I have given below details of the design and also issues we
  are facing.  Looking forward to the advice on how to proceed with these
  issues.
 
  Rgds,
  Arul Shaji

 Would this be something that you would hope to submit for 8.3?

Yes definitely. If we can finish it before the feature freeze of course.

Rgds,
Arul Shaji


 Joshua D. Drake

  1. Introduction
  --
  This is a combined proposal and design document for adding updatable
  (insensitive) cursor capability to the PostgreSQL database.
  There have already been a couple of previous proposals since 2003 for
  implementing this feature so there appears to be community interest in
  doing so. This will enable the following constructs to be processed:
 
 
  UPDATE table_name SET value_list WHERE CURRENT OF cursor_name
  DELETE FROM table_name WHERE CURRENT OF cursor_name
 
  This has the effect of users being able to update or delete specific rows
  of a table, as defined by the row currently fetched into the cursor.
 
 
  2. Overall Conceptual Design
  -
  The design is considered from the viewpoint of progression of a command
  through the various stages of processing, from changes to the file
  ?gram.y? to implement the actual grammar changes, through to changes in
  the Executor portion of the database architecture.
 
  2.1 Changes to the Grammar
  --
  The following changes will be done to the PostgreSQL grammar:
 
  UPDATE statement has the option ?WHERE CURRENT OF cursor_name? added
  DELETE statement has the option ?WHERE CURRENT OF cursor_name? added
 
  The cursor_name data is held in the UpdateStmt and DeleteStmt structures
  and contains just the name of the cursor.
 
  The pl/pgsql grammar changes in the same manner.
 
  The word CURRENT will be added to the ScanKeywords array in keywords.c.
 
 
  2.2 Changes to Affected Data Structures
  --
  The following data structures are affected by this change:
 
  Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt
  structures
 
  The Portal will contain a list of structures of relation ids and tuple
  ids relating to the tuple held in the QueryDesc structure. There will be
  one entry in the relation and tuple id list for each entry in the
  relation-list of the statement below:
 
  DECLARE cursor_name [WITH HOLD] SELECT FOR UPDATE OF relation-list
 
  The QueryDesc structure will contain the relation id and the tuple id
  relating to the tuple obtained via the FETCH command so that it can be
  propagated back to the Portal for storage in the list described above.
 
  The UpdateStmt and DeleteStmt structures have the cursor name added so
  that the information is available for use in obtaining the portal
  structure related to the cursor previously opened via the DECLARE CURSOR
  request.
 
 
  2.3 Changes to the SQL Parser
  
  At present, although the FOR UPDATE clause of the DECLARE CURSOR command
  has been present in the grammar, it causes an error message later in the
  processing since cursors are currently not updatable. This now needs to
  change. The ?FOR UPDATE? clause has to be valid, but not the ?FOR SHARE?
  clause.
 
  The relation names that follow the ?FOR UPDATE? clause will be added to
  the rtable in the Query structure and identified by means of the rowMarks
  array. In the case of an updatable cursor the FOR SHARE option is not
  allowed therefore all entries in the rtable that are identified by the
  rowMarks array must relate to tables that are FOR UPDATE.
 
  In the UPDATE or DELETE statements the ?WHERE CURRENT OF cursor_name?
  clause results in the cursor name being placed in the UpdateStmt or
  DeleteStmt structure. During the processing of the functions -
  transformDeleteStmt() and transformUpdateStmt() - the cursor name is used
  to obtain a pointer to the related Portal structure and the tuple
  affected by the current UPDATE or DELETE statement is extracted from the
  Portal, where it has been placed as the result of a previous FETCH
  request. At this point all the information for the UPDATE or DELETE
  statement is available so the statements can be transformed into standard
  UPDATE or DELETE statements and sent for re-write/planning/execution as
  usual.
 
  2.4 Changes to the Optimizer
  --
  There is a need to add a TidScan node to planning UPDATE / DELETE
  statements where the statements are ?UPDATE / DELETE at position?. This
  is to enable the tuple ids of the tuples in the tables relating to the
  query to be obtained. There will need to be a new mechanism to achieve
  this, as at present, a Tid scan is done only if there is a standard WHERE
  condition on update or delete statements to provide Tid qualifier

[HACKERS] pg_get_domaindef()

2006-10-25 Thread FAST PostgreSQL
Hi All,

I am now trying to implement pg_get_domaindef() function which is in the TODO 
list and ran into a minor issue.

When the following command is given

CREATE DOMAIN testdomain AS text CONSTRAINT testconstraint NOT NULL;

I couldn't find the CONSTRAINT name ('testconstraint' in this case) being 
stored in the system catalog. Any idea where I can find it?

Or is it acceptable, for the above statement, for pg_get_domaindef() to 
return 

CREATE DOMAIN testdomain AS text CONSTRAINT NOT NULL;

Rgds,
Arul Shaji
This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 
693 481. It is confidential to the ordinary user of the email address to which 
it was addressed and may contain copyright and/or legally privileged 
information. No one else may read, print, store, copy or forward all or any of 
it or its attachments. If you receive this email in error, please return to 
sender. Thank you.

If you do not wish to receive commercial email messages from Fujitsu Australia 
Software Technology Pty Ltd, please email [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] PG7.5

2003-09-02 Thread postgresql
Hi all
 Can anyone tell me the approximate pg 7.5 release date?
 Thanks
Josh




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] incremental backup

2003-06-10 Thread postgresql
Hi all.

I am newer to postgresql develop, so my qestion maybe too simple.

I have noticed that we have discussed the incremental backup and PITR before.

Frankly, I am still interested in incremental backup. I am not sure
whether we can implement such function based on XLog

Since there exists the unqiue LSN in XLog, if we backup the xlog content after
the given LSN, when some error ocurrs, we use the parital xlog to restore the 
database. 

I am not familiar with the log mechinism in postgresql, maybe before we make 
incremental backup on Xlog, we should create a checkpoints in log. 
Very appreciate to any kind feedback.

Thanks



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] mvcc and lock

2003-03-23 Thread postgresql
Hi all
I have read some code on transaction part.
When the new transaction starts, it record the snapshot of database containing the 
current transaction id,etc. So depending on the snapshot
, the transaction decide which tuple is visible.
But transaction could also be implemented by lock. so I am not sure how 
the transaction is implemented, by MVCC or Lock, or by both?
In my option, when tuple is processed in readonly mode(select), MVCC is enough. but 
when tuple is changed, lock is used. I am not sure
whether the explanation is correct.
Thanks for any comments.
Best regards.
josh 



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] mvcc and lock

2003-03-18 Thread postgresql


Hi allI have read some code on transaction part.When the new 
transaction starts, it record the snapshot of database containing the current 
transaction id,etc. So depending on the snapshot, the transaction decide 
which tuple is visible.But transaction could also be implemented by lock. so 
I am not sure how the transaction is implemented, by MVCC or Lock, or by 
both?In my option, when tuple is processed in readonly mode(select), MVCC is 
enough. but when tuple is changed, lock is used. I am not sure whether the 
explanation is correct.Thanks for any comments.Best regards.josh 



[HACKERS] transaction abort

2003-03-17 Thread postgresql



Hi all.
 I have read some codes on transaction abort 
operation. When the transaction abort, it seem that 
all the tuples related in the transaction have not been deal with. it XMIN 
equals to the tuple create transaction
ID. ItsXMAX equals null. Of cource ,It make some records on 
the pg_log.
So I have one question, how the system know which tuple is valid? do it need 
the help of pg_log? That is
,we check for a tuple with XMIN is valid and the sign in pg_log means 
COMMIT?
Thank for any comments
Josh


[HACKERS]

2003-02-09 Thread postgresql
Hello to any person who is interested:
Our team are trying hard on the PostgreSQL test,including JDBC,ODBC,SQL.
Could anyone give us some instructions about SQL Conformance?
We've got NIST's SQL Test Suite V6 and tried it.But after checking its some SQL 
scripts,I found
it was not as good as the publications said.
So  I ask help for any efficiect tool to DBMS SQl test.
Thanks a lot here first.
postgresql
[EMAIL PROTECTED]
2003-01-20




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS]

2003-02-09 Thread postgresql
Hello to any person who is interested:
Our team are trying hard on the PostgreSQL test,including JDBC,ODBC,SQL.
Could anyone give us some instructions about SQL Conformance?
We've got NIST's SQL Test Suite V6 and tried it.But after checking its some SQL 
scripts,I found
it was not as good as the publications said.
So  I ask help for any efficiect tool to DBMS SQl test.
Thanks a lot here first.
postgresql
[EMAIL PROTECTED]
2003-01-20




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] about the sql conformance

2002-12-25 Thread postgresql



Hi all,
I will to make the SQL 92/99 conformance test for Postgresql, could you give 
me some advices
on which tool can meet the demand ?
Great thanks for any message.
Josh


[HACKERS] alter user problem

2002-12-19 Thread postgresql



Hi all
I have install Postgresql 7.3 with user name EDU.then I enter the psql to 
change the password for EDU using "alter user EDU with password '''it 
returns " user EDU do not exist".I check the system table pg_user, and the 
user 'EDU' actually exists.But if I take the following SQL, it 
works"create user TEST'"Alter user TEST with password '';I check 
the pg_user again, and find the user "TEST' is replace by 'test'.So I have a 
question:How I change the password for the initial user "EDU", or is 
it a bug for postgresql 7.3?Great thanks for any 
messageJosh.


[HACKERS] a problem in authority

2002-12-18 Thread postgresql



Hi, all
I have installed the Postgresql 7.3 . But I think something is wrong with 
authority.
I have made the following operations:
1. I enter the psql and run 'alter user postgres with password 
'postgres''
2. I change the pg_hba.conf and set the auth_type from 'trust' to 
'password'
3. Then I can not connect to server.
I have test connect from local or connect from host, but it just return 
'authority fail for user postgres'.
I can connect to postgresql 7.23 successfully after above steps;
Great thanks for any message.
Josh


Re: [GENERAL] [HACKERS] [Fwd: AW: More UB-Tree patent information]

2002-04-15 Thread postgresql


Hannu Krosing wrote:
 
 Have you found out _what_ exaclty is patented ?
 
 Is it just his concrete implementation of UB-Tree or something
 broader, like using one multi-dimensional index instead of multiple
 one-dimensional ones ?

(I know it is OT, please reply in private, I can summarize any reactions 
to the list ...)
 
Patents are supposed to be only applicable to an industrial application 
(with external side-effects).  So ideas in themselves are not patentable.

Anyway, this is once more a good example of the danger of software patents 
- you know what to reply when people say software patents promote 
innovation

IANAL, just my 0,02 Euro.

see also : http://www.gnu.org/philosophy/savingeurope.html (also 
interesting for non-europeans, of course !)

-- 
Tycho Fruru [EMAIL PROTECTED]
Prediction is extremely difficult. Especially about the future.
  - Niels Bohr



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]