Re: [HACKERS] jsonb problematic operators

2016-12-16 Thread Matteo Beccati
Hi,

On 12/12/2016 05:09, Craig Ringer wrote:
> Does PDO let you double question marks to escape them, writing ?? or
> \? instead of ? or anything like that?
> 
> If not, I suggest that you (a) submit a postgres patch adding
> alternative operator names for ? and ?|, and (b) submit a PDO patch to
> allow ?? or \? as an escape for ? .

For reference, my plan would be to get "\?" into PDO_pgsql for PHP 7.2.
I've tried to get it into 7.1, but I was a bit too late into the RC
process to safely do that.

Since PDO itself has no escaping yet, I'm open to suggestions wrt to the
actual escape method to use.


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] kqueue

2016-09-20 Thread Matteo Beccati

Hi,

On 16/09/2016 05:11, Thomas Munro wrote:

Still no change measurable on my laptop.  Keith, would you be able to
test this on your rig and see if it sucks any less than the last one?


I've tested kqueue-v6.patch on the Celeron NetBSD machine and numbers 
were constantly lower by about 5-10% vs fairly recent HEAD (same as my 
last pgbench runs).



Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] kqueue

2016-09-14 Thread Matteo Beccati

Hi,

On 14/09/2016 00:06, Tom Lane wrote:

I'm inclined to think the kqueue patch is worth applying just on the
grounds that it makes things better on OS X and doesn't seem to hurt
on FreeBSD.  Whether anyone would ever get to the point of seeing
intra-kernel contention on these platforms is hard to predict, but
we'd be ahead of the curve if so.

It would be good for someone else to reproduce my results though.
For one thing, 5%-ish is not that far above the noise level; maybe
what I'm measuring here is just good luck from relocation of critical
loops into more cache-line-friendly locations.


FWIW, I've tested HEAD vs patch on a 2-cpu low end NetBSD 7.0 i386 machine.

HEAD: 1890/1935/1889 tps
kqueue: 1905/1957/1932 tps

no weird surprises, and basically no differences either.


Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-27 Thread Matteo Beccati
Hi Tom,

On 27/05/2014 03:07, Tom Lane wrote:
 I've verified functionality of this patch on these scenarios:
 
 (1) --with-ossp-uuid on RHEL6, using uuid-1.6.1-10.el6.x86_64
 (2) --with-linux-uuid on RHEL6, using libuuid-2.17.2-12.14.el6_5.x86_64
 (3) --with-linux-uuid on OS X 10.9.3, Intel
 (4) --with-linux-uuid on OS X 10.4.11, PPC (hence, bigendian)
 
 I do not have a machine on which to try --with-bsd-uuid, so it's
 possible I broke that portion of Matteo's patch.  Would someone try
 that case on a FreeBSD box?

I've tested on NetBSD i386 and --with-bsd-uuid worked out of the box. I
could fire up some virtual machines with FreeBSD and other BSD flavours,
but maybe some buildfarm animals could be used for that.

I'm attaching a little patch to be applied on top of yours.

I didn't notice that buf ? 13 : 0 was raising a warning about the
condition being always true on BSD. I guess it's safe to hardcode 13 as
the argument is ignored anyway with ossp, so I've fixed that.

I've also fixed v1mc generation on linux to match the OSSP and BSD
variants and added a regression test for it.


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/
diff --git a/contrib/uuid-ossp/expected/uuid_ossp.out 
b/contrib/uuid-ossp/expected/uuid_ossp.out
index f393e86..c14db22 100644
--- a/contrib/uuid-ossp/expected/uuid_ossp.out
+++ b/contrib/uuid-ossp/expected/uuid_ossp.out
@@ -77,3 +77,18 @@ SELECT uuid_generate_v4()::text ~ 
'^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]
  t
 (1 row)
 
+DO $_$
+DECLARE
+  u text;
+  i int;
+  c int;
+BEGIN
+  FOR i in 1..32 LOOP
+u := substr(uuid_generate_v1mc()::text, 25, 2);
+EXECUTE 'SELECT x''' || u || '''::int  3' INTO c;
+IF c  3 THEN
+  RAISE WARNING 'v1mc broken';
+END IF;
+  END LOOP;
+END;
+$_$;
diff --git a/contrib/uuid-ossp/sql/uuid_ossp.sql 
b/contrib/uuid-ossp/sql/uuid_ossp.sql
index 8f22417..61a44a8 100644
--- a/contrib/uuid-ossp/sql/uuid_ossp.sql
+++ b/contrib/uuid-ossp/sql/uuid_ossp.sql
@@ -17,3 +17,20 @@ SELECT uuid_generate_v3(uuid_ns_dns(), 'www.widgets.com');
 SELECT uuid_generate_v5(uuid_ns_dns(), 'www.widgets.com');
 
 SELECT uuid_generate_v4()::text ~ 
'^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$';
+
+DO $_$
+DECLARE
+  u text;
+  i int;
+  c int;
+BEGIN
+  FOR i in 1..32 LOOP
+u := substr(uuid_generate_v1mc()::text, 25, 2);
+EXECUTE 'SELECT x''' || u || '''::int  3' INTO c;
+IF c  3 THEN
+  RAISE WARNING 'v1mc broken';
+END IF;
+  END LOOP;
+END;
+$_$;
+
diff --git a/contrib/uuid-ossp/uuid-ossp.c b/contrib/uuid-ossp/uuid-ossp.c
index bc29ade..7803dbe 100644
--- a/contrib/uuid-ossp/uuid-ossp.c
+++ b/contrib/uuid-ossp/uuid-ossp.c
@@ -460,6 +460,10 @@ uuid_generate_v1mc(PG_FUNCTION_ARGS)
uuid_t  uu;
 
uuid_generate_random(uu);
+
+   /* set IEEE802 multicast and local-admin bits */
+   ((dce_uuid_t *)uu)-node[0] |= 0x03;
+
uuid_unparse(uu, strbuf);
buf = strbuf + 24;
 #else  /* BSD */
@@ -472,7 +476,7 @@ uuid_generate_v1mc(PG_FUNCTION_ARGS)
 #endif
 
return uuid_generate_internal(UUID_MAKE_V1 | UUID_MAKE_MC, NULL,
- buf, buf ? 13 
: 0);
+ buf, 13);
 }
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-27 Thread Matteo Beccati
Hi Tom,

On 27/05/2014 15:52, Tom Lane wrote:
 Matteo Beccati p...@beccati.com writes:
 On 27/05/2014 03:07, Tom Lane wrote:
 I do not have a machine on which to try --with-bsd-uuid, so it's
 possible I broke that portion of Matteo's patch.  Would someone try
 that case on a FreeBSD box?
 
 I've tested on NetBSD i386 and --with-bsd-uuid worked out of the box.
 
 Ah, cool.  I had documented this option as only working for FreeBSD,
 but that's obviously too conservative.  Anyone know about whether it
 works on OpenBSD?

I've tried to google man uuid openbsd and I got the e2fs package
(which contains uuid/uuid.h and libuuid) instead of a man page, so I
believe that could be another use case for --with-linux-uuid.

If it's confirmed to be working, that makes two BSD-derived systems
requiring linux-uuid, so --with-e2fs-uuid or similar would be more
appropriate.


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-26 Thread Matteo Beccati
On 26/05/2014 19:31, Andres Freund wrote:
 On 2014-05-26 13:25:49 -0400, Tom Lane wrote:
 Matteo Beccati p...@beccati.com writes:
 I'm attaching v2 of the patch. Here's a list of changes from v1:

 * Restored --with-ossp-uuid. Configure tries ossp support first, then
 falls back to Linux and BSD variants
 
 Imo should be the other way round.

My goal was to avoid changing the underlying implementation if someone
has the ossp library installed. Tom is suggesting to split the configure
switch in two, so I guess that would allow the most flexibility.

 * md5.o and sha1.o are linked only when not using the ossp library
 
 The backend already has a md5 implementation, no?

Yes, but as far as I could see it can only convert a string to hash.
Making it fit the uuid use case would require concatenating the
namespace and source string before passing it to the md5 function. The
BSD system and pgcrypto implementations do not:

https://github.com/mbeccati/postgres/blob/ossp/contrib/uuid-ossp/uuid-ossp.c#L237

We could even use the system md5 and sha1 on BSD, but I've spotted at
least a difference between FreeBSD and NetBSD (sha.h vs sha1.h and
different function names) and I didn't think it was worth to pursue
that, especially if beta2 is the target.


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-25 Thread Matteo Beccati
Hi Tom,

thanks for the feedback.

On 25/05/2014 21:10, Tom Lane wrote:
 Matteo Beccati p...@beccati.com writes:
 here's the latest version of my uuid changes patch, according to
 proposal (2) from Tom in the thread about OSSP-UUID[1].
 
 Hmm ... this is not actually what I had in mind.  Unless I'm misreading
 the patch, this nukes the uuid-ossp extension entirely in favor of a
 new extension uuid (providing the same SQL functions with a different
 underlying implementation).  I don't think this works from the standpoint
 of providing compatibility for users of the existing extension.
 In particular, it'd break pg_upgrade (because of the change of the .so
 library name) as well as straight pg_dump upgrades (which would expect
 CREATE EXTENSION uuid-ossp to work).  Not to mention application code
 that might expect CREATE EXTENSION uuid-ossp to still work.
 
 Another objection is that for people for whom OSSP uuid still works fine,
 this is forcing them to adopt a new implementation whose compatibility is
 as yet unproven.
 
 What I'd rather do is preserve contrib/uuid-ossp with the same extension
 and .so name, but with two configure options that select different
 underlying implementations.

Sure, that makes sense. I wasn't actually sure it was ok to keep the
OSSP brand even though the extensions didn't use the oosp library,
hence the renaming. But I do agree upgrades wouldn't be very easy if we
don't.

 In the long run it'd be nice to migrate away from the uuid-ossp
 extension name, mostly because of the poorly-chosen use of a dash in the
 name.  But I'm not sure how we do that without breaking backwards
 compatibility, and anyway it's an entirely cosmetic thing that we can
 worry about later.
 
 Anyhow, doing it like that seems like it ought to be a pretty
 straightforward refactoring of your patch.  I could pursue that,
 or you can.

I do have a system with the ossp library installed: I'd be happy to give
it a try tomorrow morning my time, depending on my workload. I'll keep
you posted!


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] uuid-ossp (Re: [pgsql-packagers] Postgresapp 9.4 beta build ready)

2014-05-23 Thread Matteo Beccati
On 22/05/2014 21:55, Matteo Beccati wrote:
 On 22/05/2014 17:07, Tom Lane wrote:
 Well, *I* don't want to do that work.  I was hoping to find a volunteer,
 but the silence has been notable.  I think deprecation is the next step.
 
 This sounds an easy enough task to try and submit a patch, if I'm able
 to allocate enough time to work on it.
 
 I have successfully compiled the extension on a NetBSD box using a
 slightly modified version of Palle's patch. I have a few doubts though:
 
 - should we keep the extension name? If not, what would be the plan?
 - the patch also uses BSD's own md5 and sha1 implementations: for md5 I
 should be able to use pg's own core version, but I'm not sure about
 sha1, as it lives in pgcrypto. Any suggestion?

Maybe I've put the cart before the horse a little bit ;)

Anyway, BSD and Linux UUID implementations are slightly different, but I
was able to get two variants of the extension to compile on NetBSD and
Ubuntu. I don't have the necessary autoconf-fu to merge them together
though, and to make sure that they compile on the various bsd/linux
flavours.

You can find the code here:
https://github.com/mbeccati/uuid # NetBSD variant
https://github.com/mbeccati/uuid/tree/linux # Ubuntu variant

For now, I've forked just RhodiumToad's uuid-freebsd extension, but I've
made sure make works fine when cloned in the contrib folder.

* Both the variants use a copy of pgcrypto md5/sha1 implementations to
generate v3 and v5 UUIDs as porting is much easier than trying to use
the system provided ones, if any.
* I've fixed a bug in v3/v5 generation wrt endianness as the results I
was getting didn't match the RFC.
* The code is PoC quality and I haven't touched the docs/readme yet.


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] uuid-ossp (Re: [pgsql-packagers] Postgresapp 9.4 beta build ready)

2014-05-23 Thread Matteo Beccati
On 23/05/2014 10:05, Matteo Beccati wrote:
 You can find the code here:
 https://github.com/mbeccati/uuid # NetBSD variant
 https://github.com/mbeccati/uuid/tree/linux # Ubuntu variant
 
 For now, I've forked just RhodiumToad's uuid-freebsd extension, but I've
 made sure make works fine when cloned in the contrib folder.
 
 * Both the variants use a copy of pgcrypto md5/sha1 implementations to
 generate v3 and v5 UUIDs as porting is much easier than trying to use
 the system provided ones, if any.
 * I've fixed a bug in v3/v5 generation wrt endianness as the results I
 was getting didn't match the RFC.
 * The code is PoC quality and I haven't touched the docs/readme yet.

And here's my last effort w/ autoconf support:

https://github.com/mbeccati/postgres/compare/postgres:master...master

It's surely far from perfect, but maybe closer to something that can be
considered as a replacement for OSSP.

Especially I'm not that happy about the #ifdefs cluttering the code and
AC_SEARCH_LIB putting libuuid in $LIBS. Any suggestion?


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] uuid-ossp (Re: [pgsql-packagers] Postgresapp 9.4 beta build ready)

2014-05-22 Thread Matteo Beccati
On 22/05/2014 17:07, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, May 18, 2014 at 12:28 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So, having seen that proof-of-concept, I'm wondering if we shouldn't make
 an effort to support contrib/uuid-ossp with a choice of UUID libraries
 underneath it.  There is a non-OSSP set of UUID library functions
 available on Linux (libuuid from util-linux-ng).  I don't know whether
 that's at all compatible with the BSD functions, but even if it's not,
 presumably a shim for it wouldn't be much larger than the BSD patch.
 
 Well, if you want to do the work, I'm fine with that.  But if you want
 to just shoot uuid-ossp in the head, I'm fine with that, too.  As
 Peter says, perfectly reasonable alternatives are available.
 
 Well, *I* don't want to do that work.  I was hoping to find a volunteer,
 but the silence has been notable.  I think deprecation is the next step.

This sounds an easy enough task to try and submit a patch, if I'm able
to allocate enough time to work on it.

I have successfully compiled the extension on a NetBSD box using a
slightly modified version of Palle's patch. I have a few doubts though:

- should we keep the extension name? If not, what would be the plan?
- the patch also uses BSD's own md5 and sha1 implementations: for md5 I
should be able to use pg's own core version, but I'm not sure about
sha1, as it lives in pgcrypto. Any suggestion?


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Matteo Beccati
On 17/01/2012 17:50, Alvaro Herrera wrote:
 
 Excerpts from Matteo Beccati's message of mar ene 17 12:33:27 -0300 2012:
 My proof of concept archive for the hackers ML site is still online, in
 case anyone has trouble downloading the patches or just wants to have
 the full thread handy.
 
 I was going to ping you about this, because I tried it when I wrote this
 message and it got stuck waiting for response.

Hmm, works for me, e.g. the recently cited message:

http://archives.postgresql.org/message-id/4f12f9e5.3090...@dunslane.net


 Now that we've migrated the website, it's time to get back to our
 conversations about migrating archives to your stuff too.  How confident
 with Django are you?

I've never wrote a line of Python in my life, so someone else should
work on porting the web part, I'm afraid...


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Matteo Beccati
On 17/01/2012 18:10, Matteo Beccati wrote:
 On 17/01/2012 17:50, Alvaro Herrera wrote:

 Excerpts from Matteo Beccati's message of mar ene 17 12:33:27 -0300 2012:
 My proof of concept archive for the hackers ML site is still online, in
 case anyone has trouble downloading the patches or just wants to have
 the full thread handy.

 I was going to ping you about this, because I tried it when I wrote this
 message and it got stuck waiting for response.
 
 Hmm, works for me, e.g. the recently cited message:
 
 http://archives.postgresql.org/message-id/4f12f9e5.3090...@dunslane.net

Erm... I meant

http://archives.beccati.org/message-id/4f12f9e5.3090...@dunslane.net

which redirects to:

http://archives.beccati.org/pgsql-hackers/message/305925

for me.


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-17 Thread Matteo Beccati
On 16/01/2012 23:40, Alvaro Herrera wrote:
 
 Excerpts from Greg Smith's message of lun ene 16 19:25:50 -0300 2012:
 On 01/16/2012 03:48 PM, Josh Berkus wrote:
 
 I'll also point out that the process for *applying* a patch, if you
 don't subscribe to hackers and keep archives around on your personal
 machine for months, is also very cumbersome and error-prone.  Copy and
 paste from a web page?  Really?

 The most reasonable answer to this is for people to publish a git repo 
 URL in addition to the official submission of changes to the list in 
 patch form.
 
 It's expected that we'll get a more reasonable interface to attachments,
 one that will allow you to download patches separately.  (Currently,
 attachments that have mime types other than text/plain are already
 downloadable separately).

My proof of concept archive for the hackers ML site is still online, in
case anyone has trouble downloading the patches or just wants to have
the full thread handy.

All you need to do is to swap postgresql.org with beccati.org in the
message-id link:

http://archives.postgresql.org/message-id/1320343602-sup-2...@alvh.no-ip.org

-

http://archives.beccati.org/message-id/1320343602-sup-2...@alvh.no-ip.org


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-11-13 Thread Matteo Beccati

Hi Andrew,

On 13/11/2011 02:56, Andrew Dunstan wrote:

Here is a patch for that for pg_dump. The sections provided for are
pre-data, data and post-data, as discussed elsewhere. I still feel that
anything finer grained should be handled via pg_restore's --use-list
functionality. I'll provide a patch to do the same switch for pg_restore
shortly.

Adding to the commitfest.


FWIW, I've tested the patch as I've recently needed to build a custom 
splitting script for a project and the patch seemed to be a much more 
elegant solution. As far as I can tell, it works great and the output 
matches the result of my script.


The only little thing I've noticed is a missing ending ) in the --help 
message.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow substitute allocators for PGresult.

2011-11-12 Thread Matteo Beccati

On 12/11/2011 07:36, Robert Haas wrote:

On Sat, Nov 12, 2011 at 12:48 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

AIUI Kyotaro-san is just suggesting that the app should be able to
provide a substitute malloc function for use in allocating PGresult
space (and not, I think, anything else that libpq allocates internally).
Basically this would allow PGresults to be cleaned up with methods other
than calling PQclear on each one.  It wouldn't affect how you'd interact
with one while you had it.  That seems like pretty much exactly what we
want for preventing memory leaks in the backend; but is it going to be
useful for other apps?


I think it will.


Maybe I've just talking nonsense, I just have little experience hacking 
the pgsql and pdo-pgsql exstensions, but to me it would seem something 
that could easily avoid an extra duplication of the data returned by 
pqgetvalue. To me it seems a pretty nice win.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archives, attachments, etc

2010-10-29 Thread Matteo Beccati
Hi Gurjeet,

On 09/10/2010 22:54, Gurjeet Singh wrote:
 On Sat, Oct 9, 2010 at 3:30 PM, Dimitri Fontaine dimi...@2ndquadrant.fr
 mailto:dimi...@2ndquadrant.fr wrote:
 I wish our super admins would have some time to resume the work on the
 new archives infrastructure, that was about ready for integration if not
 prime time:
 
  http://archives.beccati.org/pgsql-hackers/message/276290
 
 As you see it doesn't suffer from this problem, the threading is not
 split arbitrarily, and less obvious but it runs from a PostgreSQL
 database. Yes, that means the threading code is exercising our recursive
 querying facility, as far as I understand it.
 
 
 Something looks wrong with that thread. The message text in my mails is
 missing. Perhaps that is contained in the .bin files but I can't tell as
 the link leads to 404 Not Found.

Thanks for the private email to point this thread out. I've been overly
busy lately and missed it.

I'll try to debug what happens with your message formatting as soon as I
can find some time.


Cheers
-- 
Matteo Beccati

Development  Consulting - http://www.beccati.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-02-01 Thread Matteo Beccati

On 01/02/2010 03:27, Alvaro Herrera wrote:

Matteo Beccati wrote:


Incidentally, I've just found out that the mailing lists are
dropping some messages. According to my qmail logs the AOX account
never received Joe's message yesterday, nor quite a few others:

M156252, M156259, M156262, M156273, M156275

and I've verified that it also has happened before. I don't know
why, but I'm pretty sure that my MTA was contacted only once for
those messages, while normally I get two connections (my own address
+ aox address).


Hmm, I see it here:
http://archives.postgresql.org/message-id/4B64A238.1050307%40joeconway.com
Maybe it was just delayed?


But not here:

http://archives.beccati.org/message-id/4B64A238.1050307%40joeconway.com

Anyway, I guess that on production we'll have a better way to inject 
emails into Archiveopteryx rather than relying on a email subscription, 
which seems a bit fragile. It's been ages since I last set up majordomo, 
but I guess there should be a way to also pipe outgoing messages through 
a script that performs the delivery to AOX.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-02-01 Thread Matteo Beccati

On 01/02/2010 10:26, Magnus Hagander wrote:

Does the MBOX importer support incremental loading? Because majordomo
spits out MBOX files for us already.


Unfortunately the aoximport shell command doesn't support incremental 
loading.



One option could be to use SMTP with a subscription as the primary way
(and we could set up a dedicated relaying from the mailserver for this
of course, so it's not subject to graylisting or anything like that),
and then daily or so load the MBOX files to cover anything that was
lost?


I guess we could write a script that parses the mbox and adds whatever 
is missing, as long as we keep it as a last resort if we can't make the 
primary delivery a fail proof.


My main concern is that we'd need to overcomplicate the thread detection 
algorithm so that it better deals with delayed messages: as it currently 
works, the replies to a missing message get linked to the 
grand-parent. Injecting the missing message afterwards will put it at 
the same level as its replies. If it happens only once in a while I 
guess we can live with it, but definitely not if it happens tens of 
times a day.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-02-01 Thread Matteo Beccati

On 01/02/2010 15:03, Magnus Hagander wrote:

2010/2/1 Matteo Beccatip...@beccati.com:

My main concern is that we'd need to overcomplicate the thread detection algorithm so 
that it better deals with delayed messages: as it currently works, the replies to a 
missing message get linked to the grand-parent. Injecting the missing message 
afterwards will put it at the same level as its replies. If it happens only once in a 
while I guess we can live with it, but definitely not if it happens tens of times a day.


That can potentially be a problem.

Consider the case where message A it sent. Mesasge B is a response to
A, and message C is a response to B. Now assume B is held for
moderation (because the poser is not on the list, or because it trips
some other thing), then message C will definitely arrive before
message B. Is that going to cause problems with this method?

Another case where the same thing will happen is if message delivery
of B gets for example graylisted, or is just slow from sender B, but
gets quickly delivered to the author of message A (because of a direct
CC). In this case, the author of message A may respond to it (making
message D),and this will again arrive before message B because author
A is not graylisted.

So the system definitely needs to deal with out-of-order delivery.


Hmm, it looks like I didn't factor in direct CCs when thinking about 
potential problems with the simplified algorithm. Thanks for raising that.


I'll be out of town for a few days, but I will see what I can do when I 
get back.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-31 Thread Matteo Beccati

On 30/01/2010 22:18, Joe Conway wrote:

On 01/30/2010 01:14 PM, Dimitri Fontaine wrote:

Matteo Beccatip...@beccati.com  writes:

I've been following the various suggestions. Please take a look at the
updated archives proof of concept:

http://archives.beccati.org/


I like the features a lot, and the only remarks I can think about are
bikeschedding, so I'll let it to the web team when they integrate it. It
sure looks like a when rather than an if as far as I'm concerned.

In short, +1! And thanks a lot!


+1 here too. That looks wonderful!


Thanks guys. Hopefully in the next few days I'll be able to catch up 
with Alvaro to see how we can proceed on this.


Incidentally, I've just found out that the mailing lists are dropping 
some messages. According to my qmail logs the AOX account never received 
Joe's message yesterday, nor quite a few others:


M156252, M156259, M156262, M156273, M156275

and I've verified that it also has happened before. I don't know why, 
but I'm pretty sure that my MTA was contacted only once for those 
messages, while normally I get two connections (my own address + aox 
address).



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-31 Thread Matteo Beccati

On 31/01/2010 13:45, Magnus Hagander wrote:

On Sat, Jan 30, 2010 at 22:43, Matteo Beccatip...@beccati.com  wrote:

On 30/01/2010 17:54, Alvaro Herrera wrote:

* While I don't personally care, some are going to insist that the site
works with Javascript disabled.  I didn't try but from your description
it doesn't seem like it would.  Is this easily fixable?


Date sorting works nicely even without JS, while thread sorting doesn't at
all. I've just updated the PoC so that thread sorting is not available when
JS is not available, while it still is the default otherwise. Hopefully
that's enough to keep JS haters happy.


I haven't looked at how it actually works, but the general requirement
is that it has to *work* without JS. It doesn't have to work *as
well*. That means serving up a page with zero contents, or a page that
you can't navigate, is not acceptable. Requiring more clicks to get
around the navigation and things like that, are ok.


As it currently stands, date sorting is the default and there are no 
links to the thread view, which would otherwise look empty. We can 
surely build a non-JS thread view as well, I'm just not sure if it's 
worth the effort.



* The old monthly interface /list/-mm/msg*php is not really
necessary to keep, *except* that we need the existing URLs to redirect
to the corresponding new message page.  I think we should be able to
create a database of URL redirects from the old site, using the
Message-Id URL style.  So each message accessed using the old URL style
would require two redirects, but I don't think this is a problem.  Do
you agree?


Sure. I was just hoping there was an even easier way (rescritct to month,
order by uid limit 1 offset X). I guess it wouldn't be hard to write a
script that populates a backward compatibility table. No need for double
redirects, it'd be just a matter of adding a JOIN or two to the query.


Once we go into production on this, we'll need to do some serious
thinking about the caching issues. And in any such scenario we should
very much avoid serving up the same content under different URLs,
since it'll blow away cache space for no reason - it's much better to
throw a redirct.


Yes, that was my point. A single redirect to the only URL for the message.


* We're using Subversion to keep the current code.  Is your code
version-controlled?  We'd need to import your code there, I'm afraid.


I do have a local svn repository. Given it's just a PoC that is going to be
rewritten I don't think it should live in the official repo, but if you
think id does, I'll be glad to switch.


Note that the plan is to switch pgweb to git as well. So if you just
want to push the stuff up during development so people can look at it,
register for a repository at git.postgresql.org - or just set one up
at github which is even easier.


The only reason why I used svn is that git support in netbeans is rather 
poor, or at least that was my impression. I think it won't be a problem 
to move to git, I probably just need some directions ;)



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-30 Thread Matteo Beccati

Il 19/01/2010 09:44, Magnus Hagander ha scritto:

As long as the templating is separated from the code, it doesn't
matter if it's a dedicated templating engine or PHP. The point being,
focus on the contents and interface, porting the actual
HTML-generation is likely to be easy compared to that.


I've been following the various suggestions. Please take a look at the 
updated archives proof of concept:


http://archives.beccati.org/

The PoC is now integrated with the website layout and has a working 
Mailing lists menu to navigate the available lists. The artificial 
monthly breakdown has been removed and both thread and date sorting use 
pagination instead.


The fancy tables are using the Ext JS framework as it was the only free 
one I could find that features column layout for trees. I'm not 
extremely happy about it, but it just works. Threads are loaded 
asynchronously (AJAX), while date sorting uses regular HTML tables with 
a bit of JS to get the fancy layout. This means that search engines 
still have a way to properly index all the messages.


Last but not least, it's backwards compatibile with the /message-id/* 
URI. The other one (/list/-mm/msg*.php) is implemented, but I just 
realized that it has problems dealing with the old archive weirdness 
(2009-12 shows also some messages dated aug 2009 nov 2009 or jan 2010 
for -hackers).


That said, there are still a few visual improvements to be done, but 
overall I'm pretty much satisfied.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-30 Thread Matteo Beccati

On 30/01/2010 17:54, Alvaro Herrera wrote:

Matteo Beccati wrote:

Il 19/01/2010 09:44, Magnus Hagander ha scritto:

As long as the templating is separated from the code, it doesn't
matter if it's a dedicated templating engine or PHP. The point being,
focus on the contents and interface, porting the actual
HTML-generation is likely to be easy compared to that.


I've been following the various suggestions. Please take a look at
the updated archives proof of concept:

http://archives.beccati.org/


I like this.

Sorry for being unable to get in touch with you on IM.  It's been a
hectic time here with only very few pauses.


Thanks :)

And no worries, I'm pretty sure you must be quite busy lately!


Some things:

* the list of lists and groups of lists are stored in two JSON files.
Should I send you a copy of them so that you can tweak your code to use
them?  They are generated automatically from the wwwmaster database.

* We have a bunch of templates that you could perhaps have used, if you
hadn't already written all of it ... :-(


The templates and especially the integration with the current layout 
still need to be rewritten when porting the code to python/Django, so I 
I'm not sure if it's wise to spend more time on it at this stage.


Not sure about the JSON approach either. Maybe it's something that needs 
to be further discussed when/if planning the migration of the archives 
to Archiveopteryx.




* While I don't personally care, some are going to insist that the site
works with Javascript disabled.  I didn't try but from your description
it doesn't seem like it would.  Is this easily fixable?


Date sorting works nicely even without JS, while thread sorting doesn't 
at all. I've just updated the PoC so that thread sorting is not 
available when JS is not available, while it still is the default 
otherwise. Hopefully that's enough to keep JS haters happy.



* The old monthly interface /list/-mm/msg*php is not really
necessary to keep, *except* that we need the existing URLs to redirect
to the corresponding new message page.  I think we should be able to
create a database of URL redirects from the old site, using the
Message-Id URL style.  So each message accessed using the old URL style
would require two redirects, but I don't think this is a problem.  Do
you agree?


Sure. I was just hoping there was an even easier way (rescritct to 
month, order by uid limit 1 offset X). I guess it wouldn't be hard to 
write a script that populates a backward compatibility table. No need 
for double redirects, it'd be just a matter of adding a JOIN or two to 
the query.



* We're using Subversion to keep the current code.  Is your code
version-controlled?  We'd need to import your code there, I'm afraid.


I do have a local svn repository. Given it's just a PoC that is going to 
be rewritten I don't think it should live in the official repo, but if 
you think id does, I'll be glad to switch.



Last but not least, it's backwards compatibile with the
/message-id/* URI. The other one (/list/-mm/msg*.php) is
implemented, but I just realized that it has problems dealing with
the old archive weirdness (2009-12 shows also some messages dated
aug 2009 nov 2009 or jan 2010 for -hackers).


I'm surprised about the Aug 2009 ones, but the others are explained
because the site divides the mboxes using one timezone and the time
displayed is a different timezone.  We don't really control the first
one so there's nothing to do about it; but anyway it's not really
important.


It's not a big deal, the BC-table approach will take care of those 
out-of-range messages. However there are a few messages in the hackers 
archive (and most likely others) that have wrong date headers (e.g. 
1980, 2036): we need to think what to do with them.




Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WARNING: pgstat wait timeout

2010-01-21 Thread Matteo Beccati

Il 21/01/2010 03:33, Jaime Casanova ha scritto:

On Wed, Jan 20, 2010 at 9:32 PM, Jaime Casanova
jcasa...@systemguards.com.ec  wrote:

On Wed, Jan 20, 2010 at 6:20 PM, Sergey E. Koposovm...@sai.msu.ru  wrote:

Hello hackers,

I've recently hit the message WARNING:  pgstat wait timeout with PG 8.4.2.


i see the same yesterday when initdb a freshly compiled 8.5dev +
lock_timeout patch
i thought maybe it was related to that patch and was thinking in
recompile without the patch but hadn't time, obviously i was wrong



ah! i forgot to say that it was on win32 + mingw, to confirme that
patch works fin in that os


I've seen it a few days ago with 8.5alpha3 on NetBSD when I left the 
backend running for a few days. Backend was completely inactive but the 
massage was repeated 3-4 times. Googling didn't help and I didnt' know 
how to replicate it.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-19 Thread Matteo Beccati

Il 18/01/2010 18:42, Magnus Hagander ha scritto:

On Mon, Jan 18, 2010 at 18:31, Matteo Beccatip...@beccati.com  wrote:

Il 18/01/2010 15:55, Magnus Hagander ha scritto:


If it wasn't for the fact that we're knee deep in two other major
projects for the infrastructure team right now, I'd be all over this
:-) But we really need to complete that before we put anything new in
production here.


Sure, that's completely understandable.


What I'd like to see is one that integrates with our general layouts.


Shoudln't bee too hard, but I wouldn't be very keen on spending time on
layout related things that are going to be thrown away to due the framework
and language being different from what is going to be used on production
(symfony/php vs django/python).


I don't know symfony, but as long as it's done in a template it is
probably pretty easy to move between different frameworks for the
layout part.


By default symfony uses plain PHP files as templates, but some plugins 
allow using a templating engine instead. I guess I can give them a try.



--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-18 Thread Matteo Beccati

Il 16/01/2010 14:21, Matteo Beccati ha scritto:

Il 16/01/2010 11:48, Dimitri Fontaine ha scritto:

Matteo Beccatip...@beccati.com writes:

Anyway, I've made further changes and I would say that at this point
the PoC
is feature complete. There surely are still some rough edges and a few
things to clean up, but I'd like to get your feedback once again:


[...]


Also, I'd need some help with the CTE query that was picking a wrong
plan
and led me to forcibly disable merge joins inside the application when
executing it. Plans are attached.


Sorry, not from me, still a CTE noob.


Actually the live db doesn't suffer from that problem anymore, but I've
able to reproduce the issue with a few days old backup running on a test
8.5alpha3 instance that still has a stock postgresql.conf.


Following advice from Andrew RodiumToad Gierth, I raised cpu costs 
back to the defaults (I did lower them following some tuning guide) and 
that seems to have fixed the problem.


My question now is... what next? :)


Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-18 Thread Matteo Beccati

Il 18/01/2010 15:55, Magnus Hagander ha scritto:

If it wasn't for the fact that we're knee deep in two other major
projects for the infrastructure team right now, I'd be all over this
:-) But we really need to complete that before we put anything new in
production here.


Sure, that's completely understandable.


What I'd like to see is one that integrates with our general layouts.


Shoudln't bee too hard, but I wouldn't be very keen on spending time on 
layout related things that are going to be thrown away to due the 
framework and language being different from what is going to be used on 
production (symfony/php vs django/python).



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-18 Thread Matteo Beccati

Il 18/01/2010 16:19, Dimitri Fontaine ha scritto:

Magnus Hagandermag...@hagander.net  writes:

Also, I tink one of the main issues with the archives today that
people bring up is the inability to have threads cross months. I think
that should be fixed. Basically, get rid of the grouping by month for
a more dynamic way to browse.


Clic a mail in a thread within more than one given month. See the Thread
index for this email. It's complete, for both the month. Example here:

   http://archives.beccati.org/pgsql-hackers-history/message/191438.html
   http://archives.beccati.org/pgsql-hackers-history/message/191334.html


Thanks Dimitri, you beat me to it ;)



That said, the month boundary is artificial, so maybe having a X
messages per page instead would be better?


Not sure. Having date based pages helps out reducing the set of messages 
that need to be scanned and sorted, increasing the likeliness of an 
index scan. But I'm happy to examine other alternatives too.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-16 Thread Matteo Beccati

Il 16/01/2010 11:48, Dimitri Fontaine ha scritto:

Matteo Beccatip...@beccati.com  writes:

Anyway, I've made further changes and I would say that at this point the PoC
is feature complete. There surely are still some rough edges and a few
things to clean up, but I'd like to get your feedback once again:

http://archives.beccati.org


I've been clicking around and like the speedy feeling and the Thread
index appearing under any mail. Also getting the attachments seems to be
just working™. I've also checked than this local thread works on month
boundaries, so that you're POC is in a way already better than the
current archives solution.


Thanks for the feedback.


Only missing is the search, but we have tsearch and pg_trgm masters not
far away…


I haven't even looked at it as I was under the impression that the old 
engine could still be used. If not, adding search support should be 
fairly easy.




You will find that pgsql-general and -hackers are subscribed and getting
messages live, wihle -hackers-history and -www have been imported from the
archives (about 200k and 1.5k messages respectively at 50 messages/s).


Tried clicking over there and very far in the past indexes show no
messages. Here's an example:

   http://archives.beccati.org/pgsql-hackers-history/1996-09/by/thread


Yeah, there are a few messages in the archives with a wrong date header. 
The list is generated using from min(date) to now(), so there are holes. 
At some point I'll run a few queries to fix that.




Also, I'd need some help with the CTE query that was picking a wrong plan
and led me to forcibly disable merge joins inside the application when
executing it. Plans are attached.


Sorry, not from me, still a CTE noob.


Actually the live db doesn't suffer from that problem anymore, but I've 
able to reproduce the issue with a few days old backup running on a test 
8.5alpha3 instance that still has a stock postgresql.conf.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-15 Thread Matteo Beccati

Hi everyone,

Il 14/01/2010 19:36, David Fetter ha scritto:

On Thu, Jan 14, 2010 at 03:08:22PM +0100, Matteo Beccati wrote:

Il 14/01/2010 14:39, Dimitri Fontaine ha scritto:

Matteo Beccatip...@beccati.com   writes:

Any improvements to sorting are welcome :)


...
 ARRAY[uid]
...


Thanks David, using an array rather than text concatenation is slightly 
slower and uses a bit more memory, but you've been able to convince me 
that it's The Right Way(TM) ;)


Anyway, I've made further changes and I would say that at this point the 
PoC is feature complete. There surely are still some rough edges and a 
few things to clean up, but I'd like to get your feedback once again:


http://archives.beccati.org

You will find that pgsql-general and -hackers are subscribed and getting 
messages live, wihle -hackers-history and -www have been imported from 
the archives (about 200k and 1.5k messages respectively at 50 messages/s).


Also, I'd need some help with the CTE query that was picking a wrong 
plan and led me to forcibly disable merge joins inside the application 
when executing it. Plans are attached.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/
archiveopteryx=# EXPLAIN ANALYZE WITH RECURSIVE t (mailbox, uid, date, subject, 
sender, has_attachments, parent_uid, idx, depth) AS (
  SELECT mailbox, uid, date, subject, sender, has_attachments, parent_uid, 
uid::text, 1
  FROM arc_messages
  WHERE parent_uid IS NULL AND mailbox = 17 AND date = '2007-11-01' AND date  
'2007-12-01'
  UNION ALL
  SELECT a.mailbox, a.uid, a.date, a.subject, a.sender, a.has_attachments, 
a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
  FROM t JOIN arc_messages a USING (mailbox)
  WHERE t.uid = a.parent_uid
) SELECT * FROM t ORDER BY idx ;

   QUERY PLAN

 Sort  (cost=92761.67..92769.91 rows=1647 width=121) (actual 
time=4183.736..4185.762 rows=1428 loops=1)
   Sort Key: t.idx
   Sort Method:  quicksort  Memory: 366kB
   CTE t
 -  Recursive Union  (cost=0.00..92579.09 rows=1647 width=130) (actual 
time=0.030..4173.724 rows=1428 loops=1)
   -  Index Scan using arc_messages_mailbox_parent_id_date_key on 
arc_messages  (cost=0.00..486.42 rows=567 width=94) (actual time=0.025..1.432 
rows=482 loops=1)
 Index Cond: ((mailbox = 17) AND (parent_uid IS NULL) AND (date 
= '2007-11-01 00:00:00+01'::timestamp with time zone) AND (date  '2007-12-01 
00:00:00+01'::timestamp with time zone))
   -  Merge Join  (cost=729.68..9208.61 rows=108 width=130) (actual 
time=262.120..277.819 rows=63 loops=15)
 Merge Cond: ((a.mailbox = t.mailbox) AND (a.parent_uid = 
t.uid))
 -  Index Scan using arc_messages_mailbox_parent_id_key on 
arc_messages a  (cost=0.00..6452.25 rows=193871 width=94) (actual 
time=0.018..147.782 rows=85101 loops=15)
 -  Sort  (cost=729.68..758.03 rows=5670 width=44) (actual 
time=0.403..0.559 rows=109 loops=15)
   Sort Key: t.mailbox, t.uid
   Sort Method:  quicksort  Memory: 25kB
   -  WorkTable Scan on t  (cost=0.00..22.68 rows=5670 
width=44) (actual time=0.003..0.145 rows=95 loops=15)
   -  CTE Scan on t  (cost=0.00..6.59 rows=1647 width=121) (actual 
time=0.035..4179.686 rows=1428 loops=1)
 Total runtime: 4188.187 ms
(16 rows)
archiveopteryx=# SET enable_mergejoin = false;
SET
archiveopteryx=# EXPLAIN ANALYZE WITH RECURSIVE t (mailbox, uid, date, subject, 
sender, has_attachments, parent_uid, idx, depth) AS (
  SELECT mailbox, uid, date, subject, sender, has_attachments, parent_uid, 
uid::text, 1
  FROM arc_messages
  WHERE parent_uid IS NULL AND mailbox = 17 AND date = '2007-11-01' AND date  
'2007-12-01'
  UNION ALL
  SELECT a.mailbox, a.uid, a.date, a.subject, a.sender, a.has_attachments, 
a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
  FROM t JOIN arc_messages a USING (mailbox)
  WHERE t.uid = a.parent_uid
) SELECT * FROM t ORDER BY idx ;

   QUERY PLAN

 Sort  (cost=104762.75..104770.98 rows=1647 width=121) (actual 
time=34.315..36.331 rows=1428 loops=1)
   Sort Key: t.idx
   Sort Method:  quicksort  Memory: 366kB
   CTE t
 -  Recursive Union  (cost=0.00..104580.17 rows=1647 width=130) (actual 
time=0.040..24.851 rows=1428 loops=1)
   -  Index Scan using arc_messages_mailbox_parent_id_date_key on 
arc_messages  (cost=0.00..486.42 rows=567

Re: [HACKERS] mailing list archiver chewing patches

2010-01-14 Thread Matteo Beccati

Il 14/01/2010 08:22, Matteo Beccati ha scritto:

Hi,


3) A nice set of SQL queries to return message, parts, threads,
folders based on $criteria (search, id, folder, etc)


I guess Matteo's working on that…


Right, but this is where I want to see the AOX schema imporove... In
ways like adding persistant tables for threading, which are updated by
triggers as new messages are delivered, etc. Documented queries that
show how to use CTEs, ltree, etc to get threaded views, good FTS support
(with indexes and triggers managing them), etc.


+1.

I just didn't understand how much your proposal fit into current work :)


I'm looking into it. The link I've previously sent will most likely
return a 500 error for the time being.


A quick update:

I've extended AOX with a trigger that takes care of filling a separate 
table that's used to display the index pages. The new table also stores 
threading information (standard headers + Exchange headers support) and 
whether or not the email has attachments.


Please check the updated PoC: http://archives.beccati.org/

pgsql-hackers and -general are currently subscribed, while -www only has 
2003 history imported via aoximport (very fast!).


BTW, I've just noticed a bug in the attachment detection giving false 
positives, but have no time to check now.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-14 Thread Matteo Beccati

Il 14/01/2010 14:39, Dimitri Fontaine ha scritto:

Matteo Beccatip...@beccati.com  writes:

I've extended AOX with a trigger that takes care of filling a separate table
that's used to display the index pages. The new table also stores threading
information (standard headers + Exchange headers support) and whether or not
the email has attachments.

Please check the updated PoC: http://archives.beccati.org/


Looks pretty good, even if some thread are still separated (this one for
example), and the ordering looks strange.


This one is separated as the first one is not in the archive yet, thus 
to the system there are multiple parent messages. It shouldn't happen 
with full archives. About sorting, here's the query I've used (my first 
try with CTEs incidentally):


WITH RECURSIVE t (mailbox, uid, date, subject, sender, has_attachments, 
parent_uid, idx, depth) AS (
  SELECT mailbox, uid, date, subject, sender, has_attachments, 
parent_uid, uid::text, 1

  FROM arc_messages
  WHERE parent_uid IS NULL AND mailbox = 15
  UNION ALL
  SELECT a.mailbox, a.uid, a.date, a.subject, a.sender, 
a.has_attachments, a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1

  FROM t JOIN arc_messages a USING (mailbox)
  WHERE t.uid = a.parent_uid
) SELECT * FROM t ORDER BY idx

Any improvements to sorting are welcome :)


Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-14 Thread Matteo Beccati

Il 14/01/2010 14:46, Dave Page ha scritto:

On Thu, Jan 14, 2010 at 7:09 PM, Dimitri Fontaine
dfonta...@hi-media.com  wrote:

Matteo Beccatip...@beccati.com  writes:

I've extended AOX with a trigger that takes care of filling a separate table
that's used to display the index pages. The new table also stores threading
information (standard headers + Exchange headers support) and whether or not
the email has attachments.

Please check the updated PoC: http://archives.beccati.org/


Looks pretty good, even if some thread are still separated (this one for
example), and the ordering looks strange.

Seems to be right on tracks, that said :)


Yup.

Matteo - Can you try loading up a lot more of the old mbox files,
particularly the very early ones from -hackers? It would be good to
see how it copes under load with a few hundred thousand messages in
the database.


Sure, I will give it a try in the evening or tomorrow.


Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-14 Thread Matteo Beccati

Il 14/01/2010 15:47, Dimitri Fontaine ha scritto:

Matteo Beccatip...@beccati.com  writes:

WITH RECURSIVE t (mailbox, uid, date, subject, sender, has_attachments,
parent_uid, idx, depth) AS (
   SELECT mailbox, uid, date, subject, sender, has_attachments, parent_uid,
uid::text, 1
   FROM arc_messages
   WHERE parent_uid IS NULL AND mailbox = 15
   UNION ALL
   SELECT a.mailbox, a.uid, a.date, a.subject, a.sender, a.has_attachments,
a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
   FROM t JOIN arc_messages a USING (mailbox)
   WHERE t.uid = a.parent_uid
) SELECT * FROM t ORDER BY idx

Any improvements to sorting are welcome :)


What I'd like would be to have it sorted by activity, showing first the
thread which received the later messages. I'm yet to play with CTE and
window function myself so without a database example to play with I
won't come up with a nice query, but I guess a more educated reader will
solve this without a sweat, as it looks easier than sudoku-solving,
which has been done already :)


Eheh, that was my first try as well. CTEs look very nice even though I'm 
not yet very comfortable with the syntax. Anyway both for date and 
thread indexes sort is the other way around, with newer posts/threads at 
the bottom. Again I'll give it a try as soon as I find time to work 
again on it.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-13 Thread Matteo Beccati

Hi,


3) A nice set of SQL queries to return message, parts, threads,
folders based on $criteria (search, id, folder, etc)


I guess Matteo's working on that…


Right, but this is where I want to see the AOX schema imporove... In
ways like adding persistant tables for threading, which are updated by
triggers as new messages are delivered, etc.  Documented queries that
show how to use CTEs, ltree, etc to get threaded views, good FTS support
(with indexes and triggers managing them), etc.


+1.

I just didn't understand how much your proposal fit into current work :)


I'm looking into it. The link I've previously sent will most likely 
return a 500 error for the time being.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] planner or statistical bug on 8.5

2010-01-12 Thread Matteo Beccati

Il 12/01/2010 08:55, Pavel Stehule ha scritto:

I checked query and I was surprised with very strange plan:

postgres=# explain select  a, b from a,b,c;
   QUERY PLAN
---
  Nested Loop  (cost=0.00..276595350.00 rows=1382400 width=8)
-   Nested Loop  (cost=0.00..115292.00 rows=576 width=8)
  -   Seq Scan on a  (cost=0.00..34.00 rows=2400 width=4)
  -   Materialize  (cost=0.00..82.00 rows=2400 width=4)
-   Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
-   Materialize  (cost=0.00..82.00 rows=2400 width=0)
  -   Seq Scan on c  (cost=0.00..34.00 rows=2400 width=0)
(7 rows)


It doesn't surprise me. Tables are empty, thus get a default non-0 row 
estimate, which happens to be 2400:


test=# create table a (a int);
CREATE TABLE
test=# ANALYZE a;
ANALYZE
test=# EXPLAIN SELECT * from a;
 QUERY PLAN
-
 Seq Scan on a  (cost=0.00..14.80 rows=2400 width=4)
(1 row)


That said, 2400^3 (cross join of 3 tables) == 1382400


Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-12 Thread Matteo Beccati

Il 12/01/2010 10:30, Magnus Hagander ha scritto:

The problem is usually with strange looking emails with 15 different
MIME types. If we can figure out the proper way to render that, the
rest really is just a SMOP.


Yeah, I was expecting some, but all the message I've looked at seemed to 
be working ok.



(BTW, for something to actually be used In Production (TM), we want
something that uses one of our existing frameworks. So don't go
overboard in code-wise implementations on something else - proof of
concept on something else is always ok, of course)


OK, that's something I didn't know, even though I expected some kind of 
limitations. Could you please elaborate a bit more (i.e. where to find 
info)?


Having played with it, here's my feedback about AOX:

pros:
- seemed to be working reliably;
- does most of the dirty job of parsing emails, splitting parts, etc
- highly normalized schema
- thread support (partial?)

cons:
- directly publishing the live email feed might not be desirable
- queries might end up being a bit complicate for simple tasks
- might be not easy to add additional processing in the workflow



So just to put this into perspective and give anyone paying attention
an idea of the pain that lies ahead should they decide to work on
this:

- We need to import the old archives (of which there are hundreds of
thousands of messages, the first few years of which have, umm, minimal
headers.


Anyone having a local copy of this in his mailboxes? At some point there
were some NNTP gateway, so maybe there's a copy this way.


We have MBOX files.

IIRC, aox has an import function that can read MBOX files. The
interesting thing is what happens with the really old files that don't
have complete headers.

I don't think you can trust the NNTP gateway now or in the past,
messages are sometimes lost there. The mbox files are as complete as
anything we'll ever get.


Importing the whole pgsql-www archive with a perl script that bounces 
messages via SMTP took about 30m. Maybe there's even a way to skip SMTP, 
I haven't looked into it that much.



- We need to generate thread indexes


We have CTEs :)


Right. We still need the threading information, so we have something
to use our CTEs on :-)

But I assume that AOX already does this?


there are thread related tables and they seem to get filled when a SORT 
IMAP command is issued, however I haven't found a way to get the 
hierarchy out of them.


What that means is that we'd need some kind of post processing to 
populate a thread hierarchy.


If there isn't a fully usable thread hierarchy I was more thinking to 
ltree, mainly because I've successfully used it in past and I haven't 
had enough time yet to look at CTEs. But if performance is comparable I 
don't see a reason why we shouldn't use them.



- We need to re-generate the original URLs for backwards compatibility


I guess the message-id one ain't the tricky one... and it should be
possible to fill a relation table like
  monharc_compat(message_id, list, year, month, message_number);


Yeah. It's not so hard, you can just screen-scrape the current
archives the same way the search server does.


Definitely an easy enough task.

With all that said, I can't promise anything as it all depends on how 
much spare time I have, but I can proceed with the evaluation if you 
think it's useful. I have a feeling that AOX is not truly the right tool 
for the job, but we might be able to customise it to suit our needs. Are 
there any other requirements that weren't specified?



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-12 Thread Matteo Beccati

Il 12/01/2010 19:54, Magnus Hagander ha scritto:

On Tue, Jan 12, 2010 at 18:34, Dave Pagedp...@pgadmin.org  wrote:

On Tue, Jan 12, 2010 at 10:24 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Joshua D. Drakej...@commandprompt.com  writes:

On Tue, 2010-01-12 at 10:24 +0530, Dave Page wrote:

So just to put this into perspective and give anyone paying attention
an idea of the pain that lies ahead should they decide to work on
this:

- We need to import the old archives (of which there are hundreds of
thousands of messages, the first few years of which have, umm, minimal
headers.
- We need to generate thread indexes
- We need to re-generate the original URLs for backwards compatibility

Now there's encouragement :-)



Or, we just leave the current infrastructure in place and use a new one
for all new messages going forward. We shouldn't limit our ability to
have a decent system due to decisions of the past.


-1.  What's the point of having archives?  IMO the mailing list archives
are nearly as critical a piece of the project infrastructure as the CVS
repository.  We've already established that moving to a new SCM that
fails to preserve the CVS history wouldn't be acceptable.  I hardly
think that the bar is any lower for mailing list archives.

Now I think we could possibly skip the requirement suggested above for
URL compatibility, if we just leave the old archives on-line so that
those URLs all still resolve.  But if we can't load all the old messages
into the new infrastructure, it'll basically be useless for searching
purposes.

(Hmm, re-reading what you said, maybe we are suggesting the same thing,
but it's not clear.  Anyway my point is that Dave's first two
requirements are real.  Only the third might not be.)


The third actually isn't actually that hard to do in theory. The
message numbers are basically the zero-based position in the mbox
file, and the rest of the URL is obvious.


The third part is trivial. The search system already does 95% of it.
I've already implemented exactly that kind of redirect thing on top of
the search code once just as a poc, and it was less than 30 minutes of
hacking. Can't seem to find the script ATM though, but you get the
idea.

Let's not focus on that part, we can easily solve that.


Agreed. That's the part that worries me less.


Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-12 Thread Matteo Beccati

Il 12/01/2010 21:04, Magnus Hagander ha scritto:

On Tue, Jan 12, 2010 at 20:56, Matteo Beccatip...@beccati.com  wrote:

Il 12/01/2010 10:30, Magnus Hagander ha scritto:


The problem is usually with strange looking emails with 15 different
MIME types. If we can figure out the proper way to render that, the
rest really is just a SMOP.


Yeah, I was expecting some, but all the message I've looked at seemed to be
working ok.


Have you been looking at old or new messages? Try grabbing a couple of
MBOX files off archives.postgresql.org from several years back, you're
more likely to find weird MUAs then I think.


Both. pgsql-hacker and -general are subscribed and getting new emails 
and pgsql-www is just an import of the archives:


http://archives.beccati.org/pgsql-www/by/date (sorry, no paging)

(just fixed a 500 error that was caused by the fact that I've been 
playing with the db a bit and a required helper table was missing)



(BTW, for something to actually be used In Production (TM), we want
something that uses one of our existing frameworks. So don't go
overboard in code-wise implementations on something else - proof of
concept on something else is always ok, of course)


OK, that's something I didn't know, even though I expected some kind of
limitations. Could you please elaborate a bit more (i.e. where to find
info)?


Well, the framework we're moving towards is built on top of django, so
that would be a good first start.

There is also whever the commitfest thing is built on, but I'm told
that's basically no framework.


I'm afraid that's outside on my expertise. But I can get as far as 
having a proof of concept and the required queries / php code.



Having played with it, here's my feedback about AOX:

pros:
- seemed to be working reliably;
- does most of the dirty job of parsing emails, splitting parts, etc
- highly normalized schema
- thread support (partial?)


A killer will be if that thread support is enough. If we have to build
that completely ourselves, it'll take a lot more work.


Looks like we need to populate a helper table with hierarchy 
information, unless Ahijit has a better idea and knows how to get it 
from the aox main schema.



cons:
- directly publishing the live email feed might not be desirable


Why not?


The scenario I was thinking at was the creation of a static snapshot and 
potential inconsistencies that might occur if the threads get updated 
during that time.



- queries might end up being a bit complicate for simple tasks


As long as we don't have to hit them too often, which is solve:able
with caching. And we do have a pretty good RDBMS to run the queries on
:)


True :)


I don't think you can trust the NNTP gateway now or in the past,
messages are sometimes lost there. The mbox files are as complete as
anything we'll ever get.


Importing the whole pgsql-www archive with a perl script that bounces
messages via SMTP took about 30m. Maybe there's even a way to skip SMTP, I
haven't looked into it that much.


Um, yes. There is an MBOX import tool.


Cool.



With all that said, I can't promise anything as it all depends on how much
spare time I have, but I can proceed with the evaluation if you think it's
useful. I have a feeling that AOX is not truly the right tool for the job,
but we might be able to customise it to suit our needs. Are there any other
requirements that weren't specified?


Well, I think we want to avoid customizing it. Using a custom
frontend, sure. But we don't want to end up customizing the
parser/backend. That's the road to unmaintainability.


Sure. I guess my wording wasn't right... I was more thinking about 
adding new tables, materialized views or whatever else might be missing 
to make it fit out purpose.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-11 Thread Matteo Beccati

Hi,

Il 11/01/2010 11:18, Dimitri Fontaine ha scritto:

AOX is already a database backed email solution, offering an archive
page with searching. I believe the searching is baked by tsearch
indexing. That's why I think it'd be suitable.

They already archive and offer search over one of our mailing lists, and
from there it seems like we'd only miss the user interface bits:

   http://archives.aox.org/archives/pgsql-announce

I hope the UI bits are not the most time demanding one.

Is there someone with enough time to install aox somewhere and have it
subscribed to our lists?


I recall having tried AOX a long time ago but I can't remember the 
reason why I was not satisfied. I guess I can give another try by 
setting up a test ML archive.




My daugher was born yesterday and I'm having a bit of a calm before the
storm because she's not coming home until Tuesday or so (at this time of
the day, that is, because I have to take care of the other daughter).
I'll be probably away for (at least) a week when she does; and I'll
probably have somewhat of a shortage of spare time after that.


BTW, congrats Alvaro!


Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-11 Thread Matteo Beccati

Il 11/01/2010 12:58, Dave Page ha scritto:

On Mon, Jan 11, 2010 at 5:23 PM, Matteo Beccatip...@beccati.com  wrote:

I recall having tried AOX a long time ago but I can't remember the reason
why I was not satisfied. I guess I can give another try by setting up a test
ML archive.


I tried it too, before I started writing the new prototype archiver
from scratch. I too forget why I gave up on it, but it was a strong
enough reason for me to start coding from scratch.

BTW, we only need to replace the archiver/display code. The search
works well already.


It took me no more than 10 minutes to set up AOX and hook it up to a 
domain. An email account is now subscribed to the hackers ML.


I'll try to estimate how hard it could be to write a web app that 
displays the archive from the db, even though I'm not sure that this is 
a good way to proceed.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-11 Thread Matteo Beccati

Il 11/01/2010 15:00, Abhijit Menon-Sen ha scritto:

I'll keep this short: Oryx, the company behind Archiveopteryx (aox), is
no longer around, but the software is still maintained. The developers
(myself included) are still interested in keeping it alive. It's been a
while since the last release, but it'll be ready soon. If you're having
any sort of problems with it, write to me, and I'll help you.


That's good news indeed for the project, AOX seems to be working fine on 
my server. I've had a few IMAP glitches, but it seems to live happily 
with my qmail and stores the emails on the db, fulfilling my current needs.


So, I've decided to spend a bit more time on this and here is a proof of 
concept web app that displays mailing list archives reading from the AOX 
database:


http://archives.beccati.org/

Please take it as an exercise I've made trying to learn how to use 
symfony this afternoon. It's not feature complete, nor probably very 
scalable, but at least it features attachment download ;)


http://archives.beccati.org/pgsql-hackers/message/37


Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PQescapeByteaConn and the new hex encoding

2009-12-25 Thread Matteo Beccati

Hi everyone,

I've been playing with the 8.5alpha3 in the last few days. Among other 
things, I'm making sure that the pgsql PHP extensions still work 
correctly with the new version. It would seems so, as all the errors in 
the standard pgsql extension test suite come from the fact that 
PQescapeByteaConn now defaults to the new hex format when connected to 
a 8.5+ server, which is cool. It's just a matter of updating the tests.


However, before taking a look at the actual code and understanding its 
behaviour, I tried using SET bytea_output = 'escape' and I was 
expecting PQescapeByteaConn to honour it. Not sure if changing the 
current behaviour is at all possible, desirable and really worth it, but 
I'm going to hold the patches to the php test suite until I get some 
feedback here.


Thoughts?


Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQescapeByteaConn and the new hex encoding

2009-12-25 Thread Matteo Beccati

Il 25/12/2009 18:54, Tom Lane ha scritto:

Matteo Beccatip...@beccati.com  writes:

However, before taking a look at the actual code and understanding its
behaviour, I tried using SET bytea_output = 'escape' and I was
expecting PQescapeByteaConn to honour it.


Why?  PQescapeByteaConn's charter is to produce something that will work
on the given connection, no more and no less.


Makes complete sense :)

I was just trying to find a way to get the PHP function pg_escape_bytea 
(which uses PQescapeByteaConn if available) to generate a backwards 
compatible escaped string. It's probably just a corner case though and 
it can be dealt with at the client side, if necessary.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PHP and PostgreSQL 8.5 compatibility

2009-12-25 Thread Matteo Beccati

Hi everyone,

I'm glad to announce that the pgsql and PDO PHP extensions test suites 
are now passing when used with 8.5-cvs. Mostly it was just a matter of 
updating the tests themselves, but a bug in PDO_PgSQL dealing with the 
new hex format affecting only the 5.2.x branch was fixed during the 
process[1].


This means that most of the PHP applications should work fine with 8.5 
when running recent enough PHP versions. The few that are using both PDO 
and bytea fields will require a switch to 5.3 (or 5.2.13 whenever it 
comes out).



[1] http://bugs.php.net/50575


Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ERROR: out of memory when using aggregates over a partitioned table

2009-05-04 Thread Matteo Beccati
Hi everyone,

I'm unexpectedly getting out of memory error both with 8.3.3 and
8.4beta1 when doing something as simple as:

SELECT id, COUNT(*) AS counter, MAX(last_modified) AS last_modified FROM
foo GROUP BY id;

where foo is a partitioned table and id is a uuid column.

It looks like the HashAggregate estimate is set to a default of 200 even
though ndistinct in each partition is averaging at -0.59. As RhodiumToad
pointed out there's a comment explaining the behaviour:

 * XXX This means the Var represents a column of an append
 * relation. Later add code to look at the member relations and
 * try to derive some kind of combined statistics?

I just wanted to raise it as something that might happen to those using
partitions as it's very likely that a partitioned table is bigger than
the available RAM. However I don't think it happens very often that one
needs to run an aggregate query on it. I just needed it to populate a
separate table that will be kept up to date via triggers.

Here's the EXPLAIN output:

 HashAggregate  (cost=1344802.32..1344805.32 rows=200 width=24)
   -  Append  (cost=0.00..969044.47 rows=50101047 width=24)
 -  Seq Scan on foo  (cost=0.00..16.60 rows=660 width=24)
 -  Seq Scan on part_0 foo  (cost=0.00..60523.89 rows=3129289
width=24)
 -  Seq Scan on part_1 foo  (cost=0.00..60555.37 rows=3130937
width=24)
 -  Seq Scan on part_2 foo  (cost=0.00..60532.17 rows=3129717
width=24)
 -  Seq Scan on part_3 foo  (cost=0.00..60550.86 rows=3130686
width=24)
 -  Seq Scan on part_4 foo  (cost=0.00..60545.07 rows=3130407
width=24)
 -  Seq Scan on part_5 foo  (cost=0.00..60579.93 rows=3131393
width=24)
 -  Seq Scan on part_6 foo  (cost=0.00..60566.70 rows=3131470
width=24)
 -  Seq Scan on part_7 foo  (cost=0.00..60610.66 rows=3133766
width=24)
 -  Seq Scan on part_8 foo  (cost=0.00..60546.67 rows=3129667
width=24)
 -  Seq Scan on part_9 foo  (cost=0.00..60509.92 rows=3128592
width=24)
 -  Seq Scan on part_a foo  (cost=0.00..60581.25 rows=3132225
width=24)
 -  Seq Scan on part_b foo  (cost=0.00..60552.81 rows=3130781
width=24)
 -  Seq Scan on part_c foo  (cost=0.00..60621.15 rows=3134315
width=24)
 -  Seq Scan on part_d foo  (cost=0.00..60714.26 rows=3139126
width=24)
 -  Seq Scan on part_e foo  (cost=0.00..60552.85 rows=3130785
width=24)
 -  Seq Scan on part_f foo  (cost=0.00..60484.31 rows=3127231
width=24)


Cheers
--
Matteo Beccati
http://www.openx.org/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New trigger option of pg_standby

2009-03-26 Thread Matteo Beccati

Hi,

Guillaume Smet wrote:

On Thu, Mar 26, 2009 at 2:51 AM, Fujii Masao masao.fu...@gmail.com wrote:

What does the default mean? You mean that new trigger should use
the existing trigger option character (-t)?


Yes, that's my point.

I understand it seems weird to switch the options but I'm pretty sure
a lot of persons currently using -t would be surprised by the current
behaviour. Moreover playing all the remaining WALs before starting up
should be the most natural option when people are looking in the help.

That said, it would be nice to hear from people really using
pg_standby to know if they understand how it works now and if it's
what they intended when they set it up.


My fault not RTFM well enough, but I was surprised finding out that -t 
is working like that.


+1 for me to switch -t to the new behaviour.


Cheers

--
Matteo Beccati

OpenX - http://www.openx.org

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] BUG #4689: Expanding the length of a VARCHAR column should not induce a table rewrite

2009-03-05 Thread Matteo Beccati
Guillaume Smet ha scritto:
 On Wed, Mar 4, 2009 at 11:50 AM, Peter Eisentraut pete...@gmx.net wrote:
 The question is how you want to implement this in a data type independent
 fashion.  You can't assume that increasing the typmod is a noop for all data
 types.
 
 Sure. See my previous answer on -hackers (I don't think this
 discussion belong to -bugs) and especially the discussion in the
 archives about Jonas' patch.

I recently had a similar problem when I added some domains to the
application. ALTER TABLE ... TYPE varchar_dom was leading to a full
table rewrite even though the underlying type definition were exactly
the same (i.e. varchar(64)). I can live with it, but I suppose this fix
might be related to the varlen one.


Cheers

-- 
Matteo Beccati

OpenX - http://www.openx.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL failing to acquire locks on pg_listen

2009-02-12 Thread Matteo Beccati
Hi Tom,

 Given I was using 8.3, it seemed quite right to set the reset statement
 to ABORT; DISCARD ALL. Everything works fine, until a load spike
 happens and pgpool-II reset queries start to lag behind, with DISCARD
 ALL failing to acquire an exclusive locks on the pg_listen system table,
 although the application isn't using any LISTEN/NOTIFY. The reason was
 not obvious to me, but looking at the man page explained a lot: DISCARD
 ALL also performs an UNLISTEN *.
 
 Seems like we could/should fix UNLISTEN * to not do anything if it is
 known that the current backend never did any LISTENs.

Here's my proposed patch, both for HEAD and 8.3:

http://www.beccati.com/misc/pgsql/async_unlisten_skip_HEAD.patch
http://www.beccati.com/misc/pgsql/async_unlisten_skip_REL8_3_STABLE.patch

I tried to write a regression test, but couldn't find a suitable way to
get the regression framework cope with trace_notify printing the backend
pid. I even tried to add a @backend_pid@ variable to pg_regress, but
soon realised that the pid is not available to psql when variable
substitution happens.

So, here's the output of some tests I made:

http://www.beccati.com/misc/pgsql/async_unlisten_skip.out

Note: DISCARD doesn't produce any debug output, because the guc
variables are being reset before the Async_UnlistenAll is called.


Cheers

-- 
Matteo Beccati

OpenX - http://www.openx.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL failing to acquire locks on pg_listen

2009-02-12 Thread Matteo Beccati

Tom Lane ha scritto:

Matteo Beccati p...@beccati.com writes:

Seems like we could/should fix UNLISTEN * to not do anything if it is
known that the current backend never did any LISTENs.



Here's my proposed patch, both for HEAD and 8.3:


This seems a bit overcomplicated.  I had in mind something like this...


Much easier indeed... I didn't notice the unlistenExitRegistered variable.


Cheers

--
Matteo Beccati

OpenX - http://www.openx.org

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL failing to acquire locks on pg_listen

2009-02-12 Thread Matteo Beccati
Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Feb 12, 2009 at 2:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Just for completeness, I attach another form of the patch that I thought
 about for a bit.  This adds the ability for UNLISTEN ALL to revert the
 backend to the state where subsequent UNLISTENs don't cost anything.
 This could be of value in a scenario where you have pooled connections
 and just a small fraction of the client threads are using LISTEN.  That
 seemed like kind of an unlikely use-case though.  The problem is that
 this patch adds some cycles to transaction commit/abort for everyone,
 whether they ever use LISTEN/UNLISTEN/DISCARD or not.  It's not a lot of
 cycles, but even so I'm thinking it's not a win overall.  Comments?
 
 This is so lightweight I'd be inclined to go for it, even if the use
 case is pretty narrow.  Do you think you can actually construct a
 benchmark where the difference is measurable?
 
 Almost certainly not, but a cycle saved is a cycle earned ...
 
 The real problem I'm having with it is that I don't believe the
 use-case.  The normal scenario for a listener is that you LISTEN and
 then you sit there waiting for events.  In the above scenario, a client
 thread would only be able to receive events when it actively had control
 of its pool connection; so it seems like it would be at risk of missing
 things when it didn't.  It seems much more likely that you'd design the
 application so that listening clients aren't pooled but are listening
 continuously.  The guys sending NOTIFY events might well be pooled, but
 they're not the issue.
 
 If someone can show me a plausible use-case that gets a benefit from
 this form of the patch, I don't have a problem with making other people
 pay a few cycles for that.  I'm just fearing that nobody would get a win
 at all, and then neither the cycles nor the extra complexity would give
 us any benefit.  (The extra hooks into xact.c are actually bothering me
 as much as the cycles.  Given that we're intending to throw all this
 code away and reimplement LISTEN/NOTIFY completely pretty soon, I'd just
 as soon keep down the number of contact points with the rest of the
 system.)

Imagine a web application interacting with a deamon using LISTEN/NOTIFY.
It happened in past to me to build one, so I guess it could be a fairly
common scenario, which you already described. Now if both the front end
and the deamon use the same pooler to have a common failover process,
previously listening connections could be reused by the web app if the
daemon is restarted and the pooler is not. Does it look plausible?

That said, I don't mind if we go with the previous two-liner fix :)


Cheers

-- 
Matteo Beccati

OpenX - http://www.openx.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] DISCARD ALL failing to acquire locks on pg_listen

2009-02-11 Thread Matteo Beccati
Hi everyone,

I've been recently testing PostgreSQL 8.3.4 (upgrade to 8.3.6 is
scheduled) with a large number of connections from separate boxes each
using a locally installed pgpool-II set in connection pooling mode, for
up to 2500 concurrent open connections.

Given I was using 8.3, it seemed quite right to set the reset statement
to ABORT; DISCARD ALL. Everything works fine, until a load spike
happens and pgpool-II reset queries start to lag behind, with DISCARD
ALL failing to acquire an exclusive locks on the pg_listen system table,
although the application isn't using any LISTEN/NOTIFY. The reason was
not obvious to me, but looking at the man page explained a lot: DISCARD
ALL also performs an UNLISTEN *. Since then I've crafted the reset
query to only reset what is actually used by the application, and things
are going much better.

I vaguely remember that a full LISTEN/NOTIFY overhaul is in the to-do
list with low priority, but my point is that DISCARD can be a bottleneck
when used in the scenario it is designed for, i.e. high concurrency
access from connection poolers.

I've been looking to the source code and I understand that async
operations are performed acquiring an exclusive lock at the end of the
transaction, but I have a proof of concept patch that avoids it in case
there are no pending listens or notifies and the backend is not already
listening.

I didn't have time to test it yet, but I can devote a little bit more
time to it in case it makes sense to you.


Cheers

-- 
Matteo Beccati

OpenX - http://www.openx.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL failing to acquire locks on pg_listen

2009-02-11 Thread Matteo Beccati
Hi Tom,

 Given I was using 8.3, it seemed quite right to set the reset statement
 to ABORT; DISCARD ALL. Everything works fine, until a load spike
 happens and pgpool-II reset queries start to lag behind, with DISCARD
 ALL failing to acquire an exclusive locks on the pg_listen system table,
 although the application isn't using any LISTEN/NOTIFY. The reason was
 not obvious to me, but looking at the man page explained a lot: DISCARD
 ALL also performs an UNLISTEN *.
 
 Seems like we could/should fix UNLISTEN * to not do anything if it is
 known that the current backend never did any LISTENs.

Ok, I'll take sometime tonight to give my patch a try and eventually
submit it.


Cheers

-- 
Matteo Beccati

OpenX - http://www.openx.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression in analyze

2008-11-07 Thread Matteo Beccati
Hi,

  * We approximate never vacuumed by has relpages = 0, which
  * means this will also fire on genuinely empty relations.  Not
  * great, but fortunately that's a seldom-seen case in the real
  * world, and it shouldn't degrade the quality of the plan too
  * much anyway to err in this direction.
  */
 if (curpages  10  rel-rd_rel-relpages == 0)
 curpages = 10;
 
 
 commenting that two lines make the estimates correct. now that we have
 plan invalidation that hack is still needed?
 i know that as the comment suggest this has no serious impact but
 certainly this is user visible.

I guess the reason is that a 0 estimate for a non empty table which was
analyzed before the data was inserted and not yet analyzed again could
cause much more troubles... anyway, I was just curious to get an
official anwser ;)


Cheers

-- 
Matteo Beccati

OpenX - http://www.openx.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression in analyze

2008-11-06 Thread Matteo Beccati
Hi,

 Attached test shows a regression in analyze command.
 Expected rows in an empty table is 2140 even after an ANALYZE is executed

Doesn't seem to be a regression to me, as I've just checked that 8.0 did
behave the same. However the question also was raised a few days ago on
the italian mailing list and I couldn't find a reasonable explanation
for it.


Cheers

-- 
Matteo Beccati

OpenX - http://www.openx.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patternsel() and histogram_selectivity() and the hard cutoff of 100

2008-02-12 Thread Matteo Beccati

Hi Greg,


So I had a thought about how to soften the controversial hard cutoff of 100
for the use of the histogram selectivity. Instead of switching 100% one way or
the other between the two heuristics why not calculate both and combine them.
The larger the sample size from the histogram the more we can weight the
histogram calculation. The smaller the histogram size the more we weight the
heuristic.

My first thought was to scale it linearly so we use 10% of the histogram
sample + 90% of the heuristic for default statistic sizes of 10 samples. That
degenerates to the status quo for 100 samples and up.


Incidentally I hacked up a patch to do this:


Sounds sensible to me, at least much more than a hardcoded magic number 
a few people know about...



Cheers

--
Matteo Beccati

Openads - http://www.openads.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-25 Thread Matteo Beccati

Tom Lane ha scritto:

Alvaro Herrera [EMAIL PROTECTED] writes:

Limit (50)
  Sort (key: pse_lastlogin)
Result
   Append
  Limit (50)
 SeqScan tbl_profile_search
  Limit (50)
 Indexscan tbl_profile_search_interest_1
  Limit (50)
 IndexScan on the index mentioned above


is wrong because there's no guarantee that the first 50 elements of a
seqscan will be anything special.  You could imagine dealing with that
by sorting the seqscan results and limiting to 50, or by not
sorting/limiting that data at all but letting the upper sort see all the
seqscan entries.  Offhand I think either of those could win depending on
how many elements the seqscan will yield.  Also, it might be interesting
to consider inventing a merge plan node type that takes N
already-sorted inputs and produces a sorted output stream.  Then we'd
need to trade off this approach versus doing the top-level sort, which
could cope with some of its inputs not being pre-sorted.

This seems to have some aspects in common with the recent discussion
about how to optimize min/max aggregates across an appendrel set.


The plan proposed by Alvaro reminds me of:

http://archives.postgresql.org/pgsql-performance/2005-09/msg00047.php

My proposal was in fact (Alvaro's plan + first Tom's suggested change):

Limit (50)
  Sort (key: pse_lastlogin)
Result
   Append
  Limit (50)
 Sort (key: pse_lastlogin)
SeqScan tbl_profile_search
  Limit (50)
 Indexscan tbl_profile_search_interest_1
  Limit (50)
 IndexScan on the index mentioned above

The plan was generated rewriting the query to use explicit subselect and 
forcing the planner to order by and limit for each subquery.


I've tried a few times to write a patch to handle it, but I wasn't able 
to do it because of my lack of internals knowledge and spare time.



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-27 Thread Matteo Beccati

Stefan Kaltenbrunner wrote:

too bad - however any idea on one of the other troubling querys (q21) I
mentioned in the mail I resent to the list (after the original one got
lost)?

http://archives.postgresql.org/pgsql-hackers/2006-09/msg02011.php


What happens if you increase statistics for l_orderkey?


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-20 Thread Matteo Beccati

Hi,

Tom Lane wrote:

I've committed this change with (for now) 100 as the minimum histogram
size to use.  Stefan, are you interested in retrying your benchmark?


A first try with ltree gave big improvements on my smaller data set: the 
estimated row count is correct or off by only 1 row. I'm now restoring a 
bigger database to get more reliable results.


I hope Stefan can confirm the improvement on dbt3 too.

Thanks Tom :)


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-18 Thread Matteo Beccati

Hi,

Tom Lane wrote:

Hmm ... pattern_sel already applies the operator directly to the
most_common_vals, but in this situation those aren't common enough
to help much.  With such an extensive histogram it is awfully tempting
to assume that the histogram members are a representative sample, and
take the selectivity as being the fraction of histogram entries that
match the pattern.  Maybe drop the first and last histogram entries
on the grounds they're probably outliers.  Thoughts?  What would be a
reasonable minimum histogram size to enable using this approach instead
of the guess-on-the-basis-of-the-pattern code?


That's what I was suggesting here respectively for ltree operators and like:

http://archives.postgresql.org/pgsql-patches/2006-05/msg00178.php
http://archives.postgresql.org/pgsql-performance/2006-01/msg00083.php

My original ltree patch was stripped of the histogram matching code and 
I will need to re-patch 8.2 when deploying it to get decent performance 
with a couple of queries, but it would be very nice to avoid it ;)


I cannot see anything bad by using something like that:

if (histogram is large/representative enough)
{
  recalculate_selectivity_matching_histogram_values()

  if (new_selectivity  old_selectivity)
return new_selectivity
  else
return old_selectivity
}


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-18 Thread Matteo Beccati

Tom Lane ha scritto:

Matteo Beccati [EMAIL PROTECTED] writes:

I cannot see anything bad by using something like that:
if (histogram is large/representative enough)


Well, the question is exactly what is large enough?  I feel a bit
uncomfortable about applying the idea to a histogram with only 10
entries (especially if we ignore two of 'em).  With 100 or more,
it sounds all right.  What's the breakpoint?


Yes, I think 100-200 could be a good breakpoint. I don't actually know 
what is the current usage of SET STATISTICS, I usually set it to 1000 
for columns which need more precise selectivity.


The breakpoint could be set even higher (500?) so there is space to 
increase statistics without enabling the histogram check, but I don't 
feel very comfortable though suggesting this kind of possibly 
undocumented side effect...



Best ragards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] @ versus ~, redux

2006-09-06 Thread Matteo Beccati

Tom Lane wrote:

The existing geometric containment tests seem to be nonstrict, so if we
wanted to leave room to add strict ones later, it might be best to
settle on

x @= y  x contains or equals y
x =@ y  x is contained in or equals y

reserving @ and @ for future strict comparison operators.


Since the choice of @ and @ comes from current ltree operators I'd 
like to point out that they are non-strict for ltree, and this could add 
a little bit of inconsistence.



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] @ versus ~, redux

2006-09-04 Thread Matteo Beccati

Tom Lane ha scritto:

OK, so if everyone is leaning to #3, the name game remains to be played.
Do we all agree on this:

x @ y means x contains y
x @ y means x is contained in y

Are we all prepared to sign a solemn oath to commit hara-kiri if we
invent a new datatype that gets this wrong?  No?  Maybe these still
aren't obvious enough.


Does this mean that also contrib/ltree operators will likely change for 
consistency?


ltree @ ltree
- returns TRUE if left argument is an ancestor of right argument 
(or equal).

ltree @ ltree
- returns TRUE if left argument is a descendant of right argument 
(or equal).



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Enhanced containment selectivity function

2006-03-10 Thread Matteo Beccati

Bruce Momjian ha scritto:

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.


Great. I would just like to remind that Tom said:


I'd be willing to consider exporting those functions from selfuncs.c.


so that the selector function could be moved to contrib/ltree, which is 
its natural place.


It could also be noted that a similar feature could be useful outside 
ltree: I guess there are plenty of cases when scanning statistics would 
give a better result than using a constant selectivity.



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Copyright

2006-03-05 Thread Matteo Beccati

Mark,


After all - you wouldn't want somebody to say that PostgreSQL copied
them, because the date was later, would you? :-)


I think it won't be hard to understand what Copyright (c) 1996-2006 
means ;)



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] PL/php in pg_pltemplate

2005-11-30 Thread Matteo Beccati

Hi Alvaro,

I've also seen there is an experimental embed sapi which could already 
be what you need (--enable-embed).


Interesting.  I'll explore this.  Is this available in PHP5 only?


I found it while checking the available SAPIs in PHP4. Looking to the 
cvs repository, it seems to be available since PHP 4.3.0:


http://cvs.php.net/php-src/sapi/embed/php_embed.c


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] PL/php in pg_pltemplate

2005-11-26 Thread Matteo Beccati

Hi,


The only sore point of the PL/php build is that it needs the Apache2
module, so it needs to know the path to it.  I haven't found a way to do
this automatically without requiring APXS which I certainly don't want
to do ...


Maybe I didn't get the point, but this could be as simple as writing a 
new PHP sapi (i.e. sapi/pgsql) which builds the .so without requiring 
Apache or other software.


I've also seen there is an experimental embed sapi which could already 
be what you need (--enable-embed).



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] MERGE vs REPLACE

2005-11-12 Thread Matteo Beccati

Tom Lane wrote:

Peter Eisentraut [EMAIL PROTECTED] writes:
It seems to me that it has always been implicitly assumed around here 
that the MERGE command would be a substitute for a MySQL-like REPLACE 
functionality.  After rereading the spec it seems that this is not the 
case.  MERGE always operates on two different tables, which REPLACE 
doesn't do.


Normally I'd plump for following the standard ... but AFAIR, we have had
bucketloads of requests for REPLACE functionality, and not one request
for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
whole lot harder and slower than REPLACE, it seems that we could do
worse than to concentrate on doing REPLACE for now.  (We can always come
back to MERGE some other day.)


I would also like to add that MySQL's REPLACE is not exactly an INSERT 
OR UPDATE, rather and INSERT OR (DELETE then INSERT): I mean that the 
fields not specified in the query are set to their defaults:


i.e.

CREATE TABLE t (a int PRIMARY KEY, b int, c int);

INSERT INTO t (a, b, c) VALUES (1, 1, 2);

SELECT * FROM t;
+---+--+--+
| a | b| c|
+---+--+--+
| 1 |1 |2 |
+---+--+--+

REPLACE INTO t (a, b) VALUES (1, 1);

SELECT * FROM t;
+---+--+--+
| a | b| c|
+---+--+--+
| 1 |1 | NULL |
+---+--+--+


I wanted to point it out this because people are commonly mistaking this.


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] Does EXPLAIN ANALYZE show a wrong plan for MIN/MAX?

2005-11-11 Thread Matteo Beccati

Hi,

I've noticed that sometimes EXPLAIN ANALYZE is much slower than the 
plain query. After investigating I found that it happens when using MIN 
or MAX aggregates.


It seems that the plan outputted is not the optimized one (available 
since 8.1) that is really used when running the plain query.



I.e. this is about 14 times slower:


db= SELECT min(t_stamp) FROM stats;
  min

 2005-01-14 17:43:59+01
(1 row)

Time: 2206.841 ms
  

db= EXPLAIN ANALYZE SELECT min(t_stamp) FROM stats;
   QUERY PLAN
-
 Aggregate  (cost=65461.73..65461.74 rows=1 width=8) (actual 
time=30692.485..30692.488 rows=1 loops=1)
   -  Append  (cost=0.00..59648.38 rows=2325338 width=8) (actual 
time=0.043..22841.814 rows=2325018 loops=1)
 -  Seq Scan on stats  (cost=0.00..13.20 rows=320 width=8) 
(actual time=0.004..0.004 rows=0 loops=1)
 -  Seq Scan on stats_200501 stats  (cost=0.00..1.30 rows=30 
width=8) (actual time=0.030..0.132 rows=30 loops=1)
 -  Seq Scan on stats_200502 stats  (cost=0.00..117.81 
rows=4581 width=8) (actual time=0.055..16.635 rows=4581 loops=1)
 -  Seq Scan on stats_200503 stats  (cost=0.00..333.05 
rows=12905 width=8) (actual time=0.108..46.866 rows=12905 loops=1)
 -  Seq Scan on stats_200504 stats  (cost=0.00..805.40 
rows=31140 width=8) (actual time=0.212..113.868 rows=31140 loops=1)
 -  Seq Scan on stats_200505 stats  (cost=0.00..5432.80 
rows=211580 width=8) (actual time=1.394..767.939 rows=211580 loops=1)
 -  Seq Scan on stats_200506 stats  (cost=0.00..9533.68 
rows=371768 width=8) (actual time=2.870..1352.216 rows=371768 loops=1)
 -  Seq Scan on stats_200507 stats  (cost=0.00..9467.76 
rows=369176 width=8) (actual time=2.761..1348.064 rows=369176 loops=1)
 -  Seq Scan on stats_200508 stats  (cost=0.00..6023.04 
rows=234804 width=8) (actual time=1.537..853.712 rows=234804 loops=1)
 -  Seq Scan on stats_200509 stats  (cost=0.00..11600.68 
rows=452568 width=8) (actual time=3.608..1644.433 rows=452568 loops=1)
 -  Seq Scan on stats_200510 stats  (cost=0.00..16318.62 
rows=636462 width=8) (actual time=5.367..2329.015 rows=636462 loops=1)
 -  Seq Scan on stats_200511 stats  (cost=0.00..1.04 rows=4 
width=8) (actual time=0.028..0.041 rows=4 loops=1)

 Total runtime: 30692.627 ms
(15 rows)

Time: 30694.357 ms
  =


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Does EXPLAIN ANALYZE show a wrong plan for MIN/MAX?

2005-11-11 Thread Matteo Beccati

Martijn van Oosterhout wrote:
I've noticed that sometimes EXPLAIN ANALYZE is much slower than the 
plain query. After investigating I found that it happens when using MIN 
or MAX aggregates.


It seems that the plan outputted is not the optimized one (available 
since 8.1) that is really used when running the plain query.


It may also be that the overhead of calling gettimeofday() several
times per tuple is blowing the time out. What platform is this?


FreeBSD 5.4-RELEASE on an HP DL380 G4.

I've also tried to do the same on another machine which has 8.0.3 and 
FreeBSD 4.9-RELEASE-p3: times for the same query are 15s vs 63s with 
EXPLAIN ANALYZE. Of course I know 8.0 doesn't optimize min/max the same 
way 8.1 does.


Hope this helps.


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Does EXPLAIN ANALYZE show a wrong plan for MIN/MAX?

2005-11-11 Thread Matteo Beccati

Tom Lane wrote:

Martijn van Oosterhout kleptog@svana.org writes:

On Fri, Nov 11, 2005 at 11:05:45AM +0100, Matteo Beccati wrote:

It seems that the plan outputted is not the optimized one (available
since 8.1) that is really used when running the plain query.



It may also be that the overhead of calling gettimeofday() several
times per tuple is blowing the time out. What platform is this?


Martijn's explanation is by far the more probable.  The high overhead
of EXPLAIN ANALYZE has been documented before.


OK, I've had the same explaination on IRC by dennisb, but I thought it 
was strange to have a 15x slowdown.


So, does benchmarking queries using explain analyze lead to unreliable 
results? Shouldn't a min/max query use a index scan when possible?



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Matteo Beccati

Hi Tom,


Attached is a completed patch, which I've had no time to test yet, but
I have to leave for the evening right now --- so here it is in case
anyone is awake and wants to poke at it.


The patch was applied correctly only when I reverted Alvaro's first 
patch, so I suppose it was meant to be an alternative to it.


Unfortunately it doesn't solve the invalid alloc request issue.

Should I try Alvaro's second patch that you said not going to work?


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Matteo Beccati

Hi,


Should I try Alvaro's second patch that you said not going to work?


I'll add that this works for me, that's it prevents invalid alloc 
requests to show.



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Matteo Beccati

Tom Lane wrote:

OK, I think this version may actually work, and get the wraparound
case right too.  It hasn't failed yet on the pgbench test case anyway.
Matteo, could you try it on your test case?


Yes, it's working. The test case ran for a several minutes without errors.

Thank you all :)


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi,

I'm using 8.1beta4 on a development server for a rather db-intensive 
application. This application has a multiprocess daemon which was 
working fairly well in past. After some recent changes I started having 
deadlock problems. While investigating to remove what was causing them I 
removed some FOR UPDATE clauses (added on 8.0 to prevent other deadlock 
situations), hoping that the newly added FK share locks would better 
handle the concurrent access. In fact the deadlock errors went away, but 
I suddenly started getting some of these:


ERROR:  invalid memory alloc request size 4291419108
CONTEXT:  SQL statement SELECT 1 FROM ONLY public.gw_users x WHERE 
u_id = $1 FOR SHARE OF x
SQL statement INSERT INTO gw_log_credits (log_id, u_id, 
credit) SELECT currval('gw_log_log_id_seq'), u_id, SUM(credit) FROM 
gw_objects_credits WHERE o_id =  $1  GROUP BY u_id

PL/pgSQL function t_gw_outgoing_a_u line 8 at SQL statement
SQL statement UPDATE gw_outgoing SET ok = 't', response =  $1 
 WHERE o_id =  $2 

PL/pgSQL function gw_queue_ok line 30 at SQL statement
2

where 4291419108 is a big random number. Please let me know if you need 
other details.



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi Martijn,


Backtrace would be nice. I don't suppose your backend is compiled with
debugging? If so, try attaching to the backend and do:

break MemoryContextAlloc if size  10

Obviously something is trying to allocate and negative number of
bytes... 4291419108 = -3548188


Here is the backtrace, hoping I did it correctly:


Breakpoint 1, Mem
oryContextAlloc (context=0xbfbfd5d0, size=3217020368) at mcxt.c:501
501 {
(gdb) bt
#0  MemoryContextAlloc (context=0xbfbfd5d0, size=3217020368) at mcxt.c:501
#1  0x0812a586 in initStringInfo (str=0xbfbfd5d0) at stringinfo.c:50
#2  0x081303e5 in pq_beginmessage (buf=0xbfbfd5d0, msgtype=84 'T') at 
pqformat.c:92
#3  0x080778b5 in SendRowDescriptionMessage (typeinfo=0x8311420, 
targetlist=0xbfbfd5d0, formats=0x83df088) at printtup.c:170
#4  0x08117200 in ExecutorRun (queryDesc=0x83df040, 
direction=ForwardScanDirection, count=0) at execMain.c:222
#5  0x0818a16f in PortalRunSelect (portal=0x835f018, forward=32 ' ', 
count=0, dest=0x83a7448) at pquery.c:794
#6  0x0818a60e in PortalRun (portal=0x835f018, count=2147483647, 
dest=0x83a7448, altdest=0x83a7448, completionTag=0xbfbfd830 ) at 
pquery.c:646
#7  0x081868cc in exec_simple_query (query_string=0x8310228 SELECT * 
FROM gw_queue_get('7')) at postgres.c:1014
#8  0x08188e4f in PostgresMain (argc=4, argv=0x82dd3d0, 
username=0x82dd3a0 multilevel) at postgres.c:3168

#9  0x08165dbc in ServerLoop () at postmaster.c:2853
#10 0x081672bd in PostmasterMain (argc=3, argv=0xbfbfed3c) at 
postmaster.c:943

#11 0x08131092 in main (argc=3, argv=0xbfbfed3c) at main.c:256


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati
=0x8448e50) at 
pl_exec.c:991
#41 0x2dc44fbb in exec_stmt_block (estate=0xbfbfda40, block=0x84492b0) 
at pl_exec.c:936
#42 0x2dc43cda in plpgsql_exec_function (func=0x8448018, 
fcinfo=0xbfbfdb30) at pl_exec.c:286
#43 0x2dc3fcf5 in plpgsql_call_handler (fcinfo=0xbfbfdb30) at 
pl_handler.c:123
#44 0x081447e2 in ExecMakeFunctionResult (fcache=0x8436238, 
econtext=0x84361b0, isNull=0x84367e8 , isDone=0x8436840) at 
execQual.c:1096
#45 0x08145060 in ExecEvalFunc (fcache=0x8436238, econtext=0x84361b0, 
isNull=0x84367e8 , isDone=0x8436840) at execQual.c:1498
#46 0x0814888a in ExecTargetList (targetlist=0x84366b8, 
econtext=0x84361b0, values=0x84367d8, isnull=0x84367e8 , 
itemIsDone=0x8436840, isDone=0xbfbfde14)

at execQual.c:3670
---Type return to continue, or q return to quit---
#47 0x08148c7c in ExecProject (projInfo=0x84367f8, isDone=0xbfbfde14) at 
execQual.c:3871

#48 0x081544cc in ExecResult (node=0x8436128) at nodeResult.c:157
#49 0x08142c9d in ExecProcNode (node=0x8436128) at execProcnode.c:306
#50 0x08141331 in ExecutePlan (estate=0x8436018, planstate=0x8436128, 
operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection, 
dest=0x83d38e0)

at execMain.c:1102
#51 0x081404fb in ExecutorRun (queryDesc=0x8466c40, 
direction=ForwardScanDirection, count=0) at execMain.c:230
#52 0x081de829 in PortalRunSelect (portal=0x842a018, forward=1 '\001', 
count=0, dest=0x83d38e0) at pquery.c:794
#53 0x081de519 in PortalRun (portal=0x842a018, count=2147483647, 
dest=0x83d38e0, altdest=0x83d38e0, completionTag=0xbfbfe080 ) at 
pquery.c:611
#54 0x081da2a5 in exec_simple_query (query_string=0x83d3120 SELECT 
gw_queue_ok('1363827', 'OK 3000')) at postgres.c:1014
#55 0x081dd220 in PostgresMain (argc=4, argv=0x83643d0, 
username=0x83643a0 multilevel) at postgres.c:3168

#56 0x081b01ea in BackendRun (port=0x8361200) at postmaster.c:2855
#57 0x081af7d3 in BackendStartup (port=0x8361200) at postmaster.c:2498
#58 0x081ad86b in ServerLoop () at postmaster.c:1231
#59 0x081ad0eb in PostmasterMain (argc=3, argv=0xbfbfed3c) at 
postmaster.c:943

#60 0x08163962 in main (argc=3, argv=0xbfbfed3c) at main.c:256


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi Tom,


Well, this apparently indicates a bug in the new multixact code, but
there's not enough info here to figure out what went wrong.  Can you
create a test case that will let someone else reproduce the problem?


Unfortunately the error pops up randomly in a very complex app/db and I 
am unable to produce a test case :(


Lat me know what other I can do to help fixing the bug.


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi,


Go up a few levels to GetMultiXactIdMembers and type info locals, see
if we can get the values of some of the variables there. Also, if you
can turn the debugging down to -O0, that will make the results in gdb
much more reliable.

It's clear at least that length is negative, but what about the other
variables...


I already recompiled all with -O0 to be sure that I was able to have a 
backtrace. This is the full bt:


#2  0x0827b5cf in MemoryContextAlloc (context=0x856bcc8, 
size=4278026492) at mcxt.c:505

__func__ = MemoryContextAlloc
#3  0x080b6a16 in GetMultiXactIdMembers (multi=320306, xids=0xbfbfaba4) 
at multixact.c:935

pageno = 156
prev_pageno = 156
entryno = 819
slotno = 2
offptr = (MultiXactOffset *) 0x286536ac
offset = 4235201
length = -4235201
i = 138425096
nextMXact = 320308
tmpMXact = 320307
nextOffset = 4235265
ptr = (TransactionId *) 0xbfbfab78



Do you use a lot of subtransactions, function, savepoints, anything
like that?


I just removed a subtransaction that I put in a function that was used 
to capture the deadlock errors. That subtransaction was actually useless 
because I removed the FOR UPDATE clause that was causing the deadlock, 
but the alloc error is still there. I'll try to search through the code 
to find some other subtransactions.



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi Alvaro,


It would be good to see the contents of MultiXactState.  I suspect
there's a race condition in the MultiXact code.


Good, but... where do I find the contents of MultiXactState? ;)


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi Alvaro,


It would be good to see the contents of MultiXactState.  I suspect
there's a race condition in the MultiXact code.

Good, but... where do I find the contents of MultiXactState? ;)


Huh, it should be a global variable.  Try

p *MultiXactState


Done:

(gdb) p *MultiXactState
$1 = {nextMXact = 320308, nextOffset = 4235265, lastTruncationPoint = 
302016, perBackendXactIds = {0}}



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Alvaro Herrera wrote:

I don't see any easy way to fix this except by introducing a lot more
locking than is there now --- ie, holding the MultiXactGenLock until the
new mxact's starting offset has been written to disk.  Any better ideas?


Well, it isn't a very good solution because it requires us to retain the
MultiXactGenLock past a XLogInsert and some I/O on SLRU pages.
Previously the lock was mostly only used in short operations and very
rarely held during I/O.  But I don't see any other solution either.
Patch attached.


The patch works wonderfully. I'm trying to stress the whole app and with 
no errors until now.




Thanks to Matteo for finding the bug!


Thanks to you all for helping out and fixing it :)



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Tom, Alvaro


The remaining question for me is, how do we sleep until the correct
offset has been stored?


I was thinking of just pg_usleep for some nominal time (1ms maybe)
and try to read the offsets page again.  This is a corner case so
the performance doesn't have to be great.


Let me know if you need to test some other patches.

Again, thank you


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Enhanced containment selectivity function

2005-08-06 Thread Matteo Beccati

Hi,

Moving it in contrib/ltree would be more difficult to me because it 
depends on other functions declared in selfuncs.c 
(get_restriction_variable, etc).


I'd be willing to consider exporting those functions from selfuncs.c.


In the meanwhile here is the latest patch which uses both mcv and 
histogram values.



BTW, when restoring my test database I've found out that there were many 
errors on ALTER INDEX something OWNER TO ... :


ERROR:  something is not a table, view, or sequence

This using 8.1devel pg_restore and a 8.0.3 compressed dump. I could be 
wrong, but I didn't get those errors a few days ago (some cvs updates ago).



Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
Index: contrib/ltree/ltree.sql.in
===
RCS file: /projects/cvsroot/pgsql/contrib/ltree/ltree.sql.in,v
retrieving revision 1.9
diff -c -r1.9 ltree.sql.in
*** contrib/ltree/ltree.sql.in  30 Mar 2004 15:45:32 -  1.9
--- contrib/ltree/ltree.sql.in  6 Aug 2005 13:10:35 -
***
*** 230,236 
RIGHTARG = ltree,
PROCEDURE = ltree_isparent,
  COMMUTATOR = '@',
! RESTRICT = contsel,
JOIN = contjoinsel
  );
  
--- 230,236 
RIGHTARG = ltree,
PROCEDURE = ltree_isparent,
  COMMUTATOR = '@',
! RESTRICT = parentsel,
JOIN = contjoinsel
  );
  
***
*** 248,254 
RIGHTARG = ltree,
PROCEDURE = ltree_risparent,
  COMMUTATOR = '@',
! RESTRICT = contsel,
JOIN = contjoinsel
  );
  
--- 248,254 
RIGHTARG = ltree,
PROCEDURE = ltree_risparent,
  COMMUTATOR = '@',
! RESTRICT = parentsel,
JOIN = contjoinsel
  );
  
Index: src/backend/utils/adt/selfuncs.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.187
diff -c -r1.187 selfuncs.c
*** src/backend/utils/adt/selfuncs.c21 Jul 2005 04:41:43 -  1.187
--- src/backend/utils/adt/selfuncs.c6 Aug 2005 13:10:46 -
***
*** 1306,1311 
--- 1306,1488 
return (Selectivity) selec;
  }
  
+ #define DEFAULT_PARENT_SEL 0.001
+ 
+ /*
+  *parentsel   - Selectivity of parent relationship 
for ltree data types.
+  */
+ Datum
+ parentsel(PG_FUNCTION_ARGS)
+ {
+   PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+   Oid operator = PG_GETARG_OID(1);
+   List   *args = (List *) PG_GETARG_POINTER(2);
+   int varRelid = PG_GETARG_INT32(3);
+   VariableStatData vardata;
+   Node   *other;
+   boolvaronleft;
+   Datum  *values;
+   int nvalues;
+   float4 *numbers;
+   int nnumbers;
+   double  selec = 0.0;
+ 
+   /*
+* If expression is not variable @ something or something @ variable,
+* then punt and return a default estimate.
+*/
+   if (!get_restriction_variable(root, args, varRelid,
+ vardata, 
other, varonleft))
+   PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL);
+ 
+   /*
+* If the something is a NULL constant, assume operator is strict and
+* return zero, ie, operator will never return TRUE.
+*/
+   if (IsA(other, Const) 
+   ((Const *) other)-constisnull)
+   {
+   ReleaseVariableStats(vardata);
+   PG_RETURN_FLOAT8(0.0);
+   }
+ 
+   if (HeapTupleIsValid(vardata.statsTuple))
+   {
+   Form_pg_statistic stats;
+   double  mcvsum = 0.0;
+   double  mcvsel = 0.0;
+   double  hissel = 0.0;
+ 
+   stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+ 
+   if (IsA(other, Const))
+   {
+   /* Variable is being compared to a known non-null 
constant */
+   Datum   constval = ((Const *) 
other)-constvalue;
+   boolmatch = false;
+   int i;
+ 
+   /*
+* Is the constant @ to any of the column's most 
common
+* values?
+*/
+   if (get_attstatsslot(vardata.statsTuple,
+
vardata.atttype, vardata.atttypmod,
+
STATISTIC_KIND_MCV, InvalidOid,
+values, 
nvalues,
+numbers, 
nnumbers

[HACKERS] Enhanced containment selectivity function

2005-08-04 Thread Matteo Beccati

Hi,

I've recently had problems with slow queries caused by the selectivity 
of the @ ltree operator, as you may see in my post here:


http://archives.postgresql.org/pgsql-performance/2005-07/msg00473.php


Someone on IRC (AndrewSN if I'm not wrong) pointed out that the 
restriction selectivity function for @ is contsel, which returns a 
constant value of 0.001. So I started digging in the source code trying 
to understand how the default behaviour could be enhanced, and ended up 
writing a little patch which adds an alternative containment selectivity 
function (called contstatsel) which is able to deliver better results.


This first version is based on the eqsel function and uses only 
histogram values to calculate the selectivity and uses the 0.001 
constant as a fallback.


This also made me think: is there a reason why geometric selectivity 
functions return constant values rather than checking statistics for a 
better result?


Attached you will find a patch suitable for current CVS HEAD. My C 
skills are a bit rusty and my knowledge of pg internals are very poor, 
so I'm sure it could be improved and modified to better fit the pg 
coding standards.



Here are the results on a slow query:

test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING 
(u_id) WHERE tree @ '1041' AND t_stamp  '2005-07-01';


QUERY PLAN 


--
 Nested Loop  (cost=0.00..553.02 rows=8 width=364) (actual 
time=2.423..19787.259 rows=6785 loops=1)
   -  Index Scan using gw_users_gisttree_key on gw_users 
(cost=0.00..21.63 rows=5 width=156) (actual time=0.882..107.434 
rows=4696 loops=1)

 Index Cond: (tree @ '1041'::ltree)
   -  Index Scan using gw_batches_t_stamp_u_id_key on gw_batches 
(cost=0.00..106.09 rows=15 width=212) (actual time=3.898..4.171 rows=1 
loops=4696)
 Index Cond: ((gw_batches.t_stamp  '2005-07-01 
00:00:00+02'::timestamp with time zone) AND (outer.u_id = 
gw_batches.u_id))

 Total runtime: 19805.447 ms
(6 rows)

test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING 
(u_id) WHERE tree @ '1041' AND t_stamp  '2005-07-01';


QUERY PLAN 


-
 Hash Join  (cost=245.26..1151.80 rows=7671 width=364) (actual 
time=69.562..176.966 rows=6785 loops=1)

   Hash Cond: (outer.u_id = inner.u_id)
   -  Bitmap Heap Scan on gw_batches  (cost=57.74..764.39 rows=8212 
width=212) (actual time=8.330..39.542 rows=7819 loops=1)
 Recheck Cond: (t_stamp  '2005-07-01 00:00:00+02'::timestamp 
with time zone)
 -  Bitmap Index Scan on gw_batches_t_stamp_u_id_key 
(cost=0.00..57.74 rows=8212 width=0) (actual time=8.120..8.120 rows=7819 
loops=1)
   Index Cond: (t_stamp  '2005-07-01 
00:00:00+02'::timestamp with time zone)
   -  Hash  (cost=175.79..175.79 rows=4692 width=156) (actual 
time=61.046..61.046 rows=4696 loops=1)
 -  Seq Scan on gw_users  (cost=0.00..175.79 rows=4692 
width=156) (actual time=0.083..34.200 rows=4696 loops=1)

   Filter: (tree @ '1041'::ltree)
 Total runtime: 194.621 ms
(10 rows)

The second query uses a custom @ operator I added to test the 
alternative selectivity function:


CREATE FUNCTION contstatsel(internal, oid, internal, integer) RETURNS 
double precision AS 'contstatsel' LANGUAGE internal;

CREATE OPERATOR @ (
 LEFTARG = ltree,
 LEFTARG = ltree,
 PROCEDURE = ltree_risparent,
 COMMUTATOR = '@',
 RESTRICT = contstatsel,
 JOIN = contjoinsel
);


Of course any comments/feedback are welcome.


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
Index: src/backend/utils/adt/selfuncs.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.187
diff -r1.187 selfuncs.c
1309a1310,1433
  *contstatsel - Selectivity of containment for any 
 data types.
  */
 Datum
 contstatsel(PG_FUNCTION_ARGS)
 {
   PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
   Oid operator = PG_GETARG_OID(1);
   List   *args = (List *) PG_GETARG_POINTER(2);
   int varRelid = PG_GETARG_INT32(3);
   VariableStatData vardata;
   Node   *other;
   boolvaronleft;
   Datum  *values;
   int nvalues;
   double  selec = 0.0;
 
   /*
* If expression is not variable = something or something = variable,
* then punt and return a default estimate.
*/
   if (!get_restriction_variable(root, args, varRelid,
 vardata, 
 other, varonleft

Re: [HACKERS] Enhanced containment selectivity function

2005-08-04 Thread Matteo Beccati

Tom Lane wrote:

After looking at this a little, it doesn't seem like it has much to do
with the ordinary 2-D notion of containment.  In most of the core
geometric types, the histogram ordering is based on area, and so
testing the histogram samples against the query doesn't seem like it's
able to give very meaningful containment results --- the items shown
in the histogram could have any locations whatever.

The approach might be sensible for ltree's isparent operator --- I don't
have a very good feeling for the behavior of that operator, but it looks
like it has at least some relationship to the ordering induced by the
ltree  operator.


Actually, this was one of my doubts. The custom function seem to work 
well with ltree, but this also could be dependant from the way my 
dataset is organized.




So my thought is that (assuming Oleg and Teodor agree this is sensible
for ltree) we should put the selectivity function into contrib/ltree,
not directly into the core.  It might be best to call it something like
parentsel, too, to avoid giving the impression that it has something
to do with 2-D containment.

Also, you should think about using the most-common-values list as well
as the histogram.  I would guess that many ltree applications would have
enough duplicate entries that the MCV list represents a significant
fraction of the total population.  Keep in mind when thinking about this
that the histogram describes the population of data *exclusive of the
MCV entries*.


I also agree that parentsel would better fit its purpose.

My patch was originally using MCV without good results, until I realized 
that MCV was empty because the column contains unique values :)

I'll look into adding a MCV check to it.

Moving it in contrib/ltree would be more difficult to me because it 
depends on other functions declared in selfuncs.c 
(get_restriction_variable, etc).


Thank you for your feedback


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org