date:20070317

Re: CREATE INDEX and HOT (was [HACKERS] Question: pg_classattributes and race conditions ?)

2007-03-17 Thread Simon Riggs

On Sat, 2007-03-17 at 12:43 +0530, Pavan Deolasee wrote:
> How do we move forward with the CREATE INDEX issue with
> HOT ? There are quite a few suggestions and objections.
> Can we please discuss and decide on the plan ? I am very
> comfortable with the current state of HOT, the results
> are encouraging and I hope this issue does not become
> a showstopper.

My suggested plan would be:

First, we should make CREATE INDEX CONCURRENTLY work, since we have a
design to do that. Many other things are acceptable if that works fully
and correctly with HOT, if it cannot, we have problems and that is a
showstopper. 

Second, I can't put my finger on it exactly, but I think having a DDL
command to chill a table, as Alvaro suggests, sounds like a very wise
utility to include, whatever we do later. That does have the useful
by-product of making CREATE INDEX work without change, but I would not
want to rely on it in the longer term.

While those are happening we can explore the other possibilities in more
detail, but they do seem like safe bets to me.

> Here is what different people have suggested:
> 
> 1. Simon and I posted couple of designs which include prior
> suggestions from Heikki, Tom, Hannu and others. I believe
> these designs are not very distinct and if done correctly
> would help us keep the current behavior of CREATE INDEX
> and CREATE INDEX CONCURRENTLY same. 

Agreed. Apologies for confusing things by posting the second design,
that wasn't very helpful or polite.

> There are concerns
> about race conditions and deadlock issues though. What
> is a general feeling at this point ? Are these issues very
> hard to address ? At a high level, IMO we need some or all
> of these things to make either of these designs work:
> 
> - ability to enable/disable HOT-updates on the table
> - ability to enable/disable HOT-fetches on the table
> - ability to wait for concurrent index scans to finish
> - ability to allow only one backend to CHILL the table

That seems like a good summary of what we need. I would add only that
these need not be explicit user-level commands, just internal
capabilities.

> How hard would these be without causing any race conditions
> and deadlocks ?

We should be able to take advantage of two things:

- We only need to chill some of the tuples, in some cases.

- Chilling only requires us to lock the indexes, which would be more
practical if the index locking were more transitory.

I note that IndexScans hold the locks they have on an index until
transaction end, even after the IndexScans and BitmapIndexScans
complete. ISTM we could relax that locking; that seems OK since we
earlier discussed removing locks completely on indexes. The lock on the
heap would not be upgraded or changed.

> 2. Heikki suggested an approach where we add a byte
> to tuple header and track HOT-ness of different indexes.
> The idea looks good but had a downside of increasing tuple
> header and complexity.

I'm very comfortable with the idea that HOT can be turned on/off for a
table. That gives us a workaround to bugs. Previously, changing things
like WITHOUT OIDS was done over two releases, so I'd suggest the same
thing here. Add the option now, disabled, then look to make it the
default option in the next release. We can override that with the
default_use_hot parameter for those that feel bold, at least initially.
I know Bruce has been long opposed to the idea of a table-level switch,
which is why we've been trying so hard to avoid it. So we should add his
-1 to this idea from the start.

Right now, ideas around (2) sound like too much complexity. Maybe
there's a good idea there waiting to break out, so we should pursue that
also - but I'm not sure you can wait for that to happen.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [PATCHES] Bitmapscan changes

2007-03-17 Thread Grzegorz Jaskiewicz



On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote:




You'll obviously need to run it with the patch applied. I'd suggest  
to enable stats_block_level to see the effect on buffer cache hit/ 
miss ratio.


groupeditems-42-pghead.patch.gz is enough, or it needs  
maintain_cluster_order_v5.patch ??





--
Grzegorz Jaskiewicz

C/C++ freelance for hire






---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-17 Thread tomas

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, Mar 16, 2007 at 09:56:23AM -0400, Andrew Dunstan wrote:
> [EMAIL PROTECTED] wrote:
> >
> >>Does hstore nest? [...]
> >
> >If what you mean is to have "mappings of mappings" then no.
> >  [...] think "hash" for perl folks [...]

> As a perl folk I think of hashes as nestable :-)

Heh. Point taken :-)

> Anyway, this means that you can't use hstore to cover the same field as 
> YAML or JSON. That doesn't mean it's not useful - far from it.

I think that hstore might be extensible in this direction. After all,
the keys and (key, value) combinations just get hashed into a bitmap.
There is no reason why one shouldn't be able to hash more complex data
structures (unless, of course, the bit map gets so densely poked with
ones that it stops being useful). The difficult part might be to reach a
consensus on what a "complex data structure" might look like.

The purists among the relational folks are sharpening their knives
already, I can hear that ;-)

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF++GaBcgs9XrR2kYRAs/hAJ9vfRy36T23vJ6eIYj6efrQVk9roQCeMkqQ
kWqzNbgbkMOXKIDQuzwDeFY=
=fCiG
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Bison 2.1 on win32

2007-03-17 Thread Magnus Hagander

I just tried building with Bison 2.1 on msvc, and it broke. For one
thing, the .BAT file rejects 2.1 as broken instead of 2.0, which is
obviously incorrect :-)

But the generated C file also does not compile causing the error on
http://msdn2.microsoft.com/en-us/library/93az0868.aspx, because msvc
doesn't define __STDC__, which causes Bison to generate code it can't
compile.  Defining __STDC__ globally breaks several other places, since
it affects a lot of include files that aren't necessarily others.

The attached patch seems to fix the build issue. Does it seem
acceptable/the right thing to do?

Another option would be to just reject both 2.0 and 2.1 as broken to
build pg with, I guess...

//Magnus
Index: src/backend/parser/gram.y
===
RCS file: /projects/cvsroot/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.581
diff -c -r2.581 gram.y
*** src/backend/parser/gram.y   13 Mar 2007 00:33:41 -  2.581
--- src/backend/parser/gram.y   17 Mar 2007 13:14:40 -
***
*** 62,67 
--- 62,71 
  #include "utils/numeric.h"
  #include "utils/xml.h"
  
+ /* MSVC does not define __STDC__, but Bison 2.1 generates broken code without 
it */
+ #ifdef WIN32_ONLY_COMPILER
+ #define __STDC__ 1
+ #endif
  
  /* Location tracking support --- simpler than bison's default */
  #define YYLLOC_DEFAULT(Current, Rhs, N) \
Index: src/backend/bootstrap/bootparse.y
===
RCS file: /projects/cvsroot/pgsql/src/backend/bootstrap/bootparse.y,v
retrieving revision 1.88
diff -c -r1.88 bootparse.y
*** src/backend/bootstrap/bootparse.y   13 Mar 2007 00:33:39 -  1.88
--- src/backend/bootstrap/bootparse.y   17 Mar 2007 13:14:56 -
***
*** 51,56 
--- 51,61 
  #include "tcop/dest.h"
  #include "utils/rel.h"
  
+ /* MSVC does not define __STDC__, but Bison 2.1 generates broken code without 
it */
+ #ifdef WIN32_ONLY_COMPILER
+ #define __STDC__ 1
+ #endif
+ 
  #define atooid(x) ((Oid) strtoul((x), NULL, 10))

  
Index: src/pl/plpgsql/src/gram.y
===
RCS file: /projects/cvsroot/pgsql/src/pl/plpgsql/src/gram.y,v
retrieving revision 1.99
diff -c -r1.99 gram.y
*** src/pl/plpgsql/src/gram.y   19 Feb 2007 03:18:51 -  1.99
--- src/pl/plpgsql/src/gram.y   17 Mar 2007 13:15:11 -
***
*** 18,23 
--- 18,27 
  
  #include "parser/parser.h"
  
+ /* MSVC does not define __STDC__, but Bison 2.1 generates broken code without 
it */
+ #ifdef WIN32_ONLY_COMPILER
+ #define __STDC__ 1
+ #endif
  
  static PLpgSQL_expr   *read_sql_construct(int until,

int until2,

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Bison 2.1 on win32

2007-03-17 Thread Andrew Dunstan


Magnus Hagander wrote:

I just tried building with Bison 2.1 on msvc, and it broke. For one
thing, the .BAT file rejects 2.1 as broken instead of 2.0, which is
obviously incorrect :-)

But the generated C file also does not compile causing the error on
http://msdn2.microsoft.com/en-us/library/93az0868.aspx, because msvc
doesn't define __STDC__, which causes Bison to generate code it can't
compile.  Defining __STDC__ globally breaks several other places, since
it affects a lot of include files that aren't necessarily others.

The attached patch seems to fix the build issue. Does it seem
acceptable/the right thing to do?

Another option would be to just reject both 2.0 and 2.1 as broken to
build pg with, I guess...

  


I rolled back to 1.875 to get MSVC builds working. In the longer term, 
though, falling behind upstream is probably not a good idea. Should this 
be reported to the bison people?


For now I could live with either of your solutions.

cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] Bitmapscan changes

2007-03-17 Thread Joshua D. Drake

Grzegorz Jaskiewicz wrote:
> 
> On Mar 16, 2007, at 10:12 PM, Heikki Linnakangas wrote:
> 
> 
>>
>> You'll obviously need to run it with the patch applied. I'd suggest to
>> enable stats_block_level to see the effect on buffer cache hit/miss
>> ratio.
> 
> groupeditems-42-pghead.patch.gz is enough, or it needs
> maintain_cluster_order_v5.patch ??

He has a patched source ball here of the whole thing, which is what I used:


http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz

The you just need to run the tests.

> 
> 
> 
> 
> --Grzegorz Jaskiewicz
> 
> C/C++ freelance for hire
> 
> 
> 
> 
> 
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>   http://archives.postgresql.org
> 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] xpath_array with namespaces support

2007-03-17 Thread Nikolay Samokhvalov


What about it? W/o this not large patch XML functionality in 8.3 will be weak...
Will it be accepted?

On 3/5/07, Nikolay Samokhvalov <[EMAIL PROTECTED]> wrote:

On 3/4/07, Nikolay Samokhvalov <[EMAIL PROTECTED]> wrote:
> I'll fix these issues and extend the patch with resgression tests and
> docs for xpath_array(). I'll resubmit it very soon.

Here is a new version of the patch. I didn't change any part of docs yet.
Since there were no objections I've changed the name of the function
to xmlpath().




--
Best regards,
Nikolay
Index: src/backend/utils/adt/xml.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/xml.c,v
retrieving revision 1.34
diff -u -r1.34 xml.c
--- src/backend/utils/adt/xml.c	3 Mar 2007 19:32:55 -	1.34
+++ src/backend/utils/adt/xml.c	5 Mar 2007 01:14:57 -
@@ -47,6 +47,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #endif /* USE_LIBXML */
 
 #include "catalog/namespace.h"
@@ -67,6 +69,7 @@
 #include "utils/datetime.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "access/tupmacs.h"
 #include "utils/xml.h"
 
 
@@ -88,6 +91,7 @@
 static int		parse_xml_decl(const xmlChar *str, size_t *lenp, xmlChar **version, xmlChar **encoding, int *standalone);
 static bool		print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone);
 static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, xmlChar *encoding);
+static text		*xml_xmlnodetotext(xmlNodePtr cur);
 
 #endif /* USE_LIBXML */
 
@@ -1463,7 +1467,6 @@
 	return buf.data;
 }
 
-
 /*
  * Map SQL value to XML value; see SQL/XML:2003 section 9.16.
  */
@@ -2403,3 +2406,247 @@
 	else
 		appendStringInfoString(result, "\n\n");
 }
+
+
+/*
+ * XPath related functions
+ */
+
+#ifdef USE_LIBXML
+/* 
+ * Convert XML node to text (return value only, it's not dumping)
+ */
+text *
+xml_xmlnodetotext(xmlNodePtr cur)
+{
+	xmlChar		*str;
+	text			*result;
+	size_t			len;	
+	
+	str = xmlXPathCastNodeToString(cur);
+	len = strlen((char *) str);
+	result = (text *) palloc(len + VARHDRSZ);
+	SET_VARSIZE(result, len + VARHDRSZ);
+	memcpy(VARDATA(result), str, len);
+	
+	return result;
+}
+#endif
+
+/*
+ * Evaluate XPath expression and return array of XML values.
+ * As we have no support of XQuery sequences yet, this functions seems
+ * to be the most useful one (array of XML functions plays a role of
+ * some kind of substritution for XQuery sequences).
+
+ * Workaround here: we parse XML data in different way to allow XPath for
+ * fragments (see "XPath for fragment" TODO comment inside).
+ */
+Datum
+xmlpath(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+	ArrayBuildState		*astate = NULL;
+	xmlParserCtxtPtr	ctxt = NULL;
+	xmlDocPtr			doc = NULL;
+	xmlXPathContextPtr	xpathctx = NULL;
+	xmlXPathCompExprPtr	xpathcomp = NULL;
+	xmlXPathObjectPtr	xpathobj = NULL;
+	int32len, xpath_len;
+	xmlChar*string, *xpath_expr;
+	boolres_is_null = FALSE;
+	int	i;
+	xmltype*data;
+	text*xpath_expr_text;
+	ArrayType			*namespaces;
+	int	*dims, ndims, ns_count = 0, bitmask = 1;
+	char*ptr;
+	bits8*bitmap;
+	char**ns_names = NULL, **ns_uris = NULL;
+	int16typlen;
+	booltypbyval;
+	chartypalign;
+	
+	/* the function is not strict, we must check first two args */
+	if (PG_ARGISNULL(0) || PG_ARGISNULL(1))
+		PG_RETURN_NULL();
+	
+	xpath_expr_text = PG_GETARG_TEXT_P(0);
+	data  = PG_GETARG_XML_P(1);
+	
+	/* Namespace mappings passed as text[].
+	 * Assume that 2-dimensional array has been passed, 
+	 * the 1st subarray is array of names, the 2nd -- array of URIs,
+	 * example: ARRAY[ARRAY['myns', 'myns2'], ARRAY['http://example.com', 'http://example2.com']]. 
+	 */
+	if (!PG_ARGISNULL(2))
+	{
+		namespaces = PG_GETARG_ARRAYTYPE_P(2);
+		ndims = ARR_NDIM(namespaces);
+		dims = ARR_DIMS(namespaces);
+		
+		/* Sanity check */
+		if (ndims != 2)
+			ereport(ERROR, (errmsg("invalid array passed for namespace mappings"),
+			errdetail("Only 2-dimensional array may be used for namespace mappings.")));
+		
+		Assert(ARR_ELEMTYPE(namespaces) == TEXTOID);
+		
+		ns_count = ArrayGetNItems(ndims, dims) / 2;
+		get_typlenbyvalalign(ARR_ELEMTYPE(namespaces),
+			 &typlen, &typbyval, &typalign);
+		ns_names = (char **) palloc(ns_count * sizeof(char *));
+		ns_uris = (char **) palloc(ns_count * sizeof(char *));
+		ptr = ARR_DATA_PTR(namespaces);
+		bitmap = ARR_NULLBITMAP(namespaces);
+		bitmask = 1;
+		
+		for (i = 0; i < ns_count * 2; i++)
+		{
+			if (bitmap && (*bitmap & bitmask) == 0)
+ereport(ERROR, (errmsg("neither namespace nor URI may be NULL"))); /* TODO: better message */
+			else
+			{
+if (i < ns_count)
+	ns_names[i] = DatumGetCString(DirectFunctionCall1(textout,
+		  PointerGetDatum(ptr)));
+else
+	ns_uris[i - ns_count] = DatumGetCString(DirectFunctionCall1(textout,
+		  PointerGetDatum(ptr)));
+ptr = att_ad

Re: [HACKERS] [PATCHES] xpath_array with namespaces support

2007-03-17 Thread Andrew Dunstan




Nikolay Samokhvalov wrote:
What about it? W/o this not large patch XML functionality in 8.3 will 
be weak...

Will it be accepted?



In principle I am in favor of the patch.

Would it be better to use some more unlikely name for the dummy root 
element used to process fragments than  ?


Perhaps even something in a special namespace?

cheers

andrew




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] [COMMITTERS] pgsql: Add GUC temp_tablespaces to provide a default location for

2007-03-17 Thread Jaime Casanova


On 3/5/07, Tom Lane <[EMAIL PROTECTED]> wrote:

In the second place, it's a serious violation of what little modularity
and layering we have for fd.c to be calling into commands/tablespace.c.
This is not merely cosmetic but has real consequences: one being that
it's now unsafe to call OpenTemporaryFile outside a transaction.



ok, you are right... what do you suggest?
maybe move the GetTempTablespace function to somewhere in src/backend/utils?

--
regards,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
  Richard Cook

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] SoC ECPG Enhancements

2007-03-17 Thread Michael Meskes

> > ECPG Enhancments: Enable ECPG to generate calls directly for libpq
> > rather than calls to its own libraries.
> >
> > What would be the point of that?
> 
> More importantly, Michael hasn't volunteered to mentor this year, so I think 
> we should probably take the idea down.

And most importantly in my opinion: this simply doesn't make sense. :-)

I already refused to add this last year and unless someone can explain
to me what I am misunderstanding I you can simply remove it.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [COMMITTERS] pgsql: Add GUC temp_tablespaces to provide a default location for

2007-03-17 Thread Tom Lane

"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> On 3/5/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>> In the second place, it's a serious violation of what little modularity
>> and layering we have for fd.c to be calling into commands/tablespace.c.
>> This is not merely cosmetic but has real consequences: one being that
>> it's now unsafe to call OpenTemporaryFile outside a transaction.

> ok, you are right... what do you suggest?
> maybe move the GetTempTablespace function to somewhere in src/backend/utils?

You missed the point entirely.  Relocating the code to some other file
wouldn't change the objection: the problem is that fd.c mustn't invoke
any transactional facilities such as catalog lookups.  It's too low
level for that.

You could perhaps do it the other way around: some transactional
code (eg the assign hook for a GUC variable) tells fd.c to save
some private state controlling future temp file creations.

BTW, if we are now thinking of temp files as being directed to
particular tablespaces, is there any reason still to have per-database
subdirectories for them?  It might simplify life if there were just
one default temp directory, $PGDATA/base/pgsql_tmp/, plus one per
configured temp tablespace, $PGDATA/pg_tblspc//pgsql_tmp/.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: CREATE INDEX and HOT (was [HACKERS] Question: pg_class attributes and race conditions ?)

2007-03-17 Thread Tom Lane

"Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> While creating an index, if a HEAP_ONLY tuple is found,
> CREATE INDEX [CONCURRENTLY] fails with an error and the
> user needs to SET HOT OFF and then try again. While turning
> HOT off, the entire table is CHILLed, holding AccessExclusive
> lock on the table. Once the new index is created, user
> can turn HOT on again.

It hardly seems acceptable to require exclusive lock to chill a table.
In production situations, knowing that you'd have to do that to do
index maintenance on a large table would probably scare you off of
ever enabling the feature at all.  Last year we were getting beaten up
about how it wasn't acceptable for CREATE INDEX to lock out writes
for a long time; how is it suddenly acceptable to need to lock out
both reads and writes for a long time before you can even think
about creating an index?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-17 Thread Andrew Dunstan




Jeff Davis wrote:

On Wed, 2007-03-14 at 01:29 -0600, Michael Fuhr wrote:
  

On Tue, Mar 13, 2007 at 04:42:35PM +0100, Mario Weilguni wrote:


Am Dienstag, 13. März 2007 16:38 schrieb Joshua D. Drake:
  

Is this any different than the issues of moving 8.0.x to 8.1 UTF8? Where
we had to use iconv?

What issues? I've upgraded several 8.0 database to 8.1. without having to use 
iconv. Did I miss something?
  

http://www.postgresql.org/docs/8.1/interactive/release-8-1.html

"Some users are having problems loading UTF-8 data into 8.1.X.  This
is because previous versions allowed invalid UTF-8 byte sequences
to be entered into the database, and this release properly accepts
only valid UTF-8 sequences. One way to correct a dumpfile is to run
the command iconv -c -f UTF-8 -t UTF-8 -o cleanfile.sql dumpfile.sql."




If the above quote were actually true, then Mario wouldn't be having a
problem. Instead, it's half-true: Invalid byte sequences are rejected in
some situations and accepted in others. If postgresql consistently
rejected or consistently accepted invalid byte sequences, that would not
cause problems with COPY (meaning problems with pg_dump, slony, etc.).


  


How can we fix this? Frankly, the statement in the docs warning about 
making sure that escaped sequences are valid in the server encoding is a 
cop-out. We don't accept invalid data elsewhere, and this should be no 
different IMNSHO. I don't see why this should be any different from, 
say, date or numeric data. For years people have sneered at MySQL 
because it accepted dates like Feb 31st, and rightly so. But this seems 
to me to be like our own version of the same problem.


Last year Jeff suggested adding something like:

   pg_verifymbstr(string,strlen(string),0);

to each relevant input routine. Would that be an acceptable solution? If 
not, what would be?


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Bison 2.1 on win32

2007-03-17 Thread Tom Lane

Magnus Hagander <[EMAIL PROTECTED]> writes:
> The attached patch seems to fix the build issue. Does it seem
> acceptable/the right thing to do?

No, it seems pretty bletcherous.

> Another option would be to just reject both 2.0 and 2.1 as broken to
> build pg with, I guess...

In bison 2.3 (which is shipping with Fedora Core 6), all the uses of
__STDC__ seem to also test _MSC_VER:

#if (defined __STDC__ || defined __C99__FUNC__ \
 || defined __cplusplus || defined _MSC_VER)

If this fixes the problem, then I'd vote for just stating you need
bison >= 2.3 (or 2.2?) to build on MSVC.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-17 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Last year Jeff suggested adding something like:
> pg_verifymbstr(string,strlen(string),0);
> to each relevant input routine. Would that be an acceptable solution?

The problem with that is that it duplicates effort: in many cases
(especially COPY IN) the data's already been validated.  I'm not sure
how to fix that, but I think you'll get some push-back if you double
the encoding verification work in COPY for nothing.

Given that we are moving away from backslash-enabled literals, I'm
not as convinced as some that this must be fixed...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Bison 2.1 on win32

2007-03-17 Thread Magnus Hagander

Tom Lane wrote:
> Magnus Hagander <[EMAIL PROTECTED]> writes:
>> The attached patch seems to fix the build issue. Does it seem
>> acceptable/the right thing to do?
> 
> No, it seems pretty bletcherous.

That's kind of what I thought :-)


>> Another option would be to just reject both 2.0 and 2.1 as broken to
>> build pg with, I guess...
> 
> In bison 2.3 (which is shipping with Fedora Core 6), all the uses of
> __STDC__ seem to also test _MSC_VER:
> 
> #if (defined __STDC__ || defined __C99__FUNC__ \
>  || defined __cplusplus || defined _MSC_VER)
> 
> If this fixes the problem, then I'd vote for just stating you need
> bison >= 2.3 (or 2.2?) to build on MSVC.

It certainly looks like it would fix it. However, the gnuwin32 project
doesn't have 2.3, not even 2.2. But let's add a check and say we need
it, and then we can fix it again once they do release that version if
it's broken still...

Do you happen to have a 2.2 around so you can see what happens there? Or
does someone else have that? So I know which version to test against...

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-17 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  

Last year Jeff suggested adding something like:
pg_verifymbstr(string,strlen(string),0);
to each relevant input routine. Would that be an acceptable solution?



The problem with that is that it duplicates effort: in many cases
(especially COPY IN) the data's already been validated.  I'm not sure
how to fix that, but I think you'll get some push-back if you double
the encoding verification work in COPY for nothing.

Given that we are moving away from backslash-enabled literals, I'm
not as convinced as some that this must be fixed...



  


They will still be available in E'\nn' form, won't they?

One thought I had was that it might make sense to have a flag that would 
inhibit the check, that could be set (and reset) by routines that check 
for themselves, such as COPY IN. Then bulk load performance should not 
be hit much.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Bison 2.1 on win32

2007-03-17 Thread Magnus Hagander

Magnus Hagander wrote:
> I just tried building with Bison 2.1 on msvc, and it broke. For one
> thing, the .BAT file rejects 2.1 as broken instead of 2.0, which is
> obviously incorrect :-)

Actually, looking at the GNU ftp site, there isn't even a version 2.2
available. There is a 2.1a which should have the fix (based on file
dates - they don't use branches or tags in their cvs repository).

But I'll go ahead and just say it's fixed in 2.3, and patch accordingly.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Bison 2.1 on win32

2007-03-17 Thread Tom Lane

Magnus Hagander <[EMAIL PROTECTED]> writes:
> Actually, looking at the GNU ftp site, there isn't even a version 2.2
> available. There is a 2.1a which should have the fix (based on file
> dates - they don't use branches or tags in their cvs repository).

Huh?  At
http://ftp.gnu.org/gnu/bison/
I see 2.0, 2.1, 2.2, 2.3, and no 2.1a.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Bison 2.1 on win32

2007-03-17 Thread Magnus Hagander

Tom Lane wrote:
> Magnus Hagander <[EMAIL PROTECTED]> writes:
>> Actually, looking at the GNU ftp site, there isn't even a version 2.2
>> available. There is a 2.1a which should have the fix (based on file
>> dates - they don't use branches or tags in their cvs repository).
> 
> Huh?  At
> http://ftp.gnu.org/gnu/bison/
> I see 2.0, 2.1, 2.2, 2.3, and no 2.1a.

Gah. I clicked the wrong link that was titled "test releases" ;-) I just
found a second ftp link when the first one didn't work. Should've read
more carefully.

Ok. So it looks like 2.2 should be fine as well, as soon as they put out
a new win32 ver... (FYI, the code seems to come from data/c.m4)

/Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bison 2.1 on win32

2007-03-17 Thread Tom Lane

Magnus Hagander <[EMAIL PROTECTED]> writes:
> Do you happen to have a 2.2 around so you can see what happens there? Or
> does someone else have that? So I know which version to test against...

2.2 and 2.3 seem to use _MSC_VER in the same way.  I had occasion to
test both last fall, and they generate only trivially different
outputs from our grammar.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: CREATE INDEX and HOT (was [HACKERS] Question: pg_classattributes and race conditions ?)

2007-03-17 Thread Simon Riggs

On Sat, 2007-03-17 at 11:45 -0400, Tom Lane wrote:
> "Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> > While creating an index, if a HEAP_ONLY tuple is found,
> > CREATE INDEX [CONCURRENTLY] fails with an error and the
> > user needs to SET HOT OFF and then try again. While turning
> > HOT off, the entire table is CHILLed, holding AccessExclusive
> > lock on the table. Once the new index is created, user
> > can turn HOT on again.
> 
> It hardly seems acceptable to require exclusive lock to chill a table.
> In production situations, knowing that you'd have to do that to do
> index maintenance on a large table would probably scare you off of
> ever enabling the feature at all.  Last year we were getting beaten up
> about how it wasn't acceptable for CREATE INDEX to lock out writes
> for a long time; how is it suddenly acceptable to need to lock out
> both reads and writes for a long time before you can even think
> about creating an index?

I agree with you: It isn't acceptable for us to contemplate an
AccessExclusiveLock before we can build any index.

We *must* make CREATE INDEX CONCURRENTLY work with HOT. The good news is
I think we can without significant difficulty.

The problems are with CREATE INDEX, in some cases. I regret that I did
not see those difficulties until recently, which is why I was concerned
that we spent time on VACUUM FULL rather than this issue. I'm glad now
that you both pressed ahead and solved that though.

As a result of the issues, I think Pavan is playing safe, to make sure
there is *an* option, so that we can build upwards from there. The
proposal is pragmatism only, while we discuss other approaches.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-17 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> The problem with that is that it duplicates effort: in many cases
>> (especially COPY IN) the data's already been validated.

> One thought I had was that it might make sense to have a flag that would 
> inhibit the check, that could be set (and reset) by routines that check 
> for themselves, such as COPY IN. Then bulk load performance should not 
> be hit much.

Actually, I have to take back that objection: on closer look, COPY
validates the data only once and does so before applying its own
backslash-escaping rules.  So there is a risk in that path too.

It's still pretty annoying to be validating the data twice in the
common case where no backslash reduction occurred, but I'm not sure
I see any good way to avoid it.  I don't much want to add another
argument to input functions, and the global flag that you suggest
above seems too ugly/risky.

Would someone do some performance checking on the cost of adding
mbverify to textin()?  If it could be shown that it adds only
negligible overhead to COPY, on say hundred-byte-wide text fields,
then we could decide that this isn't worth worrying about.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: CREATE INDEX and HOT (was [HACKERS] Question: pg_class attributes and race conditions ?)

2007-03-17 Thread Pavan Deolasee

Tom Lane wrote:
> "Pavan Deolasee" <[EMAIL PROTECTED]> writes:
>> While creating an index, if a HEAP_ONLY tuple is found,
>> CREATE INDEX [CONCURRENTLY] fails with an error and the
>> user needs to SET HOT OFF and then try again. While turning
>> HOT off, the entire table is CHILLed, holding AccessExclusive
>> lock on the table. Once the new index is created, user
>> can turn HOT on again.
>
> It hardly seems acceptable to require exclusive lock to chill a table.
> In production situations, knowing that you'd have to do that to do
> index maintenance on a large table would probably scare you off of
> ever enabling the feature at all.  Last year we were getting beaten up
> about how it wasn't acceptable for CREATE INDEX to lock out writes
> for a long time; how is it suddenly acceptable to need to lock out
> both reads and writes for a long time before you can even think
> about creating an index?
>

Yeah, I agree. I was proposing this as a stop-gap solution though.
Something which would help us solve the problem without changing
the current behavior for non-HOT tables.

So what do you suggest ? Do you feel that there is no way we can
solve the problem ?

ISTM that if we run CHILL as a seperate transaction (just like
VACUUM), we should be able to CHILL the table with
ShareUpdateExclusiveLock. Running it as a seperate transaction
would reduce the risk of causing deadlocks. Is that a fair
assessment ?

If we need to CHILL with ShareUpdateExclusiveLock, IMHO we
would again be back to something similar to the first approach.
I know you said its fragile and full of race conditions, but
do you think we can handle it better if we have a seperate
DDL command, running within its own transaction ?

The algorithm would look like:

1. Disable HOT-updates
2. CHILL the table by inserting appropriate index entries and
  marking tuples CHILL_IN_PROGRESS
3. Establish a point when there are no open index scans
4. Disable HOT-fetches
5. Scan the heap again, reset CHILL_IN_PROGRESS, HEAP_ONLY
  and HOT_UPDATED flags
6. Establish a point when there are no open index scans
7. Enable HOT-fetches
8. Enable HOT-updates

I need help to do the step 1,3,4,6,7 and 8 (well all :-))
in a deadlock and race condition free mannner. Any
suggestions ? Where do we keep the global state about
HOT-updates/HOT-fetches ? If we keep it in pg_class, a
crash of the CHILL command or the server may leave the
pg_class row in a stale state. That does not look like
a problem though. In the worst case, we might not be
able to do HOT-updates without manual intervention.

Again comments, suggestions ? I really appreciate
everyone's time and patience. Help is what I need to
solve this problem.

Thanks,
Pavan

--

EnterpriseDBhttp://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: CREATE INDEX and HOT (was [HACKERS] Question: pg_classattributes and race conditions ?)

2007-03-17 Thread Pavan Deolasee

Simon Riggs wrote:
>
> As a result of the issues, I think Pavan is playing safe, to make sure
> there is *an* option, so that we can build upwards from there. The
> proposal is pragmatism only, while we discuss other approaches.
>

Absolutely true. I agree that CHILLing the table with AccessExclusive
lock is not a solution that I like. I was looking at it just as a
stop-gap solution, given the time left for feature freeze.
But if there is something better that we can do, I am for it.

Thanks,
Pavan

--

EnterpriseDBhttp://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-17 Thread Tom Lane

I wrote:
> Actually, I have to take back that objection: on closer look, COPY
> validates the data only once and does so before applying its own
> backslash-escaping rules.  So there is a risk in that path too.

> It's still pretty annoying to be validating the data twice in the
> common case where no backslash reduction occurred, but I'm not sure
> I see any good way to avoid it.

Further thought here: if we put encoding verification into textin()
and related functions, could we *remove* it from COPY IN, in the common
case where client and server encodings are the same?  Currently, copy.c
forces a trip through pg_client_to_server for multibyte encodings
even when the encodings are the same, so as to perform validation.
But I'm wondering whether we'd still need that.  There's no risk of
SQL injection in COPY data.  Bogus input encoding could possibly
make for confusion about where the field boundaries are, but bad
data is bad data in any case.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] xpath_array with namespaces support

2007-03-17 Thread Andrew Dunstan




Nikolay Samokhvalov wrote:

On 3/17/07, Andrew Dunstan <[EMAIL PROTECTED]> wrote:

In principle I am in favor of the patch.

Would it be better to use some more unlikely name for the dummy root
element used to process fragments than  ?

Perhaps even something in a special namespace?



I did think about it, but I didn't find any difficulties with simple
 The thing is that regardless the element name we have
corresponding shift in XPath epression -- so, there cannot be any
problem from my point of view... But maybe I don't see something and
it's better to avoid _possible_ problem. It depends on PostgreSQL code
style itself -- what is the best approach in such cases? To avoid
unknown possible difficulties or to be clear?



If you are sure that it won't cause a problem then I think it's ok to 
leave it, as long as there is a comment in the code that says why we are 
sure it's ok.


cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Project suggestion: benchmark utility for PostgreSQL

2007-03-17 Thread Mickael DELOISON


Hello everybody.

I am a student and I would like to have your opinion on a project I
plan to submit for GSoC. At school, when I work with relational
databases I have problems to test tables' structure and queries
because I need to insert test data manually, which is very unpleasant.
Therefore, I suggest creating a benchmark utility for PostgreSQL.
This utility would be divided in two parts: first part is the **random
data generator** to fill in the tables with test data, and second part
is a **test module** in which we can execute a set of queries and
analyze response time and results.

The random data generator would be able to generate data of every type
(from 8.1 Numeric Types to 8.10 Arrays in official documentation).
When starting, the program connects to a database, retrieves the
tables and proposes adding random data to the user. The user uses
pattern and dictionaries to make those generators work.

The test module would consist in a sort of unit tests module. The user
adds queries and possibly sets the expected results of those queries.
The module executes the queries and gives for each one the results,
the time it took to execute, and if the results are those which were
expected.

And finally, I did not talk about that before. But I think it would be
a great idea for a complete benchmark utility... if it would give some
advices to optimize the tables' structure (columns' type, indexes,
foreign keys). Of course, those advices would be for a novice user as
expert already knows how to optimize the columns types, the joins...

For a programming language, as it would be for GSoC, it has to be
realized in three month and I believe the utility has to be
cross-platform (anyway I want it to be). So I think Java would be
good. I am very used to Java and Swing programming. What do you think
about that choice? If you feel Java is a bad choice, there is
C++/Boost/wxWidget/ (like pgAdmin). But with wxWidget, I am not sure
if a GUI works under Windows and Linux it will work under MacOS
without hacks.

Thank you for having read me until there. I am very excited about such
project. I wait for your comments.

Best regards,
Mickael Deloison

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: CREATE INDEX and HOT (was [HACKERS] Question: pg_classattributes and race conditions ?)

2007-03-17 Thread Simon Riggs

On Sat, 2007-03-17 at 23:11 +0530, Pavan Deolasee wrote:

> The algorithm would look like:
> 
> 1. Disable HOT-updates

The lock held by CREATE INDEX already prevents HOT updates. So steps 1
and 8 aren't needed.

We need to be clear that we already have a solution to CREATE INDEX
CONCURRENTLY. Do you agree that we do? Does anyone see a problem with
the posted design for that?

Hopefully it is only CREATE INDEX that we need to think about.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: CREATE INDEX and HOT (was [HACKERS] Question: pg_classattributes and race conditions ?)

2007-03-17 Thread Pavan Deolasee

Simon Riggs wrote:
>
> We need to be clear that we already have a solution to CREATE INDEX
> CONCURRENTLY. Do you agree that we do? Does anyone see a problem with
> the posted design for that?
>
> Hopefully it is only CREATE INDEX that we need to think about.
>

I agree. Lets first decide whether its only CREATE INDEX that
needs solution or its both.

Lets also decide whether we want to fix CREATE INDEX
[CONCURRENTLY] or we want to provide a seperate DLL to
CHILL the table and then build index normally. Tom
has already rejected the idea of holding exclusive lock
while chilling, but if we get around that, is rest of
the approach acceptable ?

If its going to be a seperate DLL, lets decide whether
its acceptable to run it as a seperate transaction.

Thanks,
Pavan

--

EnterpriseDBhttp://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bison 2.1 on win32

2007-03-17 Thread Hiroshi Saito


Hi.

I was operating in a tentative way by 2.3.
Still, it is not sufficient. However, it moves.
I will think that I am glad, if it can adjust with Magnus.
http://winpg.jp/~saito/MinGW/bison-2.3_win32_src.tgz

Regards,
Hiroshi Saito

- Original Message - 
From: "Tom Lane" <[EMAIL PROTECTED]>

To: "Magnus Hagander" <[EMAIL PROTECTED]>
Cc: "PGSQL Hackers" 
Sent: Sunday, March 18, 2007 2:06 AM
Subject: Re: [HACKERS] Bison 2.1 on win32 




Magnus Hagander <[EMAIL PROTECTED]> writes:

Do you happen to have a 2.2 around so you can see what happens there? Or
does someone else have that? So I know which version to test against...


2.2 and 2.3 seem to use _MSC_VER in the same way.  I had occasion to
test both last fall, and they generate only trivially different
outputs from our grammar.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-17 Thread Andrew Dunstan




Tom Lane wrote:

I wrote:
  

Actually, I have to take back that objection: on closer look, COPY
validates the data only once and does so before applying its own
backslash-escaping rules.  So there is a risk in that path too.



  

It's still pretty annoying to be validating the data twice in the
common case where no backslash reduction occurred, but I'm not sure
I see any good way to avoid it.



Further thought here: if we put encoding verification into textin()
and related functions, could we *remove* it from COPY IN, in the common
case where client and server encodings are the same?  Currently, copy.c
forces a trip through pg_client_to_server for multibyte encodings
even when the encodings are the same, so as to perform validation.
But I'm wondering whether we'd still need that.  There's no risk of
SQL injection in COPY data.  Bogus input encoding could possibly
make for confusion about where the field boundaries are, but bad
data is bad data in any case.

regards, tom lane

  



Here are some timing tests in 1m rows of random utf8 encoded 100 char 
data. It doesn't look to me like the saving you're suggesting is worth 
the trouble.


baseline:

Time: 28228.325 ms
Time: 25987.740 ms
Time: 25950.707 ms
Time: 25756.371 ms
Time: 27589.719 ms
Time: 25774.417 ms


after adding suggested extra test to textin():


Time: 26722.376 ms
Time: 28343.226 ms
Time: 26529.364 ms
Time: 28020.140 ms
Time: 24836.853 ms
Time: 24860.530 ms


Script is:

\timing
create table xyz (x text);
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
drop table xyz;


Test platform: FC6, Athlon64.


cheers

andrew


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Project suggestion: benchmark utility for PostgreSQL

2007-03-17 Thread Florian G. Pflug


Mickael DELOISON wrote:

For a programming language, as it would be for GSoC, it has to be
realized in three month and I believe the utility has to be
cross-platform (anyway I want it to be). So I think Java would be
good. I am very used to Java and Swing programming. What do you think
about that choice? If you feel Java is a bad choice, there is
C++/Boost/wxWidget/ (like pgAdmin). But with wxWidget, I am not sure
if a GUI works under Windows and Linux it will work under MacOS
without hacks.


I just wanted to comment on the wxWidgest-under-OSX part. I works
pretty well, but of course there are some bugs in wxWidgets that
plague only a particular plattform. And since OSX seems to be one
of the lesser used one, there appear to be more bugs of this kind.

But if you grep through the pgAdmin3 sources, you'll find only
little OSX-specific code.

One nice thing about wxWidgets is that applications that use if
feel like native applications on the platforms that wxWidgets
supports. Java-GUIs often feel rather alien, at least IMHO.

greetings, Florian Pflug


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: CREATE INDEX and HOT (was [HACKERS] Question:pg_classattributes and race conditions ?)

2007-03-17 Thread Simon Riggs

On Sun, 2007-03-18 at 00:44 +0530, Pavan Deolasee wrote:
> Simon Riggs wrote:
>  >
>  > We need to be clear that we already have a solution to CREATE INDEX
>  > CONCURRENTLY. Do you agree that we do? Does anyone see a problem with
>  > the posted design for that?

If we have solved CREATE INDEX CONCURRENTLY, then I would propose that
this becomes the default option for creating an index, when the
statement is issued outside of a statement block.

That seems better than reminding everybody to run with the CONCURRENTLY
option, or advise them of different performance characteristics or
behaviour of the normal CREATE INDEX route.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] Bitmapscan changes

2007-03-17 Thread Grzegorz Jaskiewicz

This is on dual ultra 2 sparc. with ultrawide 320 scsi drives. 512MB  
ram.
I had to drop size of DB, because the DB drive is 4GB (I do welecome  
bigger drives as donation, if someone asks - UWscsi 320).


here are my results. With only 4.2 patch (no maintain cluster order  
v5 patch). If the v5 patch was needed, please tell me - I am going  
rerun it with.


hope it is usefull.

Repeat 3 times to ensure repeatable results.
Timing is on.
select_with_normal_index
--
   10
(1 row)

Time: 1727891.334 ms
select_with_normal_index
--
   10
(1 row)

Time: 1325561.252 ms
select_with_normal_index
--
   10
(1 row)

Time: 1348530.100 ms
Timing is off.
And now run the same tests with clustered index
Timing is on.
select_with_clustered_index
-
  10
(1 row)

Time: 870246.856 ms
select_with_clustered_index
-
  10
(1 row)

Time: 477089.456 ms
select_with_clustered_index
-
  10
(1 row)

Time: 381880.965 ms
Timing is off.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-17 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Here are some timing tests in 1m rows of random utf8 encoded 100 char 
> data. It doesn't look to me like the saving you're suggesting is worth 
> the trouble.

Hmm ... not sure I believe your numbers.  Using a test file of 1m lines
of 100 random latin1 characters converted to utf8 (thus, about half and
half 7-bit ASCII and 2-byte utf8 characters), I get this in SQL_ASCII
encoding:

regression=# \timing
Timing is on.
regression=# create temp table test(f1 text);
CREATE TABLE
Time: 5.047 ms
regression=# copy test from '/home/tgl/zzz1m';
COPY 100
Time: 4337.089 ms

and this in UTF8 encoding:

utf8=# \timing
Timing is on.
utf8=# create temp table test(f1 text);
CREATE TABLE
Time: 5.108 ms
utf8=# copy test from '/home/tgl/zzz1m';
COPY 100
Time: 7776.583 ms

The numbers aren't super repeatable, but it sure looks to me like the
encoding check adds at least 50% to the runtime in this example; so
doing it twice seems unpleasant.

(This is CVS HEAD, compiled without assert checking, on an x86_64
Fedora Core 6 box.)

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-17 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  
Here are some timing tests in 1m rows of random utf8 encoded 100 char 
data. It doesn't look to me like the saving you're suggesting is worth 
the trouble.



Hmm ... not sure I believe your numbers.  Using a test file of 1m lines
of 100 random latin1 characters converted to utf8 (thus, about half and
half 7-bit ASCII and 2-byte utf8 characters), I get this in SQL_ASCII
encoding:

regression=# \timing
Timing is on.
regression=# create temp table test(f1 text);
CREATE TABLE
Time: 5.047 ms
regression=# copy test from '/home/tgl/zzz1m';
COPY 100
Time: 4337.089 ms

and this in UTF8 encoding:

utf8=# \timing
Timing is on.
utf8=# create temp table test(f1 text);
CREATE TABLE
Time: 5.108 ms
utf8=# copy test from '/home/tgl/zzz1m';
COPY 100
Time: 7776.583 ms

The numbers aren't super repeatable, but it sure looks to me like the
encoding check adds at least 50% to the runtime in this example; so
doing it twice seems unpleasant.
 
(This is CVS HEAD, compiled without assert checking, on an x86_64

Fedora Core 6 box.)

  


Are you comparing apples with apples? The db is utf8 in both of my cases.

cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] Bitmapscan changes

2007-03-17 Thread Luke Lonergan

Wow, nice!

Can you tell us:
- how big is the table
- cardinality of the column
- how big is the index in each case
- how much memory on the machine
- query and explain analyze

Thanks!

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Grzegorz Jaskiewicz [mailto:[EMAIL PROTECTED]
Sent:   Saturday, March 17, 2007 05:16 PM Eastern Standard Time
To: Joshua D.Drake
Cc: Heikki Linnakangas; PostgreSQL-development Hackers
Subject:Re: [HACKERS] [PATCHES] Bitmapscan changes

This is on dual ultra 2 sparc. with ultrawide 320 scsi drives. 512MB  
ram.
I had to drop size of DB, because the DB drive is 4GB (I do welecome  
bigger drives as donation, if someone asks - UWscsi 320).

here are my results. With only 4.2 patch (no maintain cluster order  
v5 patch). If the v5 patch was needed, please tell me - I am going  
rerun it with.

hope it is usefull.

Repeat 3 times to ensure repeatable results.
Timing is on.
select_with_normal_index
--
10
(1 row)

Time: 1727891.334 ms
select_with_normal_index
--
10
(1 row)

Time: 1325561.252 ms
select_with_normal_index
--
10
(1 row)

Time: 1348530.100 ms
Timing is off.
And now run the same tests with clustered index
Timing is on.
select_with_clustered_index
-
   10
(1 row)

Time: 870246.856 ms
select_with_clustered_index
-
   10
(1 row)

Time: 477089.456 ms
select_with_clustered_index
-
   10
(1 row)

Time: 381880.965 ms
Timing is off.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PATCHES] Bitmapscan changes

2007-03-17 Thread Grzegorz Jaskiewicz



On Mar 17, 2007, at 10:33 PM, Luke Lonergan wrote:


Wow, nice!

Can you tell us:
- how big is the table
- cardinality of the column
- how big is the index in each case
- how much memory on the machine
- query and explain analyze



All I changed, was the 400k to 150k
512MB of ram, as I said earlier. And it is running 64bit kernel,  
32bit user-land on linux 2.6.20


query and explain is going to run for a while, so I'll leave it - as  
it is going to be the same on other machines (much faster ones).
postgres=#  select pg_size_pretty( pg_relation_size 
( 'narrowtable_index' ) );

pg_size_pretty

321 MB
(1 row)

postgres=#  select pg_size_pretty( pg_relation_size 
( 'narrowtable2_clustered_index' ) );

pg_size_pretty

3960 kB
(1 row)

(so there's quite a difference).

Judging from noises coming out of machine, there was pretty loads of  
I/O activity. and funny enough, one CPU was stucked on 'wait' up to  
80% most of the time.


the 'cardinality', as I guess, uniqueness is the same as intended in  
original test. Like I said, only table size was changed.


select count(distinct key) from narrowtable; and select count(*) from  
narrowtable; are the same - 1500



hth.


--
Grzegorz Jaskiewicz

C/C++ freelance for hire






---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-17 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  
Here are some timing tests in 1m rows of random utf8 encoded 100 char 
data. It doesn't look to me like the saving you're suggesting is worth 
the trouble.



Hmm ... not sure I believe your numbers.  Using a test file of 1m lines
of 100 random latin1 characters converted to utf8 (thus, about half and
half 7-bit ASCII and 2-byte utf8 characters), I get this in SQL_ASCII
encoding:

regression=# \timing
Timing is on.
regression=# create temp table test(f1 text);
CREATE TABLE
Time: 5.047 ms
regression=# copy test from '/home/tgl/zzz1m';
COPY 100
Time: 4337.089 ms

and this in UTF8 encoding:

utf8=# \timing
Timing is on.
utf8=# create temp table test(f1 text);
CREATE TABLE
Time: 5.108 ms
utf8=# copy test from '/home/tgl/zzz1m';
COPY 100
Time: 7776.583 ms

The numbers aren't super repeatable, but it sure looks to me like the
encoding check adds at least 50% to the runtime in this example; so
doing it twice seems unpleasant.

(This is CVS HEAD, compiled without assert checking, on an x86_64
Fedora Core 6 box.)


  


Here are some test results that are closer to yours. I used a temp table 
and had cassert off and fsync off, and tried with several encodings.


The additional load from the test isn't 50%, (I think you have added the 
cost of going from ascii to utf8 to the cost of the test to get that 
50%) but it is  nevertheless appreciable.


I agree that we should look at not testing if the client and server 
encodings are the same, so we can reduce the difference.


cheers

andrew

   Run SQL_ASCII LATIN1  UTF8


 1   4659.38 4766.07  9134.53

 2   7999.64 4003.13  6231.41

 3   4178.46 6178.89  7266.39

  Without test   44201.7 3930.84 10154.38

 5   4092.44 .52  9438.24

 6   3977.34 4197.09  8866.56

   Average   4851.49 4586.76  8515.25



 1  11993.86 12625.8 10109.89

 2   4647.16 9192.53 11251.27

  With test  3   4211.02 9903.77 10097.37

 4   9203.62 7045.06 10372.25

 5   4121.39 4138.78 10386.92

 6   3722.73 4552.09  7432.56

   Average   6316.63 7909.67  9941.71





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

40 matches

Mail list logo