Re: [HACKERS] ANSI-strict pointer aliasing rules

2006-04-27 Thread Taral
On 4/27/06, Zeugswetter Andreas DCP SD [EMAIL PROTECTED] wrote:
 Can you please explain what exactly was not working ?
 xlc has in the past shown warnings that were actually problematic code
 that gcc did not show (and the cc variant of xlc also does not show).

This has nothing to do with warnings. With xlc version 6, this code:

Value *
makeString(char *str)
{
Value  *v = makeNode(Value);

v-type = T_String;
v-val.str = str;
return v;
}

Will return objects whose type field is T_Value (650), because the
compiler reorders the assignment that makeNode makes with that of the
main function.

--
Taral [EMAIL PROTECTED]
You can't prove anything.
-- Gödel's Incompetence Theorem

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ANSI-strict pointer aliasing rules

2006-04-27 Thread Taral
On 4/27/06, Tom Lane [EMAIL PROTECTED] wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  There are other ways of achieving the same thing. Structs containing a union
  for the subclass fields for example.

 Doesn't achieve the same thing, unless you mandate that every part of
 the system use the identical massively-overloaded union struct to refer
 to every node.

If we do subclassing like this:

struct Node { ... };
struct Value { struct Node; ... };
etc.

do we still run into the alias problem?

--
Taral [EMAIL PROTECTED]
You can't prove anything.
-- Gödel's Incompetence Theorem

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ANSI-strict pointer aliasing rules

2006-04-27 Thread Taral
On 4/27/06, Taral [EMAIL PROTECTED] wrote:
 If we do subclassing like this:

 struct Node { ... };
 struct Value { struct Node; ... };
 etc.

 do we still run into the alias problem?

Nope, it appears to get rid of the alias problem completely. But it
requires anonymous structure support (C99?) to work without changing
anything other than headers.

As a bonus, if we ever change Node, we don't have to update any other
structures...

--
Taral [EMAIL PROTECTED]
You can't prove anything.
-- Gödel's Incompetence Theorem

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ANSI-strict pointer aliasing rules

2006-04-27 Thread Taral
On 27 Apr 2006 15:25:45 -0400, Greg Stark [EMAIL PROTECTED] wrote:
 It would be pretty cool to have a type-safe codebase. It just seems like too
 an awful lot of work for a mostly aesthetic improvement.

Does anyone have some benchmarks I can run? I can run tests to see if
this aliasing makes a noticeable difference or not...

--
Taral [EMAIL PROTECTED]
You can't prove anything.
-- Gödel's Incompetence Theorem

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] ANSI-strict pointer aliasing rules

2006-04-26 Thread Taral
I ran afoul of these rules the other day when compiling pgsql 8.1 on
AIX. The configure scripts are set up to look for xlc instead of
cc, and that command invokes cc with -qalias=ansi, the ANSI-strict
pointer aliasing mode.

Specifically, in this mode, the compiler assumes that accesses via
pointers of different types never alias. Unfortunately, this breaks
all of the Value construction code, because we end up with (for
example):

Node *n = palloc0fast(...);
n-tag = T_VALUE;
Value *v = (Value *) n;
v-tag = T_STRING;

And aggressive compiler optimization can reorder these two tag
assignments, resulting in the bizarre Unrecognized node type: 650
message.

The fix is one of two things:

1. Change the tag element of structures to be a Node, and access the
tag via it: Major code change, allows Node to change in the future for
instrumentation et cetera.
2. Make the makeNode macro cast to the derived structure before
assigning the tag: Minor code change, makes assumptions about derived
structures.
3. Get configure to select cc instead of xlc: No code change,
loses some performance.

--
Taral [EMAIL PROTECTED]
You can't prove anything.
-- Gödel's Incompetence Theorem

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ANSI-strict pointer aliasing rules

2006-04-26 Thread Taral
On 4/26/06, Martijn van Oosterhout kleptog@svana.org wrote:
 Well, there are a number of fixes, it's questionable whether it's worth
 the effort. In GCC you can mark structures that are aliased to avoid
 the problem (attribute((may_alias)) iirc), but we don't do that.

There's also C99's restrict.

 4. Find the option for disabling strict alias and get configure to add
 that.

You'll still lose performance, but the option is -qalias=noansi.

--
Taral [EMAIL PROTECTED]
You can't prove anything.
-- Gödel's Incompetence Theorem

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: No index maximum? (was Re: [HACKERS] No merge sort?)

2003-03-23 Thread Taral
On Mon, Mar 17, 2003 at 11:23:47AM -0600, Taral wrote:
 Yes, that's exactly it. It's an index _scan_. It should simply be able
 to read the maximum straight from the btree.

Still doesn't work, even with rewritten query. It sort a
Limit(Sort(Index Scan)), with 1333 rows being pulled from the index.

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


Re: No index maximum? (was Re: [HACKERS] No merge sort?)

2003-03-17 Thread Taral
On Sat, Mar 15, 2003 at 09:23:28AM -0600, Bruno Wolff III wrote:
 On Fri, Mar 14, 2003 at 14:19:46 -0600,
   Taral [EMAIL PROTECTED] wrote:
  Same setup, different query:
  
  test= explain select max(time) from test where id = '1';
  NOTICE:  QUERY PLAN:
  
  Aggregate  (cost=5084.67..5084.67 rows=1 width=0)
-  Index Scan using idx on test  (cost=0.00..5081.33 rows=1333 width=0)
  
  Since the index is (id, time), why isn't the index being used to
  retrieve the maximum value?
 
 It looks like an index scan is being done.
 
 If the index was on (time, id) instead of (id, time), then you could get
 a further speed up by rewriting the query as:
 select time from test where id = '1' order by time desc limit 1;

Yes, that's exactly it. It's an index _scan_. It should simply be able
to read the maximum straight from the btree.

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


No index maximum? (was Re: [HACKERS] No merge sort?)

2003-03-14 Thread Taral
Same setup, different query:

test= explain select max(time) from test where id = '1';
NOTICE:  QUERY PLAN:

Aggregate  (cost=5084.67..5084.67 rows=1 width=0)
  -  Index Scan using idx on test  (cost=0.00..5081.33 rows=1333 width=0)

Since the index is (id, time), why isn't the index being used to
retrieve the maximum value?

On Thu, Mar 13, 2003 at 03:10:49PM -0600, Taral wrote:
 I have a table test that looks like this:
 
 CREATE TABLE test (
 id BIGINT,
 time INTEGER
 );
 
 There is an index:
 
 CREATE INDEX idx ON test(id, time);
 
 The table has been loaded with 2M rows, where time ranges sequentially
 from 0 to 199 and id is random values from 0 to 4.

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


Re: [HACKERS] No merge sort?

2003-03-14 Thread Taral
On Fri, Mar 14, 2003 at 10:43:30PM -0500, Tom Lane wrote:
 Sure.  That's why we have a planner that distinguishes between startup
 cost and total cost, and interpolates when a LIMIT is involved.  But
 if this mergesort idea only helps for small-limit cases, that's another
 restriction on its scope of usefulness...

I don't think so, since even in the non-limit case it avoids having to
do a full sort if the number of initial streams is finite and small (as
in the case I demonstrated), reducing time complexity to O(N).

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


[HACKERS] No merge sort?

2003-03-13 Thread Taral
I tried general, but no response. Anyone here can shed some light on the
issue? Do I need to code merge sort into postgresql?

- Forwarded message from Taral [EMAIL PROTECTED] -

From: Taral [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Date: Wed, 12 Mar 2003 17:54:35 -0600
Subject: [GENERAL] No merge sort?
Message-ID: [EMAIL PROTECTED]

I have a table test that looks like this:

CREATE TABLE test (
id BIGINT,
time INTEGER
);

There is an index:

CREATE INDEX idx ON test(id, time);

The table has been loaded with 2M rows, where time ranges sequentially
from 0 to 199 and id is random values from 0 to 4.

This query:

SELECT * FROM idx WHERE id IN (...) AND time  198000 AND time  199800
ORDER BY time DESC LIMIT 20;

has an EXPLAIN ANALYZE of:

Limit  (cost=3635.28..3635.28 rows=20 width=12) (actual time=22.94...22.96 rows=14 
loops=1)
  -  Sort  (cost=3635.28..3635.28 rows=23 width=12) (actual time=22.93..22.93 rows=14 
loops=1)
-  Index Scan using idx, idx, ..., idx, idx on test  (cost=0.00...3634.77 
rows=23 width=12) (actual time=1.01..22.10 rows=14 loops=1)
Total runtime: 29.12 msec

This query:

SELECT * FROM idx WHERE id IN (...) AND time  199800 ORDER BY time DESC
LIMIT 20;

has an EXPLAIN ANALYZE of:

Limit  (cost=14516.46..14516.46 rows=20 width=12) (actual time=1448..83..1448.86 
rows=20 loops=1)
  -  Sort  (cost=14516.46..14516.46 rows=2527 width=12) (actual time=1448.82..1448.83 
rows=21 loops=1)
-  Index Scan using idx, idx, ..., idx, idx on test  (cost=0.00...14373.67 
rows=2527 width=12) (actual time=0.14..1437.33 rows=2048 loops=1)
Total runtime: 1454.62 msec

Since the index will output 'time' sorted data for each 'id', why isn't
a merge sort being used here? A merge sort would reduce the execution
time back to 30 ms.

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


Re: [HACKERS] No merge sort?

2003-03-13 Thread Taral
On Thu, Mar 13, 2003 at 04:28:34PM -0500, Tom Lane wrote:
 Seems like a waste of effort to me.  I find this example less than
 compelling --- the case that could be sped up is quite narrow,
 and the potential performance gain not all that large.  (A sort
 is a sort however you slice it, with O(N log N) runtime...)

Actually, it's O(N) time. The index can produce time sorted data for
each id in linear time, and the merge sort can merge them in linear
time. Also, the existing system insists on loading _all_ candidate rows
whereas this method can benefit from the limit.

If you don't want to code it, I will. I need it for the livejournal
mysql-postgresql transition. (No, mysql doesn't do it right either.)
But a few pointers to the right places to look in the code would be
helpful.

 Also, the direction we'd likely be going in in future is to merge
 multiple indexscans using bitmap techniques, so that the output
 ordering of the scans couldn't be counted on anyway.

I don't understand this. What do these bitmap techniques do?

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


Re: [HACKERS] No merge sort?

2003-03-13 Thread Taral
On Thu, Mar 13, 2003 at 10:30:27PM -0500, Tom Lane wrote:
 The idea is you look at the index to make a list of main-table tuple
 positions you are interested in, which you represent compactly as a
 compressed bitmap.  (There is some finagling needed because PG actually
 uses block/line number rather than a pure tuple number to identify
 tuples, but you can fake it with a reasonably small amount of overhead.)
 Then you can combine multiple index inputs by ANDing or ORing bitmaps
 (the OR case applies to your example).  Finally, you traverse the heap,
 accessing the desired rows in heap-location order.  This loses in terms
 of producing presorted output --- but it often saves enough in I/O costs
 to more than justify doing the sort in memory.

And it loses bigtime in the case of LIMIT. If the unlimited query
returns 4,000 records and I only want 20, you're retrieving 200x too
much data from disk.

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature