[PATCHES] pg_next_dst_boundary optimization

2007-09-19 Thread Heikki Linnakangas
While profiling loading a table with timestamp columns with COPY, I
spotted that a lot of time is spent in pg_next_dst_boundary. It scans a
sorted array of time segments to find the one that the given timestamp
falls within. That's slow when the array is big; GB timezone for example
has  200 segments.

Attached is a patch to use binary search instead of linear search. On my
laptop, it reduces the time required to load million rows to this test
table:

CREATE TABLE timestamptable (
  id int,
  ts1 timestamp,
  ts2 timestamp,
  ts3 timestamp,
  ts4 timestamp
);

from ~16.4s to ~11.9s.

It would be nice to slip this into 8.3...

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/timezone/localtime.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/timezone/localtime.c,v
retrieving revision 1.17
diff -c -r1.17 localtime.c
*** src/timezone/localtime.c	4 Aug 2007 19:29:25 -	1.17
--- src/timezone/localtime.c	18 Sep 2007 13:41:03 -
***
*** 1008,1013 
--- 1008,1014 
  	int			i;
  	int			j;
  	const pg_time_t t = *timep;
+ 	int high, low;
  
  	sp = tz-state;
  	if (sp-timecnt == 0)
***
*** 1056,1064 
  		return 1;
  	}
  	/* Else search to find the containing segment */
! 	for (i = 1; i  sp-timecnt; ++i)
! 		if (t = sp-ats[i])
! 			break;
  	j = sp-types[i - 1];
  	ttisp = sp-ttis[j];
  	*before_gmtoff = ttisp-tt_gmtoff;
--- 1057,1074 
  		return 1;
  	}
  	/* Else search to find the containing segment */
! 	low = 1;
! 	high = sp-timecnt - 1;
! 	while (high  low)
! 	{
! 		int mid = low + ((high - low) / 2);
! 		if (t  sp-ats[mid])
! 			low = mid + 1;
! 		else
! 			high = mid;
! 	}
! 	i = high;
! 
  	j = sp-types[i - 1];
  	ttisp = sp-ttis[j];
  	*before_gmtoff = ttisp-tt_gmtoff;

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] pg_next_dst_boundary optimization

2007-09-19 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Has anyone looked what other changes there's been to the code in tz
 library?

Yeah, we need to re-sync that code --- one big item we are missing is
support for 64-bit timezone data (hence, tz info beyond 2038).  I don't
really want to touch it for 8.3 though.  There's been enough code drift
on our side (mostly from well-meaning activities like pg_indent and
ANSI-fying function headers) that a merge is going to be a bit painful,
and risk introducing some bugs.  We should hold that for a fresh
devel cycle rather than try to cram it in at the last minute.

 Which version of tz is our code based on?

Whatever was current when we imported the code (4/2004 looks like).

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match