Re: [PATCHES] strpos() KMP

2007-08-10 Thread Pavel Ajtkulov
Tom Lane writes:

 hash table?

 I'd think the cost of hashing would render it impractical.  Most of the
 papers I've seen on this topic worry about getting single instructions
 out of the search loop --- a hash lookup will cost lots more than that.
 Moreover, you'd lose the guarantee of not-worse-than-linear time,
 because hash lookup can be pathologically bad if you get a lot of hash
 collisions.

compute max_wchar, min_wchar. If (d = max_wchar - min_wchar)  k (for
example, k = 1000), then we use index table (wchar - wchar -
min_wchar). Else we use hash table. Number of collisions would be a
few (because hash table needs for pattern characters only. Characters
located serially, hash function = whchar % const).

 The main difficulty with BM is verification and understanding good
 suffix shift (the second part of BM) (I don't understand it entirely).

 Yeah, there seem to be a bunch of variants of BM (many of them not
 guaranteed linear, which I'm sure we don't want) and the earliest
 papers had bugs.  But a good implementation would be a lot easier
 sell because it would show benefits for a much wider set of use-cases
 than KMP.

Is there requirement for some string mathching algorithms/data
structure(suffix array/tree) in PG? or We've
had no complaints about the speed of those functions.



Ajtkulov Pavel
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Reduce the size of PageFreeSpaceInfo on 64bit platform

2007-08-10 Thread Decibel!
On Fri, Aug 10, 2007 at 10:32:35AM +0900, ITAGAKI Takahiro wrote:
 Here is a patch to reduce the size of PageFreeSpaceInfo on 64bit platform.
 We will utilize maintenance_work_mem twice with the patch.
 
 The sizeof(PageFreeSpaceInfo) is 16 bytes there because the type of 'avail'
 is 'Size', that is typically 8 bytes and needs to be aligned in 8-byte bounds.
 I changed the type of the field to uint32. We can store the freespace with
 uint16 at smallest, but the alignment issue throws it away.

So... does that mean that the comment in the config file about 6 bytes
per page is incorrect?
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpTGjLFGE552.pgp
Description: PGP signature


[PATCHES] final CSVlog patch

2007-08-10 Thread Andrew Dunstan


I think this is ready to be committed now.It's been a long and tiresome 
road ;-)


Last-minute comments welcome.

cheers

andrew
Index: doc/src/sgml/config.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.136
diff -c -r1.136 config.sgml
*** doc/src/sgml/config.sgml	4 Aug 2007 01:26:53 -	1.136
--- doc/src/sgml/config.sgml	11 Aug 2007 02:00:58 -
***
*** 2253,2259 
 para
  productnamePostgreSQL/productname supports several methods
   for logging server messages, including
!  systemitemstderr/systemitem and
   systemitemsyslog/systemitem. On Windows, 
   systemitemeventlog/systemitem is also supported. Set this
   parameter to a list of desired log destinations separated by
--- 2253,2259 
 para
  productnamePostgreSQL/productname supports several methods
   for logging server messages, including
!  systemitemstderr/systemitem, systemitemcsvlog/systemitem and
   systemitemsyslog/systemitem. On Windows, 
   systemitemeventlog/systemitem is also supported. Set this
   parameter to a list of desired log destinations separated by
***
*** 2262,2278 
   This parameter can only be set in the filenamepostgresql.conf/
   file or on the server command line.
 /para
/listitem
   /varlistentry
  
!  varlistentry id=guc-redirect-stderr xreflabel=redirect_stderr
!   termvarnameredirect_stderr/varname (typeboolean/type)/term
indexterm
!primaryvarnameredirect_stderr/ configuration parameter/primary
/indexterm
listitem
 para
!  This parameter allows messages sent to applicationstderr/ to be
   captured and redirected into log files.
   This method, in combination with logging to applicationstderr/,
   is often more useful than
--- 2262,2285 
   This parameter can only be set in the filenamepostgresql.conf/
   file or on the server command line.
 /para
+para If varnamelog_destination/ is set to systemitemcsvlog/systemitem, 
+  the log is output as comma seperated values. The format is:
+  timestamp with milliseconds, username, database name, session id, host:port number,
+  process id, per process line number, command tag, session start time, transaction id, 
+  error severity, SQL state code, statement/error message. 
+/para
/listitem
   /varlistentry
  
!  varlistentry id=guc-start-log-collector xreflabel=start_log_collector
!   termvarnamestart_log-collector/varname (typeboolean/type)/term
indexterm
!primaryvarnamestart_log_collector/ configuration parameter/primary
/indexterm
listitem
 para
!  This parameter allows messages sent to applicationstderr/,
! 		 and CSV logs, to be
   captured and redirected into log files.
   This method, in combination with logging to applicationstderr/,
   is often more useful than
***
*** 2280,2285 
--- 2287,2293 
   might not appear in applicationsyslog/ output (a common example
   is dynamic-linker failure messages).
   This parameter can only be set at server start.
+ 		 It is required to be on if CSV logs are to be generated.
 /para
/listitem
   /varlistentry
***
*** 2291,2298 
/indexterm
listitem
 para
! When varnameredirect_stderr/ is enabled, this parameter
! determines the directory in which log files will be created.
  It can be specified as an absolute path, or relative to the
  cluster data directory.
  This parameter can only be set in the filenamepostgresql.conf/
--- 2299,2306 
/indexterm
listitem
 para
! When varnamestart_log_collector/ is enabled, 
! this parameter determines the directory in which log files will be created.
  It can be specified as an absolute path, or relative to the
  cluster data directory.
  This parameter can only be set in the filenamepostgresql.conf/
***
*** 2308,2315 
/indexterm
listitem
 para
! When varnameredirect_stderr/varname is enabled, this parameter
! sets the file names of the created log files.  The value
  is treated as a systemitemstrftime/systemitem pattern,
  so literal%/literal-escapes can be used to specify time-varying
  file names.  (Note that if there are
--- 2316,2323 
/indexterm
listitem
 para
! When varnamestart_log_collector/varname is enabled,
! this parameter sets the file names of the created log files.  The value
  is treated as a 

Re: [PATCHES] strpos() KMP

2007-08-10 Thread Tom Lane
Pavel Ajtkulov [EMAIL PROTECTED] writes:
 Tom Lane writes:
 Moreover, you'd lose the guarantee of not-worse-than-linear time,
 because hash lookup can be pathologically bad if you get a lot of hash
 collisions.

 compute max_wchar, min_wchar. If (d = max_wchar - min_wchar)  k (for
 example, k = 1000), then we use index table (wchar - wchar -
 min_wchar). Else we use hash table. Number of collisions would be a
 few (because hash table needs for pattern characters only.

I think you missed my point: there's a significant difference between
guaranteed good performance and probabilistically good performance.
Even when the probably-good algorithm wins for typical cases, there's a
strong argument to be made for guarantees.  The problem you set out to
solve really is that an algorithm that's all right in everyday cases
will suck in certain uncommon cases --- so why do you want to fix it
by just moving around the cases in which it fails to do well?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly