Re: [PATCHES] patch adding new regexp functions

2007-02-17 Thread Mark Dilger

Jeremy Drake wrote:

The regexp_split function code was based on some code that a friend of
mine wrote which used PCRE rather than postgres' internal regexp support.
I don't know exactly what his use-case was, but he probably had
one because he wrote the function and had it returning SETOF text ;)
Perhaps he can share a general idea of what it was (nudge nudge)?


db=# CREATE OR REPLACE FUNCTION split(p TEXT, t TEXT) RETURNS SETOF TEXT AS $$
db$# my ($p, $t) = @_;
db$# return [ split(/$p/,$t) ];
db$# $$ LANGUAGE plperl;
CREATE FUNCTION
Time: 1.254 ms
db=# select distinct word from (select * from split('\\W+','mary had a little 
lamb, whose fleece was black as soot') as word) as ss;

  word

 a
 as
 black
 fleece
 had
 lamb
 little
 mary
 soot
 was
 whose
(11 rows)

Time: 30.517 ms



As you can see, this can easily be done with a plperl function.  Some people may 
not want to install plperl, or may not want to allow arbitrary patterns to be 
handed to perl in this fashion.  That was not my concern.  I was simply trying 
to see if I could make it faster in a C-language coded function.


In the end I dropped the project because the plperl function works fast enough 
for me and I don't have any objection to plperl from a security standpoint, etc.


mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] [SQL] Interval subtracting

2006-03-03 Thread Mark Dilger

Attached is the new patch.  To summarize:

  - new function justify_interval(interval)
  - modified function justify_hours(interval)
  - modified function justify_days(interval)

These functions are defined to meet the requirements as discussed in this 
thread.  Specifically:


  - justify_hours makes certain the sign bit on the hours
matches the sign bit on the days.  It only checks the
sign bit on the days, and not the months, when
determining if the hours should be positive or negative.
After the call, -24  hours  24.

  - justify_days makes certain the sign bit on the days
matches the sign bit on the months.  It's behavior does
not depend on the hours, nor does it modify the hours.
After the call, -30  days  30.

  - justify_interval makes sure the sign bits on all three
fields months, days, and hours are all the same.  After
the call, -24  hours  24 AND -30  days  30.

'make check' passes all tests.  There are no tests for justify_interval, as it 
is new.  But the existing tests for justify_hours and justify_days appear to 
still work, even though the behavior has changed.  Apparently, their test cases 
are not sensitive to the particular changes that have occurred.


I would include new tests in the patch but do not know on which reference 
machine/platform the patches are supposed to be generated.


mark
Index: src/backend/utils/adt/timestamp.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/timestamp.c,v
retrieving revision 1.160
diff --context -r1.160 timestamp.c
*** src/backend/utils/adt/timestamp.c   22 Nov 2005 22:30:33 -  1.160
--- src/backend/utils/adt/timestamp.c   3 Mar 2006 20:23:26 -
***
*** 1975,1980 
--- 1975,2054 
  }
  
  /*
+  *  interval_justify_interval()
+  *
+  *  Adjust interval so 'month', 'day', and 'time' portions are within
+  *  customary bounds.  Specifically:
+  *
+  *0 = abs(time)  24 hours
+  *0 = abs(day)   30 days
+  *
+  *  Also, the sign bit on all three fields is made equal, so either
+  *  all three fields are negative or all are positive.
+  */
+ Datum
+ interval_justify_interval(PG_FUNCTION_ARGS)
+ {
+   Interval   *span = PG_GETARG_INTERVAL_P(0);
+   Interval   *result;
+   
+ #ifdef HAVE_INT64_TIMESTAMP
+   int64   wholeday;
+ #else
+   double  wholeday;
+ #endif
+   int32   wholemonth;
+ 
+   result = (Interval *) palloc(sizeof(Interval));
+   result-month = span-month;
+   result-day = span-day;
+   result-time = span-time;
+ 
+ #ifdef HAVE_INT64_TIMESTAMP
+   TMODULO(result-time, wholeday, USECS_PER_DAY);
+ #else
+   TMODULO(result-time, wholeday, (double) SECS_PER_DAY);
+ #endif
+   result-day += wholeday;/* could overflow... */
+ 
+   wholemonth = result-day / DAYS_PER_MONTH;
+   result-day -= wholemonth * DAYS_PER_MONTH;
+   result-month += wholemonth;
+ 
+   if (result-month  0  result-day  0)
+   {
+   result-day -= DAYS_PER_MONTH;
+   result-month++;
+   }
+   else if (result-month  0  result-day  0)
+   {
+   result-day += DAYS_PER_MONTH;
+   result-month--;
+   }
+ 
+   if (result-time  0  result-day  0)
+   {
+ #ifdef HAVE_INT64_TIMESTAMP
+   result-time += USECS_PER_DAY;
+ #else
+   result-time += (double) SECS_PER_DAY;
+ #endif
+   result-day--;
+   }
+   else if (result-time  0  result-day  0)
+   {
+ #ifdef HAVE_INT64_TIMESTAMP
+   result-time -= USECS_PER_DAY;
+ #else
+   result-time -= (double) SECS_PER_DAY;
+ #endif
+   result-day++;
+   }
+ 
+   PG_RETURN_INTERVAL_P(result);
+ }
+ 
+ /*
   *interval_justify_hours()
   *
   *Adjust interval so 'time' contains less than a whole day, adding
***
*** 2006,2011 
--- 2080,2104 
  #endif
result-day += wholeday;/* could overflow... */
  
+   if (result-time  0  result-day  0)
+   {
+ #ifdef HAVE_INT64_TIMESTAMP
+   result-time += USECS_PER_DAY;
+ #else
+   result-time += (double) SECS_PER_DAY;
+ #endif
+   result-day--;
+   }
+   else if (result-time  0  result-day  0)
+   {
+ #ifdef HAVE_INT64_TIMESTAMP
+   result-time -= USECS_PER_DAY;
+ #else
+   result-time -= (double) SECS_PER_DAY;
+ #endif
+   result-day++;
+   }
+ 
PG_RETURN_INTERVAL_P(result);
  }
  
***
*** 2031,2036 
--- 2124,2140 
result-day -= wholemonth * DAYS_PER_MONTH;
result-month += wholemonth;
  
+   if (result-month  0  result-day  0)
+   {
+   result-day -= DAYS_PER_MONTH;
+   result-month++;
+   }
+   else if (result-month  0  result-day  0)
+   {
+   result-day += 

Re: [PATCHES] [SQL] Interval subtracting

2006-03-01 Thread Mark Dilger

Mark Dilger wrote:

Mark Dilger wrote:


Tom Lane wrote:


Milen A. Radev [EMAIL PROTECTED] writes:


Milorad Poluga напи�а:

SELECT '10 years 1 mons 1 days'::interval - '9 years 10 mons 15 
days'::interval

?column?--- 3 mons -14 days
Why not '2 mons  16 days' ? 






Please read the last paragraph in section 8.5.1.4 of the manual
(http://www.postgresql.org/docs/8.1/static/datatype-datetime.html#AEN4775) 


. It mentions the functions named justify_days and justify_hours
that could do what you need.





justify_days doesn't currently do anything with this result --- it
thinks its charter is only to reduce day components that are = 30 days.
However, I think a good case could be made that it should normalize
negative days too; that is, the invariant on its result should be
0 = days  30, not merely days  30.  Similarly for justify_hours.
Comments anyone?  Patch anyone?




Sure, if nobody objects to this change I can write the patch.

mark



I've modified the code and it now behaves as follows:

  select justify_days('3 months -12 days'::interval);
justify_days
  
   2 mons 18 days

  select justify_days('3 months -33 days'::interval);
   justify_days
  ---
   1 mon 27 days

  select justify_hours('3 months -33 days -12 hours'::interval);
 justify_hours
  ---
   3 mons -34 days +12:00:00

  select justify_days(justify_hours('3 months -33 days -12 
hours'::interval));

justify_days
  
   1 mon 26 days 12:00:00

  select justify_hours('-73 hours'::interval);
 justify_hours
  ---
   -4 days +23:00:00

  select justify_days('-62 days'::interval);
 justify_days
  --
   -3 mons +28 days


I find the last two results somewhat peculiar, as the new functionality 
pushes the negative values upwards (from hours to days, days to 
months).  Changing '-73 hours' to '-3 days -1 hour' might be more 
intuitive?  The '-4 days +23 hours' is however consistent with the 
behavior in the other cases.


Thoughts?  I will package this up into a patch fairly soon.

mark


The patch is attached.  Since the functionality is being intentionally changed, 
not surprisingly the regression tests for timestamp, timestamptz and horology 
failed.  The regression.diffs are also attached.


I intended to update the docs for justify_days and justify_hours, but the docs 
don't detail the behavior at a sufficient level for any change to be warranted.


mark
Index: src/backend/utils/adt/timestamp.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/timestamp.c,v
retrieving revision 1.160
diff --context=5 -r1.160 timestamp.c
*** src/backend/utils/adt/timestamp.c   22 Nov 2005 22:30:33 -  1.160
--- src/backend/utils/adt/timestamp.c   1 Mar 2006 18:35:36 -
***
*** 2003,2013 
TMODULO(result-time, wholeday, USECS_PER_DAY);
  #else
TMODULO(result-time, wholeday, (double) SECS_PER_DAY);
  #endif
result-day += wholeday;/* could overflow... */
! 
PG_RETURN_INTERVAL_P(result);
  }
  
  /*
   *interval_justify_days()
--- 2003,2021 
TMODULO(result-time, wholeday, USECS_PER_DAY);
  #else
TMODULO(result-time, wholeday, (double) SECS_PER_DAY);
  #endif
result-day += wholeday;/* could overflow... */
!   if (result-time  0)
!   {
! #ifdef HAVE_INT64_TIMESTAMP
!   result-time += USECS_PER_DAY;
! #else
!   result-time += (double) SECS_PER_DAY;
! #endif
!   result-day--;
!   }
PG_RETURN_INTERVAL_P(result);
  }
  
  /*
   *interval_justify_days()
***
*** 2028,2037 
--- 2036,2050 
result-time = span-time;
  
wholemonth = result-day / DAYS_PER_MONTH;
result-day -= wholemonth * DAYS_PER_MONTH;
result-month += wholemonth;
+   if (result-day  0)
+   {
+   result-day += DAYS_PER_MONTH;
+   result-month--;
+   }
  
PG_RETURN_INTERVAL_P(result);
  }
  
  /* timestamp_pl_interval()
*** ./expected/timestamp.outSat Jun 25 20:04:18 2005
--- ./results/timestamp.out Wed Mar  1 10:21:00 2006
***
*** 442,448 
  SELECT '' AS 54, d1 - timestamp without time zone '1997-01-02' AS diff
 FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
   54 |  diff  
! +
  | @ 9863 days ago
  | @ 39 days 17 hours 32 mins 1 sec
  | @ 39 days 17 hours 32 mins 1 sec
--- 442,448 
  SELECT '' AS 54, d1 - timestamp without time zone '1997-01-02' AS diff
 FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
   54 |   diff
! +---
  | @ 9863 days ago
  | @ 39 days 17 hours 32 mins