Re: [HACKERS] Fix number skipping in to_number

2017-09-26 Thread Nathan Wagner
On Mon, Sep 25, 2017 at 07:52:19PM +0100, Oliver Ford wrote:

> Thanks for your review. The issue is that Oracle throws errors on many
> more input cases than Postgres does, so making it exactly like Oracle
> could break a lot of existing users. E.g. to_number ('123,000', '999')
> returns '123' on Postgres, but throws an error on Oracle. So making it
> exactly Oracle-like could break existing users who might rely on the
> current behavior.

I wouldn't use to_number for anything other than oracle compatibility,
and then just so I didn't have to wade through all the ported oracle
code.  I would use a regex or some such to clean up the number, and then
cast the result.  For an input string of '123,000' I might do a
translate('123,000', ',', '')::integer or perhaps use regexp_replace().
Either way I would want a more positive decision as to what was valid or
not, based on the input.

> My view is that we shouldn't deliberately introduce errors in order to be
> exactly like Oracle if we don't currently and there's a sane use case for
> current behavior. Do you have any examples of results that are different
> between Oracle and Postgres, and you think the Oracle result makes more
> sense?

Not really, other than I think an error report might make more sense.
'123,000' doesn't really match the format of '999'.  If anything it
seems like we're guessing rather than validating input.  It is
surprising (to me at least) that

to_char(to_number('123,000', '999'), '999')

doesn't give us the original input (in the sense that identical formats
don't preserve the original string).  So I'm not sure the current
behavior is a sane use case, but perhaps more people are using
to_number() to get *some* numeric result, rather than for wanting it to
be like oracle.  I would generally prefer to throw an exception instead
of getting a number I wasn't expecting, but I can see cases where that
might not be the case.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix number skipping in to_number

2017-09-24 Thread Nathan Wagner
On Thu, Aug 17, 2017 at 12:33:02PM +0100, Oliver Ford wrote:
 
> Ok I've made that change in the attached v3. I'm not sure as I'm on
> en_US.UTF-8 locale too. Maybe something Windows specific?

This patch applies against master (8485a25a), compiles, and
passes a make check.

I tested both on my mac laptop, and my linux server.

If we want this patch, I'd say it's ready for committer.  We may want
(and I can't believe I'm saying this) more discussion as to exactly what
the strategy for to_number() (and friends) is.  Do we want to duplicate
Oracle's functionality, or do we want a similar function to do similar
things, without necessarily having a goal of identical behavior to
oracle?

For myself, I pretty much never use the to_date, to_number, or
to_timestamp functions except when porting oracle code.  I do use the
to_char functions on occasion.  If strftime were available, I probably
wouldn't use them.

I would commit this patch and update the TODO with a goal of making
to_number as Oracle compatible as is reasonable.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for geqo tweaks

2015-11-06 Thread Nathan Wagner
On Fri, Nov 06, 2015 at 11:19:00AM -0500, Tom Lane wrote:
> Nathan Wagner <nw...@hydaspes.if.org> writes:
> > I see you committed a modified version of my patch in commit
> > 59464bd6f928ad0da30502cbe9b54baec9ca2c69.
> 
> > You changed the tour[0] to be hardcoded to 1, but it should be any
> > of the possible gene numbers from 0 to remainder.
> 
> How so?  The intent is to replace the first iteration of the
> Fisher-Yates loop, not the old loop.  That iteration will certainly
> end by assigning 1 to tour[0], because it must choose j = i = 0.

You are correct.  I got confused between reading the original code, my
patch, and your modified patch.

I wonder why the algorithm bothers with the first iteration at all, in
the case of an initialized array, it would just swap the first element
with itself.  I must be missing something.  I'll need to do some more
reading.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for geqo tweaks

2015-11-06 Thread Nathan Wagner
On Fri, Nov 06, 2015 at 11:45:38AM -0500, Tom Lane wrote:

> However, really the whole argument is moot, because I notice that
> geqo_mutation() is only called in the "#ifdef CX" code path, which
> we don't use.

I suppose someone could turn it on via a compiler define.

> So there's little point in improving it.

No, probably not.

> (There's a fair amount of dead code in /geqo/, which I've never had
> the energy to clean up, but maybe we should do that sometime.  It
> seems unlikely that anyone will ever be interested in experimenting
> with the ifdef'ed-out code paths.)

I also note that in  src/backend/optimizer/path/allpaths.c there is a
join_search_hook variable apparently intended for plugins (extensions?)
to be able to control the search path optimizer.  And the geqo code is
AFAICT turned off by default anyway, so none of the code is used in
probably the vast majority of systems, with standard_join_search() being
called instead.

Would it be worth either of removing at least the non-ERX portions of
the geqo code, or removing the geqo code entirely (presumably with a
deprecation cycle) and moving it to an extension?  If there's any
interest, I can work up a patch for either or both.

There is only one test in the regression suite that turns on geqo that I
could find.  It's labeled "check for failure to generate a plan with
multiple degenerate IN clauses" in join.sql.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for geqo tweaks

2015-11-06 Thread Nathan Wagner
On Fri, Nov 06, 2015 at 02:16:41PM -0500, Tom Lane wrote:
> Uh, what?  It's not by any means turned off by default.
> 
> postgres=# select name,setting from pg_settings where name like '%geqo%';
> name | setting 
> -+-
>  geqo| on

[snip]

My day to make a fool of myself in public I guess.  You're right of
course.  I can only plead distraction by having too many projects in
mind at once and not focusing properly.  Sorry for taking up your time
on things I should have checked better.

> I'm inclined to think that removing all the ifdefd-out-by-default logic
> would be a fine thing to do, though.

I'll work up a patch.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for geqo tweaks

2015-11-05 Thread Nathan Wagner
On Wed, Nov 04, 2015 at 12:51:52PM -0500, Tom Lane wrote:
> Nathan Wagner <nw...@hydaspes.if.org> writes:
> > I have two patches to make the geqo initialization and mutation
> > slightly better.
> 
> > The first adjusts the mutation swaps to avoid having to re-pick
> > ties.  The second changes the initialization and shuffling algorithm
> > for the gene array to use an in-place Fisher-Yates shuffling
> > algorithm.
> 
> I took a quick look at this.
> 
> I'm not very impressed with the first patch: it might save a few
> geqo_randint() calls, but it seems to do so at the price of making the
> swap choices less random --- for instance it sure looks to me like the
> last array element is now less likely to participate in swaps than
> other elements.  Unless you can prove that actually the swapping is
> still unbiased, I'm inclined to reject this part.

If I have understood the original code correctly, we need to select two
different random integers between 0 and num_gene-1, inclusive.  That
happens to be num_gene possible results.

Having chosen the first one, which I will call "swap1", we now only have
num_gene-1 possible results, which need to range from either 0 to
swap1-1 or from swap1+1 to num_gene-1, which is num_gene-1 possible
results.  I treat this as a single range from 0 to num_gene-2 and
generate a number within that range, which I will call "swap2".

If swap2 is between 0 and swap1-1, it is in the first range, and no
adjustment is necessary.  If it is greater than or equal to swap1, then
it is in the second range.  However the generated swap2 in the second
range will be between swap1 and num_gene-2, whereas we need it to be
between swap1+1 and num_gene-1, so I add one to swap2, adjusting the
range to the needed range.

It would be equivalent to set swap2 to num_gene-1 in that case,
effectively remapping the first value to the last, but an increment was
more intuitive to me.

> As for the second part, I had to look up Fisher-Yates ;-) but after
> having read Wikipedia's entry about it I think this is a good change.
> The code's shorter and more efficient, and it should mathematically
> provide an equally-unbiased initial shuffle.  It could do with a
> better comment, and I'd be inclined to handle the first element
> outside the loop rather than uselessly computing geqo_randint(0,0),

I could do that.  It will make the code slightly harder to read.  I
wonder if it would be worth having geqo_randint() handle the special
case instead.

> Having said that, though, I believe that it's also probably a
> *different* initial shuffle, which may well mean that GEQO gives
> different plans in some cases.

Yes, I would expect it to be different in the general case.  I think my
commit message noted that, but perhaps it could be more explicit.

> That doesn't bother me as long as we only make the change in HEAD, but
> does anyone want to complain?

I don't see any need to backport either of these patches.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for geqo tweaks

2015-11-05 Thread Nathan Wagner
On Wed, Nov 04, 2015 at 12:51:52PM -0500, Tom Lane wrote:

> As for the second part, I had to look up Fisher-Yates ;-) but after
> having read Wikipedia's entry about it I think this is a good change.
> The code's shorter and more efficient, and it should mathematically
> provide an equally-unbiased initial shuffle.  It could do with a
> better comment, and I'd be inclined to handle the first element
> outside the loop rather than uselessly computing geqo_randint(0,0),
> but those are trivial changes.

I see you committed a modified version of my patch in commit
59464bd6f928ad0da30502cbe9b54baec9ca2c69.

You changed the tour[0] to be hardcoded to 1, but it should be any of
the possible gene numbers from 0 to remainder.  If you want to pull the
geqo_randint(0,0) out of the loop, it would be the last element, not the
first (i.e. where remainder == 0).

We might be able to just skip the last swap, and the loop could be

for (i=0; i < num_gene-1; i++) {

but I'd need to re-read the details of the Fisher-Yates algorithm to be
sure.  It may be that the last swap needs to happen for the shuffle to
be fully random.  In any case, tour[0] certainly shouldn't be hardcoded
to 1.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] September 2015 Commitfest

2015-10-31 Thread Nathan Wagner
On Sat, Oct 31, 2015 at 12:08:58PM -0400, Tom Lane wrote:
> Nathan Wagner <nw...@hydaspes.if.org> writes:
> > Second, it would be convenient if there were a make target that would
> > set up a test environment.  Effectively do what the 'make check' does,
> > but don't run the tests and leave the database up.  It should probably
> > drop you into a shell that has the paths set up as well.  Another
> > target should be available to shut it down.
> 
> As far as that goes, I don't think it's really the makefiles' place to
> establish a manual-testing convention.  What I do, and what I think
> most other longtimers do, is create test installations in nondefault
> places.

[snip description on how to set this up ]

> You could imagine putting something into the standard makefiles
> that did some subset of this, but I think it would be too rigid
> to be useful.

I think it would be very useful to just be able to tell the system "fire
this up for me so I can test it".  I don't think it needs to handle
every possible testing scenario, just making it easier to leave up the
test postmaster from make check would be very useful, at least to me.

> As an example, what if you wanted to compare the behaviors of both
> unmodified HEAD and your patched code?  It's not very hard to set up
> two temporary installations along the lines of the recipe I've just
> given, but I can't see the makefiles handling that.

They could pick up make or environment variables.  We already do that
for psql.  Something like

PGPORT=5495 PGPATH=~/pg95 make startit

or some such.  I'm not actually proposing this, I'm just noting how the
makefiles could handle it fairly easily.  All I'd really like is a way
to leave the database used for 'make check' running so I can do any
additional poking around by hand that I might want to do more easily.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] September 2015 Commitfest

2015-10-31 Thread Nathan Wagner
On Sat, Oct 31, 2015 at 08:03:59AM +0100, Robert Haas wrote:
> +1.  FWIW, I'm willing to review some patches for this CommitFest, but
> if the committers have to do first-round review as well as
> committer-review of every patch in the CommitFest, this is going to be
> long, ugly, and painful.  We need to have a substantial pool of
> non-committers involved in the review process so that at least some of
> the work gets spread out.

As a non-committer, let me offer my thoughts.

First, I would ask that every patch include a commit id that the patch
was generated against.  For example, I was looking at the "group command
option for psql" patch.  I created a branch off of master, but the patch
doesn't apply cleanly.  On further investigation, it looks like Adam
Brightwell noted this on September 21, but the patch hasn't been
updated.  If I knew which commit id the patch was created against, I
could create a branch from there and test the patch, but without, I need
to figure that out which is more work, or I need to re-create the patch,
which is also more work.

Second, it would be convenient if there were a make target that would
set up a test environment.  Effectively do what the 'make check' does,
but don't run the tests and leave the database up.  It should probably
drop you into a shell that has the paths set up as well.  Another
target should be available to shut it down.  So, what would be cool,
and make testing easier is if I could do a 'git checkout -b feature
abcdef' (or whatever the commit id is and branch name you want to use)
Then from there a

make
make check
make testrig

make testshutdown

These two would go a long way to making the process of actually
doing a review easier.

Back to the patch in question, so Mr Brightwell noted that the patch
doesn't apply against master.  Should someone then mark the patch as
waiting on author?  Is failing to apply against master a 'waiting on
author' cause?  Is the answer different if the patch author has supplied
a commit id that the patch was generated from?

There was then some further discussion on the interface, and what to do
with startup files, and nothing was really decided, and then the thread
petered out.  This potential reviewer is then left with the conclusion
that this patch really can't be reviewed, and it's not sure if it's even
wanted as is anyway.  So I move on to something else.  There was a bunch
of discussion by a bunch of committers, and no-one updated the status of
the patch in the commitfest, and there's no clear guidelines on what the
status should be in this case.

If I were needing to decide, I would say that the patch should either be
marked as "Waiting on Author" on the grounds that the patch doesn't
currently apply, or "Returned with feedback" on the grounds that there
was unaddressed feedback on the details of the patch, and it was noted
as a "proof of concept" when submitted anyway.  But I'm unwilling to
just change it to that without more clear definitions of the meaning of
each status.  A link to definitions and when the status should be
changed would help.

What is "ready for committer" anyway?  It's clearly not "a committer
will apply the patch", since things sit in that status without being
committed.  If I think the patch is good and should be applied, do I
mark it as ready for committer, and then later a committer will also
review the patch?  If so, is doing anything other than the sanity
checks, and possibly offering an opinion, on the patch even useful?

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add EXTRA_CFLAGS to configure

2015-10-28 Thread Nathan Wagner
On Wed, Oct 28, 2015 at 02:42:19PM +0100, Robert Haas wrote:
> On Wed, Oct 28, 2015 at 2:17 PM, Andres Freund  wrote:
> >> I use COPT for this purpose.
> >
> > Unless I miss something you can't just pass that to configure though,
> > right? I.e. it has to be passed to each make invocation?
> 
> What I do is:
> 
> echo COPT=-Wall -Werror > src/Makefile.custom

Make will pick up variables from the environment.  So unless the
makefile goes out of its way to circumvent it, you can just do

COPT=-Werror
export COPT

and then run your usual configure/compile cycle.  There's no
specific need to modify the makefiles or pass extra arguments
into make.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fortnight interval support

2015-10-27 Thread Nathan Wagner
On Tue, Oct 27, 2015 at 01:52:11PM +, Nathan Wagner wrote:
> On Mon, Oct 26, 2015 at 01:58:52PM -0400, Robert Haas wrote:
> > Aw, you're no fun.  select '1 fortnight'::interval => '14 days' would be 
> > cool.
> 
> Patch attached...

This isn't necessarily bad, but I observe that it would be difficult or
impossible to add fortnight support to intervals as an extension rather
than by modifying the scanner tables that the interval parser uses at
original compile time.  You might be able to override symbols with a C
extension, but parts of the parser are static (in the C sense), so you'd
need to override and duplicate a lot of the existing functions.  Of
course, a hookable interval parser is absurd in the first place.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fortnight interval support

2015-10-27 Thread Nathan Wagner
On Tue, Oct 27, 2015 at 12:04:55PM -0500, Merlin Moncure wrote:
> On Tue, Oct 27, 2015 at 8:52 AM, Nathan Wagner <nw...@hydaspes.if.org> wrote:
> > On Mon, Oct 26, 2015 at 01:58:52PM -0400, Robert Haas wrote:
> >> Aw, you're no fun.  select '1 fortnight'::interval => '14 days' would be 
> >> cool.
> >
> > Patch attached...

> This is very cool (you are 100% certain there are no performance
> impacts on current cases, right?)!   :-)

It passed the regression test.  It must be perfect :)

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fortnight interval support

2015-10-27 Thread Nathan Wagner
On Wed, Oct 28, 2015 at 08:17:25AM +1300, Gavin Flower wrote:
> You trying to get PostgreSQL banned in France???  :-)
> 
> When I was learning French many years ago, I was told that the French
> consider their fortnight to be 15 days!!!

What, it's a "fortnight", not a "quinzaine".

You have no idea how hard it was to resist updating the patch...

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] fortnight interval support

2015-10-27 Thread Nathan Wagner
On Mon, Oct 26, 2015 at 01:58:52PM -0400, Robert Haas wrote:
> Aw, you're no fun.  select '1 fortnight'::interval => '14 days' would be cool.

Patch attached...

:)

% psql -p 5433 -d template1 -h localhost
psql (9.4.5, server 9.6devel)
WARNING: psql major version 9.4, server major version 9.6.
 Some psql features might not work.
Type "help" for help.

template1=# select current_date;
date

 2015-10-27
(1 row)

template1=# select '1 fortnight'::interval;
 interval 
--
 14 days
(1 row)

template1=# select current_date + '1 fortnight'::interval;
  ?column?   
-
 2015-11-10 00:00:00
(1 row)

template1=# select current_date + '1.3 fortnight'::interval;
  ?column?   
-
 2015-11-14 04:48:00
(1 row)

template1=# select current_date + '1.3 fortnights'::interval;
  ?column?   
-
 2015-11-14 04:48:00
(1 row)

-- 
nw
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 926358e..2032fe0 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -186,6 +186,8 @@ static const datetkn deltatktbl[] = {
{DDECADE, UNITS, DTK_DECADE},   /* "decade" relative */
{"decades", UNITS, DTK_DECADE}, /* "decades" relative */
{"decs", UNITS, DTK_DECADE},/* "decades" relative */
+   {DFORTNIGHT, UNITS, DTK_FORTNIGHT}, /* "fortnights" relative */
+   {"fortnights", UNITS, DTK_FORTNIGHT}, /* "fortnights" relative */
{"h", UNITS, DTK_HOUR}, /* "hour" relative */
{DHOUR, UNITS, DTK_HOUR},   /* "hour" relative */
{"hours", UNITS, DTK_HOUR}, /* "hours" relative */
@@ -3281,6 +3283,12 @@ DecodeInterval(char **field, int *ftype, int nf, int 
range,
tmask = DTK_M(DAY);
break;
 
+   case DTK_FORTNIGHT:
+   tm->tm_mday += val * 14;
+   AdjustFractDays(fval, tm, fsec, 
14);
+   tmask = DTK_M(WEEK);
+   break;
+
case DTK_WEEK:
tm->tm_mday += val * 7;
AdjustFractDays(fval, tm, fsec, 
7);
diff --git a/src/include/utils/datetime.h b/src/include/utils/datetime.h
index e9a1ece..3641292 100644
--- a/src/include/utils/datetime.h
+++ b/src/include/utils/datetime.h
@@ -52,6 +52,7 @@ struct tzEntry;
 #define DHOUR  "hour"
 #define DDAY   "day"
 #define DWEEK  "week"
+#define DFORTNIGHT "fortnight"
 #define DMONTH "month"
 #define DQUARTER   "quarter"
 #define DYEAR  "year"
@@ -181,6 +182,7 @@ struct tzEntry;
 #define DTK_TZ_MINUTE  35
 #define DTK_ISOYEAR36
 #define DTK_ISODOW 37
+#define DTK_FORTNIGHT  38
 
 
 /*
diff --git a/src/test/regress/expected/interval.out 
b/src/test/regress/expected/interval.out
index c873a99..7a72f2a 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -40,6 +40,12 @@ SELECT INTERVAL '1.5 weeks' AS "Ten days twelve hours";
  10 days 12:00:00
 (1 row)
 
+SELECT INTERVAL '1 fortnight' AS "Fourteen days";
+ Fourteen days 
+---
+ 14 days
+(1 row)
+
 SELECT INTERVAL '1.5 months' AS "One month 15 days";
  One month 15 days 
 ---
diff --git a/src/test/regress/sql/interval.sql 
b/src/test/regress/sql/interval.sql
index 789c3de..285266a 100644
--- a/src/test/regress/sql/interval.sql
+++ b/src/test/regress/sql/interval.sql
@@ -12,6 +12,7 @@ SELECT INTERVAL '-08:00' AS "Eight hours";
 SELECT INTERVAL '-1 +02:03' AS "22 hours ago...";
 SELECT INTERVAL '-1 days +02:03' AS "22 hours ago...";
 SELECT INTERVAL '1.5 weeks' AS "Ten days twelve hours";
+SELECT INTERVAL '1 fortnight' AS "Fourteen days";
 SELECT INTERVAL '1.5 months' AS "One month 15 days";
 SELECT INTERVAL '10 years -11 month -12 days +13:14' AS "9 years...";
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-21 Thread Nathan Wagner
On Tue, Oct 20, 2015 at 10:39:55AM -0700, Joshua D. Drake wrote:
> So where are we at on this?

Well, I can't speak to where we are, but my system is up, running, and
seems to work well,  It even attracts a few visitors.

I have been meaning to write a triage interface, but I have been stuck
doing postgis work for an anthropology paper I am working on for the
last week.  I hope to be able to take a break from that tomorrow and
through the weekend.  I have read the various comments, and will also do
some simple scanning for phrases in emails and commit messages
indicating status changes which I will reflect on the main web page.
Once I write that code, I will email the list with what I have done.  If
people want to use it, or criticize it, they would be welcome to do so.
If not, well, the interface is useful for me in any event, so I will
probably maintain it for the forseeable future.

https://granicus.if.org/pgbugs/ for anyone who hasn't and wants to take a
look.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-08 Thread Nathan Wagner
On Wed, Oct 07, 2015 at 03:06:50PM -0400, Stephen Frost wrote:
> * Nathan Wagner (nw...@hydaspes.if.org) wrote:
> > I have added full text searching to my tracker.  I only index the first
> > 50 KB of each message.  There's apparently a one MB limit on that
> > anyway, which a few messages exceed.  I figure anything important is
> > probably in the first 50KB.  I could be wrong.  I could re-index fairly
> > easily.  It seems to work pretty well.
> 
> Note that we have FTS for the -bugs, and all the other, mailing lists..

True, but that finds emails.  The search I have finds bugs (well, bug reports
anyway).  Specifically, I have the following function:

create or replace function bugvector(bugid bigint)
returns tsvector language 'sql' as $$
select tsvagg(
setweight(to_tsvector(substr(body(msg), 1, 50*1024)), 'D')
||
setweight(to_tsvector(header_value(msg, 'Subject')), 'C')
)
from emails
where bug = $1
$$ strict;

which, as you can see, collects into one tsvector all the emails associated
with that particular bug.  So a search hit is for the whole bug.  There's
probably some search artifacts here.  I suspect a bug with a long email thread
will be ranked higher than a one with a short thread.  Perhaps that's ok
though.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-07 Thread Nathan Wagner
I have added full text searching to my tracker.  I only index the first
50 KB of each message.  There's apparently a one MB limit on that
anyway, which a few messages exceed.  I figure anything important is
probably in the first 50KB.  I could be wrong.  I could re-index fairly
easily.  It seems to work pretty well.

I have changed the 'stale' status to 90 days.  I have triaged most of
the remaining bugs with a message count > 1.  This is in a separate
file if it some other system wants to collect the work.

I am still working out exactly how to find multiple matching commits
along different branches.  I'm running git patch-id on every commit,
but that will take a while.

Later tonight I will implement some in-message-body update syntax I'm
also going to create (internally) a web page for the bugs that allows
editing the status from that page.  I will post the details on that
once I've tested it a bit.  I'll probably make a web page allowing
updates as well.

How often are the bugs mail list archives updated?  I've got a bunch of
posts to the bugs list in my inbox that aren't in the archive download.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-06 Thread Nathan Wagner
On Tue, Oct 06, 2015 at 01:17:48PM -0400, Bruce Momjian wrote:

> I do think we rushed to choose a tracker a little too quickly.

Have we chosen one?

> Let me explain, from a high level, what a new tracker will change in
> our workflow.

[snip]

I won't quote your whole message, which I essentially agree with.  Let
me say that the questions I have brought up have several purposes.

One, I think it's important to identify what exactly we're after.  I
hope my questions have help put some light on that.

Two, I think any attempt to tell the developers and committers that they
need to change their workflow to adapt to some system is bound to fail,
so, I have asked, just what changed would you all be willing to actually
*do*?  Tom Lane is pretty good at noting a bug number in his commit
messages, for example.  Would he be willing to modify that slightly to
make it easier to machine parse?  Would you be willing to add a bug
number to your commit messages?  I'm not asking for guarantees.
Actually I'm not really asking for anything, I'm just trying to figure
out what the parameters of a solution might be.  If the answer to that
is "no, I'm not willing to change anything at all", that's fine, it just
colors what might be done and how much automation I or someone else
might be able to write.

I think even with a bug tracker the default "ignore" behavior can still
be done.  In principle, we could even mark bugs as "unconfirmed" or
"logged" or something right away and only mark them as new or open or
something if they actually draw a reply.  We could even require a reply
from a committer if that was wanted.

If I haven't made it clear by now, I am trying to write a system that
requires the absolute minimal amount of change to the existing way of
doing things.  As I noted in my original email, I've put together a bug
tracker, not a ticket system.  If people would like to make some small
adjustments to make it easier to automate a bug tracker, that would be
great, but if not, that's fine too, it's no *worse* than what we already
have.  And if people really wanted a ticket system, it wouldn't be hard
to enhance a tracker.

> My point is that we have our current workflow not because we are
> idiots, but because it fit our workflow and resources best.  I am not
> sure if we have succeeded because of our current non-retain mode, or
> in spite of it.  It might be time to switch to a default-retain mode,
> especially since most other projects have that mode, but we should be
> clear what we are getting into.

I thinking having a bug tracker and retention vs non-retention are
orthogonal questions.  But I agree that knowing what might be involved
is a good idea.  I think perhaps one of the problems is that different
people want different things, so it's probably going to be hard to make
everyone happy.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-06 Thread Nathan Wagner
On Tue, Oct 06, 2015 at 10:57:42AM -0700, Josh Berkus wrote:

> On 10/06/2015 10:17 AM, Bruce Momjian wrote:

> > Therefore, our current default behavior is to ignore user reports,
> > unless someone takes an action to reply, record, or retain the email for
> > later review.  What a tracker does is to make the default user report be
> > _retained_, meaning we have to take action to _not_ retain a user report
> > as an open item.
> 
> Well, we can determine how that's handled.  There are bug trackers out
> there that automatically archive unconfirmed bug reports after a certain
> amount of time.  I'd personally recommend it.
> 
> Of course, that requires a bug tracker which can have an "unconfirmed"
> status.

This is essentially what I have done with the 'Stale' status.  Though
I have done at two years to be conservative, rather than 60 days,
which I think is entirely more reasonable.

> > Second, we have a mix of user reports.  Some bug reports are not bugs
> > and must be reclassified.  In other cases, uses ask questions via
> > non-tracked communicate channels, e.g. pgsql-general, but they are
> > really bugs.  So, to do this right, we need a way of marking tracked
> > bugs as not bugs, and a way of adding bugs that were reported in a
> > non-tracked manner.
> 
> Yeah, I was wondering about that.

I think I have suggested that there be a way to generate a bug id via
email.  Presumably someone could just copy that email address to make a
not-tracked discussion get a bug id.  If the system archived all the
lists (not hard) it would be possible to pull the other emails from the
thread into the bug (also not hard).  As for marking as 'not-a-bug'
this can easily be done via whatever mechanism might be used.
Something along the lines of:

Bug Status: not a bug

would probably work.  It's really whatever people are willing to
actually do.

> FWIW, when I talk about bugs which we lost track of, they're not
> generally unconfirmed bug reports.  Usually, it's stuff which a
> contributor replied to, but the bug was low-impact, circumstantial, and
> hard to reproduce, and came in during a busy period (like release time).
>  So I'd be perfectly OK with the idea that unconfirmed bugs hang around
> in the system for 60 days, then automatically convert to "stale" status.

My tracker would do this trivially if I changed the query to 60 days
instead of two years.

> Until we build up a team of volunteers for bug triage, we might have to
> do that.
> 
> Speaking of which ... this project is rich in skilled users who are
> involved in the community but don't code.  Bug triage is exactly the
> kind of thing very part-time community supporters can do, if we make it
> easy for them to do.

I can make it easy.  But that doesn't directly address two of the other
points:

1: Do we need any system for who is allowed to triage bugs?
2: Should an equivalent email be sent to the list?

Specifically with point number 2, suppose the above mechanism is
used.  When a triager marks a bug as (say) not a bug, should
the system just update the database, or should it actually
send a formatted email to the bugs list with the 'Bug Status: not a bug'
line (among others, presumably)?  I think it should send the email,
but I can see how that could be construed as junk.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bugs and bug tracking

2015-10-06 Thread Nathan Wagner
So, in order to do some clean up and see how my pgbugs page
(https://granicus.if.org/pgbugs/) might actually work I've been going
through bugs and marking their status.  A lot of questions arise.

A lot of the reports aren't bugs at all, but requests for help.  My
guess is that the users either don't know where to ask or don't
understand the difference between a bug and not knowing how to do what
they want to do.  Perhaps a more thorough explaination on the submission
form would be useful.

What is the difference between a bug and a feature request?  Ok, I know
the difference, but do we want to treat them differently.  For example,
see bug 9457 (https://granicus.if.org/pgbugs/9457).  As Pavel Stehule
noted, this isn't a *bug* per se, but what should we do with it?  Close
it as 'not a bug'?  I don't like this because it's not really the same
as the other 'not a bug's.  Create another 'closed' status of 'feature
request'?  Except that if we decide to implement the feature, in some
sense it becomes a bug until we actually implement it.  Create an 'open'
status of 'feature request' to mark it until it is either implemented or
rejected.  At least then it's tracked.  This last choice is my
preference.

I conflate open bugs in the sense of 'not closed so we still need to do
something with the bug even if it is just closing it' and open bugs in
the sense of 'this seems to actually be a bug in postgres'.  I'm not
sure what terminology I should use.

I have lots of different types of 'not a bug'.

Not a bug, the use should have posted to a different list. (e.g. 13602)
Not a bug, probably user error, which is similar to the above.
Not a bug, but a bug in the libraries we use (e.g. openssl, 10184)
Not a bug, works as intended, i.e. the user didn't make a mistake, but
had an incorrect notion of what it was supposed to do. (11596)
Not a bug, but the user never got a reply.  That is, I decided
personally that this wasn't actually a bug.  (13367)
And bug 1000 is not a bug, system test.

Do we care about the difference between any of these?  I track them
differently in my update script, but they all get the same status in the
db.

Can a bug be 'fixed' if there's no actually identifiable commit that
fixes the bug?  See 13516, which Tom Lane claims was fixed in 9.1.  A
grep for 13516 of the git log for both master and REL9_1_STABLE don't
turn up anything.

I can't as yet figure out how to match up git commit messages to
identify every branch in which a fix was applied.  I could of course
load all of the commit messages into a table and compare them that way.

Should I mark as "open" (i.e. not "new) any report that has a response?
More than one response?  That would at least distinguish between bug
reports that at least someone, in theory, has taken a look at, and those
that haven't been addressed at all.

I have created the following statuses:

Fixed - bug is presumably fixed

Unreproduceable - we can't make the system demonstrate this error

Timed Out - the reporter was asked to provide more information and
didn't respond for a while.  I would probably suggest somewhere around a
month for this.  Should there be a 'waiting on feedback' to mark the
'pre timed out' phase?

Stale   5281 - the bug hasn't had any activity for >2 years, so just
close it.  If people want to troll through these to give them a better
status, that would probably be good, but there's still a thousand open
bugs newer than that.

Not Our Bug - looks like a bug, but it's not a bug in postgres.  What
exactly are our bugs?  Just what you'd get out of the release tarballs
or the git repo?  Or is there more?

Not a Bug - see above discussion

Won't Fix - this is arguably a bug, but for some reason we're not going
to fix it.  Perhaps we intentionally violate the standard, or it's a bug
against a version we don't support, or we're not going to backpatch it.

Open - this seems to be a bug, or at least we're talking about it and
it's not where we want to close it.  Note of course that "closing" a bug
just means it will show up somewhere else in my tracker, obviously it
doesn't affect the mailing list at all.

New - this hasn't been looked at enough for someone to change the status
to something better.

I don't have a "reopened" status.  I'm not sure what it means, other
than it used to be closed, but someone changed it back to open.  I don't
immediately see why we would want to distinguish between this and a
regular open bug, other than perhaps as a way of conflating status with
priority.  It's easy to make one though if people really want it.  I
probably have too many statuses already.

I will post later on my thoughts on how to control the system.  Are
people, in principle, willing to put magic incantations in their emails
and commit messages?  I'm not asking for a commitment to my tool here,
I'm just exploring what the bounds of people's, and committer's in
particular, willingness to adjust their workflow or message texts a bit
to make it easier to automate 

Re: [HACKERS] bugs and bug tracking

2015-10-06 Thread Nathan Wagner
On Tue, Oct 06, 2015 at 12:04:11PM -0500, Jaime Casanova wrote:

> I like how this page is looking now, good work.

Thank you.

> Now, AFAIU from previous mails part of the reason to have a bug
> tracker is to make easy to know when a bug was fixed. Which should be
> interpreted as: which versions this bug affected? and which minor
> versions on those branches fix the issue
> 
> for example bug # 13636 was reported for 9.4.4 but it existed in older
> branches so Tom fixed it in all active branches (ie:
> http://www.postgresql.org/message-id/e1zfjgx-0005lu...@gemulon.postgresql.org).
> is it possible (even if not yet implemented) to add that information?

It is possible.  I don't know yet how easy it will be to automate it for
all the back patches.  I think I can look for matching commit messages
but I haven't written any code yet.  It might be possible to infer
this information after the fact by looking at where on which branches
a commit fix was applied.

> also i like that we can search on bugs but we can filter by version.
> i'm just guessing that if someone looks for a bug he's probably
> worrying about the specific version he is using.

I'll probably get to adding filtering soon-ish.  Maybe even today.  I
haven't decided if I want to do that on the server side, or add some
javascript to let the user sort and filter whatever the page has already
returned.  I'm not really a web guy, so it takes me a while to figure
out what to do.

> having a bug tracker that allow us to not lose track of bugs is good,
> having a bug tracker that allow us to have the whole information on a
> bug is better, IMHO.

I agree.  It's just a matter of somehow automating the process.  I'm
under no illusions though that I have any control over things.  I'm
hoping that one or more of the committers will say "we'd like to do it
this way" and I'll work with that.  Personally, I'd like to see either
'[Fixes #12345]' anywhere in a commit message, or 'Fixes: #12345' at the
beginning of a line.  But it has to come from them.

Also, the version numbers are user reported and a bit of a mess.  I
don't think they could really be relied on as is for users trying to
find out if a bug affects their version.  Someone would have to update
that information, and communicate that update to the tracker.  The
general concensus seems to be that magic phrases at the beginning of a
line in a message body is the way to go.  I don't necessarily agree, but
any consistent system can be made to work.  This obviously applies to
any metadata, not just version numbers.

> > A lot of the reports aren't bugs at all, but requests for help.  My
> > guess is that the users either don't know where to ask or don't
> > understand the difference between a bug and not knowing how to do what
> > they want to do.  Perhaps a more thorough explaination on the submission
> > form would be useful.
> >
> 
> the real problem here is that fill the bug report doesn't force you to
> register in a ML, while asking for help in a ML will. and a lot of
> people don't want to register in a ML, they just want a specific
> question answered so i don't think any change in the form will avoid
> that.

True.  Perhaps we could provide another form for other lists.  Probably
tilting at windmills here, but it would be nice if we could cut down
on the amount of time taken up by "this isn't a bug, you should go ask
down the hall".

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-10-05 Thread Nathan Wagner
I don't have the original message for this thread, so I arbitrarily picked a
message to reply to.

Since what has been asked for is a bug *tracker*, and we already have a bugs
mailing list, I put together something.

I downloaded the archives for pgsql-bugs, and fed them into a database.  This
part was easy, since I have already written a pg backed usenet server and had
the code hand for storing and parsing out bits of rfc 2822 messages.

It's dirt simple.  If the system sees a message with 'Bug #(\d+)' in the
subject line, it creates an entry in a bugs table with that bug number (if
needed), and then marks the message as belonging to that bug.  If there seems
to be metadata about the bug in the format of the (unquoted)

Bug reference:
Logged by:  
Email address:
PostgreSQL version:
Operating system:
Description:
Details:

it pulls that out and puts it in the bugs table.  There's also an "open"
boolean in the table, defaulting to true.

The results can be found at https://granicus.if.org/pgbugs/

Ok.  So now we have a bug tracker, but...

Some open questions that I don't think have really been addressed, with my
commentary interspersed:

1: Can a bug be more than "open" or "closed"?

I think yes.  At least we probably want to know why a bug is closed.  Is it not
a bug at all, not our bug, a duplicate submission, a duplicate of another bug,
something we won't fix for some reason (e.g. a bug against version 7)

2: Who can declare a bug closed.

Ugh.  I'm going to close some of them if it seems obvious to me that they
should be closed.  But what if it's not obvious?  I could probably maintain it
to some extent, but I don't know how much time that would actually take.

Related to the next point, it probably makes sense to just close up front
bugs that are marked against unsupported pg versions, or haven't had
any activity for too long, perhaps two years.  Just closing bugs with no
mailing list activity for two years closes 5280 of 6376 bugs.

3: How far back should I actually import data from the bugs list?

I have imported each archived month from December of 1998.  It looks like the
bug sequence was started at 1000 in December of 2003.  Emails with no bug id in
the subject line don't get associated with any bug, they're in the DB bug not
really findable.

4: What should I do with emails that don't reference a bug id but seem to be
talking about a bug?

I suggest we do nothing with them as far as the bug tracker is concerned.  If
people want to mark their message as pertaining to a bug, they can put that in
the subject line.  However, I don't think a bug id can be assigned via email,
that is, I think you have to use a web form to create a bug report with a bug
id.  Presumably that could change if whoever runs the bug counter wants it to.

5: How can we use email to update the status of a bug?

I suggest using email headers to do this.  'X-PGBug-Fixed: ' and the
like.  I assume here that everyone who might want to do such a thing uses an
MUA that would allow this, and they know how.

6: Does there need to be any security on updating the status?

Probably not.  I don't think it's the sort of thing that would attract
malicious adjustments.  If I'm wrong, I'd need to rethink this.  I realize I'm
making security an afterthought, which makes my teeth itch, but I think layers
of security would make it much less likely to be actually adopted.

Just to be clear, this is both a work in progress and a proof of concept.  It's
slow, it's ugly.  I haven't created any indexes, written any css or javascript,
or implemented any caching.  I may work on that before you read this though.

Comments are welcome, and no, I don't really expect that this will be what gets
adopted, mainly I wanted to show that we can probably just build something
rather effective off our existing infrastructure, and I had Saturday free (as
of this writing, I've got maybe 5 hours into it total, albeit with lots of
code re-use from other projects).  There are some obvious todo items,
filtering and searching being the most salient.

Some data import issues:

March 10, 2003, bad Date header, looked like junk anyway, so I deleted it.

Time zone offsets of --0400 and -0500 (at least), I took these as being -0400
(or whathaveyou).

Date header of Sat, 31 May 2008 12:12:18 +1930, judging by the name on the
email, this is probably posted from Venezuela, which switched to time one -0430
in 2007, which could also be thought of as +1930 if you ignore the implied date
change.

And, by way of some statistics, since I've got the archives in a database:

Emails: 43870
Bugs: 6376
Distinct 'From' headers: 10643

The bugs have 3.5 messages each on average, with 2 being the most common
number, and 113 at the most, for bug 12990.  1284 bugs have only one message
associated with them.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!]

2015-10-04 Thread Nathan Wagner
On Sun, Oct 04, 2015 at 04:30:49PM -0700, Josh Berkus wrote:
> That would be the key part, wouldn't it?  Nice that you have [code to
> store and parse email messages].

Yeah.  It actually made most of the work pretty easy.  It's available
with a bunch of other code at https://pd.if.org/git/ if anyone wants it.
I did find a bug in my header processing though, so I'll need to commit
that fix.

> We'd also want a way to link a bug fix to a commit, and probably a way
> to give the bug a list of searchable keywords (and add to that list).

I've been thinking of hooking it up to the fti machinery and providing
a search box.  I've never really used fti before, so this might be a
good opportunity to learn it for real.

> > it probably makes sense to just close up front bugs that are marked
> > against unsupported pg versions, or haven't had any activity for too
> > long, perhaps two years.
 
> I'm reluctant to close all of those unexamined, since part of the
> purpose of this is to find bugs which were never fixed.  Probably we
> should organize a posse to comb trhough all of the old bugs and
> hand-close them.

I'm doing some of that as I poke at the bugs pages.  Perhaps it would
make sense to have a closed status of 'stale' or the like (perhaps
that's what you meant by 'timed out') which could be used to get bugs
out of the main list but still be marked as 'not human reviewed'.  These
could then be reviewed by the posse.

> Yeah, fixing this [email's without a bug id] would probably be tied to
> the possible change to mailman.  Unless someone already has a way to
> get majordomo to append a bug ID.

How are bug ids assigned now?  From the evidence, I assume there is a
sequence in a database that the web submission form queries to format a
resulting email to the bugs mailing list.  Do the mailing lists live on
the same server?  If so, perhaps it would be easy to assign a new bug id
to a new thread on -bugs.  But perhaps that's too aggressive in creating
bugs.

> > 5: How can we use email to update the status of a bug?
> > 
> > I suggest using email headers to do this.  'X-PGBug-Fixed:
> > ' and the like.  I assume here that everyone who might
> > want to do such a thing uses an MUA that would allow this, and they
> > know how.
> 
> I guess that depends on who we expect to use this, at least for
> closing stuff.

I could certainly support more than one mechanism.  A web interface for
those who would prefer such and emails would seem to be the basic
requirements.

> > 6: Does there need to be any security on updating the status?
> > 
> > Probably not.  I don't think it's the sort of thing that would
> > attract malicious adjustments.  If I'm wrong, I'd need to rethink
> > this.  I realize I'm making security an afterthought, which makes my
> > teeth itch, but I think layers of security would make it much less
> > likely to be actually adopted.
> 
> I think there needs to be some kind of administrative access which
> allows, for example, an issue to be closed so that it can't be
> reopened.

I guess technically we have that now since I'm the only one who can
close or open a bug in the db I've set up :)

Seriously though, I think it probably makes the most sense to tie the
system in with the existing pg community accounts if it goes that far.

> Anyway, I'm not convinced we want to reinvent this particular wheel, but
> if we do, you've done a yeoman's job.

Thank you.  (Assuming I've interpreted the phrase correctly, I'm not
familiar with it, and a web search was only semi-enlightening).

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!]

2015-10-04 Thread Nathan Wagner
I don't have the original message for this thread, so I arbitrarily picked a
message to reply to.

Since what has been asked for is a bug *tracker*, and we already have a bugs
mailing list, I put together something.

I downloaded the archives for pgsql-bugs, and fed them into a database.  This
part was easy, since I have already written a pg backed usenet server and had
the code hand for storing and parsing out bits of rfc 2822 messages.

It's dirt simple.  If the system sees a message with 'Bug #(\d+)' in the
subject line, it creates an entry in a bugs table with that bug number (if
needed), and then marks the message as belonging to that bug.  If there seems
to be metadata about the bug in the format of the (unquoted)

Bug reference:
Logged by:  
Email address:
PostgreSQL version:
Operating system:
Description:
Details:

it pulls that out and puts it in the bugs table.  There's also an "open"
boolean in the table, defaulting to true.

The results can be found at https://granicus.if.org/pgbugs/

Ok.  So now we have a bug tracker, but...

Some open questions that I don't think have really been addressed, with my
commentary interspersed:

1: Can a bug be more than "open" or "closed"?

I think yes.  At least we probably want to know why a bug is closed.  Is it not
a bug at all, not our bug, a duplicate submission, a duplicate of another bug,
something we won't fix for some reason (e.g. a bug against version 7)

2: Who can declare a bug closed.

Ugh.  I'm going to close some of them if it seems obvious to me that they
should be closed.  But what if it's not obvious?  I could probably maintain it
to some extent, but I don't know how much time that would actually take.

Related to the next point, it probably makes sense to just close up front
bugs that are marked against unsupported pg versions, or haven't had
any activity for too long, perhaps two years.  Just closing bugs with no
mailing list activity for two years closes 5280 of 6376 bugs.

3: How far back should I actually import data from the bugs list?

I have imported each archived month from December of 1998.  It looks like the
bug sequence was started at 1000 in December of 2003.  Emails with no bug id in
the subject line don't get associated with any bug, they're in the DB bug not
really findable.

4: What should I do with emails that don't reference a bug id but seem to be
talking about a bug?

I suggest we do nothing with them as far as the bug tracker is concerned.  If
people want to mark their message as pertaining to a bug, they can put that in
the subject line.  However, I don't think a bug id can be assigned via email,
that is, I think you have to use a web form to create a bug report with a bug
id.  Presumably that could change if whoever runs the bug counter wants it to.

5: How can we use email to update the status of a bug?

I suggest using email headers to do this.  'X-PGBug-Fixed: ' and the
like.  I assume here that everyone who might want to do such a thing uses an
MUA that would allow this, and they know how.

6: Does there need to be any security on updating the status?

Probably not.  I don't think it's the sort of thing that would attract
malicious adjustments.  If I'm wrong, I'd need to rethink this.  I realize I'm
making security an afterthought, which makes my teeth itch, but I think layers
of security would make it much less likely to be actually adopted.

Just to be clear, this is both a work in progress and a proof of concept.  It's
slow, it's ugly.  I haven't created any indexes, written any css or javascript,
or implemented any caching.  I may work on that before you read this though.

Comments are welcome, and no, I don't really expect that this will be what gets
adopted, mainly I wanted to show that we can probably just build something
rather effective off our existing infrastructure, and I had Saturday free (as
of this writing, I've got maybe 5 hours into it total, albeit with lots of
code re-use from other projects).  There are some obvious todo items,
filtering and searching being the most salient.

Some data import issues:

March 10, 2003, bad Date header, looked like junk anyway, so I deleted it.

Time zone offsets of --0400 and -0500 (at least), I took these as being -0400
(or whathaveyou).

Date header of Sat, 31 May 2008 12:12:18 +1930, judging by the name on the
email, this is probably posted from Venezuela, which switched to time one -0430
in 2007, which could also be thought of as +1930 if you ignore the implied date
change.

And, by way of some statistics, since I've got the archives in a database:

Emails: 43870
Bugs: 6376
Distinct 'From' headers: 10643

The bugs have 3.5 messages each on average, with 2 being the most common
number, and 113 at the most, for bug 12990.  1284 bugs have only one message
associated with them.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patch for geqo tweaks

2015-09-08 Thread Nathan Wagner
I have two patches to make the geqo initialization and mutation
slightly better.

The first adjusts the mutation swaps to avoid having to re-pick
ties.  The second changes the initialization and shuffling algorithm
for the gene array to use an in-place Fisher-Yates shuffling
algorithm.

Diffs against commit 49124613f134b04594b1a5c46368eb0a5db16d4b
(i.e. tip of master as of when I re-made the diff).

On my system the patches pass a ./configure; make; make check

-- 
nw
>From c7f3c7cc37f943481b2358149210789be3d39ee9 Mon Sep 17 00:00:00 2001
From: Nathan Wagner <n...@hydaspes.if.org>
Date: Sun, 21 Sep 2014 09:30:01 +
Subject: [PATCH 1/2] cleanup geqo_mutation.c

Avoid a possible random number collision by choosing random
ranges better.  This will potentially change the output
of the algorithm since it avoids extra calls to geqo_randint().
---
 src/backend/optimizer/geqo/geqo_mutation.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/geqo/geqo_mutation.c 
b/src/backend/optimizer/geqo/geqo_mutation.c
index 1a06d49..c78bd2c 100644
--- a/src/backend/optimizer/geqo/geqo_mutation.c
+++ b/src/backend/optimizer/geqo/geqo_mutation.c
@@ -43,20 +43,18 @@ geqo_mutation(PlannerInfo *root, Gene *tour, int num_gene)
int num_swaps = geqo_randint(root, num_gene / 3, 0);
Genetemp;
 
-
while (num_swaps > 0)
{
swap1 = geqo_randint(root, num_gene - 1, 0);
-   swap2 = geqo_randint(root, num_gene - 1, 0);
+   swap2 = geqo_randint(root, num_gene - 2, 0);
 
-   while (swap1 == swap2)
-   swap2 = geqo_randint(root, num_gene - 1, 0);
+   if (swap2 >= swap1)
+   swap2++;
 
temp = tour[swap1];
tour[swap1] = tour[swap2];
tour[swap2] = temp;
 
-
num_swaps -= 1;
}
 }
-- 
2.5.0

>From 47979bb844eda3821f08c1f00afc081ee0a3a260 Mon Sep 17 00:00:00 2001
From: Nathan Wagner <n...@hydaspes.if.org>
Date: Sun, 21 Sep 2014 13:02:48 +
Subject: [PATCH 2/2] rewrote array shuffler for geqo_recombination

This simplifies the array shuffling by doing it inplace
and eliminating a memory allocation and array initialization.

The output is identical in the sense that it still randomizes
an array, but the actual shuffled array will be different.
---
 src/backend/optimizer/geqo/geqo_recombination.c | 32 +++--
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/src/backend/optimizer/geqo/geqo_recombination.c 
b/src/backend/optimizer/geqo/geqo_recombination.c
index 652fadc..b327776 100644
--- a/src/backend/optimizer/geqo/geqo_recombination.c
+++ b/src/backend/optimizer/geqo/geqo_recombination.c
@@ -37,31 +37,17 @@
 void
 init_tour(PlannerInfo *root, Gene *tour, int num_gene)
 {
-   Gene   *tmp;
-   int remainder;
-   int next,
-   i;
+   int i, j;
 
-   /* Fill a temp array with the IDs of all not-yet-visited cities */
-   tmp = (Gene *) palloc(num_gene * sizeof(Gene));
-
-   for (i = 0; i < num_gene; i++)
-   tmp[i] = (Gene) (i + 1);
-
-   remainder = num_gene - 1;
-
-   for (i = 0; i < num_gene; i++)
-   {
-   /* choose value between 0 and remainder inclusive */
-   next = geqo_randint(root, remainder, 0);
-   /* output that element of the tmp array */
-   tour[i] = tmp[next];
-   /* and delete it */
-   tmp[next] = tmp[remainder];
-   remainder--;
+   /* shuffle tour in-place, uses Fisher-Yates inside-out algorithm */
+   for (i=0; i < num_gene; i++) {
+   j = geqo_randint(root, i, 0);
+   /* we could skip the compare and just assign */
+   if (i != j) {
+   tour[i] = tour[j];
+   }
+   tour[j] = (Gene) i+1;
}
-
-   pfree(tmp);
 }
 
 /* alloc_city_table
-- 
2.5.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ossp-uuid Contrib Patch

2012-09-12 Thread Nathan Wagner
On Mon, Sep 10, 2012 at 04:23:00PM -0700, David E. Wheeler wrote:
 Well given that OSSP seems to be abandon ware (no activity since July
 2008), it might be time to dump it in favor of something else.

Perhaps this would be a good time to bring up my uuid code again.  I've
got a module for uuid support for postgres.  Originally written for 8.4
but I've kept it updated.  It's available at

https://pd.if.org/git/uuid.git

and for anonymous git cloning

git clone https://pd.f.org/git/uuid

The install directions are perhaps a bit lacking, but after the git
checkout:

cd uuid
make
cd postgres
make
sudo make install
psql -c 'create extension pduuid'

Installing to different schemas is supported also.  The postgres/README
file describes the supported features.  Briefly, my code supports uuid
generation of version 1, 3, 4, and 5 uuids, casting to and from numeric,
bit string, and bytea values, and provides a uuid_recent() function to
get the most recent  uuid generated in that backend.  The code is
entirely in the public domain.

At one point, it compiled on Windows and MacOS, but I haven't
tested that recently.  I don't think I have changed anything
that would affect compiling on those platforms.  IIRC, the
only platform dependent code is in figuring out a local
mac address and a sub-second timestamp.

Comments or suggestions welcome.

-- 
nw


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql line number reporting from stdin

2011-11-28 Thread Nathan Wagner

On Sat, 26 Nov 2011 22:36:15 +0200, Peter Eisentraut wrote:

There is a long-standing oddity in psql that running

psql -f foo.sql

returns error messages with file name and line number, like

psql:foo.sql:1: ERROR:  syntax error at or near foo

but running

psql  foo.sql does not.  I suggest we change the latter to print

psql:stdin:1: ERROR:  syntax error at or near foo

Other examples for the use of the spelling stdin in this context
include gcc and slonik.

Error messages printed in interactive mode will not be affected, of
course.

Patch attached.


No issue with the change itself, but the docs claim
that

the variant using the shell's input redirection is
(in theory) guaranteed to yield exactly the same output
you would have received had you entered everything by hand.

--
nw

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 9.0 and standard_conforming_strings

2010-02-03 Thread Nathan Wagner
On Wed, 03 Feb 2010 14:41:13 -0500, Tom Lane t...@sss.pgh.pa.us wrote:

 Indeed it is, which is one of the reasons to be cautious with changing
 it.  We've been telling people to move away from \' for a long time,
 but actually flipping the switch that will make their apps insecure
 is not something to do on the spur of the moment.

AFAICT the switch was added in 8.2, and mentioned in the release notes
dated 2006-12-05.  The documentation for 8.2 says The default is
currently
off, causing PostgreSQL to have its historical behavior of treating
backslashes
as escape characters. The default will change to on in a future release
to improve compatibility with the standard.

So people have had three years of warning, which I would hardly
characterize
as spur of the moment.  If you want the old behavior, change the setting
to off.

I think that a major release point is exactly the right time to do this,
doing it at a minor release number is much less reasonable.

A question for those opposed to doing it now: how exactly do you propose
to
warn people that is different than the notice that it will be changed in
a future release that has been around for the last three years?

-- 
nw

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] uuid type for postgres

2005-09-07 Thread nathan wagner
On Wed, Sep 07, 2005 at 09:45:17AM -0700, josh@agliodbs.com wrote:

 I think the issue is portability.  Remember that this type needs to work on 
 Windows as well as all POSIX platforms and AIX.

I had forgotten about windows.  I'll see to what extent the library i'm
using is portable to windows.

  It would appear that linking against libraries is just fine, I don't
  see that uuid is any different.
 
 Is it portable to all platforms we support?

I don't know yet.  It could be made to be so though.  I don't have any way
to compile on windows, so someone would have to be willing to try that.
Same with AIX.  If someone with an AIX box could download and test
it out I would appreciate a compatibility report.

 How does it generate its machine identifier?

No idea.  Does it matter?  Not having to fret this kind of detail is
the advantage of using someone else's library.

 One of the differences between an add-in and core code is support for all 
 PostgreSQL platforms.

So would including my type in contrib be possible then?  To me, that
is perhaps more important than making it a core type, though i think
there should be a core UUID type as well.

I'm testing out elog().  I'll post a new version later today.  There really
won't be any new functionality, it's just code clean up.

-- 
Nathan Wagner

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] uuid type for postgres

2005-09-07 Thread nathan wagner
On Wed, Sep 07, 2005 at 09:48:08AM -0700, josh@agliodbs.com wrote:

 That's good, it gives users options.  And I can see why you don't want to 
 re-create the functionality in PG code, it's probably pretty large.

It would also be something else that would have to be maintained and
debugged.  Which i would be starting from scratch.  The OSSP
UUID library is at version 1.3.  From this I infer that he has spent
some time maintaining and debugging it.  The less duplication of
code and effort the better.

There may be a problem with using the library as a core type though,
due to windows portability concerns.  That said, it would probably
be easier to port the underlying library than to start from scratch.

 Well, IMHO, if the unguessability of UUIDs is an important part of your spec, 
 you have some design problems.

I was thinking of cryptographic applications.  Then, to use an
example from another domain, initial TCP sequence numbers should
be random (i.e. unguessable).  A problem with TCP perhaps.  It's
been a while since i've read over my copy of _Applied Cryptography_,
but I seem to recall that unguessable numbers were sometimes useful.

-- 
Nathan Wagner

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] uuid type for postgres

2005-09-07 Thread nathan wagner
I have made a new version, and made a web page for it.

http://granicus.if.org/~nw/uuid/

Given the statement that it won't be accepted for contrib or core
unless it compiles on windows, I guess I won't really have anything
further to say on the topic.  I don't have any way to compile on windows,
so it's not an obstacle I can readily address.

Comments for improvements are welcome.  Other sorts of comments on the code are
welcome as well.  I hope someone else will find it useful.

I won't post any further update/release notices to the list, they're
not really on topic, I posted these two so that folks could look at the code
i was proposing if they wished.

-- 
Nathan Wagner

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] uuid type for postgres

2005-09-06 Thread nathan wagner
---BeginMessage---
I have been in need of a uuid type and ran across the pguuid download
by Xiongjian (Mike) Wang.  This wasn't really useful to me for two
reasons: first, it is GPLed and I would prefer a more liberal license,
secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a
SIOCGETIFHWADDR (? i think, i can get the exact name if you want it)
ioctl() under darwin.

While I could dike out the code that calls it, that seems like a suboptimal
solution.  So after a bit of poking around the interweb i ran across
Ralf Engelschall's ossp uuid library.  This compiled with minimal
effort on mac os.  Some reading, and an evening later, i've made
a server plugin with supporting SQL that implements an 'ossp_uuid'
type.

Now i have four questions:

1: Is it feasible for this to be included in the contrib section of
the regular download?  The uuid library is a notice of copyright style
license, and I am willing to put my own code into the public domain.

2: Would just calling the type 'uuid' be better than 'ossp_uuid'?  It's
certainly a nicer name.

3: Would it be possible to include such a type as a postgres extension
to the usual SQL types.  It seems to me that having an officially
supported type would be better than a user contributed type on the grounds
that you could then rely on it being avaiable if postgres was.
In particular, installing it as an extension would require the cooperation
of the DBA, which may be infeasible in some environments.

-- 
Nathan Wagner
---End Message---


Re: [HACKERS] uuid type for postgres

2005-09-06 Thread nathan wagner
On Tue, Sep 06, 2005 at 11:38:57AM -0400, [EMAIL PROTECTED] wrote:
 
 There is pretty much zero chance of being accepted into contrib, much
 less core, if the code isn't pure BSD license.

Hmm.  Here is the copyright and license portion of the readme...

  COPYRIGHT AND LICENSE

  Copyright (c) 2004-2005 Ralf S. Engelschall [EMAIL PROTECTED]
  Copyright (c) 2004-2005 The OSSP Project http://www.ossp.org/

  This file is part of OSSP uuid, a library for the generation
  of UUIDs which can found at http://www.ossp.org/pkg/lib/uuid/

  Permission to use, copy, modify, and distribute this software for
  any purpose with or without fee is hereby granted, provided that
  the above copyright notice and this permission notice appear in all
  copies.

  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  IN NO EVENT SHALL THE AUTHORS AND COPYRIGHT HOLDERS AND THEIR
  CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
  USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
  ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
  OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  SUCH DAMAGE.

Reading the postgres license from the faq...

PostgreSQL Data Base Management System

Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group Portions
Copyright (c) 1994-1996 Regents of the University of California

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement is
hereby granted, provided that the above copyright notice and this paragraph and
the following two paragraphs appear in all copies.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST
PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF
THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN AS IS BASIS, AND
THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

These two licenses appear to be functionally identical.  Both require
notice of the copyright to be included, and both allow use for any purpose.
The wording of the requirement is even nearly identical.

By way of disclaimer, I am not (yet) an attorney.  

 (We used to be a bit
 lax about that, but are trying to clean things up.)  Nathan's comment
 about starting with code that was sorta-BSD-with-advertising alarmed me.

Perhaps i was a bit lax in my wording.  I don't read the license
as requiring any advertising at run time, just as a requirement that
the copyright notice be kept with the source code.  That is, an
identical requirement to the one that postgres itself uses.

-- 
Nathan Wagner

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] uuid type for postgres

2005-09-06 Thread nathan wagner
On Tue, Sep 06, 2005 at 03:57:55PM -0400, [EMAIL PROTECTED] wrote:
 I agree with Josh on the UUID type, it gets abused far too often

Out of curiosity, how does it get abused?  It doesn't seem to me
that it would be any more prone to abuse than any other type.

 and (IMHO) isn't widely enough used to belong in the core.

Seems like a self-fulfilling cycle.  There isn't one in core
or contrib (pguuid is not in contrib, IIRC), and so one doesn't
get used.  Since no one uses one, it's not in core or contrib.

 Couldn't you just fix the 
 problem in pguuid rather than write a whole new type?

Not sure which you you're addressing here, but I can't fix the license
problem.  Nor can I readily fix the lack of a mac ioctl() to get
the hardware mac address.

In any case, from an economy of effort view, I'd much rather maintain
a postgres extension/interface to an existing uuid library, than fret
the details of writing a uuid library myself.  UUIDs themselves don't
really interest me, I just happen to need one for my application.

I'm basically done.  I'll tar it up and post a link to an alpha version in a
couple of hours.  As far as i can tell, it works fine.  I'm calling it alpha
because i'm not doing proper error checking, and i'm not sure how to log or
throw an error even if i detect one.  Also, I have only compiled it on
Mac OS 10.3, I'll want to test it on my linux box.

-- 
Nathan Wagner

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] uuid type for postgres

2005-09-06 Thread nathan wagner
As promised a link to the code.

http://granicus.if.org/~nw/ossp_pg_uuid-0.1.tar.gz

You'll also need Ralf Engelschall's uuid library, which mine is a postgres
interface to.  It's available at

ftp://ftp.ossp.org/pkg/lib/uuid/uuid-1.3.0.tar.gz

It probably has a few warts.  I'm mainly posting it for suggestions,
comments, and so we have something that isn't vaporware to argue about.

:)

-- 
Nathan Wagner

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] uuid type for postgres

2005-09-06 Thread nathan wagner
On Tue, Sep 06, 2005 at 05:54:34PM -0400, [EMAIL PROTECTED] wrote:
 
 One thing that is raising my own level of concern quite a bit is the
 apparent portability issues.

I can't speak to the portability in general, but there is a PORTING file
in the ossp uuid library that states

OSSP uuid was already written with maximum portability in mind, so
  there should be no great effort required to get it running on any Unix
  platform with a reasonable POSIX API. Additionally, the portability
  was tested by successfully building and running it on the following
  particular Unix platforms (syntax is cpu-os (compiler)):

  alpha-tru644.0 (cc) alpha-tru645.1 (gcc, cc) hppa-hpux11.11 (cc)
ia64-hpux11.23 (cc) ix86-debian2.2 (gcc, icc) ix86-debian3.0 (gcc)
ix86-debian3.1 (gcc) ix86-freebsd4.9 (gcc) ix86-freebsd5.2 (gcc, icc)
ix86-netbsd1.6 (gcc) ix86-qnx6.2 (gcc) ix86-solaris10 (gcc) ix86-unixware7.1.3
(cc) mips64-irix6.5 (gcc) sparc64-solaris8 (gcc, forte) sparc64-solaris9 (gcc)

On my end I managed to compile it with nothing more than a configure,
make, followed by a make install.

 Code that isn't completely portable is a huge maintainability problem; in
 particular, stuff that requires system-dependent behavior used nowhere
 else in Postgres is a real pain. It sounds like the UUID code expects to
 be able to get at the machine's MAC address,

If the mac address isn't available, I believe it falls back on using
a random 47 bit number with the 48th bit set to make the mac address
fall within the multicast mac numberspace.  You could also use a version
4 uuid, or derive a version 3 or 5 uuid from some other source.

The better answer though, is these sort of questions are exactly why
I would prefer to rely on someone else's library.  Just as I basically
trust that the folks maintaining postgres aren't going to munge my tables
and destroy my data if i mess up a transaction and roll it back,
because they've spent time thinking about just that sort of problem, I
also (having worked with the code a bit now) trust the UUID folks
to have thought about just how do we make a unique number without
centralized coordination?  base on the mac address?  what if we
don't have one?  or don't know it for some reason?  I assume here that
the answer they came up with wasn't oh, hell, just return a 42 then.

The point being, that other people have already written a better uuid
library than i am likely to, so, license permitting, let's use it.

 The bottom line is that we're willing to listen, but it's not by any
 means a slam dunk to get this into the distribution.

Fair enough.

Personally, I think it should be a core type, but would be quite happy
if it were in contrib.  At least that way it would save the next guy
from having to hunt around the net.

I guess i'm volunteering to maintain it in contrib.  I'm not certain
if i have the requisite knowledge to maintain it in the core.  While
I could acquire the familiarity if need be, for the next year and nine
months law school is going to take up the bulk of my free time.  And
of course I'll still need time to play around with my ticketing and gis
databases i'm developing.

-- 
Nathan Wagner

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] uuid type for postgres

2005-09-06 Thread nathan wagner
On Tue, Sep 06, 2005 at 06:40:27PM -0700, josh@agliodbs.com wrote:
 
 Sure.  What's a UUID, after all?   It's three pieces of information:
 1) A server or database instance identifier
 2) A table identifier
 3) A row identifier
 
 e.g.:
 chayote.sf.agliodbs.com | public.customers | 4271
 
 Generally, it's pretty easy to just grab these 3 pieces of information 
 separately and pass them with the data when you want to transmit between 
 servers.  If there are space/overhead considerations, you can put them 
 into a reversable hash.

Take a look at the version 3 or version 5 UUIDs.  They essentially do this.
The hash isn't reversable, but rather recreatable.  Seems that if it
were reversable, it would be compression, not a hash.  Anyway.

You're assuming though that you want to leak this information.  If you do,
or you don't care, go ahead with the v3 or v5 uuid creator.

Or use v1 if you want mac/time based uuids, or v4 for random uuids.

The convenient thing about using the uuid library is that all of these
mechanisms result in the same type, and are thus compatible.

 I've never understood the complex measures which 
 application developers take to create universal IDs.

Different applications have different goals.  Unguessability might
be important in some contexts.

-- 
Nathan Wagner

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] uuid type for postgres

2005-09-06 Thread nathan wagner
On Tue, Sep 06, 2005 at 08:40:08PM -0700, [EMAIL PROTECTED] wrote:
 
 - linking against libuuid is fine for a contrib/ extension, but no  
 good for a built-in type.  A real uuid would have to do a proper  
 independent implementation of uuid creation within pgsql.

Why?

I'm not sure what the default build instructions are, but i've got
the following on my linux box...

granicus:~% ldd /usr/bin/postgres
linux-gate.so.1 =  (0x00368000)
libpam.so.0 = /lib/libpam.so.0 (0x00311000)
libssl.so.5 = /lib/libssl.so.5 (0x009b3000)
libcrypto.so.5 = /lib/libcrypto.so.5 (0x00507000)
libkrb5.so.3 = /usr/lib/libkrb5.so.3 (0x003ab000)
libz.so.1 = /usr/lib/libz.so.1 (0x00c55000)
libreadline.so.5 = /usr/lib/libreadline.so.5 (0x00c6a000)
libtermcap.so.2 = /lib/libtermcap.so.2 (0x00dfa000)
libcrypt.so.1 = /lib/libcrypt.so.1 (0x009f7000)
libresolv.so.2 = /lib/libresolv.so.2 (0x0014e000)
libnsl.so.1 = /lib/libnsl.so.1 (0x00a62000)
libdl.so.2 = /lib/libdl.so.2 (0x00c4f000)
libm.so.6 = /lib/libm.so.6 (0x00c29000)
libc.so.6 = /lib/libc.so.6 (0x00afd000)
libcom_err.so.2 = /lib/libcom_err.so.2 (0x0038e000)
libgssapi_krb5.so.2 = /usr/lib/libgssapi_krb5.so.2 (0x0099a000)
libk5crypto.so.3 = /usr/lib/libk5crypto.so.3 (0x00483000)
/lib/ld-linux.so.2 (0x00adf000)
libkrb5support.so.0 = /usr/lib/libkrb5support.so.0 (0x0041f000)

Quite a list.  I wonder what readline is doing there.

It would appear that linking against libraries is just fine, I don't
see that uuid is any different.

 - we cannot snarf libuuid code, it is LGPL (though perhaps the author  
 would re-license. if that is the *only* objection, it is well worth  
 asking)

Not sure what uuid library you were using, but the one i used
is not LGPL.  I posted earlier the copyright and license portion of
the readme.  No relicensing would be necessary.

-- 
Nathan Wagner

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings