from:"Mike Rylander"

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-10-06 Thread Mike Rylander

On Fri, Oct 6, 2017 at 1:22 PM, Paul A Jungwirth
 wrote:
> On Fri, Jul 22, 2016 at 4:15 AM, Anton Dignös  wrote:
>> We would like to contribute to PostgreSQL a solution that supports the query
>> processing of "at each time point". The basic idea is to offer two new
>> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
>> ranges of tuples so that subsequent queries can use the usual grouping and
>> equality conditions to get the intended results.
>
> I just wanted to chime in and say that the work these people have done
> is *amazing*. I read two of their papers yesterday [1, 2], and if you
> are interested in temporal data, I encourage you to read them too. The
> first one is only 12 pages and quite readable. After that the second
> is easy because it covers a lot of the same ground but adds "scaling"
> of values when a tuple is split, and some other interesting points.
> Their contributions could be used to implement SQL:2011 syntax but go
> way beyond that.
>

I've also been following this feature with great interest, and would
definitely throw whatever tiny weight I have, sitting out here in the
the peanut gallery, behind accepting the ALIGN and NORMALIZE syntax.
I estimate that about a third of the non-trivial queries in the
primary project I work on (and have, on Postgres, for the last 13+
years) would be simpler with support of the proposed syntax, and some
of the most complex business logic would be simplified nearly to the
point of triviality.

Anyway, that's my $0.02.

Thank you, Anton and Peter!

-- Mike

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-02-11 Thread Mike Rylander

On Thu, Feb 11, 2016 at 8:11 AM, Teodor Sigaev <teo...@sigaev.ru> wrote:
>> I have attached a new version of the patch. It fixes error of operators
>> <->> and
>> %>:
>> - operator <->> did not pass the regression test in CentOS 32 bit (gcc
>> 4.4.7
>> 20120313).
>> - operator %> did not pass the regression test in FreeBSD 32 bit (gcc
>> 4.2.1
>> 20070831).
>>
>> It was because of variable optimization by gcc.
>
>
> Fixed with volatile modifier, right?
>
> I'm close to push this patches, but I still doubt in names, and I'd like to
> see comment from English speackers:
> 1 sml_limit GUC variable (options: similarity_limit, sml_threshold)
> 2 subword_similarity(). Actually, it finds most similar word (not
> substring!) from whole string. word_similarity? word_in_string_similarity?
>

At least for this English speaker, substring_similarity is not
confusing even if it's not internally accurate, but English is a
strange language.

Because I want the bike shed to be blue, how does
query_string_similarity sound instead?  If that's overly precise, then
word_similarity would be fine with me.

Thanks,

--
Mike Rylander


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] old bug in full text parser

2016-02-10 Thread Mike Rylander

On Wed, Feb 10, 2016 at 4:28 AM, Oleg Bartunov <obartu...@gmail.com> wrote:
> It  looks like there is a very old bug in full text parser (somebody pointed
> me on it), which appeared after moving tsearch2 into the core.  The problem
> is in how full text parser process hyphenated words. Our original idea was
> to report hyphenated word itself as well as its parts and ignore hyphen.
> That was how tsearch2 works.
>
> This behaviour was changed after moving tsearch2 into the core:
> 1. hyphen now reported by parser, which is useless.
> 2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed differently
> than ones with plain text words like 'four-dot', no hyphenated word itself
> reported.
>
> I think we should consider this as a bug and produce fix for all supported
> versions.
>

The Evergreen project has long depended on tsearch2 (both as an
extension and in-core FTS), and one thing we've struggled with is date
range parsing such as birth and death years for authors in the form of
1979-2014, for instance.  Strings like that end up being parsed as two
lexems, "1979" and "-2014".  We work around this by pre-normalizing
strings matching /(\d+)-(\d+)/ into two numbers separated by a space
instead of a hyphen, but if fixing this bug would remove the need for
such a preprocessing step it would be a great help to us.  Would such
strings be parsed "properly" into lexems of the form of "1979" and
"2014" with you proposed change?

Thanks!

--
Mike Rylander

> After  investigation we found this commit:
>
> commit 73e6f9d3b61995525785b2f4490b465fe860196b
> Author: Tom Lane <t...@sss.pgh.pa.us>
> Date:   Sat Oct 27 19:03:45 2007 +
>
> Change text search parsing rules for hyphenated words so that digit
> strings
> containing decimal points aren't considered part of a hyphenated word.
> Sync the hyphenated-word lookahead states with the subsequent
> part-by-part
> reparsing states so that we don't get different answers about how much
> text
> is part of the hyphenated word.  Per my gripe of a few days ago.
>
>
> 8.2.23
>
> select tok_type, description, token from ts_debug('dot-four');
>   tok_type   |  description  |  token
> -+---+--
>  lhword  | Latin hyphenated word | dot-four
>  lpart_hword | Latin part of hyphenated word | dot
>  lpart_hword | Latin part of hyphenated word | four
> (3 rows)
>
> select tok_type, description, token from ts_debug('dot-4');
>   tok_type   |  description  | token
> -+---+---
>  hword   | Hyphenated word   | dot-4
>  lpart_hword | Latin part of hyphenated word | dot
>  uint| Unsigned integer  | 4
> (3 rows)
>
> select tok_type, description, token from ts_debug('4-dot');
>  tok_type |   description| token
> --+--+---
>  uint | Unsigned integer | 4
>  lword| Latin word   | dot
> (2 rows)
>
> 8.3.23
>
> select alias, description, token from ts_debug('dot-four');
>   alias  |   description   |  token
> -+-+--
>  asciihword  | Hyphenated word, all ASCII  | dot-four
>  hword_asciipart | Hyphenated word part, all ASCII | dot
>  blank   | Space symbols   | -
>  hword_asciipart | Hyphenated word part, all ASCII | four
> (4 rows)
>
> select alias, description, token from ts_debug('dot-4');
>alias   |   description   | token
> ---+-+---
>  asciiword | Word, all ASCII | dot
>  int   | Signed integer  | -4
> (2 rows)
>
> select alias, description, token from ts_debug('4-dot');
>alias   |   description| token
> ---+--+---
>  uint  | Unsigned integer | 4
>  blank | Space symbols| -
>  asciiword | Word, all ASCII  | dot
> (3 rows)
>
>
> Regards,
> Oleg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath changes in the recent back branches

2015-03-04 Thread Mike Rylander

On Thu, Feb 19, 2015 at 5:53 AM, Marko Tiikkaja ma...@joh.to wrote:

 Hi,

 Commit 79af9a1d2668c9edc8171f03c39e7fed571eeb98 changed xpath handling
 with regard to namespaces, and it seems to be fixing an actual issue.
 However, it was also backpatched to all branches despite it breaking for
 example code like this:

 do $$
 declare
 _x xml;
 begin
 _x := (xpath('/x:Foo/x:Bar', xml 'Foo 
 xmlns=teh:urnBarBaz1/BazBat2/Bat/Bar/Foo',
 array[['x','teh:urn']]))[1];
 raise notice '%', xpath('/Bar/Baz/text()', _x);
 raise notice '%', xpath('/Bar/Bat/text()', _x);
 end
 $$;

 The problem is that there's no way to write the code like this in such a
 way that it would work on both versions.  If I add the namespace, it's
 broken on 9.1.14.  Without it it's broken on 9.1.15.

 I'm now thinking of adding a workaround which strips namespaces, but that
 doesn't seem to be easy to do, even with PL/Perl.  Is there a better
 workaround here that I'm not seeing?


FWIW, I've been working around the bug fixed in that commit for ages by
spelling my xpath like this:

  xpath('/*[local-name()=Bar]/*[local-name()=Baz]/text()', data)

I've modularized my XML handling functions so the source of 'data' is
immaterial -- maybe it's a full document, maybe it's a fragment from a
previous xpath() call -- and the referenced commit is going to make correct
XPATH much more sane, readable, and maintainable.  I, for one, welcome it
wholeheartedly.

HTH,

--Mike

Re: [HACKERS] standby registration (was: is sync rep stalled?)

2010-10-04 Thread Mike Rylander

On Mon, Oct 4, 2010 at 3:25 PM, David Christensen da...@endpoint.com wrote:

 On Oct 4, 2010, at 2:02 PM, Robert Haas wrote:

 On Mon, Oct 4, 2010 at 1:57 PM, Markus Wanner mar...@bluegap.ch wrote:
 On 10/04/2010 05:20 PM, Robert Haas wrote:
 Quorum commit, even with configurable vote weights, can't handle a
 requirement that a particular commit be replicated to (A || B)  (C
 || D).

 Good point.

 Can the proposed standby registration configuration format cover such a
 requirement?

 Well, if you can name the standbys, there's no reason there couldn't
 be a parameter that takes a string that looks pretty much like the
 above.  There are, of course, some situations that could be handled
 more elegantly by quorum commit (any 3 of 5 available standbys) but
 the above is more general and not unreasonably longwinded for
 reasonable numbers of standbys.


 Is there any benefit to be had from having standby roles instead of 
 individual names?  For instance, you could integrate this into quorum commit 
 to express 3 of 5 reporting standbys, 1 berlin standby and 1 tokyo 
 standby from a group of multiple per data center, or even just utilize role 
 sizes of 1 if you wanted individual standbys to be named in this fashion.  
 This role could be provided on connect of the standby is more-or-less 
 tangential to the specific registration issue.


Big +1 FWIW.

-- 
Mike Rylander

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue: Deprecation of the XML2 module 'xml_is_well_formed' function

2010-06-28 Thread Mike Rylander

On Mon, Jun 28, 2010 at 11:08 AM, Mike Berrow mber...@gmail.com wrote:
 We need to make extensive use of the 'xml_is_well_formed' function provided
 by the XML2 module.
 Yet the documentation says that the xml2 module will be deprecated since
 XML syntax checking and XPath queries
 is covered by the XML-related functionality based on the SQL/XML standard in
 the core server from PostgreSQL 8.3 onwards.
 However, the core function XMLPARSE does not provide equivalent
 functionality since when it detects an invalid XML document,
 it throws an error rather than returning a truth value (which is what we
 need and currently have with the 'xml_is_well_formed' function).
 For example:
 select xml_is_well_formed('br/br2');
  xml_is_well_formed
 
  f
 (1 row)
 select XMLPARSE( DOCUMENT 'br/br2' );
 ERROR:  invalid XML document
 DETAIL:  Entity: line 1: parser error : expected ''
 br/br2
         ^
 Entity: line 1: parser error : Extra content at the end of the document
 br/br2
         ^
 Is there some way to use the new, core XML functionality to simply return a
 truth value
 in the way that we need?.

You could do something like this (untested):

CREATE OR REPLACE FUNCTION my_xml_is_valid ( x TEXT ) RETURNS BOOL AS $$
BEGIN
  PERFORM XMLPARSE( DOCUMENT x::XML );
  RETURN TRUE;
EXCEPTION WHEN OTHERS THEN
  RETURN FALSE;
END;
$$ LANGUAGE PLPGSQL;

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON manipulation functions

2010-05-14 Thread Mike Rylander

On Fri, May 14, 2010 at 1:15 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 13, 2010 at 9:47 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:

[snip]

 == array/object conversion ==

 The json_object function converts a tuple to a JSON object.  If there
 are duplicate column names, there will be duplicate keys in the
 resulting JSON object.

 json_object([content [AS name] [, ...]]) returns json

 Seems good.

 Likewise, the json_array function converts a tuple to a JSON array.
 Column names are ignored.

 json_array([content [AS name] [, ...]]) returns json

 I think this is pointless and should be omitted.


(I'm going to avoid the use of the term object here to reduce confusion.)

I disagree with the assertion that it's pointless, and I have a
specific use-case in mind for this function.  I have a system that
uses JSON arrays on the wire to encapsulate data, and both ends
understand the positional semantics of the elements.  Using JSON
arrays instead of JSON objects reduces the transfer size by 40-80%,
depending on how full the rows (or class instances) are and the data
types of the elements, simply by removing the redundant object keys.
This function would be extremely useful to me when creating or
persisting raw class instances of these sorts.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Mike Rylander

On Mon, May 10, 2010 at 6:03 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Robert Haas wrote:
 On Thu, May 6, 2010 at 2:47 PM, Josh Berkus j...@agliodbs.com wrote:
 Now that I've realized what the real problem is with max_standby_delay
 (namely, that inactivity on the master can use up the delay), I think
 we should do what Tom originally suggested here.  It's not as good as
 a really working max_standby_delay, but we're not going to have that
 for 9.0, and it's clearly better than a boolean.
 I guess I'm not clear on how what Tom proposed is fundamentally
 different from max_standby_delay = -1.  If there's enough concurrent
 queries, recovery would never catch up.

 If your workload is that the standby server is getting pounded with
 queries like crazy, then it's probably not that different: it will
 fall progressively further behind.  But I suspect many people will set
 up standby servers where most of the activity happens on the primary,
 but they run some reporting queries on the standby.  If you expect
 your reporting queries to finish in 10s, you could set the max delay
 to say 60s.  In the event that something gets wedged, recovery will
 eventually kill it and move on rather than just getting stuck forever.
  If the volume of queries is known not to be too high, it's reasonable
 to expect that a few good whacks will be enough to get things back on
 track.

 Yeah, I could live with that.

 A problem with using the name max_standby_delay for Tom's suggestion
 is that it sounds like a hard limit, which it isn't. But if we name it
 something like:

 # -1 = no timeout
 # 0 = kill conflicting queries immediately
 #  0 wait for N seconds, then kill query
 standby_conflict_timeout = -1

 it's more clear that the setting is a timeout for each *conflict*, and
 it's less surprising that the standby can fall indefinitely behind in
 the worst case. If we name the setting along those lines, I could live
 with that.

+1 from the peanut gallery.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-04-03 Thread Mike Rylander

On Sat, Apr 3, 2010 at 8:59 AM, Joseph Adams joeyadams3.14...@gmail.com wrote:
 I've been wondering whether the JSON datatype should be strict or 
 conservative.

 For one, there's strict JSON (following the exact specification).
 Then there's more conservative JSON variants.  Some JSON parsers
 support comments, some support invalid number formats (e.g. '3.' or
 '+5'), etc..

 The consensus seems to be that JSON content should be stored verbatim
 (it should store the exact string the client sent to it), as is done
 with XML.  However, this notion is somewhat incompatible with Be
 conservative in what you do; be liberal in what you accept from
 others because we can't accept loose JSON, then spit out conservative
 JSON without messing with the content.

 Here's my idea: the datatype should only allow strict JSON, but there
 should be a function that accepts a liberal format, cleans it up to
 make it strict JSON, and converts it to JSON.  I think making strict
 JSON the default makes the most sense because:
  * Inputs to the database will most likely be coming from programs, not 
 humans.
  * Output is expected to be valid JSON and work anywhere JSON should work.
  * Strict JSON is what more people would expect, I'd think.

+1

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Mike Rylander

On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 Here's another thought. Given that JSON is actually specified to consist
 of a string of Unicode characters, what will we deliver to the client
 where the client encoding is, say Latin1? Will it actually be a legal
 JSON byte stream?

 No, it won't.  We will *not* be sending anything but latin1 in such a
 situation, and I really couldn't care less what the JSON spec says about
 it.  Delivering wrongly-encoded data to a client is a good recipe for
 all sorts of problems, since the client-side code is very unlikely to be
 expecting that.  A datatype doesn't get to make up its own mind whether
 to obey those rules.  Likewise, data on input had better match
 client_encoding, because it's otherwise going to fail the encoding
 checks long before a json datatype could have any say in the matter.

 While I've not read the spec, I wonder exactly what consist of a string
 of Unicode characters should actually be taken to mean.  Perhaps it
 only means that all the characters must be members of the Unicode set,
 not that the string can never be represented in any other encoding.
 There's more than one Unicode encoding anyway...

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.

Whether it would be easy inside the backend, when generating JSON from
user data stored in tables that are not in a UTF-8 encoded cluster, to
convert to UTF-8, that's something else entirely.  If it /is/ easy and
safe, then it's just a matter of scanning for multi-byte sequences and
replacing those with their \u equivalents.  I have some simple and
fast code I could share, if it's needed, though I suspect it's not.
:)

UPDATE:  Thanks, Robert, for pointing to the RFC.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Mike Rylander

On Sun, Mar 28, 2010 at 8:33 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander mrylan...@gmail.com wrote:
 In practice, every parser/serializer I've used (including the one I
 helped write) allows (and, often, forces) any non-ASCII character to
 be encoded as \u followed by a string of four hex digits.

 Is it correct to say that the only feasible place where non-ASCII
 characters can be used is within string constants?

Yes.  That includes object property strings -- they are quoted string literals.

 If so, it might be
 reasonable to disallow characters with the high-bit set unless the
 server encoding is one of the flavors of Unicode of which the spec
 approves.  I'm tempted to think that when the server encoding is
 Unicode we really ought to allow Unicode characters natively, because
 turning a long string of two-byte wide chars into a long string of
 six-byte wide chars sounds pretty evil from a performance point of
 view.


+1

As an aside, \u-encoded (escaped) characters and native multi-byte
sequences (of any RFC-allowable Unicode encoding) are exactly
equivalent in JSON -- it's a storage and transmission format, and
doesn't prescribe the application-internal representation of the data.

If it's faster (which it almost certainly is) to not mangle the data
when it's all staying server side, that seems like a useful
optimization.  For output to the client, however, it would be useful
to provide a \u-escaping function, which (AIUI) should always be safe
regardless of client encoding.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Probably badly timed suggestion: pl/perl calling style same as C style

2010-02-24 Thread Mike Rylander

On Wed, Feb 24, 2010 at 4:35 PM, Richard Huxton d...@archonet.com wrote:
 With plperl.on_init allowing the loading of modules, might there be some
 merit (and little cost) in allowing the same style of function-mapping as
 with C functions?

 CREATE FUNCTION add_one(integer) RETURNS integer
     AS 'DIRECTORY/funcs', 'add_one'
     LANGUAGE C STRICT;

 CREATE FUNCTION add_one(integer) RETURNS integer
     AS 'My::Package', 'add_one'
     LANGUAGE plperl STRICT;


+1, fwiw

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] remove contrib/xml2

2010-01-28 Thread Mike Rylander

On Thu, Jan 28, 2010 at 4:18 PM, Andrew Dunstan and...@dunslane.net wrote:

 Robert Haas wrote:

 There has been some more discussion lately of problems caused by
 contrib/xml2.

 http://archives.postgresql.org/pgsql-bugs/2010-01/msg00251.php
 http://archives.postgresql.org/pgsql-bugs/2010-01/msg00198.php

 I think we need to either (1) fix the bugs and update the
 documentation to remove the statement that this will be removed or (2)
 actually remove it.  Nobody seems interested in #1, so PFA a patch to
 do #2.  It also rips out all the libxslt stuff, which seems to exist
 only for the purpose of supporting contrib/xml2.

 The problem is that there are people who use the XSLT and xpath_table stuff
 on text data and so don't run into these bugs.

 I agree it's a mess but I don't think just abandoning the functionality is a
 good idea.

I'm one of those people.  :)

Expecting to see contrib/xml2 go away at some point, possibly without
replacements for xslt_process and xpath_table, I've been working on
some plpgsql and plperlu work-alikes targeted at TEXT columns, as the
xml2 versions do. I hope these (attached) will be of some help to
others.  Note, these are not the exact functions I use, they are
lightly edited to remove the use of wrappers I've created to paper
over the transition from xpath_nodeset() to core XPATH().

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com


xml2-replacements.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath not a good replacement for xpath_string

2009-07-28 Thread Mike Rylander

On Tue, Jul 28, 2009 at 3:21 PM, pg...@mohawksoft.com wrote:
 Sorry to bring this up, I know you've been fighting about XML for a while.

 Currently, I am using XML2 functionality and have tried to get the newer
 XPath function to work similarly, but can't quite seem to do it.

 I think the current xpath function is too limited. (The docs said to post
 problems to hackers if I have an issue.)

 For instance, we have a web application that uses java with an XML class
 serializer/deserializer Xstream. It creates XML that looks like this:

 com.company.local.myclass
    uuidb5212259-a91f-4dca-a547-4fe89cf2f32c/uuid
    emailj...@somedomain.com/email
 /com.company.local.myclass

 My current strategy is to use xml2 as:

 select xpath_string(datum, E'/com\.company\.local\.myclass/uuid) as uuid
 from table;

 Which produces a usable:
 b5212259-a91f-4dca-a547-4fe89cf2f32c

 I have been trying to use xpath
 select xpath(E'/com\.company\.local\.myclass/uuid', XMLPARSE(CONTENT
 datum)) as uuid from table;

 Which produces an unusable:
 {uuidb5212259-a91f-4dca-a547-4fe89cf2f32c/uuid}


How about:

SELECT (xpath(E'/com\.company\.local\.myclass/uuid/text()',
XMLPARSE(CONTENT datum)))[1] as uuid from table;

Not as clean, but it produces the same result as xpath_string().
Combined with array_to_string() could can collapse the array instead
of just grabbing the first element (in cases other than uuid, of
course).

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath not a good replacement for xpath_string

2009-07-28 Thread Mike Rylander

On Tuesday, July 28, 2009,  pg...@mohawksoft.com wrote:
 Andrew Dunstan and...@dunslane.net wrote:

 in fact the desired functionality is present [...] You just need to
 use the text() function to get the contents of the node, and an
 array subscript to pull it out of the result array.

 I just took a quick look, and that didn't jump out at me from the
 documentation.  Perhaps there should be an example or two of how to
 get the equivalent functionality through the newer standard API, for
 those looking to migrate?

 Would it make sense to supply convenience SQL functions which map
 some of the old API to the new?

 The thing that perplexed me was that it was not obvious from the docs how,
 exactly, to get the functionality that was simple and straight forward in
 XML2.

 Another thing that is troubling is that more exotic types do not seem to
 be supported at all. For instance, in my example I used uuid, and if one
 substitutes uuid() for text() that doesn't work.


text() is an XPath function used to extract the text content of a
node, in this case the uuid element.

 The API is less intuitive than the previous incarnation and is, indeed,
 more difficult to use.

It may be easier to use for those not familiar with more advanced
XPath, but it also has non-standard default actions.  That being said,
I'd love to see wrapper functions that provide the older api but
leverage the core code.

--miker


-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fwd: [HACKERS] tsvector extraction patch

2009-07-08 Thread Mike Rylander

Sorry, forgot to reply-all.


-- Forwarded message --
From: Mike Rylander mrylan...@gmail.com
Date: Wed, Jul 8, 2009 at 4:17 PM
Subject: Re: [HACKERS] tsvector extraction patch
To: Alvaro Herrera alvhe...@commandprompt.com


On Wed, Jul 8, 2009 at 3:38 PM, Alvaro
Herreraalvhe...@commandprompt.com wrote:
 Mike Rylander escribió:
 On Fri, Jul 3, 2009 at 3:49 AM, Hans-Juergen Schoenig --
 PostgreSQLpostg...@cybertec.at wrote:

  test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure
  this is a good patch'));
  lex   | rank
  +--
  good   |    8
  patch  |    9
  pretti |    3
  sure   |    4
  (4 rows)
 

 This looks very useful!  I wonder if providing a weight column would
 be relatively simple?  I think this would present problems with the
 cast-to-text[] idea that Peter suggests, though.

 Where would the weight come from?


From a tsvector column that has weights set via setweight().

--
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com



-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tsvector extraction patch

2009-07-06 Thread Mike Rylander

On Fri, Jul 3, 2009 at 3:49 AM, Hans-Juergen Schoenig --
PostgreSQLpostg...@cybertec.at wrote:
 hello,

 this patch has not made it through yesterday, so i am trying to send it
 again.
 i made a small patch which i found useful for my personal tasks.
 it would be nice to see this in 8.5. if not core then maybe contrib.
 it transforms a tsvector to table format which is really nice for text
 processing and comparison.

 test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure
 this is a good patch'));
 lex   | rank
 +--
 good   |    8
 patch  |    9
 pretti |    3
 sure   |    4
 (4 rows)


This looks very useful!  I wonder if providing a weight column would
be relatively simple?  I think this would present problems with the
cast-to-text[] idea that Peter suggests, though.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sun blade 1000 donation

2009-05-27 Thread Mike Rylander

On Wed, May 27, 2009 at 10:00 PM, Josh Berkus j...@agliodbs.com wrote:
 Andy,

 I have a Sun blade 1000 that's just collecting dust now days.  I was
 wondering if there were any pg-hackers that could find use for it.

 Its dual UltraSPARC III 750 (I think) and has two 36? gig fiber channel
 scsi disks.

 It weighs a ton.

 I'd be happy to donate it to a good cause.

 Feh, as much as we need more servers, we're really limited in our ability to
 accept stuff which is large  high power consumption.

 Now, if we had a DSL line we could hook it to, I could see using it for the
 buildfarm; it would be interesting old HW / old Solaris for us.


Would you like an IPC instead?  Though building PG on it might take
longer than the average release cycle.  ;)

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-31 Thread Mike Rylander

On Fri, May 30, 2008 at 6:47 PM, Andreas 'ads' Scherbaum
[EMAIL PROTECTED] wrote:
 On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote:
 Andreas 'ads' Scherbaum wrote:
  On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
 
  Well, yes, but you do know about archive_timeout, right? No need to wait
  2 hours.
 
  Then you ship 16 MB binary stuff every 30 second or every minute but
  you only have some kbyte real data in the logfile. This must be taken
  into account, especially if you ship the logfile over the internet
  (means: no high-speed connection, maybe even pay-per-traffic) to the
  slave.

 Sure there's a price to pay. But that doesn't mean the facility doesn't
 exist. And I rather suspect that most of Josh's customers aren't too
 concerned about traffic charges or affected by such bandwidth
 restrictions. Certainly, none of my clients are, and they aren't in the
 giant class. Shipping a 16Mb file, particularly if compressed, every
 minute or so, is not such a huge problem for a great many commercial
 users, and even many domestic users.

 The real problem is not the 16 MB, the problem is: you can't compress
 this file. If the logfile is rotated it still contains all the
 old binary data which is not a good starter for compression.

Using bzip2 in my archive_command script, my WAL files are normally
compressed to between 2MB and 5MB, depending on the write load
(larger, and more of them, in the middle of the day).  bzip2
compression is more expensive and rotated WAL files are not
particularly compressable to be sure, but due to (and given) the
nature of the data bzip2 works pretty well, and much better than gzip.


 So you may have some kB changes in the wal logfile every minute but you
 still copy 16 MB data. Sure, it's not so much - but if you rotate a
 logfile every minute this still transfers 16*60*24 = ~23 GB a day.


I archived 1965 logs yesterday on one instance of my app totalling
8.5GB ... not to bad, really.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone: 1-877-OPEN-ILS (673-6457)
 | email: [EMAIL PROTECTED]
 | web: http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spoofing as the postmaster

2007-12-22 Thread Mike Rylander

On Dec 22, 2007 1:04 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  Wouldn't SSL work over Unix-domain sockets as well?  The API only deals with
  file descriptors.

 Hmm ... we've always thought of SSL as being primarily comm security
 and thus useless on a Unix socket, but the mutual authentication aspect
 could come in handy as an answer for this type of threat.  Anyone want
 to try this and see if it really works or not?

 Does OpenSSL have a mode where it only does mutual auth and not
 encryption?  The encryption would be wasted cycles in this scenario,
 so being able to turn it off would be nice.


[EMAIL PROTECTED]:~$ openssl ciphers -v  'NULL'
NULL-SHASSLv3 Kx=RSA  Au=RSA  Enc=None  Mac=SHA1
NULL-MD5SSLv3 Kx=RSA  Au=RSA  Enc=None  Mac=MD5

I see no way to turn off the message digest, but maybe that's just an
added benefit.

--miker

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Mike Rylander

On 8/18/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On Fri, Aug 17, 2007 at 04:06:15PM -0700, Josh Berkus wrote:
  Bruce,
 
   Oh, so you want the config inside each tsvector value. Interesting
   idea.
 
  Yeah, hasn't anyone suggested this before?  It seems like the obvious
  solution.  A TSvector constructed with en_US is NOT the same as a vector
  constructed with fr_FR and it's silly to pretend that they are comparable.

 Except that (as I understand Oleg) it even seems to make sense sometimes
 to compare a tsvectors constructed with different configs -- so it might
 be important not to prevent this use case eihter. Oleg?

Configs are not simply about languages, they are also about stopword
lists and stemmers and parsers, and there's no reason to think that
one would be using only one configuration to create a single tsvector.

Different fields from within one document may require different
treatment.  Take for instance title, with stopwords included, and
body, with them removed.  Those two initial tsvectors can then be
concatenated together with different weights to provide a very rich,
and simple (relatively speaking) search infrastructure.

--miker

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Another idea for index-only scans

2007-08-15 Thread Mike Rylander

On 8/15/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 I have added another idea for index-only scans to the TODO list:

A third idea would be for a heap scan to check if all rows are visible
and if so set a per-table flag which can be checked by index scans.
Any change to the table would have to clear the flag.  To detect
changes during the heap scan a counter could be set at the start and
checked at the end --- if it is the same, the table has not been
modified --- any table change would increment the counter.

Perhaps this is naive (or discussed and discarded... if so, I couldn't
find it, but I apologize if that's the case), but wouldn't recording
the xid of non-readonly transactions, at commit time, and at the table
level, be equivalent to the flag and remove the need for a counter?
Readers could just check the last-modification-xid at the beginning
and end of their scans to test for heap stability.

I suppose that would require a write-exclusive lock on some metadata
for each modified table during each commit... so perhaps it's a
non-starter right there.

--miker

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Mike Rylander

On 8/13/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 Heikki Linnakangas wrote:
  Bruce Momjian wrote:
   Heikki Linnakangas wrote:
   Removing the default configuration setting altogether removes the 2nd
   problem, but that's not good from a usability point of view. And it
   doesn't solve the general issue, you can still do things like:
   SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@
   to_tsquery('confB', 'query');
  
   True, but in that case you are specifically naming different
   configurations, so it is hopefully obvious you have a mismatch.
 
  There's many more subtle ways to do that. For example, filling a
  tsvector column using a DEFAULT clause. But then you sometimes fill it
  in the application instead, with a different configuration. Or if one of
  the function calls is buried in another user defined function.
 
  I don't think explicitly naming the configuration gives enough protection.

 Oh, wow, OK, well in that case the text search API isn't ready and we
 will have to hold this for 8.4.


I've been watching this thread with a mixture of dread and hope,
waiting to see where the developers' inclination will end up; whether
leaving a useful foot gun available will be allowed.

This is just my $0.02 as a fairly heavy user of the current tsearch2
code, but I sincerely hope you do not cripple the system by removing
the ability to store tsvectors built using arbitrary configurations in
a single column.  Yes, it can lead to unexpected results if you do not
know what you are doing, but if you have gone beyond building a single
tsearch2 configuration then you are required to know what you are
doing.  What's more, IMO the default configuration mechanism feels
very much like a CONSTRAINT, as Oleg suggests.  That point is one of
cognizance, where if one has gone to the trouble of setting up
multiple configurations and has learned enough to do so correctly,
then one necessarily understands the importance of the setting and can
use it (or not, and use explicit configurations) correctly.  The
default config lowers the bar to an acceptable level for beginners
that have no need of multiple configurations, and while I don't feel
too strongly, personally, about having a default, I think it is both
useful and helpful for new users -- it was for me.

Now, so this email isn't entirely complaining, and as a data point for
the discussion, I'll explain why I do not want to see tsearch2
crippled in the way suggested by Heikki and Bruce.

My application (http://open-ils.org, which run 80% of the public
libraries in Georgia, USA, http://gapines.org and
http://georgialibraries.org/lib/pines.html) requires that I be able to
search a corpus of bibliographic records in a mix of languages, and
potentially with mixed stop-word rules, with one query.  I cannot know
ahead of time what languages will be used in the corpus and I cannot
restrict any one query to one language.  To accomplish this, the
record itself will be inspected inside an INSERT/UPDATE trigger to
determine the language and type, and use the correct configuration for
creating the tsvector.  This will obviously result in a mixed
tsvector column, but that's exactly what I need.  I can filter on
record language if the user happens to specify a query language (and
thus configuration), or simply rank the assumed (IP based, perhaps, or
browser preference based) preferred language higher, or one of a
hundred other things.  But I won't be able to do any of that if
tsvectors are required to have one and only one configuration per
column.

Anyway, I felt I needed to provide some outside perspective to this,
as a user, since it seems that the external viewpoint (my particular
viewpoint, at least) was missing from the discussion.

Thanks, folks, for all the work on this so far!

--miker

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Mike Rylander

On 8/14/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
 Mike Rylander wrote:
[snip]

 Don't you need to use the right configuration to parse the query into a
 tsquery as well?


Only if the user (or user agent) can supply enough information to move
away from the configured default of, say, en-US.  And even then, other
tsvector configurations need to be searched.  Configurations are not,
of course, just about language/stemming, but also stop-words and
thesauri.  If it were just languages it wouldn't be too big of a deal
(other than in-app DDL management...).

 What you have is basically the classic problem problem of representing
 inheritance in a relational model. You have a superclass, bibliographic
 record, and multiple subclasses, one per language with one extra field,
 the corpus in the right language. You've solved it by storing them all
 in one table, with an extra column (language) to denote which subclass
 the record is. Alternatively, you could solve it by using inherited
 tables, or having one table per language with a foreign key referencing
 the master table containing the other fields, or having a single table
 with one nullable field per configuration, and a check constraint to
 check that exactly one of them is not null.


Sorry, communication problem here ... I provided an oversimplified
example meant more to show the issues than to find alternate
solutions, though I appreciate you taking the time to consider the
problem.  If I only had to consider one delimiting facet per record
then it would be much simpler. But with the complication that some
fields have stop-word lists (and some not), some use thesauri (and
different ones, at that), and these fields (as extracted from the
records) and their configurations (stem? use a stop-word list? use
thesaurus x, y or z?) are user-defined ...

 As a thought experiment, let me present another, not text search related
 example that's isomorphic to your problem:


Unfortunately, the problem you describe is not quite the same as the
problem I need to solve.

Fortunately, the current incarnation of tsearch2 does a pretty good
job of solving the problem if you store everything in one place and
use the query-time environment to apply some adjustments to the
ranking of items.  I could still work around this problem by creating
inherited tables, one for each configuration on each index-providing
table but I /really/ hope to avoid that.  Creating new configurations
for local requirements doesn't require creating new tables (and the
associated management overhead in the app) today, something I'd really
like to avoid.  In fact, I'm starting to sweat just thinking about
what the planner would go through with the number tables needed for
the potential configurations in an installation that makes use of
multiple thesauri and a mix of stop-word lists across, say, 30
languages.  Such a dataset is not uncommon.

In any case, thanks again for taking the time to think about the
problem.  I still think having the ability to store any old tsvector I
happen to have hanging around in any column of the correct type is a
GoodThing(tm).  I see from Oleg's message down-thread that that's the
way things will be (the tsvector type doesn't know about columns, just
lexem tuples).

--miker

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Mike Rylander

On 8/14/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
 Bruce Momjian wrote:

  Basically, the default GUC doesn't work because of:
 
error prone
if super-user only, non-super-user doesn't work on restore
if non-super-user, can cause mismatch (perhaps this is the best
  option), and restore still a problem (no storage of config in
  indexes or tables)

 I haven't really seen anyone else arguing about this.  I wonder whether
 you are being overly zealous about it.

I hate to just pile on Bruce, but as a production user I tend to
agree... let me shoot myself in the foot. :)

--miker

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Mike Rylander

On 8/14/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
 Oleg Bartunov wrote:
  On Tue, 14 Aug 2007, Alvaro Herrera wrote:
 
  Oleg Bartunov wrote:
  On Thu, 9 Aug 2007, [EMAIL PROTECTED] wrote:
 
  Maybe I'm missing something, but it seems to me that the configuration
  is more attached to a column/index thatn to the whole database. If
  there's a default in an expression, I'd rather expect this default to be
  drawn from the index involved than from a global value (like a
  functional
  index does now).
 
  I'm tired to repeat - index itself doesn't know about configuration !
 
  Is there a way to change that?  For example store the configuration in a
  metapage or something?
 
  it's useless, in general, since you could use different configuration to
  build tsvector.

 Hmm, sorry, I think I just understood what this was about: so you mean
 that the configuration is really *per row* and not per index?  So I can
 store rows into an index using more than one configuration, and it will
 work?

Can and does, to great success.  :)

--miker

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Mike Rylander

On 8/14/07, Gregory Stark [EMAIL PROTECTED] wrote:
 Mike Rylander [EMAIL PROTECTED] writes:

  My application (http://open-ils.org, which run 80% of the public
  libraries in Georgia, USA, http://gapines.org and
  http://georgialibraries.org/lib/pines.html) requires that I be able to
  search a corpus of bibliographic records in a mix of languages, and
  potentially with mixed stop-word rules, with one query.  I cannot know
  ahead of time what languages will be used in the corpus and I cannot
  restrict any one query to one language.  To accomplish this, the
  record itself will be inspected inside an INSERT/UPDATE trigger to
  determine the language and type, and use the correct configuration for
  creating the tsvector.  This will obviously result in a mixed
  tsvector column, but that's exactly what I need.  I can filter on
  record language if the user happens to specify a query language (and
  thus configuration), or simply rank the assumed (IP based, perhaps, or
  browser preference based) preferred language higher, or one of a
  hundred other things.  But I won't be able to do any of that if
  tsvectors are required to have one and only one configuration per
  column.
 
  Anyway, I felt I needed to provide some outside perspective to this,
  as a user, since it seems that the external viewpoint (my particular
  viewpoint, at least) was missing from the discussion.

 This is *extremely* useful. I think it's precisely what we've been missing so
 far. At least, what I've been missing.

 So the question is what exactly happens in this case? If I search for the
 does that mean it will ignore matches in English where that's a stop-word but
 find me books on tea in French? Is that what I should expect to happen? What
 if I search for earl and the? Does that find me French books on Early Grey
 Tea but English books on all earls?

Oh dear ... you went and got me started...

Each field type has a different set of configurations for potenial
use.  Title and subject fields, for instance, do not (generally) use
stop-word lists, so a subject search for the will match any record
with the lexem the in a subject field.  Title fields are a little
more complicated, because there is information in the bibliographic
record about how and when to skip leading articles, but generally
those are indexed as well for FTS purposes.  English LCSH subjects
generally don't have stop(like) words in them, so you'll probably just
get French tea records.  Title results would be a mix of earls and
French tea records (probably) and the correlation between the user's
preferred language (either chosen query lang or UI lang, whichever is
available) will help adjust the ranking, pushing what are likely to be
the more appropriate records to the top.

Note, however, that much of this multi-tsearch2-configuration setup is
not used in the implementation at http://gapines.org/ because, well,
there's not much need (read: demand from librarians) for that dataset
to support these more complex tricks.  It's basically all en-US and
exclude stop-words.  Other implementations are making more use of what
I describe above, including a (government mandated) French-English
bilingual institution who shall remain nameless for the time being...


 What happens if I use the same operator directly on the text column? Or
 perhaps it's not even possible to specify stop-words when operating on a text
 column? Should it be?

You mean with an expression index on a text column?  I haven't
considered using them for FTS.  It just feels easier and more flexible
to me to use an external tsvector column because of the fairly heavy
processing that goes into creating each tsvector value.  I may
re-evaluate that position now that CREATE INDEX CONCURRENTLY exists,
but I'm not developing with 8.2+ only features yet.  Once 8.3 is out
that may change.

Also, unless I misunderstand, you have to wrap the text column in the
function used to build the index.  For my purposes, that makes
building a generic FTS driver for my app (which, admittedly, only has
a Postgresql driver ;) ) more difficult than having a hidden extra
column.  Again, that could change if the benefits of CREATE INDEX
CONCURRENTLY end up outweighing simpler FTS driver code.

--miker

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-25 Thread Mike Rylander


On 6/25/07, Tom Lane [EMAIL PROTECTED] wrote:

Well, it's not hard at all to find chunks of English text that have
embedded bits of French, Spanish, or what-have-you, but that's not an
argument for trying to intermix the stemmers.  I doubt that such simple
bits of program could tell the language difference well enough to
determine which stemming rules to apply.



While I imagine that is probably true of many, if not most, my project
in particular would greatly benefit from the ability to mix stemmers.
I work with complex bibliographic data, which has language information
embedded within records.  This is not limited to the record level
either.  Individual fields within each bibliographic record can be in
different langauges.

Especially in countries where making software multi-lingual (such as
Canada (en_CA/fr_CA)) is a requirement for use in public institutions,
the ability to choose a stemmer and stop-word list at will for any
particular record will actually provide the exact behavior needed.
The obvious generalization from Canada would be to support any mix of
languages supported by tsearch2.

I can certainly understand the benefit of making the default
configuration a simple locale to language map, but there are
definitely uses for searching using different stemmers/stop-lists even
within the same corpus/index.  So, as a datapoint for the discussion,
I would ask that the option of multiple languages per DB locale not be
removed if it can be at all avoided.

Thanks for listening (and for all the great work on getting tsearch
into core! :) ...

--
Mike Rylander

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-25 Thread Mike Rylander


On 6/25/07, Tom Lane [EMAIL PROTECTED] wrote:

Mike Rylander [EMAIL PROTECTED] writes:
 I can certainly understand the benefit of making the default
 configuration a simple locale to language map, but there are
 definitely uses for searching using different stemmers/stop-lists even
 within the same corpus/index.  So, as a datapoint for the discussion,
 I would ask that the option of multiple languages per DB locale not be
 removed if it can be at all avoided.

Nobody is proposing that --- the issue here is just how we set up the
default configuration.



Then I misunderstood.  Sorry for the noise, folks.

--
Mike Rylander

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] [PATCHES] xml2 contrib patch supporting default XML namespaces

2007-06-01 Thread Mike Rylander


On 6/1/07, Peter Eisentraut [EMAIL PROTECTED] wrote:

Mike Rylander wrote:
 I understand that XML support is planned and at least partially
 implemented for 8.3, but many production instances will be unable
 (or, in fact, unwilling) to upgrade to 8.3 for quite some time.
 Because this patch is completely backward compatible it can
 (theoretically) be included in future 8.1 and 8.2 releases, and for
 those of us that need more full XML support in the short term the
 upgrade of a contrib module is probably a very viable option -- it is
 for me, anyway.

8.3 contains XPath support which should cover the issue that this patch
addresses.  (Might wanna check.)  Since we're not going to put new
features into earlier releases, and contrib modules are not necessarily
source backward-compatible, I don't think this patch has a place in a
future PostgreSQL release.


I agree, assuming the upcoming XML support can handle default
(unprefixed) namespaces.


--
Mike Rylander

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PATCHES] xml2 contrib patch supporting default XML namespaces

2007-03-22 Thread Mike Rylander


On 3/22/07, Tom Lane [EMAIL PROTECTED] wrote:

Bruce Momjian [EMAIL PROTECTED] writes:
 Peter Eisentraut wrote:
 I was hoping that we're deprecating contrib/xml2, so I wouldn't add more
 features to it.

 Author states:

 I understand that XML support is planned and at least partially
 implemented for 8.3, but many production instances will be unable (or,
 in fact, unwilling) to upgrade to 8.3 for quite some time.  Because
 this patch is completely backward compatible it can (theoretically) be
 included in future 8.1 and 8.2 releases, and for those of us that need
 more full XML support in the short term the upgrade of a contrib
 module is probably a very viable option -- it is for me, anyway.

Well, it's not going to be put in future 8.1 or 8.2 releases, so the
above argument is not a reason to include it now.  What the author
should do if he wants to offer a new feature for past release branches
is to put up a project on pgfoundry.

regards, tom lane



Hmm.. OK.  Well, thank you all for clarifying that.  I thought
(perhaps only hoped?) that the bar was lower for contrib than for core
as far as features go, but it seems that assumption is incorrect.
I'll look at starting a pgfoundry project soon.

A related question, however:  Will the XML features being included in
8.3 support namespace prefix registration?  If not, handling arbitrary
XML via XPath that includes unprefixed (default) namespaces (for me
that is the majority of the XML I deal with, and no, I can't change
that) will have exactly the same problems using the new mechanisms as
with the current xml2 contrib module.  I ask because, based on the
design emails I've seen on -hackers, nothing surrounding explicit
support for said issue jumped out at me.

Thanks again.

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] [PATCHES] xml2 contrib patch supporting default XML namespaces

2007-03-18 Thread Mike Rylander


On 3/6/07, Mike Rylander [EMAIL PROTECTED] wrote:

On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote:
 Mike Rylander wrote:
  The patch adds support for default XML namespaces in xml2 by providing
  a mechanism for supplying a prefix to a named namespace URI.

 How does it support multiple namespaces in one document?

It supports one default (unprefixed) namespace URI per document, which
ISTM is the overwhelmingly common case (and the itch that I must
scratch).


I think there is some confusion about what the current xml2 contrib
module supports and what my patch adds.  The current code, as it
stands today, supports multiple namespaces just fine.  The only
requirement is that each namespace have a prefix, or else one is
forced to use the local-name() construct with every single node for
those nodes in unprefixed (default) namespaces.  This patch simply
adds support for registering a prefix for an unprefixed namespace,
which is an extremely common case in XML and causes the use of overly
verbose contortions when designing XPath expressions.  To illustrate
this, xml2 currently supports all of these statements:

SELECT xpath_nodeset('xyfoo/y/x','/x/y');
SELECT xpath_nodeset('xa:y xmlns:a=uri:for:afoo/a:y/x','/x/a:y');
SELECT xpath_nodeset('b:x xmlns:b=uri:for:ba:y
xmlns:a=uri:for:afoo/a:y/b:x','/b:x/a:y');

All number and manner of /prefixed/ namespaces work fine today.
However, in order to match an element or attribute with an unprefixed
namespace, the xpath becomes a study in overly verbose, human error
inducing repetition.  For instance, consider the extremely common case
of an xhtml document that does not use a prefix for the xhtml
namespace.  Using the xml2 contrib module as it stands today, without
my patch, using XPath to get the title of the document might look
something like this:

/*[local-name()=html]/*[local-name()=head]/*[local-name()=title]

Now just imagine the XPath needed to get a portion of the body in a
nested div based on the existence of some other node ... the logic
gets lost in the noise simply because of the overhead of
namespace-qualifying the elements.

Namespaces were introduced in XML to address verbosity issues (among
other things), but as XPath was designed primarily as a language for
use inside XSLT (where namespace support is fully integrated) it
didn't get the treatment needed to handle unprefixed namespaces.  To
address /that/ issue, my patch allows the registration of a supplied
prefix for a supplied URI, which solves the common default namespace
problem in a completely backward compatible way.  The above example
XPath can now become:

/x:html/x:head/x:title

simply by supplying 2 more arguments to the _ns version of any of the
xpath_ functions available in xml2.  I challenge anyone to claim that
the [local-name()=foo] variant is easier to read and less error prone
than the second, namespace-prefixed variant.  They are exactly
equivalent, but the second (quite obviously) is Better(tm).

I understand that XML support is planned and at least partially
implemented for 8.3, but many production instances will be unable (or,
in fact, unwilling) to upgrade to 8.3 for quite some time.  Because
this patch is completely backward compatible it can (theoretically) be
included in future 8.1 and 8.2 releases, and for those of us that need
more full XML support in the short term the upgrade of a contrib
module is probably a very viable option -- it is for me, anyway.

So, to sum up, please let me know what I can do to increase the
chances of getting this patch included.  Alternatively, if my patch is
being vetoed, please let me know that too so that I can create a local
maintenance plan for this.

Thanks in advance.  I've attached the patch again for reference.

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org


xml2-namespaces.patch
Description: Binary data

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CLUSTER and MVCC

2007-03-09 Thread Mike Rylander


On 3/9/07, Florian G. Pflug [EMAIL PROTECTED] wrote:

Heikki Linnakangas wrote:
 Csaba Nagy wrote:
 On Fri, 2007-03-09 at 14:00, Alvaro Herrera wrote:
 But I'm not really seeing the problem here.  Why isn't Csaba's problem
 fixed by the fact that HOT reduces the number of dead tuples in the
 first place?  If it does, then he no longer needs the CLUSTER
 workaround, or at least, he needs it to a much lesser extent.

 Is this actually true in the case of HOT + long running transactions ? I
 was supposing HOT has the same problems in the presence of long running
 transactions...

 It does, HOT won't help you here. A long-running transaction is just as
 much of a problem with HOT as without. Besides, I don't recall that
 you're doing updates in the first place.

Couldn't HOT in principle deal with this? Let's say you have two long-running
transactions, which see row versions A and D. While those transactions
are running, the row is constantly updated, leading to row versions B, C (before
the second long-running transaction started), D, E, F, ... Z.
Now, the versions B,C,E,F,...Z could be removed by HOT or vacuum, because they
are not currently visible, nor will they ever become visible because they are
already deleted.


Couldn't they (or at least one of them) become visible due to
SAVEPOINT rollback?



greetings, Florian Pflug

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] xml2 contrib patch supporting default XML namespaces

2007-03-06 Thread Mike Rylander


On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote:

Mike Rylander wrote:
 The patch adds support for default XML namespaces in xml2 by providing
 a mechanism for supplying a prefix to a named namespace URI.

How does it support multiple namespaces in one document?


It supports one default (unprefixed) namespace URI per document, which
ISTM is the overwhelmingly common case (and the itch that I must
scratch).

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] [PATCHES] xml2 contrib patch supporting default XML namespaces

2007-03-06 Thread Mike Rylander


On 3/6/07, Nikolay Samokhvalov [EMAIL PROTECTED] wrote:

On 3/6/07, Mike Rylander [EMAIL PROTECTED] wrote:
 Attatched you'll find a patch that I've been kicking around for a
 while that I'd like to propose for inclusion in 8.3.  I attempted to
 submit this through the original xml2 author (as far back as the  7.4
 days) but got no response.

 It's really fairly trivial, but I will be using the features it
 provides in production soon, so I'd like to see it applied against the
 contrib xml2 module.  The patch adds support for default XML
 namespaces in xml2 by providing a mechanism for supplying a prefix to
 a named namespace URI.  It then wraps the namespace-capable functions
 in backward-compatible equivalents so that old code will not break.

1) And what about non-default namespaces?


I'm not sure I understand.  If the namespace already has a prefix then
it works fine.  This patch simply gives a known non-prefixed namespace
URI a prefix so one can write XPath that looks like

 //marc:[EMAIL PROTECTED]'245']/marc:[EMAIL PROTECTED]'a']

instead of

 //*[local-name()='datafield' and
@tag='245']/*[local-name()='subfied' and @code='a']

A little two node example is painful enough, now imagine a non-trivial
example with backtracking conditionals... :P


2) What if my XPath query has different prefix, that also should be
mapped to the same URI? (Not frequent case, but this really can occur
-- e.g. XML doc has prefix 'local' for URI='http://127.0.0.1', but
XPath should have 'loc' for the same URI.)



Both prefixes work fine as multiple prefixes can map to the same URI.


--
Best regards,
Nikolay




--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[HACKERS] xml2 contrib patch supporting default XML namespaces

2007-03-05 Thread Mike Rylander


Attatched you'll find a patch that I've been kicking around for a
while that I'd like to propose for inclusion in 8.3.  I attempted to
submit this through the original xml2 author (as far back as the  7.4
days) but got no response.

It's really fairly trivial, but I will be using the features it
provides in production soon, so I'd like to see it applied against the
contrib xml2 module.  The patch adds support for default XML
namespaces in xml2 by providing a mechanism for supplying a prefix to
a named namespace URI.  It then wraps the namespace-capable functions
in backward-compatible equivalents so that old code will not break.

I have patched README.xml2, pgxml.sql.in and xpath.c against CVS HEAD
as of about 1 hour ago.  I have code that uses both the old
non-namespace-capable functions and the new functions, and all
function as intended in my environment.

Please let me know if there is any more I can/need-to do to help this
patch along!

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org


xml2-namespaces.patch
Description: Binary data

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] May, can, might

2007-01-30 Thread Mike Rylander


On 1/30/07, Gregory Stark [EMAIL PROTECTED] wrote:


Bruce Momjian [EMAIL PROTECTED] writes:

 (Who says were obsessive?)  :-)

I may not fall into your clever trap...


But you certainly can!

cymbal_crash/

(sorry...)



--
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly




--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Ontology on PostgreSQL - is there something?

2006-11-18 Thread Mike Rylander


On 11/18/06, Rodrigo Hjort [EMAIL PROTECTED] wrote:

Dear PG-hackers,

Based on the paper below [1], I ask: is there anyone working on, or already
tried to such native implementation on PostgreSQL? I didn't find anything
related on pgFoundry. There is also a presentation [2] related to the paper.

By Souripriya Das, Eugene Inseok Chong, George Eadon, Jagannathan
Srinivasan, Proceedings of the 30th VLDB Conference, Toronto, Canada.
Ontologies are increasingly being used to build applications that utilize
domain-specific knowledge. This paper addresses the problem of supporting
ontology-based semantic matching in RDBMS. The approach enables users to
reference ontology data directly from SQL using the semantic match
operators, thereby opening up possibilities of combining with other
operations such as joins as well as making the ontology-driven applications
easy to develop and efficient. In contrast, other approaches use RDBMS only
for storage of ontologies and querying of ontology data is typically done
via APIs. This paper presents the ontology-related functionality including
inferencing, discusses how it is implemented on top of Oracle RDBMS, and
illustrates the usage with several database applications.

[1]
http://www.cs.toronto.edu/vldb04/protected/eProceedings/contents/pdf/IND3P1.PDF
[2] http://www-scf.usc.edu/~csci586/ppt-2005/dguo.pps

Besides, what are your opinions on the subject?

Is it worthful to have such native implementation on PG, even as a modest
contrib module?
Based on your development know-how on the backend internals, how difficult
is it to develop this (3 operators + 1 index type)?


For simple situations of BT/NT ontology relationships, check out the
ltree contrib module*.  It allows building and indexing of directed
graphs and other network-like structures, and seems ideal for both
path storage and search.

[thinks some more]

If applied as a column per axis, ltree may be enough ... interesting
stuff in any case.

* http://www.sai.msu.su/~megera/postgres/gist/ltree/



--
Best regards,

Rodrigo Hjort
http://icewall.org/~hjort
CELEPAR - Cia de Informática do Paraná - Brasil
http://www.pr.gov.br





--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Support Parallel Query Execution in Executor

2006-04-07 Thread Mike Rylander

On 4/6/06, Qingqing Zhou [EMAIL PROTECTED] wrote:

 Jonah H. Harris [EMAIL PROTECTED] wrote
 
  Great work!  I had looked into this a little bit and came to the same
  ideas/problems you did, but none of them seemed insurmountable at all.
   I'd be interested in working with you on this if you'd like.
 

First, I want to second Jonah's enthusiasm.  This is very exciting!


 Yes, I am happy to work with anyone on the topic. The plan in mind is like
 this:
 (1) stable the master-slave seqscan: solve all the problems left;
 (2) parallize the seqscan: AFAICS, this should not very difficult based on
 1, may only need some scan portition assignment;

This is really only a gut feeling for me (it can't be otherwise, since
we can't yet test), but I think parallelizing a single seqscan is
pretty much guaranteed to do nothing, because seqscans, especially on
large tables, are IO bound.

There was plan some time ago (during 8.0 beta, I think) to allow
multiple seqscans from different queries to join each other, such that
scans that begin later start scanning the table at the point, or just
behind the point, that the first running scan is already at.  That
plan would reduce IO contention, and buffer and OS cache thrashing, by
having multiple readers pull from the same hose.

I can't see how asking for more than one stream from the same file
would do anything but increase both cache thrashing and IO bandwidth
contention.  Am I missing something here?

 (3) add an indexscan or other one or two node type to  master-slave
 solution: this is in order to make the framework extensible;
 (4) parallize these node - this will be a big chunk of job;

Now that could be a _big_ win!  Especially if tablespaces are used to
balance commonly combined tables and indexes.

 (5) add a two-phase optimization to the server - we have to consider the
 partitioned table in this stage, yet another big chunk of job;


Same here.  This would be a place where parallel seqscans of different
tables (instead of multi-headed scan of one table) could buy you a
lot, especially with proper tablespace use.

Thanks again, Qingqing, for the work on this.  I'm very excited about
where this could go. :)

 Regards,
 Qingqing



 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org



--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] First Aggregate Funtion?

2006-03-31 Thread Mike Rylander

On 3/31/06, Martijn van Oosterhout kleptog@svana.org wrote:
 On Fri, Mar 31, 2006 at 03:02:47PM -0600, Tony Caduto wrote:
  Has there ever been any talk of adding a first aggregate function?
  It would make porting from Oracle and Access much easier.
 
  Or is there something in the contrib modules that I might have missed?

 There are several oracle compatability modules:

 http://pgfoundry.org/projects/oracompat/
 http://pgfoundry.org/projects/orafce/

 I'm sure there's many more if you look...

If all you want is FIRST() and LAST() then:

-- Create a function that always returns the first non-NULL item
CREATE OR REPLACE FUNCTION public.first_agg ( anyelement, anyelement )
RETURNS anyelement AS $$
SELECT CASE WHEN $1 IS NULL THEN $2 ELSE $1 END;
$$ LANGUAGE SQL STABLE;

-- And then wrap an aggreagate around it
CREATE AGGREGATE public.first (
sfunc= public.first_agg,
basetype = anyelement,
stype= anyelement
);

-- Create a function that always returns the last non-NULL item
CREATE OR REPLACE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement AS $$
SELECT $2;
$$ LANGUAGE SQL STABLE;

-- And then wrap an aggreagate around it
CREATE AGGREGATE public.last (
sfunc= public.last_agg,
basetype = anyelement,
stype= anyelement
);


Hope that helps!

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-20 Thread Mike Rylander

On 1/19/06, Pollard, Mike [EMAIL PROTECTED] wrote:
 Martijn van Oosterhout wrote:

  Please provides natural keys for any of the following:
 
  - A Person
  - A phone call: (from,to,date,time,duration) is not enough
  - A physical address
  - A phone line: (phone numbers arn't unique over time)
  - An internet account: (usernames not unique over time either)

 Ahh, a challenge.  Hmm, not sure about all of them, but here goes:

 A Person - well, you could use a bit map of their fingerprints, or maybe
 their retinal scan.  Of course, that could change due to serious injury.
 Maybe some kind of representation of their DNA?

Unless the person in question happens to be a chimera (yes, they do exist).

;-)

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Surrogate keys

2006-01-20 Thread Mike Rylander

On 1/21/06, Christopher Browne [EMAIL PROTECTED] wrote:
  On 1/19/06, Pollard, Mike [EMAIL PROTECTED] wrote:
  Martijn van Oosterhout wrote:
 
   Please provides natural keys for any of the following:
  
   - A Person
   - A phone call: (from,to,date,time,duration) is not enough
   - A physical address
   - A phone line: (phone numbers arn't unique over time)
   - An internet account: (usernames not unique over time either)
 
  Ahh, a challenge.  Hmm, not sure about all of them, but here goes:
 
  A Person - well, you could use a bit map of their fingerprints, or maybe
  their retinal scan.  Of course, that could change due to serious injury.
  Maybe some kind of representation of their DNA?
 
  Unless the person in question happens to be a chimera (yes, they do exist).
 
  ;-)

 Yeah, I saw that episode of CSI!  :-)

Heh.  I didn't realize they did that already.  I was thinking of the
show I Am My Own Twin from the Discovery Health Channel.

Couldn't find a link from the official page, but:
http://www.globalspin.com/mt/archives/000547.html .

 --
 cbbrowne,@,gmail.com
 http://linuxdatabases.info/info/lsf.html
 Cat's motto: No matter what you've done wrong, always try to make it
 look like the dog did it. -- Unknown

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend



--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Why don't we allow DNS names in pg_hba.conf?

2006-01-02 Thread Mike Rylander

On 1/2/06, Jon Jensen [EMAIL PROTECTED] wrote:
 On Sun, 1 Jan 2006, Andreas Pflug wrote:

  While I haven't tried it, I suspect that allowing a DNS host name
  would take little work (basically removing the AI_NUMERICHOST flag
  passed to getaddrinfo in hba.c).  There was once a good reason not
  to allow it: slow DNS lookups would lock up the postmaster.  But
  now that we do this work in an already-forked backend, with an overall
  timeout that would catch any indefinite blockage, I don't see a good
  reason why we shouldn't let people use DNS names.
 [snip]
 
  Routers/firewalls that allow DNS names will usually resolve them 
  immediately,
  and store the IP addresses.

 I think it's a great idea to make DNS names an option in pg_hba.conf.
 However, I think we're talking about two fairly different features here.

 (1) What Tom proposed is that we store the hostname and do a new DNS
 lookup for every connection. That could be useful in certain situations,
 but I wouldn't use it for a busy production server. The additional time
 for DNS lookups (even with a fast local caching nameserver) would not be a
 price I'd want to pay for the convenience. In a development environment,
 it could be just the ticket, though.

IMHO, this is the price you pay for using DNS.


 (2) As Andreas mentioned, firewalls commonly do the DNS lookups when they
 read their configuration, and store the IP addresses. If we did this,
 you'd presumably only have to reload the postmaster configuration to cause
 the DNS lookups to be done again, but until then they'd be static and as
 fast as using actual IP addresses.

 If both were implemented, I'm not sure how you'd indicate which method you
 want on any given pg_hba.conf line.

 If we were only to have one of these, I'd prefer (2). You could always set
 up a cron job to reload the postmaster config hourly or daily, to keep the
 DNS lookups from getting too stale.

If we're going to use an external facility, I'd vote for using it as a
black box via the API it was designed with and not doing any caching
tricks, especially if those tricks involve reloading the config file
to make changes visible that would be seen if the facility were used
correctly.  I guess I just see it as a correct vs. fast decision. 
If caching were to be used, expiry should be based on the TTL for the
DNS record, and not a reload of our config.

 just my $0.02.


 Jon

 --
 Jon Jensen
 End Point Corporation
 http://www.endpoint.com/
 Software development with Interchange, Perl, PostgreSQL, Apache, Linux, ...

 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq



--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Reducing the overhead of NUMERIC data

2005-11-02 Thread Mike Rylander

On 11/2/05, Simon Riggs [EMAIL PROTECTED] wrote:
 On Tue, 2005-11-01 at 18:15 -0500, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   Anybody like to work out a piece of SQL to perform data profiling and
   derive the distribution of values with trailing zeroes?
 
  Don't forget leading zeroes.  And all-zero (we omit digits entirely in
  that case).  I don't think you can claim that zero isn't a common case.

 The question is: how common?

 For INTEGERs I would accept that many are often zero. For NUMERIC, these
 are seldom exactly zero, IMHO.

Seconded.  My INTEGER data does have a quite a few zeros but most of
my NUMERIC columns hold debits and credits.  Those are almost never
zero.


 This is one of those issues where we need to run tests and take input.
 We cannot decide this sort of thing just by debate alone. So, I'll leave
 this as a less potentially fruitful line of enquiry.

 Best Regards, Simon Riggs


 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings



--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] platform test

2005-10-31 Thread Mike Rylander

4x AMD Opteron (tm) Processor 852

   -

[EMAIL PROTECTED] /tmp/pgtestbuild/postgresql-8.1RC1 $ uname -a
Linux localhost 2.6.12-gentoo-r10 #1 SMP Fri Sep 9 09:43:22 EDT 2005
x86_64 AMD Opteron (tm) Processor 852 AuthenticAMD GNU/Linux



[EMAIL PROTECTED] /tmp/pgtestbuild/postgresql-8.1RC1 $ file
src/backend/postgres
src/backend/postgres: ELF 64-bit LSB executable, AMD x86-64, version 1
(SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs),
not stripped

   --


[EMAIL PROTECTED] /tmp/pgtestbuild/postgresql-8.1RC1 $ gcc --version
gcc (GCC) 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)

   --

[EMAIL PROTECTED] /tmp/pgtestbuild/postgresql-8.1RC1 $
src/bin/pg_config/pg_config
BINDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/pg_config
DOCDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/doc/postgresql
INCLUDEDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/include
PKGINCLUDEDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/include/postgresql
INCLUDEDIR-SERVER =
/tmp/pgtestbuild/postgresql-8.1RC1/src/bin/include/postgresql/server
LIBDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/lib
PKGLIBDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/lib/postgresql
LOCALEDIR =
MANDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/man
SHAREDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/share/postgresql
SYSCONFDIR = /tmp/pgtestbuild/postgresql-8.1RC1/src/bin/etc/postgresql
PGXS = 
/tmp/pgtestbuild/postgresql-8.1RC1/src/bin/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-perl' '--with-openssl'
'--enable-integer-datetimes' '--prefix=/tmp/pgtest/'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,/tmp/pgtest//lib
LDFLAGS_SL =
LIBS = -lpgport -lssl -lcrypto -lz -lreadline -lcrypt -lresolv -lnsl
-ldl -lm -lbsd
VERSION = PostgreSQL 8.1RC1

  --

==
 All 98 tests passed.
==

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] LDAP Authentication?

2005-10-10 Thread Mike Rylander

On 10/10/05, Magnus Hagander [EMAIL PROTECTED] wrote:
   You can do this today using PAM authenication, but this is
  not always
   possible. Notably it's never possible on Windows, and there are
   several unix platforms/distros that don't support it
  without a lot of
   work.
 
  Or you port PAM to Windows, and then everybody wins.

 Well, for one that's going to be a *lot* more work. I'm not even sure
 how many of the concepts would apply to win32, but then I don't really
 know PAM...


Most of the work has already been done:

http://pgina.xpasystems.com/

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Found small issue with OUT params

2005-09-30 Thread Mike Rylander

On 9/30/05, Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Fri, Sep 30, 2005 at 10:53:22AM -0700, Joshua D. Drake wrote:
  
  So you might notice little performance hit bringing back a million rows,
  and most of these type of single OUT params functions only return one
  row/value anyway.
  There would be zero perceivable difference in performance regardless of
  the extra overhead for a single value/row.
 
  Sounds like we need a test case... up for it?

 If there is a performance difference my vote is that we bite the bullet
 for 8.1 and accept the performance hit rather than settle for
 sub-optimal behavior. Much easier to fix the performance penalty down
 the road than to fix the behavior.

What about just returning the single OUT value named by the parameter,
instead of special casing single-OUT functions?  If I understand
correctly, Tom has just added a test to make single-OUT functions look
like RETURNS functions.  If that were removed then we'd have what, at
least by counting the responses on this thread, seems to be the
desired (and expected) behaviour.

Or I could just be misunderstanding the implementation again.

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Found small issue with OUT params

2005-09-29 Thread Mike Rylander

On 9/29/05, Tom Lane [EMAIL PROTECTED] wrote:
 Tony Caduto [EMAIL PROTECTED] writes:
  Please don't take this the wrong way, but don't you think even if a
  single param is declared as OUT it should return the name of the OUT param?

 Not really, because create function foo (in x int, out y float) is
 supposed to have the same external behavior as create function foo
 (in x int) returns float.  I agree it's a bit of a judgment call, but
 I do not see a case for changing it.


Just my $0.02, but that seems inconsistent.  In my mind, the
difference between functions with OUT params and functions that return
a RECORD (or a specific rowtype) is syntactic sugar.  I'm pretty sure
that this was used to explain the implementation when it was being
discussed, in fact.

Using that logic, a functions with one OUT param would be the same as
a function returning a rowtype with only one column, and the one
column in such a rowtype certainly has a name of it's own.

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] Function's LEAST, GREATEST and DECODE (Oracle vararg polymorphic functions)

2005-06-24 Thread Mike Rylander

On 6/24/05, Tom Lane [EMAIL PROTECTED] wrote:
 [ moving to -hackers for a wider audience ]
 
 Today's issue: should the GREATEST/LEAST functions be strict (return
 null if any input is null) or not (return null only if all inputs are
 null, else return the largest/smallest of the non-null inputs)?
 

[snip]

 
  Please, if You think, so Oracle way is good, correct it.
 
 I'm still favoring non-strict but it deserves more than two votes.
 Anybody else have an opinion?
 
 regards, tom lane
 

My $0.02: I'd prefer the non-strict version.

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] bitmap scans, btree scans, and tid order

2005-05-16 Thread Mike Rylander

On 5/16/05, Tom Lane [EMAIL PROTECTED] wrote:
 regression=# explain analyze select * from tenk1 where unique1 between 100 
 and 1000;
 QUERY PLAN
 --
  Bitmap Heap Scan on tenk1  (cost=9.58..381.53 rows=930 width=244) (actual 
 time=6.185..18.034 rows=901 loops=1)
Recheck Cond: ((unique1 = 100) AND (unique1 = 1000))
-  Bitmap Index Scan on tenk1_unique1  (cost=0.00..9.58 rows=930 width=0) 
 (actual time=4.522..4.522 rows=901 loops=1)
  Index Cond: ((unique1 = 100) AND (unique1 = 1000))
  Total runtime: 23.784 ms
 (5 rows)
 
 regression=# explain analyze select * from tenk1 where unique2 between 100 
 and 1000;
  QUERY PLAN
 -
  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..45.88 rows=805 
 width=244) (actual time=0.154..14.856 rows=901 loops=1)
Index Cond: ((unique2 = 100) AND (unique2 = 1000))
  Total runtime: 20.331 ms
 (3 rows)
 

Tom (or anyone with some round tuits and CVS-tip savy), if you have a
chance at some point would you post the non-bitmap version of the
query for tenk2 from above?  I'd be very interested to see if the
bitmap forced TID order fetch actually does help, or if it's
outweighed by the bitmap setup overhead.

TIA

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] FunctionCallN improvement.

2005-02-01 Thread Mike Rylander

On Tue, 01 Feb 2005 16:23:56 -0500, Tom Lane [EMAIL PROTECTED] wrote:
 a_ogawa [EMAIL PROTECTED] writes:
  I made the test program to measure the effect of this macro.
 
 Well, if we're going to be tense about this, let's actually be tense
 about it.  Your test program isn't a great model for what's going to
 happen in fmgr.c, because you've designed it so that Nargs cannot be
 known at compile time.  In the fmgr routines, Nargs is certainly a
 compile-time constant, and so implementations that can exploit that
 will have an advantage.
 

big snip

Here are some numbers for AMD64 (gcc -O2 -I
/opt/include/postgresql/server/ pg_test.c -o pg_test):

[EMAIL PROTECTED] miker $ time ./pg_test -memset 10
test MemSetLoop(2): 10

real1m15.896s
user1m15.881s
sys 0m0.006s
[EMAIL PROTECTED] miker $ time ./pg_test -origmacro 10
test OrigMacro(2): 10

real0m4.217s
user0m4.215s
sys 0m0.001s
[EMAIL PROTECTED] miker $ time ./pg_test -setmacro 10
test SetMacro(2): 10

real0m4.217s
user0m4.216s
sys 0m0.001s
[EMAIL PROTECTED] miker $ time ./pg_test -unrolled 10
test Unrolled(2): 10

real0m4.218s
user0m4.215s
sys 0m0.002s


and now with -O6:

[EMAIL PROTECTED] miker $ time ./pg_test -memset 10
test MemSetLoop(2): 10

real1m13.624s
user1m13.542s
sys 0m0.001s
[EMAIL PROTECTED] miker $ time ./pg_test -origmacro 10
test OrigMacro(2): 10

real0m2.929s
user0m2.926s
sys 0m0.001s
[EMAIL PROTECTED] miker $ time ./pg_test -setmacro 10
test SetMacro(2): 10

real0m2.929s
user0m2.926s
sys 0m0.000s
[EMAIL PROTECTED] miker $ time ./pg_test -unrolled 10
test Unrolled(2): 10

real0m2.510s
user0m2.508s
sys 0m0.001s


Now with NARGS = 5, -O2:

[EMAIL PROTECTED] miker $ time ./pg_test -memset 10
test MemSetLoop(5): 10

real1m15.204s
user1m15.175s
sys 0m0.002s
[EMAIL PROTECTED] miker $ time ./pg_test -origmacro 10
test OrigMacro(5): 10

real0m10.027s
user0m10.022s
sys 0m0.001s
[EMAIL PROTECTED] miker $ time ./pg_test -setmacro 10
test SetMacro(5): 10

real0m4.177s
user0m4.177s
sys 0m0.000s
[EMAIL PROTECTED] miker $ time ./pg_test -unrolled 10
test Unrolled(5): 10

real0m5.013s
user0m5.011s
sys 0m0.000s

And once more, with -O6:

[EMAIL PROTECTED] miker $ time ./pg_test -memset 10
test MemSetLoop(5): 10

real1m47.090s
user1m46.972s
sys 0m0.000s
[EMAIL PROTECTED] miker $ time ./pg_test -origmacro 10
test OrigMacro(5): 10

real0m8.367s
user0m8.358s
sys 0m0.000s
[EMAIL PROTECTED] miker $ time ./pg_test -setmacro 10
test SetMacro(5): 10

real0m3.349s
user0m3.345s
sys 0m0.000s
[EMAIL PROTECTED] miker $ time ./pg_test -unrolled 10
test Unrolled(5): 10

real0m3.347s
user0m3.343s
sys 0m0.000s


Hope the numbers help!

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] FunctionCallN improvement.

2005-02-01 Thread Mike Rylander

Sorry, forgot the compiler version.

gcc (GCC) 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)

On Wed, 2 Feb 2005 01:12:04 +, Mike Rylander [EMAIL PROTECTED] wrote:
 On Tue, 01 Feb 2005 16:23:56 -0500, Tom Lane [EMAIL PROTECTED] wrote:
  a_ogawa [EMAIL PROTECTED] writes:
   I made the test program to measure the effect of this macro.
 
  Well, if we're going to be tense about this, let's actually be tense
  about it.  Your test program isn't a great model for what's going to
  happen in fmgr.c, because you've designed it so that Nargs cannot be
  known at compile time.  In the fmgr routines, Nargs is certainly a
  compile-time constant, and so implementations that can exploit that
  will have an advantage.
 
 
 big snip
 
 Here are some numbers for AMD64 (gcc -O2 -I
 /opt/include/postgresql/server/ pg_test.c -o pg_test):
 
 [EMAIL PROTECTED] miker $ time ./pg_test -memset 10
 test MemSetLoop(2): 10
 
 real1m15.896s
 user1m15.881s
 sys 0m0.006s
 [EMAIL PROTECTED] miker $ time ./pg_test -origmacro 10
 test OrigMacro(2): 10
 
 real0m4.217s
 user0m4.215s
 sys 0m0.001s
 [EMAIL PROTECTED] miker $ time ./pg_test -setmacro 10
 test SetMacro(2): 10
 
 real0m4.217s
 user0m4.216s
 sys 0m0.001s
 [EMAIL PROTECTED] miker $ time ./pg_test -unrolled 10
 test Unrolled(2): 10
 
 real0m4.218s
 user0m4.215s
 sys 0m0.002s
 
 and now with -O6:
 
 [EMAIL PROTECTED] miker $ time ./pg_test -memset 10
 test MemSetLoop(2): 10
 
 real1m13.624s
 user1m13.542s
 sys 0m0.001s
 [EMAIL PROTECTED] miker $ time ./pg_test -origmacro 10
 test OrigMacro(2): 10
 
 real0m2.929s
 user0m2.926s
 sys 0m0.001s
 [EMAIL PROTECTED] miker $ time ./pg_test -setmacro 10
 test SetMacro(2): 10
 
 real0m2.929s
 user0m2.926s
 sys 0m0.000s
 [EMAIL PROTECTED] miker $ time ./pg_test -unrolled 10
 test Unrolled(2): 10
 
 real0m2.510s
 user0m2.508s
 sys 0m0.001s
 
 Now with NARGS = 5, -O2:
 
 [EMAIL PROTECTED] miker $ time ./pg_test -memset 10
 test MemSetLoop(5): 10
 
 real1m15.204s
 user1m15.175s
 sys 0m0.002s
 [EMAIL PROTECTED] miker $ time ./pg_test -origmacro 10
 test OrigMacro(5): 10
 
 real0m10.027s
 user0m10.022s
 sys 0m0.001s
 [EMAIL PROTECTED] miker $ time ./pg_test -setmacro 10
 test SetMacro(5): 10
 
 real0m4.177s
 user0m4.177s
 sys 0m0.000s
 [EMAIL PROTECTED] miker $ time ./pg_test -unrolled 10
 test Unrolled(5): 10
 
 real0m5.013s
 user0m5.011s
 sys 0m0.000s
 
 And once more, with -O6:
 
 [EMAIL PROTECTED] miker $ time ./pg_test -memset 10
 test MemSetLoop(5): 10
 
 real1m47.090s
 user1m46.972s
 sys 0m0.000s
 [EMAIL PROTECTED] miker $ time ./pg_test -origmacro 10
 test OrigMacro(5): 10
 
 real0m8.367s
 user0m8.358s
 sys 0m0.000s
 [EMAIL PROTECTED] miker $ time ./pg_test -setmacro 10
 test SetMacro(5): 10
 
 real0m3.349s
 user0m3.345s
 sys 0m0.000s
 [EMAIL PROTECTED] miker $ time ./pg_test -unrolled 10
 test Unrolled(5): 10
 
 real0m3.347s
 user0m3.343s
 sys 0m0.000s
 
 
 Hope the numbers help!
 
 --
 Mike Rylander
 [EMAIL PROTECTED]
 GPLS -- PINES Development
 Database Developer
 http://open-ils.org
 


-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Implementing Bitmap Indexes

2005-01-29 Thread Mike Rylander

On Sat, 29 Jan 2005 19:41:20 +0200, Victor Y. Yegorov [EMAIL PROTECTED] wrote:
 * Pawe Niewiadomski [EMAIL PROTECTED] [29.01.2005 17:45]:
   I'd like to implement bitmap indexes and want your comments. Here is
   an essence of what I've found regarding bitmaps for the last month.
  
  Do you think it would be possible to work on it as a team?
 
 Yes, why not.
 
 But everything depends on the community, may bitmaps will be realized as a
 contrib or pgfoundry module. The only thing --- I don't know, if that is
 possible for indexes.

For on-disk bitmap indexes, yes.  I don't see any reason this couldn't
be done with GiST, perhaps even as a generalization of the index stuff
in the int_array contrib module.  But the bitmaps that Tom as been
advocating, the ones used to join two index scans, will require a new
planner Op.

As a side note, wouldn't the in-memory bitmaps pretty much kill the
need for multicolumn indexes?  It seems that they would be able to
join index scans on the same table, and then there would be no need
for industrial strength cross-column correlation stats.  The planner
would be able to choose a multi index scan based on multiple single
column stat entries and completely sidestep the need for precalculated
cross-column correlations.  Am I getting that right?

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Implementing Bitmap Indexes

2005-01-29 Thread Mike Rylander

On Sun, 30 Jan 2005 11:07:59 +1100, Neil Conway [EMAIL PROTECTED] wrote:
 Mike Rylander wrote:
  For on-disk bitmap indexes, yes.  I don't see any reason this couldn't
  be done with GiST
 
 It might be possible to do it with GiST, but GiST is designed for
 implementing tree-structured indexes; I don't think it's the right tool
 for the job.

For the initial example where the index is implemented as a set of
unique keys from the table and a bitmap for each key this would look a
unique index, but with an extra datum at at each index node to hold
the bitmap for that key.  If implemented that way an augmented B-Tree
structure would work fine.  At least that's how I would imagine an
on-disk bitmap index would work.  I suppose that would make the index
much more efficient for high-cardinality values, no?

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Implementing Bitmap Indexes

2005-01-29 Thread Mike Rylander

On Sun, 30 Jan 2005 12:15:20 +1100, Neil Conway [EMAIL PROTECTED] wrote:
 It might _work_, I just don't see the point. Given an attribute of a
 heap relation that has N distinct values and T tuples, you need to store
 
 - N bitmaps, each of T bits (before compression)
 - T ctids
 - a way to map from a bit in one of the bitmaps to a heap tuple
 - a way to decide which bitmap(s) to use for a given index scan
 
 I don't see why it's a win to organize this data in a tree. Why not
 store the ctids in a simple array? You then know that bit K of any
 bitmap refers to entry K of the ctid array. You'd also need some meta
 data to figure out which bitmap to use for a given scankey, but it
 should be pretty easy to do that efficiently.

OK, I think it just clicked.  I was seeing a tree for the initial
lookup to find the right bitmaps to scan.  Does that seem like to much
overhead for the first step?

So, pick the bitmap(s) based on the key, scan the bitmaps and combine
them based on the WHERE condition combination type, and as you find
matching bits you toss the ctids into a matching array.  Then it's a
fast ctid scan.  That it?  I'm very interested in this after reading a
bit (heh he) about bitmap indexes.  Here's how I'm visualizing it now:

For a query like SELECT * FROM table WHERE a IN (1,3) ...

Index on table.a looks like:

bitmaps
1 | 001001001001000
2 | 1001011
3 | 010110100010110

ctids
1 | {2,5,8,11}
2 | {0,7,9,14}
3 | {1,3,4,6,10,12,13}


The index scan would do bitwise a OR on bitmaps 1 and 3, find the
possition of the 1s, jump to those possitions in the ctid array, and
bounce to the heap for the value.


-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] DBD::PgSPI 0.02

2004-12-06 Thread Mike Rylander

On Mon, 6 Dec 2004 00:27:18 -0500 (EST), [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:
 Hello,
 
 A short note that I've updated DBD::PgSPI version 0.02 to CPAN.
 
 There are no new features - but the code now expects (and works with)
 reasonably decent versions of perl (5.8.x) and pgsql (8.x).

Just so that you have some info, I've been using DBD::PgSPI with Pg
8.0 since beta 1.  The only restriction I've run into with the old
code is that it doesn't like the DBD 'do' method.  I have to use
execute/fetchX or selectX, but other than that it seems to work.  I'll
be grabbing the update to test soon.

 
 No warranty is given, this code compiles and 'scratches my itch'. If it
 happens to scratch yours, more the merrier.
 

Thanks for scratching your itch!  I'm sure you're merrie than you know. :)

 -alex
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
 


-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] DBD::PgSPI 0.02

2004-12-06 Thread Mike Rylander

On Mon, 6 Dec 2004 08:17:29 -0500 (EST), [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:
 On Mon, 6 Dec 2004, Mike Rylander wrote:
 
  Just so that you have some info, I've been using DBD::PgSPI with Pg 8.0
  since beta 1.  The only restriction I've run into with the old code is
  that it doesn't like the DBD 'do' method.  I have to use execute/fetchX
  or selectX, but other than that it seems to work.  I'll be grabbing the
  update to test soon.
 'do' seems to work for me. What happens for you?

With v. 0.01 the statement just doesn't seem to execute.  I know
that's odd, because it's supposed to be doing a prepare/execute
internally, but nothing happens.

 
 I put a version of code with a bit more fixes from comments onlist to
 www.pilosoft.com/PgSPI/DBD-PgSPI-0.03pre.tar.gz
 
 Please download and try it.
 

I'll grab it now, but I'm not sure when I'll have time to test it. 
I'm mired in Unicode work right now...  I'll get back to you as soon
as I've had a chance to work with it, though.

 -alex
 
 

Again, thanks for all your work on PgSPI.  I'll be moving all my
plperlu functions out to a module and using PgSPI to allow me to run
the code from my middleware layer as well as in-DB.  That way I only
need to write the logic once.

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Error handling in plperl and pltcl

2004-12-01 Thread Mike Rylander

On Wed, 01 Dec 2004 10:29:17 +0100, Thomas Hallgren
[EMAIL PROTECTED] wrote:
 Richard Huxton wrote:
 
  But is the problem not that forgetting to use SAVEPOINT can get us in
  trouble with clearing up after an exception?
 
 I fail to see how that's different from forgetting to use pgtry instead
 of try.

I see that as a non-starter.  At least in the case of perl, we can
actually hide pgeval behind the standard eval.  If pgeval were
equivelant to, say, 'savepoint(foo); CORE::eval @_;' then the onus
is still on the developer to use 'eval', but that is a familiar
concept to defensive developers.

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] patch: plpgsql - access records with rec.(expr)

2004-11-23 Thread Mike Rylander

Just to put in my .02$, I would absolutely love to see this
functionality included in plpgsql.  With some extra error checking for
the know changing datatype failure, and docs that mention that
limitation, I'd say this is a great extension to the language. 
plpgsql feels quicker than the interpreted PLs and it's far easier
than C to work with for writing triggers, so this patch makes plpgsql
a much more attractive target for general purpose stored procs.  And
my gut feeling is that an EVALUATE statement would be significantly
slower than this.

In any case, thanks for the great work, Matt.  Please, CORE, include this one!

As an alternative, what would be the possibility of creating a new PL
as a contrib module, say PLPGSQL_NG, to move forward with extensions
like this and perhaps EVALUATE?

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer

On 23 Nov 2004 09:03:10 +, Matt [EMAIL PROTECTED] wrote:
   See your point. But what about NEW.($1)?
 
  I don't follow -- what do you mean?
 
 I want to be able to be able to write a trigger function that accesses a
 column passed as an argument to the function in the row that caused the
 trigger. This is my use case.
 
 I guess that would actually written NEW.(TG_ARGV[1]).
 
  (BTW, I think my comment also applies to variables of type text and
  similar -- I think the patch would be a lot simpler if you just
  implement access to record fields by ordinal position, and don't
  implement access by field name.)
 
 Yes, it would be marginally simpler: I'd still have to call
 exec_eval_datum() on the variable and check whether it could be
 evaluated to an integer (trigger args are all text AFAIK). The only
 difference would be throwing an error if it wasn't, instead of making
 use of the value... and a slightly less readable 'create trigger'
 statement.
 
 It would be a good idea to check that the variable was either a constant
 or a trigger arg. This would stop the looping problem, since the type of
 the underlying field couldn't change.
 
 But I've somehow got the feeling that this sort of thing isn't the
 issue. The issue is whether we want to allow dynamic access to columns
 in any syntax at all. A simple yes or no would do :)
 
 Matt
 
 BTW: here's the original post adding the rec.(3) syntax to the TODO:
 http://archives.postgresql.org/pgsql-hackers/2003-07/msg00425.php
 here's someone else who tried something very similar:
 http://archives.postgresql.org/pgsql-hackers/2003-09/msg00533.php
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Odd plpgsql behaviour

2004-11-15 Thread Mike Rylander

On Mon, 15 Nov 2004 15:12:24 +0800, Christopher Kings-Lynne
[EMAIL PROTECTED] wrote:
 On 7.4:
 
 This is what we wanted to do:
 
 IF TG_OP = 'INSERT' OR (TG_OP = 'UPDATE' AND NEW.name != OLD.name) THEN
   EXECUTE x;
 END IF;
 
 However, we had to write it like this:
 
 IF TG_OP = 'INSERT' THEN
   EXECUTE x;
 ELSIF TG_OP = 'UPDATE' AND NEW.name != OLD.name THEN
   EXECUTE x;
 END IF;
 
 Because in the first case it would complain that OLD.name wasn't
 defined, if the trigger was NOT an update.
 
 OK, but the second case works??!?!  Is this a weird peculiarity of the
 pl/pgsql lazy evaluation rules?  Why doesn't the first one work if the
 second one does?

IIRC, the reason for this is that the entire IF test is passed to the
SQL engine as a SELECT statement after replacing the TG_* identifiers
with their respective values.

Your first example is essentially

   IF (SELECT (TG_OP = 'INSERT' OR (TG_OP = 'UPDATE' AND NEW.name !=
OLD.name) IS TRUE) ...

In this case, since OLD.name does not exist during INSERT it cannot be
replaced.  Perhaps someone else can shed a little more light on this.

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] APR 1.0 released

2004-10-08 Thread Mike Rylander

A while back I was looking the backend code in preparation to start
beginning to look at parallelization techniques for PG ;)... My
thought was instead of trying to parallelize each individual plan node
(multi-process sort, etc.) I would look at creating worker
threads/processes for each plan node as a whole.  For example, take a
plan that looks like this:

 QUERY
PLAN
-
 Subquery Scan metarecord_field_entry_view  (cost=5.32..4038.80
rows=21 width=112)
   -  Append  (cost=5.32..4038.59 rows=21 width=112)
 -  Subquery Scan *SELECT* 1  (cost=5.32..5.33 rows=1 width=74)
   -  HashAggregate  (cost=5.32..5.32 rows=1 width=74)
 -  Index Scan using tmr_fe_field on
metarecord_title_field_entry  (cost=0.00..5.31 rows=1 width=74)
   Index Cond: (field = 'added_entry_author'::text)
   Filter: ((field_class = 'title'::text) AND
(value ~~* '% joe %'::text))
 -  Subquery Scan *SELECT* 2  (cost=4031.02..4031.20
rows=18 width=62)
   -  HashAggregate  (cost=4031.02..4031.02 rows=18 width=62)
 -  Seq Scan on metarecord_author_field_entry 
(cost=0.00..4030.79 rows=18 width=62)
   Filter: ((field_class = 'author'::text) AND
(field = 'added_entry_author'::text) AND (value ~~* '% joe %'::text))
 -  Subquery Scan *SELECT* 3  (cost=2.03..2.04 rows=1 width=81)
   -  HashAggregate  (cost=2.03..2.03 rows=1 width=81)
 -  Index Scan using smr_fe_field on
metarecord_subject_field_entry  (cost=0.00..2.02 rows=1 width=81)
   Index Cond: (field = 'added_entry_author'::text)
   Filter: ((field_class = 'subject'::text)
AND (value ~~* '% joe %'::text))
 -  Subquery Scan *SELECT* 4  (cost=0.01..0.02 rows=1 width=112)
   -  HashAggregate  (cost=0.01..0.01 rows=1 width=112)
 -  Seq Scan on metarecord_misc_field_entry 
(cost=0.00..0.00 rows=1 width=112)
   Filter: ((field_class = 'misc'::text) AND
(field = 'added_entry_author'::text) AND (value ~~* '% joe %'::text))

The optimizer would look at each node as it walked down the tree and
see that 'Append' node has multiple peer child nodes.  It would look
at the cost estimate of the child nodes and if that cost is greater
that the total average cost across all nodes it would spin off a
worker thread/process to handle gathering the sub-resultset.

In any case, I've no time to even *start* looking into something like
that.  But even if I did, am I all wet?

--miker

On Fri, 8 Oct 2004 11:56:27 -0400 (EDT), Bruce Momjian
[EMAIL PROTECTED] wrote:
 Neil Conway wrote:
  Marc G. Fournier wrote:
   Do we have 'make backend thread safe' listed yet?  As I recall it, until
   that gets done, parallelization of anything was considered to be a
   relatively onerous task, no?
 
  ISTM there's no reason we couldn't parallelize query execution using the
  same IPC techniques that we use now. What would be the advantage of
  using threads instead?
 
 Separate processes.  Yes, we could do that too and the item mentions that.
 
 --
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Mislabeled timestamp functions (was Re: [SQL] [NOVICE] date_trunc'd timestamp index possible?)

2004-10-03 Thread Mike Rylander

Not that my 2c is worth 1c, but I second this.  I'd rather initdb now
than get bitten by some catalog difference when I move my DB into
production. :)

--miker

On Sat, 02 Oct 2004 14:22:50 -0400, Tom Lane [EMAIL PROTECTED] wrote:
[...]
 
  I'd prefer if all users of 8.0 were guaranteed to have the same catalog.
 
 Well, there's something to be said for that viewpoint too.  Anyone else
 feel the same?
[...]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] FYI: 8.0beta2 on Monday

2004-08-29 Thread Mike Rylander

Tom Lane wrote:

 Just so you know --- core has agreed that it's about time for beta2.
 If you've got any must fix issues, please get 'em in over the weekend.

Will we be looking at a re-initdb with beta2?  I didn't notice any changes
that would force it, but just to be clear...

 
 regards, tom lane
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

TIA

--miker

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] nntp server craziness

2004-08-06 Thread Mike Rylander

Seems the NNTP server went wonky again... 

TIA!

-- 
Mike Rylander
[EMAIL PROTECTED]

  Indentation is a wonderful form of commentary from
  programmer to programmer, but its symbology is
  largely wasted on the computer. We don't tell poets
  how to format their poetry.
-- Larry Wall

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Portals and nested transactions

2004-07-14 Thread Mike Rylander

Tom Lane wrote:

 Alvaro Herrera [EMAIL PROTECTED] writes:
 On Tue, Jul 13, 2004 at 04:57:06PM -0400, Tom Lane wrote:
 I've been thinking about what to do with cursors in subtransactions.
 
 So within this proposal, a query executed by normal means will get its
 resources saved in the transaction ResourceOwner?
 
 No, *all* queries are executed within portals.  The reason we need a
 transaction ResourceOwner is because query parsing/planning happens in
 advance of creating the portal, so we need someplace to keep track of
 resources acquired during that process.
 
 How is the unnamed portal affected by it?
 
 Same as the rest.
 
 I don't recall whether SPI creates actual portals, but we'd definitely
 want it to create a new ResourceOwner for queries it runs.
 
 On the other hand, some people supported the idea that v3 Bind portals
 should behave nontransactionally, while DECLARE portals should behave
 transactionally.  Maybe we could make that a property of the portal, or
 even a user-selectable property (where we would define a reasonable
 default behavior).
 
 This is certainly possible.  Whether it's a good idea needs further
 discussion...

I didn't want to be the first to speak up on this as I'm relatively new to
the group (so thank you Alvaro), but I would definitely perfer the option
of either trans or non-trans behavior.  I can see using the non-trans
behavior in a cursor based FOR loop with a savepoint/subtrans allowing me
to fail on row x and continue on to row x+1 immediately.  Then, after
choosing trans-mode, I could implement a multi-strategy row processor.

Of course, just to be difficult, my ideal default would be:

 Q1 -- Portals close
 Q2 -- Portals do NOT roll back to previous state.

However, I do see the logical inconsistency in that.  But then again,
subtransactions/savepoints are not ACID, so it seems to be implementation
dependent.

 
 regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

-- 
--miker

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Portals and nested transactions

2004-07-14 Thread Mike Rylander

Mike Rylander wrote:

 Tom Lane wrote:
 
 Alvaro Herrera [EMAIL PROTECTED] writes:
 On Tue, Jul 13, 2004 at 04:57:06PM -0400, Tom Lane wrote:
 I've been thinking about what to do with cursors in subtransactions.
 
 So within this proposal, a query executed by normal means will get its
 resources saved in the transaction ResourceOwner?
 
 No, *all* queries are executed within portals.  The reason we need a
 transaction ResourceOwner is because query parsing/planning happens in
 advance of creating the portal, so we need someplace to keep track of
 resources acquired during that process.
 
 How is the unnamed portal affected by it?
 
 Same as the rest.
 
 I don't recall whether SPI creates actual portals, but we'd definitely
 want it to create a new ResourceOwner for queries it runs.
 
 On the other hand, some people supported the idea that v3 Bind portals
 should behave nontransactionally, while DECLARE portals should behave
 transactionally.  Maybe we could make that a property of the portal, or
 even a user-selectable property (where we would define a reasonable
 default behavior).
 
 This is certainly possible.  Whether it's a good idea needs further
 discussion...
 
 I didn't want to be the first to speak up on this as I'm relatively new to
 the group (so thank you Alvaro), but I would definitely perfer the option
 of either trans or non-trans behavior.  I can see using the non-trans
 behavior in a cursor based FOR loop with a savepoint/subtrans allowing me
 to fail on row x and continue on to row x+1 immediately.  Then, after
 choosing trans-mode, I could implement a multi-strategy row processor.
 
 Of course, just to be difficult, my ideal default would be:
 
  Q1 -- Portals close
  Q2 -- Portals do NOT roll back to previous state.
 
 However, I do see the logical inconsistency in that.  But then again,
 subtransactions/savepoints are not ACID, so it seems to be implementation
 dependent.
 

To make that a little more specific, something along the lines of:

DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
CURSOR [ { WITH | WITHOUT } HOLD ] FOR query
[ FOR { READ ONLY | UPDATE [ OF column [, ...] ] } ]
[ IN { LEXICAL | GLOBAL } SCOPE
^^^

... or some such... I think in perl. :)

 
 regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
--miker

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] plperl (7.5)

2004-07-13 Thread Mike Rylander

posted  mailed

Tom Lane wrote:

 Alvaro Herrera [EMAIL PROTECTED] writes:
 On Sat, Jul 10, 2004 at 09:18:28PM -0700, elein wrote:
 The new plperl returns sets by having
 the function return an array.
 
 I think RETURN NEXT does the same thing anyway ... they just store
 tuples in a Tuplestore and then the whole thing is returned.  So the
 function actually doesn't return until the whole function is done.
 
 However, it's likely that the tuplestore infrastructure can deal
 comfortably with sets far larger than a Perl array can.  (For one thing,
 it will swap tuples out to a temp file on disk once the set size exceeds
 work_mem.)  I think elein's concern is justified, unless someone can
 produce a test case showing that plperl actually performs OK with a
 large result set.
 
 As a simple test for plpgsql's speed with such things, I tried
 
 create function seq(int) returns setof int as '
 begin
   for i in 1..$1 loop
 return next i;
   end loop;
 return;
 end' language plpgsql;
 
 regression=# \timing
 Timing is on.
 regression=# select count(*) from seq(10);
  count
 
  10
 (1 row)
 
 Time: 396.524 ms
 regression=# select count(*) from seq(100);
   count
 -
  100
 (1 row)
 
 Time: 3615.115 ms
 regression=# select count(*) from seq(1000);
   count
 --
  1000
 (1 row)
 
 Time: 40356.972 ms
 
 My Perl is too rusty to immediately whip out the equivalent incantation
 in plperl; would someone like to compare the timings on their own machine?
 

I don't have access to a machine with plperl installed, but it would be very
close to this:

create function seq(int) returns setof int as $$
my $count = shift;
my $ret = [];
for my $i ( 1 .. $count ) {
push @$ret, $i;
}
return $ret;
$$ language 'plperl';

... hmmm... the push line may need to be:

push @$ret, { val = $i };

Hope it helps!

 regards, tom lane
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings

-- 
--miker

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Nested Transactions, Abort All

2004-07-13 Thread Mike Rylander

posted  mailed

Dennis Bjorklund wrote:

 On Sat, 10 Jul 2004, Mike Rylander wrote:
 
 They do, if only to make particular constructs easier to write.  This is
 an opinion, but for example an EXCEPTION framework for plpgsql would be
 easier to implement and use if it used the nested transactions rather
 than savepoint syntax:
 
 CREATE FUNCTION blah () RETURNS INT LANGUAGE PLPGSQL AS '
 BEGIN
 BEGIN NESTED;
 do some work...
 BEGIN NESTED;
 do other work...
 EXCEPTION WHEN SQLSTATE = already_exists THEN
 do alternate work with its own error checking...
 END NESTED;
 EXCEPTION WHEN SQLSTATE = fkey_violation THEN
 ROLLBACK NESTED;
 END NESTED;
 END;';
 
 I realize this can be done with nested savepoints and that is what the
 spec requires,
 
 Lets look at what it can look like:
 
 BEGIN
   SAVEPOINT nested;
   do some work...
   SAVEPOINT nested2;
   do other work...
   EXCEPTION WHEN SQLSTATE = already_exists THEN
   ROLLBACK TO SAVEPOINT nested2;
   do alternate work with its own error checking...
   RELEASE nested2;
   EXCEPTION WHEN SQLSTATE = fkey_violation THEN
   ROLLBACK TO SAVEPOINT nested;
   RELEASE nested;
 END;
 
 
 Now, in what way is this more complicated?

Only in that you need to define a name for each savepoint in order to create
the hierarchy.  And that is my point, savepoints impose more work on the
user to create a logical hierarchy, not that they cannot be used for
hierarchical structures.

 
 I'm not 100% sure how the exceptions that you used above work. Do that
 always rollback the transaction thay are in? In one of the exceptions you
 did a rollback but not in the other. In my example I added a rollback in
 the first exception handler. Maybe you forgot it there?

That was just pseudo-code and wholly invented in my head, but based on an
earlier expample of possible EXCEPTION syntax.  The idea is that when a
subtransaction is in an aborted state due to an error the EXCEPTION clause
would implicitly roll back that subtransaction and open a new transaction
from its own block.  This EXCEPTION subtrans is only used in the case of an
error in the matching BEGIN NESTED block, and the two share the COMMIT
statement, syntacticly speaking.  Think of it as a try { ... } catch
[type] { ... } finally { commit } type structure.

 
 In any case. I don't see this as any harder then your example.
 

It's not harder, per se, but it does impose a more difficult to maintain
syntax, IMHO.

  Savepoints have more possibilities, you can invalidate older savepoints
  then the last (with subtransactions you can only commit/rollback the
  last).
 
 This implies that savepoints are flat.  It won't be that way under the
 covers, but it does give that impression, and flat savepoint space is
 definitely suited to a different class of problems than nested
 transactions.
 
 First, my claim above was wrong. As Gavin pointed out in another mail, if
 one have savepoints p1 and p2 and release p1 then also p2 is released.
 It's possible to implement both kinds of behaviour using Alvaros work, but
 the standard demands the simpler one where p2 is also released.
 
 Now, about the flatness. Savepoints are not flat. They are sort of flat in
 a savepoint level. But, for example when you call a function you get a new
 savepoint level. I actually don't want to call it flat at all. The example
 above does not overwrite the savepoints nested and nested2 that might
 exist before the call, since this is a higher savepoint level.
 

OK, savepoints are not REALLY flat, but they are not hierarchically nested
either.  They are cumulative.  They can be used, as you showed above, in a
hierarchy, but as I said, they are not by their nature nested.

 I'm not sure exactly what it is that defines a new savepoint level, but a
 function call does and maybe some other things.
 

As for savepoint levels in functions, that is a scoping issue imposed by the
functions themselves, not by the savepoint syntax.  It would be nonsensical
to rollback to a savepoint outside a function, just as it would be
nonsensical to rollback the outer transaction from within the function. 
Allowing either would cause undesired action at a distance and possibly
violate the A in ACID.  The way I see it, savepoint levels should be
specified by function calls, as you said, and by the transaction nesting
level.

 BTW, I would imagine that savepoints will be implemented as nested
 transactions with detachable labels... the label can move from a
 transaction to one of its descendants, and that outer (sub)transaction
 will be implicitly COMMITed with its parent.
 
 Yes. That's my view as well.
 

Well, at least we agree on that ;)

 Alvaro found

Re: [HACKERS] Nested Transactions, Abort All

2004-07-13 Thread Mike Rylander

posted  mailed

Dennsnippetssklund wrote:

 On Fri, 9 Jul 2004, Mike Rylander wrote:
 
 Nested transactions and savepoints serve two different purposes.  They
 have some overlap, but for the most part solve two distinct problems.
 
 Then show some examples that illustrait the difference. So far all
 examples shown that uses subtransactions could just as well have been
 written using savepoints.
 

After seeing some more snippets of the SQL2003 spec it seems that this is
true, and that there is more of a syntactic difference than functional.
This does not seem to be the case for Oracle (the other major
implementation that has been cited for SAVEPOINT syntax), as savepoints in
Oracle are not logically nested.  Note that I am going on the statements
from others on this list for this point...

 I don't agree that they have two different purposes.

They do, if only to make particular constructs easier to write.  This is an
opinion, but for example an EXCEPTION framework for plpgsql would be easier
to implement and use if it used the nested transactions rather than
savepoint syntax:

CREATE FUNCTION blah () RETURNS INT LANGUAGE PLPGSQL AS '
BEGIN
BEGIN NESTED;
do some work...
BEGIN NESTED;
do other work...
EXCEPTION WHEN SQLSTATE = already_exists THEN
do alternate work with its own error checking...
END NESTED;
EXCEPTION WHEN SQLSTATE = fkey_violation THEN
ROLLBACK NESTED;
END NESTED;
END;';

I realize this can be done with nested savepoints and that is what the spec
requires, but in other major implementations of savepoints this nested
exception handling would be more difficult to write.  Again, simply my
opinion.

 
 I don't think so, especially as there has been some talk of implementing
 savepoints as a subset of nested transactions.
 
 It is not a subset. It's the other way around. Nested transactions are a
 subset of savepoints

Perhaps I got my wires crossed a bit there.  And again, after looking at
some more of the SQL2003 spec this does seem to be the case.  I cry your
pardon! :)

 
 Savepoints have more possibilities, you can invalidate older savepoints
 then the last (with subtransactions you can only commit/rollback the
 last).

This implies that savepoints are flat.  It won't be that way under the
covers, but it does give that impression, and flat savepoint space is
definitely suited to a different class of problems than nested
transactions.

 If you don't use that then it's exactly the same as 
 subtransactions.
 

I don't see this.  Nested transactions present a hierarchal interface to the
user, savepoints don't, especially considering that those familiar with
PL/SQL know that savepoints are not nested.  Now, savepoints can be used IN
a hierarchy, but they do not DEFINE one as nested transactions do.

I look at it this way: Let's have both, and where a user wants a flat
transaction space, as that may suit the needs of the problem, they will use
SAVEPOINT syntax; if the user would perfer an explicit hierarchy they can
use nested transactions.  Everyone wins!

 The only feature subtransactions have that savepoints doesn't is the
 lack of names. Every savepoint have a name. If we want an extension it
 could be to get the database to generate a fresh savepoint name. The
 client can of course also generate unique savepoint names if it want.

I don't think they can be compared like that, feature for feature.  Although
I agree with you that they provide extremely similar feature sets, the
present different interfaces to the user.  They may end up being backed by
the exact same code but the syntax and logical structure will surely
differ, and when a user wants labeled rollback point they will use
savepoints.  When s/he wants hierarchical rollback points they will use the
nested transactions syntax.

BTW, I would imagine that savepoints will be implemented as nested
transactions with detachable labels... the label can move from a
transaction to one of its descendants, and that outer (sub)transaction will
be implicitly COMMITed with its parent.

 
 That subtransactions do more than savepoints is just smoke an mirrors. So
 far there have been no example to validate that point of view, and I don't
 think there will be any. If anyone know of something that you can do with
 subtransactions and not with savepoints, please speak up.
 

You have opened my eyes to the fact that savepoints and nested transactions
can be used for most of the same problems, however I don't see this as a
one-or-the-other proposition.

Alvaro found it easier to implement nested transactions, he forged ahead and
did it.  Now, because of good design or simple luck, we should be able to
implement savepoints fairly easily.  To me this is the best we could have
hoped for, as it means that not only will be support the entire SQL2003
spec WRT savepoints, we actually get to present a richer

Re: [HACKERS] Nested Transactions, Abort All

2004-07-09 Thread Mike Rylander

Dennis Bjorklund wrote:

 On Fri, 9 Jul 2004, Bruce Momjian wrote:
 
 I think we agreed on BEGIN NESTED/COMMIT NESTED, and START NESTED
 TRANSACTION and COMMIT NESTED TRANSACTION.
 
 Should I read this as pg will get its own implementation of sub
 transactions and not implement the almost equivalent standard (sql99)
 savepoint feature?
 
 Will we in the future see savepoints as well?

I'm not a core developer, but that is what it looks like.

 And when that happen, should 
 we then recommend that people use the standard feature and stay away from
 the pg only feature?

Nested transactions and savepoints serve two different purposes.  They have
some overlap, but for the most part solve two distinct problems.

 
 Doesn't anyone but me think is all backwards?
 

I don't think so, especially as there has been some talk of implimenting
savepoints as a subset of nested transactions.

--miker


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Quick question regarding tablespaces

2004-07-08 Thread Mike Rylander

On Thursday 01 July 2004 09:26 pm, Scott Marlowe wrote:
 On Thu, 2004-07-01 at 18:54, Gavin Sherry wrote:
  On Thu, 1 Jul 2004, Mike Rylander wrote:
   On Thursday 01 July 2004 06:43 pm, Gavin Sherry wrote:
Hi Mike,
   
In this release, unfortunately not.
  
   That't too bad, but it's not that urgent I suppose.
  
I had some idea early on of putting rand_page_cost in pg_tablespace
and having the planner have access to it for costing. I didn't
actually get around to it but. :-(
  
   Well, I haven't looked at the PG source before, but if you have some
   specific design ideas I would be glad to help out.  I'm just not sure
   where (or when, with the official release coming (sort of) soon) to
   start, but with some pointers I'll do what I can!
 
  Well, it wont be in 7.5. Feel free to start looking at how
  random_page_cost in cost_index(). It might be worthwhile introducing a
  per tablespace performance factor so that we could could say that the
  cost of fetching an index tuple from tablespace A is half that of
  fetching an index tuple from tablespace B. That idea might not actually
  turn out to be a very good one once I look at it closely though.

 How about having a per cluster / database / tablespace / table type
 setup that goes in a hierarchy, if they're there.  I.e. if the database
 doesn't have it's own random_page_cost, it inherits from cluster, if a
 tablespace doesn't have one, it inherits from cluster-database, and so
 on to individual tables / indexes.

I was thinking of purely tablespace-based random_page_cost, as that variable 
is tied to the access time of a particular filesystem.

 It may be that it's easier to 
 implement for them all now while doing it for tablespaces.  Just
 wondering.  I'm a user, not a hacker, so I have no idea how much that
 idea makes any sense, but I would certainly love to be able to set an
 index to have a random_page_cost effect of 1.1 while the table it lives
 in is 1.3, the tablespace 1.4, and so on.  But not required, because it
 always inherits from the parent if it doesn't have one, like stats
 target.

I have been thinking about something along the lines of a 'user_cost_push' 
index attribute.  This would default to 1 (if not set) and would be 
multiplied against the cost of the plan node for the index to help or hurt 
the use of the index in cases where the planner consistently makes the wrong 
choice regarding the use of the index (due to funky stats, etc.).

Though perhaps I am just thinking around the problem.  I know there has been 
some pretty big work done on the stats collector recently.

--miker


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Quick question regarding tablespaces

2004-07-08 Thread Mike Rylander

On Thursday 01 July 2004 08:54 pm, Gavin Sherry wrote:
 On Thu, 1 Jul 2004, Mike Rylander wrote:
  On Thursday 01 July 2004 06:43 pm, Gavin Sherry wrote:
   Hi Mike,
  
   In this release, unfortunately not.
 
  That't too bad, but it's not that urgent I suppose.
 
   I had some idea early on of putting rand_page_cost in pg_tablespace and
   having the planner have access to it for costing. I didn't actually get
   around to it but. :-(
 
  Well, I haven't looked at the PG source before, but if you have some
  specific design ideas I would be glad to help out.  I'm just not sure
  where (or when, with the official release coming (sort of) soon) to
  start, but with some pointers I'll do what I can!

 Well, it wont be in 7.5. Feel free to start looking at how
 random_page_cost in cost_index().

I will start looking there.

 It might be worthwhile introducing a per 
 tablespace performance factor so that we could could say that the cost of
 fetching an index tuple from tablespace A is half that of fetching an
 index tuple from tablespace B.

As random_page_cost is tied directly to the performance of a filesystem, my 
thought was to leave the setting from the config file as a cluster-wide (and 
default tablespace) setting that would be overridden by a tablespace specific 
setting... i.e.

ALTER TABLESPACE ... SET RANDOM PAGE COST x.x;

or even setting a scaling factor that would shift the global random page cost.  
this scaling factor would be set on all tablespaces and would have a default 
of 1.  Then it could be set lower ( 0.5 means that tablespace is 2 times 
faster than the default tablespace, or global setting).  Is that more what 
your were thinking?

 That idea might not actually turn out to be 
 a very good one once I look at it closely though.


If the latter is what you were thinking, I tend to agree.  But I think a 
direct setting for each tablespace would be a very big benefit.  At least I'm 
pretty sure I would use it :)

--miker

 Gavin


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Quick question regarding tablespaces

2004-07-01 Thread Mike Rylander

Now that PG will have tablespaces I can stick my really high I/O data on a
fiberchannel array, and save some money by putting the rest of it (also the
majority of it) on less expensive SCSI RAID sets.  Will I also be able to
tune individual tablespaces with the likes of random_page_cost?  Sorry if I
missed this somewhere...

TIA

--miker

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] MERGE-esque Functionallity (was: Adding MERGE to the TODO list (resend with subject))

2004-05-14 Thread Mike Rylander

On Tuesday 11 May 2004 09:44 am, Bruce Momjian wrote:
[snip]
Bruce Momjian kirjutas E, 10.05.2004 kell 06:58:
 Added to TODO:

   * Add MERGE command that does UPDATE, or on failure, INSERT
   
[snip]

Hello all.

I have been lurking here for a bit and the MERGE topic (on [HACKERS]) caught 
my eye, so I had a go at implementing a basic version of MERGE-on-INSERT in 
pl/pgsql.  It is attached below, and any comments are welcome.  I find it 
useful on status type tables, though it is not very nice when there are 
many clients (table locking to avoid race conditions).

Hope someone will find it useful!

-miker









-- 
-- Merge on INSERT functionallity for Postgres 7.3+
-- 
-- [EMAIL PROTECTED] / 5-14-04
-- 
-- CAVEAT EMPTOR: Uses table locks to avoid concurrency issues,
-- 		  so it WILL slow down heavily loaded tables.
-- 		  This effecivly puts the table into
-- 		  TRANSACTION ISOLATION LEVEL SERIALIZABLE mode.
-- 

CREATE OR REPLACE FUNCTION add_merge_on_insert (
	TEXT,  -- table name
	TEXT,  -- key column
	TEXT[] -- column list to update on deduplication
 ) RETURNS TEXT
   RETURNS NULL ON NULL INPUT
   SECURITY INVOKER
   LANGUAGE 'plpgsql'
   AS '

DECLARE
	tablename	ALIAS FOR $1;
	keycol		ALIAS FOR $2;
	updatecols	ALIAS FOR $3;
	trig		TEXT;
	arraydims	TEXT;

BEGIN
	trig := \'
		CREATE FUNCTION \' || tablename || \'_merge_on_insert_proc () RETURNS TRIGGER AS \'\'
		DECLARE
			orig \' || quote_ident(tablename) || \'%ROWTYPE;
		BEGIN
			LOCK TABLE \' || quote_ident(tablename) || \' IN ROW EXCLUSIVE MODE;

			SELECT INTO orig * FROM  \' || quote_ident(tablename) || \' WHERE \' || quote_ident(keycol)|| \' = NEW.\' || quote_ident(keycol) || \';

			IF NOT FOUND THEN
RETURN NEW;
			END IF;

			UPDATE \' || quote_ident(tablename) || \' SET \'; 

	arraydims := array_dims(updatecols);
	FOR i IN 1 .. (substring(arraydims from (position(\':\' in arraydims) + 1 ) for ( position(\']\' in arraydims) - (position(\':\' in arraydims) + 1 ) )))::INT LOOP
		trig := trig || quote_ident(updatecols[i]) || \' = COALESCE( NEW.\' || quote_ident(updatecols[i]) || \', orig.\' || quote_ident(updatecols[i]) || \'), \';
	END LOOP;

	trig := substring( trig from 0 for (character_length(trig) - 1));

	trig := trig || \' WHERE \' || quote_ident(keycol)|| \' = NEW.\' || quote_ident(keycol) || \';
			RETURN NULL;
		END;
		\'\' LANGUAGE \'\'plpgsql\'\';
	\';

	EXECUTE trig;
	EXECUTE \'
		CREATE TRIGGER \' || tablename || \'_merge_on_insert_trig BEFORE INSERT
			ON \' || quote_ident(tablename) || \' FOR EACH ROW
			EXECUTE PROCEDURE \' || tablename || \'_merge_on_insert_proc ();
	\';

	RETURN \'FUNCTION \' || tablename || \'_merge_on_insert_proc (); TRIGGER \' || tablename || \'_merge_on_insert_trig;\';
END;

';

CREATE OR REPLACE FUNCTION remove_merge_on_insert (
	TEXT  -- table name
 ) RETURNS TEXT
   RETURNS NULL ON NULL INPUT
   SECURITY INVOKER
   LANGUAGE 'plpgsql'
   AS '

BEGIN
	EXECUTE \'DROP FUNCTION \' || $1 || \'_merge_on_insert_proc () CASCADE;\';
	RETURN \'FUNCTION \' || $1 || \'_merge_on_insert_proc (); TRIGGER \' || $1 || \'_merge_on_insert_trig;\';
END;

';


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

74 matches

Mail list logo