Re: [HACKERS] Invalidating dependent views and functions

2010-05-02 Thread Scott Bailey

Robert Haas wrote:

On Fri, Apr 30, 2010 at 3:33 AM, Scott Bailey arta...@comcast.net wrote:

Proposal: Add an invalid flag to pg_class. Invalid objects would be ignored
when doing dependency checks for DDL statements. And an exception would be
thrown when an invalid object is called.

This is similar to what Oracle does. And most Oracle tools have find and
compile invalid objects with a statement like:
ALTER VIEW foo RECOMPILE;
ALTER PACKAGE bar RECOMPILE BODY;


Keep in mind that our implementation is apparently quite different
from Oracle's.  Of course I have no idea what they do under the hood,
but we don't even store the original text of the view.  Instead, we
store a parsed version of the view text that refers to the target
objects logically rather than by name.  That has some advantages; for
example, you can rename a column in some other table that the view
uses, and nothing breaks.  You can rename a whole table that is used
by the view, and nothing breaks.  Even if we added storage for the
text of the view, recompiling it might result in some fairly
astonishing behavior - you might suddenly be referring to tables or
columns that were quite different from the ones you originally
targeted, if the old ones were renamed out of the way and new,
eponymous ones were added.

I'm familiar with the view-dependency-hell problem you mention, having
fought with it (succesfully, I'm pleased to say, using a big Perl
script to manage things - and also - obligatory dig here - to work
around our lack of support for CREATE IF NOT EXISTS) on many
occasions, but I don't have any brilliant ideas about how to solve it.
 I would like to eventually support ALTER VIEW ... DROP COLUMN; note
that we do now support ADDING columns to a view using CREATE OR
REPLACE as long as all the new ones are at the end.  But neither of
those things is going to help with a case like yours, when you want to
change the type of the column.  I'm not really sure what to do about
that case.

...Robert



I've been using the source in information_schema.views rather than
storing the original source. Oracle does indeed store the original
source code for your objects. I don't know what they use to recompile.
But my inclination is that they use the original source. If you alter a
table/column name I believe it will invalidate any dependent views which
will need manually edited before they will compile successfully.

As for Oracle's approach being stupid and not user friendly, OK, maybe 
they could automatically try to recompile. But even a manual process is 
better than no help at all.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Invalidating dependent views and functions

2010-04-30 Thread Scott Bailey
I've been working on a brute force method of saving view and function 
source when changes to an underlying object force that object to be 
dropped. But I think there is a way for Postgres to handle this that 
wouldn't be too hard to implement and would be extremely useful for us 
users.


Problem: We need to change the last_name column of the people table from 
varchar(30) to varchar(50). You issue the alter table command only to be 
reminded that the people table is quite popular. It will likely be used 
in dozens of views and many of those views will have dependent views, 
and lets say there are some dependent functions too. And you have to 
drop all of them if you want to alter your column. Once they are dropped 
you can alter your column and then start digging through your source 
code repository to rebuild all of those views and functions that you 
just dropped.


Proposal: Add an invalid flag to pg_class. Invalid objects would be 
ignored when doing dependency checks for DDL statements. And an 
exception would be thrown when an invalid object is called.


This is similar to what Oracle does. And most Oracle tools have find and 
compile invalid objects with a statement like:

ALTER VIEW foo RECOMPILE;
ALTER PACKAGE bar RECOMPILE BODY;

Oracle invalidates objects without warning. But maybe we could keep the 
current behavior and add an invalidate option.


ALTER TABLE people ALTER last_name VARCHAR(50);
-- Throw exception can not alter table with dependents

ALTER TABLE people ALTER last_name VARCHAR(50) INVALIDATE;
-- Alters column and invalidates any dependent objects

Is this a viable option?

Scott Bailey


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extended operator classes vs. type interfaces

2010-04-17 Thread Scott Bailey

Jeff Davis wrote:

On Fri, 2010-04-09 at 12:50 -0500, Kevin Grittner wrote:

I just thought that if you were adding more type information,
oriented aournd the types themselves rather than index AMs, some form
of inheritence might fit in gracefully.


There are already some specific proposals for inheritance in database
theory literature. For instance: Databases, Types, and the Relational
Model by C.J. Date addresses inheritance explicitly (and the appendices
have some interesting discussion).

I'm not sure how compatible it is with SQL, though; and I am not very
optimistic that we could accomplish such a restructuring of the type
system while maintaining a reasonable level of backwards compatibility.

Either way, I think it's a separate topic. Two types that are not
related by any subtype/supertype relationship (like strings and ints)
can conform to the same interface (total ordering); while the very same
type can conform to two different interfaces.

Regards,
Jeff Davis



Well I've been doing a lot of work with range abstract data types in 
Oracle lately. And I've got to say that the OO features in Oracle make 
it really nice. Of course its Oracle, so its like a half baked OO in 
cobol syntax, lol. But I for one think it would be great if Postgres had 
 object data types that had methods and could be subclassed.


For those not familiar with ADT's in Oracle, here's an example:

CREATE TYPE period AS OBJECT (
  beginningDATE,
  ending   DATE,
  CONSTRUCTOR FUNCTION period (
self IN OUT NOCOPY period,
beginning DATE,
ending DATE
  )  RETURN SELF AS RESULT,
  -- config functions
  MEMBER FUNCTION granule RETURN INTERVAL DAY TO SECOND,
  MEMBER FUNCTION def_inc RETURN NUMBER,
  MEMBER FUNCTION range_union(p2 period) RETURN period
  ...
) NOT FINAL;

CREATE TYPE date_range UNDER period (
  OVERRIDING MEMBER FUNCTION granule RETURN INTERVAL DAY TO SECOND,
  OVERRIDING MEMBER FUNCTION def_inc RETURN NUMBER,
  ...
);

Scott Bailey

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xpath improvement V2

2010-02-16 Thread Scott Bailey

Jan Urbański wrote:

Arie Bikker wrote:

Hi all,

I've combined the review suggestions of Jan Urbański, Scott Bailey, and
others.
This was a lot harder, then I had foreseen; and I took my time to do it
the right way (hope you agree!).


Hi,

I see the patch has been marked as Returned with Feedback on the 6th
of February, I assume on grounds of prolonged silence about it. I
confess it was partly my fault, because soon after posting the review I
suddenly had to focus on other things.

I won't be able to review the new version in the next few days, and this
commitfest is closing anyway... However I would hate to see that patch
just disappear, as I think it's useful and you obviously invested some
work in it.

At this stage I would suggest moving it to the first 9.1 commitfest,
since it's a nice feature, but not one we should burden the committers
with this late in the development cycle.

Arie, care to add that last version of the patch to the 2010-Next
commitfest?

Cheers,
Jan


I would agree w/ Jan about just adding to 9.1 except that it fixes 
several known bugs.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XQuery support

2010-02-16 Thread Scott Bailey

Andrew Dunstan wrote:



Matthias Brantner wrote:

I know this has been discussed several times and it seems the
conclusin was it's impossible if we would like to use existing XQuery
external modules (some are by license reasons and some are by
techinical reasons).

So it seems the only way to support XQuery is, developing our own
XQuery functionality from scratch. I'm wondering if other people reach
the same conclusion as me, or is it a totaly impossible project?
  


 
Well xquilla on top of xerces should get us pretty much were we need 
to go. They are both under the Apache 2 license, which I believe 
would be ok. But they are C++ so I don't know if we could use them.
I found Zorba, its C++ and Apache 2. But I don't know much about it. 
But it looks promising.

http://www.zorba-xquery.com/

Sounds like a great project.  Please, let us know 
(zorba-us...@lists.sourceforge.net) if you need any help or have 
questions with regard to plugging Zorba to PostGreSQL.  We would love 
to see this happen and help.



  


Well, maybe you can answer the questions I had last time I looked at it, 
namely:


XQuery is a whole other question. Adding another library dependency is 
something we try to avoid. Zorba http://www.zorba-xquery.com/ might 
work, but it appears to have its own impressive list of dependencies 
(why does it require both libxml2 and xerces-c? That looks a bit 
redundant.)


Even if we did implement XMLTABLE, I think I'd probably be inclined to 
start by limiting it to plain XPath, without the FLWOR stuff. I think 
that would satisfy the vast majority of needs, although you might feel 
differently. (Do a Google for XMLTABLE - every example I found uses 
plain XPath expressions.)


A much more urgent need we have, IMNSHO, than for XQuery, is for XSLT 
support that actually works. But maybe that's just me.


I've been playing around with Zorba and xQilla today. I couldn't 
actually get Zorba to build on my system. It looks like the two share a 
common (or similar) C API though. Zorba also had perl, Python, Ruby and 
PHP bindings (which I also couldn't get to work) but would be nice to be 
able to use from plperl, plpython, plruby also.


If it were possible to drop the xerces dependency from Zorba and use 
libxml2 for parsing, that would be a strong case for it. The other big 
dependency for Zorba was ICU for converting character sets. I'm not 
familiar with that part of Postgres. But it seems to me that we would 
have our own libraries for that sort of thing.


I'm not a big fan of doing XSLT in the database. But xQilla would give 
you XSLT 2.0 support.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xpath improvement suggestion

2010-01-28 Thread Scott Bailey

Robert Haas wrote:

On Sun, Jan 17, 2010 at 11:33 AM, Jan Urbański wulc...@wulczer.org wrote:

[ detailed review ]


Arie,

Are you planning to submit an updated patch? If so, please do so soon.

Thanks,

...Robert


What is the time limit on this? I've been testing Arie's patch and I 
want to see it get in. I can make the changes Jan requested if Arie 
doesn't. How long should I give him?


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XQuery support

2010-01-28 Thread Scott Bailey

Tatsuo Ishii wrote:

Hi,

I know this has been discussed several times and it seems the
conclusin was it's impossible if we would like to use existing XQuery
external modules (some are by license reasons and some are by
techinical reasons).

So it seems the only way to support XQuery is, developing our own
XQuery functionality from scratch. I'm wondering if other people reach
the same conclusion as me, or is it a totaly impossible project?
--
Tatsuo Ishii
SRA OSS, Inc. Japan



Well xquilla on top of xerces should get us pretty much were we need to 
go. They are both under the Apache 2 license, which I believe would be 
ok. But they are C++ so I don't know if we could use them.


I found Zorba, its C++ and Apache 2. But I don't know much about it. But 
it looks promising.

http://www.zorba-xquery.com/

Sedna is a complete db/xquery solution licensed under Apache 2 and 
written in C/C++. Perhaps we could use pieces of this. Better than 
starting from scratch.

http://modis.ispras.ru/sedna/index.html

Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: listagg aggregate

2010-01-24 Thread Scott Bailey



I don't think. When we have function, with same parameters, same
behave like some Oracle function, then I am strongly prefer Oracle
name. I don't see any benefit from different name. It can only confuse
developers and add the trable to people who porting applications.


Meh. If the name is terrible, we don't have to use it, and it's easy enough to 
create an alias in SQL for those who need it.


The corresponding function in Oracle is called wm_concat. In MySQL its 
called group_concat. I don't believe DB2 or SQL Server have built in 
equivalents. The Oracle name isn't really an option (wm' stands for 
workspace manager)


I think listagg or string_agg would be the most appropriate names. Oh 
and before Oracle had wm_concat, Tom Kyte wrote a function called stragg 
that was pretty popular.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xpath improvement suggestion

2010-01-12 Thread Scott Bailey

Arie Bikker wrote:

Peter Eisentraut wrote:

On ons, 2010-01-06 at 23:46 +0100, Arie Bikker wrote:
 

Hope this is the right attachement type (I'm new at this)
BTW. here a some nice examples:

- Get the number of attributes of the first childnode:

select ( xpath('count(@*)',(xpath('*[1]','a b=cd e=f 
g=j//a'))[1]))[1];


- an alternative for xpath_exist('/a/d')
select (xpath('boolean(/a/d)','a b=cd e=f g=j//a'))[1];

- fixes bug 4206

select xpath('//text()',xmlparse(document '?xml 
version=1.0?elem1elem2one/elem2elem2two/elem2elem2three/elem2elem3att=2//elem1')); 



- fixes bug 4294

select xpath('name(/my:a/*[last()])', 'a 
xmlns=http://myns.com/ns;btext1/bctext2/c/a', 
ARRAY[ARRAY['my','http://myns.com/ns']]); 


Instead of converting everything to text, there have been previous
suggestions to add functionx like xpath_string, xpath_number,
xpath_boolean that return the appropriate types from xpath.  This could
provide for better type safety and probably also more clarity.

In any case, please consider adding test cases like the above to the
regression tests in whatever patch comes out at the end.

  
As an addition these xpath_sometype functions have been mentioned and 
can be handy. But, considering that the xpath function itself is a 
generalized function, the user of this function might not have 
beforehand knowledge of the type of the result; the first argument of 
the call could be used in a dynamic fashion.
Comming back to the xpath_sometype functions - would these definitions 
be suitable?


boolean xpath_boolean(xpath, xml [, nsarray])
text xpath_string(xpath, xml [, nsarray])
int xpath_number(xpath, xml [,nsarray])

implementation can be done via an xpath_nonnode function defined as:
text xpath_nonnode(xpath, xml [,nsarray])
where each of the xpath_sometype functions simply interpret the text as 
its target type.

Is this the way to go?

kind regards,  Arie Bikker


Postgres' type system is MUCH more robust than anything in XPath/XML. 
And folks who use XML on a regular basis expect most XPath expressions 
to return a string any way.


For instance how many confused users do you think you'll get with 
something like:

SELECT xpath_boolean('boolean(/root/@bar)', 'root bar=false/)
-- evaluates to true

or

SELECT xpath_number('/root/@foo', 'root foo=42/')
--xpath will return the string '42' not a number unless you do something 
like:

SELECT xpath_number('number(/root/@foo)', 'root foo=42/')

I think we'd be much better of having a function like xpath_nonnode() or 
xpath_value() that returns text and let the user handle the casting.



Scott Bailey


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xpath improvement suggestion

2010-01-06 Thread Scott Bailey

Arie Bikker wrote:
Sorry for the previous NUUUB post, didn't now the mailing list doesn't 
support html ;(

Robert Haas wrote:

On Tue, Jan 5, 2010 at 6:09 PM, Arie Bikker a...@abikker.nl wrote:
 

Hi all,

Well I had  to burn some midnight oil trying to figure out why a 
construct

like
SELECT xpath('name()','a/');
doesn't give the expected result. Kept getting an empty array:
 xpath
-
{}
instead of the expected {a}
BugID 4294 and the TODO item better handling of XPath data types 
pointed

in the right direction.
whithin src/backend/utils/adt/xml.c in the function xpath the result 
of the
call to xmlXPathCompiledEval is not handled optimally. In fact, the 
result

is assumed to be a nodeset without consulting the -type member of the
result. I've made some minor changes to xml.c to handle some non-nodeset
results of xmlXPathCompiledEval.
Essentially, the revised code makes an array of all the nodes in the
xpathobj result in case this is a nodeset, or an array with a single 
element
in case the reult is a number/string/boolean. The problem cases 
mentioned in
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php now 
work

as expected.
Revision of the code involves:
- A switch statement to handle the result type of xmlXPathCompiledEval.
- an additional function xmlpathobjtoxmltype.

diff of the revisioned code with respect to original is in attached 
file.


kind regards, Arie Bikker



Hi,

Could you please resend this as a context diff and add it to our patch
management application?

http://wiki.postgresql.org/wiki/Submitting_a_Patch
https://commitfest.postgresql.org/action/commitfest_view/open

Thanks!

...Robert
  

Hope this is the right attachement type (I'm new at this)
BTW. here a some nice examples:

- Get the number of attributes of the first childnode:

select ( xpath('count(@*)',(xpath('*[1]','a b=cd e=f 
g=j//a'))[1]))[1];


- an alternative for xpath_exist('/a/d')
select (xpath('boolean(/a/d)','a b=cd e=f g=j//a'))[1];

- fixes bug 4206

select xpath('//text()',xmlparse(document '?xml 
version=1.0?elem1elem2one/elem2elem2two/elem2elem2three/elem2elem3att=2//elem1')); 



- fixes bug 4294

select xpath('name(/my:a/*[last()])', 'a 
xmlns=http://myns.com/ns;btext1/bctext2/c/a', 
ARRAY[ARRAY['my','http://myns.com/ns']]);


Awesome! This really helps.

Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: XML helper functions

2010-01-05 Thread Scott Bailey
One of the problem with shredding XML is that it is very kludgy to get a 
scalar value back from xpath. The xpath function always returns an array 
of XML. So for example, to extract a numeric value you need to:

1) use xpath to get the node
2) get the first element of the XML array
3) cast that to varchar
4) cast that to numeric

So I wrote the following function:

CREATE OR REPLACE FUNCTION xmlvalue(
   VARCHAR,
   XML
) RETURNS TEXT AS
$$
   SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$'
   THEN (xpath($1, $2))[1]
   WHEN $1 ~* '/text()$'
   THEN (xpath($1, $2))[1]
   WHEN $1 LIKE '%/'
   THEN (xpath($1 || 'text()', $2))[1]
   ELSE (xpath($1 || '/text()', $2))[1]
   END::text;
$$ LANGUAGE 'sql' IMMUTABLE;

It's pretty simple. It just does a check to see if you are extracting an 
attribute or an element and if element, it makes sure to get the text value.


So query that used to look like:

SELECT CAST(
  CAST(
(xpath('/foo/bar/text()', myxml))[1]
  AS varchar)
AS numeric) AS bar

now becomes:

SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar


Second function just checks that the xpath expression finds at least one 
node.


CREATE OR REPLACE FUNCTION xmlexists(
 VARCHAR,
 XML
) RETURNS BOOLEAN AS
$$
 SELECT CASE WHEN array_upper(xpath($1, $2), 1)  0
 THEN true ELSE false END;
$$ LANGUAGE 'sql' IMMUTABLE;

On naming, SQL/XML specifies xmlexists and xmlcast. Latest db2 provides 
xmlcast(), Oracle has equivalent extractvalue() function, MSSQL uses 
xml.value(). The xmlvalue does only part of what is required by xmlcast 
(it won't cast scalar to xml).


So would these functions need to be rewritten in c in order to be accepted?

Regards,

Scott Bailey


Further reading:

http://scottrbailey.wordpress.com/2009/06/19/xml-parsing-postgres/
http://en.wikibooks.org/wiki/SQL_Dialects_Reference/SQL_XML
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions052.htm#i1131042
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0023486.htm

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: XML helper functions

2010-01-05 Thread Scott Bailey

Merlin Moncure wrote:

On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey arta...@comcast.net wrote:

One of the problem with shredding XML is that it is very kludgy to get a
scalar value back from xpath. The xpath function always returns an array of
XML. So for example, to extract a numeric value you need to:
1) use xpath to get the node
2) get the first element of the XML array
3) cast that to varchar
4) cast that to numeric


I just happen to be dealing with XML right now as well and my initial
thought is that your suggestion doesn't buy you a whole lot: the root
problem IMO is not dealing with what xpath gives you but that there is
no DOMish representation of the xml document for you to query.  You
have to continually feed the entire document to xpath which is
absolutely not scalable (if it works the way I think it does --
haven't looked at the code).


No typically you'll only be passing the xml for a single row so what 
we end up doing in Postgres typically looks something like this:


SELECT xmlvalue('/row/@id', bitesizexml)::int AS id,
  xmlvalue('/row/@lat', bitesizexml)::numeric AS lat,
  xmlvalue('/row/@lon', bitesizexml)::numeric,
  xmlvalue('/row/comment', bitesizexml) AS cmt
FROM (
  SELECT unnest(xpath('/foo/row', mybigxmldoc)) AS bitesizexml
) sub

So only the one call has to work with the entire document. All the calls 
to xmlvalue are passed a much smaller node to work with.



 xpath is great for simple things but it's too texty and you need a
 more robust API to handle documents for serious parsing on the
 backend.  In the short term i'd advise doing work in another pl like
 perl.

This is basically the method used for Oracle too until they provided 
XMLTable functionality. They had a function xmlsequence that basically 
did the unnest(xpath()) part. Hopefully we'll get xmltable support soon.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: XML helper functions

2010-01-05 Thread Scott Bailey

Pavel Stehule wrote:

2010/1/5 Scott Bailey arta...@comcast.net:

One of the problem with shredding XML is that it is very kludgy to get a
scalar value back from xpath. The xpath function always returns an array of
XML. So for example, to extract a numeric value you need to:
1) use xpath to get the node
2) get the first element of the XML array
3) cast that to varchar
4) cast that to numeric

So I wrote the following function:

CREATE OR REPLACE FUNCTION xmlvalue(
  VARCHAR,
  XML
) RETURNS TEXT AS
$$
  SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$'
  THEN (xpath($1, $2))[1]
  WHEN $1 ~* '/text()$'
  THEN (xpath($1, $2))[1]
  WHEN $1 LIKE '%/'
  THEN (xpath($1 || 'text()', $2))[1]
  ELSE (xpath($1 || '/text()', $2))[1]
  END::text;
$$ LANGUAGE 'sql' IMMUTABLE;

It's pretty simple. It just does a check to see if you are extracting an
attribute or an element and if element, it makes sure to get the text value.

So query that used to look like:

SELECT CAST(
 CAST(
   (xpath('/foo/bar/text()', myxml))[1]
 AS varchar)
AS numeric) AS bar

now becomes:

SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar


Second function just checks that the xpath expression finds at least one
node.

CREATE OR REPLACE FUNCTION xmlexists(
 VARCHAR,
 XML
) RETURNS BOOLEAN AS
$$
 SELECT CASE WHEN array_upper(xpath($1, $2), 1)  0
 THEN true ELSE false END;
$$ LANGUAGE 'sql' IMMUTABLE;

On naming, SQL/XML specifies xmlexists and xmlcast.


I am for SQL/XML naming convention.


Well I'm shying away from the name xmlcast because it is supposed to 
cast xml to scalar, scalar to xml and xml to xml. For instance these 
would all work on db2.


SELECT xmlcast(null AS XML),
xmlcast(bar AS XML),
xmlcast(xmlquery('$x/baz/bar' PASSING foo.barxml AS x) AS VARCHAR(30))
FROM foo

But I just found that Oracle added xmlcast in 11g and it only does xml 
to scalar (and only number, varchar2 and date/time types). So maybe 
you're right.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: XML helper functions

2010-01-05 Thread Scott Bailey

Peter Eisentraut wrote:

On tis, 2010-01-05 at 10:14 -0800, Scott Bailey wrote:
One of the problem with shredding XML is that it is very kludgy to get a 
scalar value back from xpath. The xpath function always returns an array 
of XML. So for example, to extract a numeric value you need to:

1) use xpath to get the node
2) get the first element of the XML array
3) cast that to varchar
4) cast that to numeric


There has been talk about adding something like xpath_string,
xpath_number, xpath_boolean for fetching xpath expressions that don't
return nodesets.  I think that would fit your use case.


The first two sound very much like what I'm looking for. I'm unsure 
about the third. Is it's purpose to extract the scalar value of an 
expression and cast to bool as the other two do, or is it to identify if 
the xpath expression returned any nodes like xmlexists?


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xpath improvement suggestion

2010-01-05 Thread Scott Bailey

Arie Bikker wrote:

Hi all,

Well I had  to burn some midnight oil trying to figure out why a 
construct like

SELECT xpath('name()','a/');
doesn't give the expected result. Kept getting an empty array:
  xpath
-
{}
instead of the expected {a}
BugID 4294 and the TODO item better handling of XPath data types 
pointed in the right direction.
whithin src/backend/utils/adt/xml.c in the function xpath the result of 
the call to xmlXPathCompiledEval is not handled optimally. In fact, the 
result is assumed to be a nodeset without consulting the -type member 
of the result. I've made some minor changes to xml.c to handle some 
non-nodeset results of xmlXPathCompiledEval.
Essentially, the revised code makes an array of all the nodes in the 
xpathobj result in case this is a nodeset, or an array with a single 
element in case the reult is a number/string/boolean. The problem cases 
mentioned in 
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php now 
work as expected.

Revision of the code involves:
- A switch statement to handle the result type of xmlXPathCompiledEval.
- an additional function xmlpathobjtoxmltype.

diff of the revisioned code with respect to original is in attached file.

kind regards, Arie Bikker


Well that's interesting. I was getting ready to dig into libxml2 to see 
what it would take to return scalar values from xpath expressions. Thanks.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-17 Thread Scott Bailey

Tom Lane wrote:

Dimitri Fontaine dfonta...@hi-media.com writes:

Tom Lane t...@sss.pgh.pa.us writes:

Hm, how would you do it with LATERAL?  The problem is not so much
composition as the need for a variable number of rounds of
composition.



Let's have a try at it:



select p2_member, array_accum(p1)
  from unnest(p2) as p2_member
   lateral (select period_except(p1_member, p2_member)
  from unnest(p1) p1_member) as x(p1);


I don't think that does it.  Maybe I misunderstand LATERAL, but what
that looks like to me is that each p1 will be separately filtered by
each p2, giving rise to a distinct element in the output.  What we
need is for each p1 to be filtered by *all* p2's, successively
(though in any order).

regards, tom lane


That approach will only work if you coalesce your inputs into 
non-contiguous sets (NCS) first. Overlapping ranges would break it in a 
hurry. In addition to two coalesce operations, period_except would be 
calculated 1000x for a pair of 100 element arrays. Original solution, 
while not short was probably a little more elegant than Tom gave credit 
for. In a single pass it pulls out only the data points needed to build 
the resultant NCS without making assumptions that the inputs were coalesced.


I think I'll still be able to do a single pass solution for continuous 
ranges. I just wont be able to do the coalesce operations inline with 
the set operations.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idea - new aggregates median, listagg

2009-12-16 Thread Scott Bailey

Thom Brown wrote:
2009/12/15 Pavel Stehule pavel.steh...@gmail.com 
mailto:pavel.steh...@gmail.com


Hello

I am looking on new feature - ORDER clause in aggregate, and I thing,
so we are able to effectively implement some non standard, but well
known aggregates.

a) function median - it is relative frequent request - with usually
slow implementation

b) function listagg (it is analogy of group_concat from MySQL) - it
should simplify report generating and some other

What is your opinion? Do you like to see these functions in core?



I'm probably missing the point here, but when I originally saw MySQL's 
group_concat function, I found it odd that it featured ordering 
functionality.  Shouldn't the order by determined by the query itself?  
Otherwise it's almost as if its separating the relationship between the 
result column and the resultset.


For xmlagg in particular, it is quite useful to be able order the 
results. And sorting the query doesn't work for ordering the agg unless 
you do it in a subquery. Oracle has this functionality and it is quite 
handy. It would be nice to see listagg with the option to order as well.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-16 Thread Scott Bailey

Jeff Davis wrote:

On Sun, 2009-12-13 at 23:49 -0800, Scott Bailey wrote:
So basically I have an anyrange pseudo type with the functions prev, 
next, last, etc defined. So instead of hard coding range types, we would 
allow the user to define their own range types. Basically if we are able 
to determine the previous and next values of the base types we'd be able 
to define a range type. I'm envisioning in a manner much like defining 
an enum type.


After an off-list discussion with Scott, I think there may be a solution
here that works for everyone if we don't try so hard to unify the
implementation of discrete and continuous ranges. The API should be very
similar, of course, but the implementation doesn't need to be.

Continuous ranges absolutely require the following information: start,
end, and inclusivity information.

But discrete ranges can instead be stored by counting the number of
granules from the start point. For instance, it could be stored as:
start, num_granules.

That has a lot of benefits for discrete ranges of time. First of all, it
allows the algebra to work reasonably well for the days and months
part of the interval, so we can allow a granule of 1 day/week/month/year
for a timestamp range. For output of the range, we can then just
multiply the granule by the number of granules, and add that to the
start time; thus avoiding the incremental addition problem with date
math. I think this works reasonably well for timestamp/date ranges --
let me know if there is a problem here (aside from timestamptz, which I
address below).

Secondly, in the case of a timestamp range, we can use 7 bytes for
storing the number of granules rather than another full 8-byte
timestamp, leaving one byte for flags to represent NULL boundaries,
infinite boundaries, etc. For timestamps that would still mean that an
interval could be 2000 years long with '1 microsecond' granularity. For
dates, 3 bytes is sufficient for a date range 45000 years long with
granules of '1 day'. That means that we can get back down to a 16 byte
representation for timestamp ranges, or 8 byte representation for date
ranges. There are a few details, like infinite ranges, but those can be
pretty easily solved with flags as well.

There's one problem, and that's for timestamptz ranges with intervals
that include days and months. Timezone adjustments are just not
well-defined for that kind of granule (nor would it be particularly
useful even if it magically worked), so this would have to be blocked
somehow. I think that's a special case, and we could provide the user
with a nice error message telling the user to use a date or timestamp
range instead.

So, the idea is to default to a continuous range type, but if the user
supplies a granule, prior and next functions, and other necessary
details, then it becomes a discrete range type.

 * continuous ranges can still have everything that everyone wants, 
   including flags to indicate special values.
 * discrete range granule is specified explicitly, so it's not an 
   implementation detail

 * discrete ranges can have a compact representation
 * discrete ranges would still have room for flags to indicate special 
   values
 
Comments?


As I pointed out off-list, I think the granularity for timestamp range 
should be limited to hours and smaller. Anything larger is asking for 
trouble. And quite honestly if they wanted day granularity, they should 
use date range. Also, I think the granule should be same type as 
returned when subtracting two subtypes. So granule of date range should 
be int not interval. And if user wanted something with month 
granularity, perhaps an enum range of 'MM' would be better.


Quite honestly the following 3 cases would probably meet 99% of need:
CREATE TYPE period AS RANGE(timestamptz(0), interval '1 s');
CREATE TYPE period AS RANGE(timestamptz(3), interval '1 ms');
CREATE TYPE period AS RANGE(timestamptz, interval '1 us');

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-16 Thread Scott Bailey

Tom Lane wrote:

Jeff Davis pg...@j-davis.com writes:

[ hacky special-case representation for discrete timestamp ranges ]


I'm still not exactly clear on what the use-case is for discrete
timestamp ranges, and I wonder how many people are going to be happy
with a representation that can't handle a range that's open-ended
on the left.


They wouldn't. But the timestamp data would be the anchor, not 
necessarily the start point. As long as we ranges unbounded on both ends 
we'd be ok.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-16 Thread Scott Bailey
Ok, silly question here. But how do you determine the length of a 
continuous range? By definition length of [a, b) and (a, b] = b-a. But 
what about (a,b) and [a,b]? Are we saying that because they are 
continuous, the difference between values included in the range and 
those excluded are so infinitesimally small so as not to matter? Thus 
length (a,b) == length [a,b] == length [a,b)? And if that is the case, 
does the inclusiveness of the range really even matter?


And can anyone point me to a reference for working with continuous 
ranges? Google just insists that I spelled contiguous wrong.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-16 Thread Scott Bailey

Tom Lane wrote:

Martijn van Oosterhout klep...@svana.org writes:

However, it does seem reasonable to allow people to restrict, either by
typmod or a check constraint the kinds of values that can be stored in
a particular column. Then an application can decide which way they want
their intervals to work and have the database enforce it.


Sure --- the range datatype should absolutely provide inquiry functions
that let you determine all the properties of a range, so something like
CHECK (is_open_on_right(col)) would work for that.  I'm of the opinion
that we must not usurp typmod for range behavior --- the right thing is
to pass that through to the contained type, just as we do with arrays.

(Note that a range over timestamp(0) would eliminate at least some of
the platform dependencies we've been arguing about.  I'm still quite
dubious that next timestamp is anything except evidence that you've
misformulated your problem, though.)

regards, tom lane


Well our work is based on over 15 years of temporal research (not by us) 
and numerous books from Snodgrass, Date and Celko; as well as partial 
implementations in other databases. So its not like we took a blue pill 
this weekend and woke up with this hair-brained idea.


I understand your concern. But I think the objections are based more on 
implementation details with float timestamp rather than conceptually.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-15 Thread Scott Bailey

Jeff Davis wrote:

On Tue, 2009-12-15 at 10:19 -0500, Tom Lane wrote:

I'm not sure that anyone has argued that.  I did suggest that there
might be a small list of types for which we should provide discrete
behavior (ie, with next/previous functions) and the rest could have
continuous behavior (without that assumption).  But I quite agree
that we want both types of ranges.


It seems like we're moving toward treating TIMESTAMP as continuous.

If I'm correct, continuous ranges always need two extra bits of storage
for the exclusivity. But for timestamps, that means 16 bytes (2 x 8-byte
timestamp) turns into 17 bytes, which is really more like 20 or 24 bytes
with alignment.

Considering that these are likely to be used for audit or history
tables, 8 bytes of waste (50%) seems excessive -- especially when
treating them as discrete seems to work pretty well, at least for the
int64 timestamps.


Would it be OK if we handled float timestamp ranges as continuous and 
int64 timestamps discrete? You effectively lose the ability to build 
non-contiguous sets with continuous ranges. Which is integral to the 
work I'm doing (union, intersect, coalesce and minus sets of ranges)


As for the extra bits, would it be better to just require continuous 
ranges to be either [] or [)? But I don't know which would be preferred. 
My inclination would be toward [), but Tom seemed to indicate that 
perhaps [] was the norm.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-15 Thread Scott Bailey

David Fetter wrote:

On Tue, Dec 15, 2009 at 11:31:05AM -0800, Scott Bailey wrote:

Jeff Davis wrote:

On Tue, 2009-12-15 at 10:19 -0500, Tom Lane wrote:

Would it be OK if we handled float timestamp ranges as continuous
and int64 timestamps discrete?


That sounds like a recipe for disaster.  Whatever timestamp ranges
are, float and int64 should be treated the same way so as not to get
surprises due to implementation details.


You effectively lose the ability to build non-contiguous sets with
continuous ranges. Which is integral to the work I'm doing (union,
intersect, coalesce and minus sets of ranges)

As for the extra bits, would it be better to just require continuous
ranges to be either [] or [)? But I don't know which would be
preferred. My inclination would be toward [), but Tom seemed to
indicate that perhaps [] was the norm.


[] makes certain operations--namely the important ones in
calendaring--impossible, or at least incredibly kludgy, to do.  I
think we ought to leave openness at each end up to the user,
independent of the underlying implementation details.

FWIW, I think it would be a good idea to treat timestamps as
continuous in all cases.


Ok, let me give an example of what we can do with the current 
implementations that would not be possible with timestamps if we 
implement as suggested. Jeff's implementation uses a 1 microsecond step 
size or granule. And my implementation uses an interval step size and 
can be configured database wide, but default is 1 second.


The function below takes two period arrays that can have overlapping and 
adjacent elements. It subtracts all values in pa1 that intersect with 
values in pa2. So perhaps pa1 is all of your work shifts for the month 
and pa2 is a combination of your leave and holidays. The result is a 
coalesced non-contiguous set of the times you would actually be working. 
But to do this kind of thing you need to be able to determine prior, 
first, last and next. I need an implementation that can do this for 
timestamps and not just ints and dates.


CREATE OR REPLACE FUNCTION period_minus(
   pa1  IN period[],
   pa2  IN period[]
) RETURNS period[] AS
$$
SELECT array_agg(prd)
FROM (
SELECT period((t_in).start_time,
MIN((t_out).end_time)) AS prd
FROM (
SELECT DISTINCT first(p) AS start_time
FROM unnest($1) p
WHERE NOT contains($2, first(p))
AND NOT contains($1, prior(p))

UNION

SELECT DISTINCT next(p)
FROM unnest($2) p
WHERE contains($1, next(p))
AND NOT contains($2, next(p))
) t_in
JOIN (
SELECT next(p) AS end_time
FROM unnest($1) p
WHERE NOT contains($1, next(p))

UNION ALL

SELECT first(p)
FROM unnest($2) p
WHERE contains($1, first(p))
  AND NOT contains($2, prior(p))
) t_out ON t_in.start_time  t_out.end_time
GROUP BY t_in.start_time
ORDER BY t_in.start_time
) sub;
$$ LANGUAGE 'sql' IMMUTABLE STRICT;

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-15 Thread Scott Bailey

Tom Lane wrote:

Jeff Davis pg...@j-davis.com writes:

On Tue, 2009-12-15 at 11:49 -0800, David Fetter wrote:

FWIW, I think it would be a good idea to treat timestamps as
continuous in all cases.



I disagree. There is a lot of value in treating timestamp ranges as
discrete.



One big reason is that the ranges can be translated between the
different input/output forms, and there's a canonical form. As we know,
a huge amount of the value in an RDBMS is unifying data from multiple
applications with different conventions.


Actually, that is exactly one of the reasons why what you propose is
a *bad* idea.  You want to institutionalize application dependence on
a non-portable implementation detail, namely the granularity of machine
representation of what's in principle a continuous value.  That's one
of the fastest routes to non-unifiable data I can think of.


So, let's say one application uses (] and another uses [). If you are
mixing the data and returning it to the application, you want to be able
to provide the result according to its convention. You can't do that
with a continuous range.


The above is nonsense.  [1,2) and [1,2] are simply different objects.
A design that assumes that it is always possible to replace one by
the other is broken so badly it's not even worth discussing.


I don't hear anyone arguing that. But you should be able to convert 
between [1,2], [1,3), (0,3) and (0,2].



The only reason you'd have applications that fail to handle both open
and closed intervals would be if someone were to create an
implementation that didn't support both from the outset.  Which we
need not and should not do.


And things get more interesting: if you mix (] and [), then range_union
will produce () and range_intersect will produce []. So now you have all
four conventions floating around the same database.


Which is why it's a good idea to support all four...


I don't understand you then. Where do you suppose we would define the 
inclusiveness for the value? At the type level, at the column level, or 
at the value level? A design that allows values of different 
inclusiveness and offers no means to convert from one to another is 
worthless.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-15 Thread Scott Bailey

 If this were an amazingly

short and beautiful piece of code, it might support your argument,
but it's neither.


Well we can't all be arrogant brainiacs.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-15 Thread Scott Bailey

Tom Lane wrote:

I wrote:

The proposed problem is certainly soluble without any assumptions
of discreteness.


To be concrete, I think it could be approached like this:

Assume the datatype provides a built-in function

period_except(p1 period, p2 period) returns setof period

which can return zero, one, or two rows depending on the inputs:

no rows if p1 is completely contained in p2

one row if p1 partially overlaps p2, for example:

[1,4] except [3,5] returns [1,3)
[4,6] except [1,5) returns [5,6]

two rows if p1 properly contains p2, for example

[1,10] except [4,5] returns [1,4) and (5,10]
[1,10] except [9,10) returns [1,9) and [10,10]

and of course just p1 if p1 and p2 don't overlap at all.

Given such a function it's a simple matter of successively removing each
element of p2[] from the set representing the current members of p1[].
The way that I'd find most natural to code that is a loop, along the
lines of

foreach p2_member in unnest(p2) loop
  p1 := array(select period_except(p1_member, p2_member)
  from unnest(p1) p1_member);
end loop;

But maybe it can be done in a single SQL command.

As this example makes clear, when dealing with continuous intervals you
*must* admit both open and closed intervals, else you don't have a way
to represent the results of except.  Maybe part of the failure to
communicate here arises from your desire to try to avoid supporting both
kinds of intervals.  But I think you really have to do it if you want to
deal with data that hasn't got any natural granularity.

regards, tom lane


Alright well I'm going to calm down a bit and take a step back. Perhaps 
I'm just too close to the issue and not thinking outside of the box that 
I've built. Let me see if I can make everything work rather than arguing 
why it wont.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-14 Thread Scott Bailey

Martijn van Oosterhout wrote:

On Sun, Dec 13, 2009 at 11:49:53PM -0800, Scott Bailey wrote:
So basically I have an anyrange pseudo type with the functions prev,  
next, last, etc defined. So instead of hard coding range types, we would  
allow the user to define their own range types. Basically if we are able  
to determine the previous and next values of the base types we'd be able  
to define a range type. I'm envisioning in a manner much like defining  
an enum type.


I find it odd that you could define functions next() and prev() since
that assumes some kind of dicretisation which simply does not exist for
most types I can think of.


Because intervals (mathematical not SQL) can be open or closed at each 
end point we need to know what the next an previous value would be at 
the specified granularity. And while you can do some operations without 
knowing this, there are many you can't. For instance you could not tell 
whether two [] or () ranges were adjacent, or be able to coalesce an 
array of ranges.



It would seem to me the real useful uses of ranges would be the
operations overlaps, disjoint, proceeds, follows, etc, which could all
be defined on any well-ordered type (wherever greater-than and
less-than are well defined). No need to discretise anything.

Do you have any actual usecase for a distretized range type for
timestamp?




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-14 Thread Scott Bailey

Tom Lane wrote:

Scott Bailey arta...@comcast.net writes:
Because intervals (mathematical not SQL) can be open or closed at each 
end point we need to know what the next an previous value would be at 
the specified granularity. And while you can do some operations without 
knowing this, there are many you can't. For instance you could not tell 
whether two [] or () ranges were adjacent, or be able to coalesce an 
array of ranges.


This statement seems to me to demonstrate that you don't actually
understand the concept of open and closed ranges.  It has nothing
whatsoever to do with assuming that the data type is discrete;
these concepts are perfectly well defined for the reals, for example.
What it is about is whether the inclusion conditions are  bound
or = bound.


I won't address how you draw your conclusions here. But I find it 
'interesting' that you assume that I don't know what I'm talking about 
rather than assume you don't fully understand what I'm talking about.


Anyhow. For any given range you may be 4 combinations of values. Either 
the first value included in the range '[' or the last value preceding 
the start of the range '('; and the last value included in the range ']' 
or the first value following the end of the range ')'. We aren't going 
to store all four data points so we need to normalize into the most 
common form, a half-open interval [) and store just those two values. 
The first value included in the range and the first value after the end 
of our range.


So lets say you are using a  numeric range to model the high and low 
values of stocks trading on a given day. Now imagine doing this with no 
concept of granularity. You will most likely be given a range [low, 
high] with inclusive end points. So how do you convert that to a 
closed-open interval so you can store it? Is 20.4201 the 
next value after 20.42? Probably not. You are going to want to define 
0.01 as the granularity for this (either type or column) so that 20.43 is.


Or again are the ranges [14.0, 22.0] and [22.1, 29.0] adjacent? Maybe, 
maybe not. There is no way to tell w/o knowing the granularity. Perhaps 
the granularity is 0.1 and there are a billion values that are 
not included. Or perhaps the granularity is 0.1 and the are adjacent.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-14 Thread Scott Bailey

Tom Lane wrote:

Scott Bailey arta...@comcast.net writes:
So basically I have an anyrange pseudo type with the functions prev, 
next, last, etc defined. So instead of hard coding range types, we would 
allow the user to define their own range types. Basically if we are able 
to determine the previous and next values of the base types we'd be able 
to define a range type. I'm envisioning in a manner much like defining 
an enum type.


I think array types, not enums, would be a better model.


I was referring to the syntax for how the user actually defined an enum 
not about it's implementation. Basically what I was hoping to get out of 
this thread was whether it was better to allow the user to define their 
own range types by specifying the base type and possibly the granularity 
 and default inclusiveness of the end points, or if we should just 
provide the types like period and intrange?


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-14 Thread Scott Bailey

Tom Lane wrote:

Jeff Davis pg...@j-davis.com writes:

On Mon, 2009-12-14 at 14:23 -0500, Tom Lane wrote:

I'd prefer not to leave it to the user to decide whether a type is
discrete or not.


I don't know how we can decide such a thing. Do you have any ideas? 


If the only interesting use-cases are ints and enums, maybe we could
just hard-wire it.


I think dates could be added to that list as well. But any 
implementation that doesn't do ranges of timestamptz are non-starters as 
far as I'm concerned. Certainly int64 timestamps and numeric are doable. 
And Jeff's period implementation supports float timestamps. I never use 
float timestamps so I can only assume that he made it work.


Personally, I'd rather just see float timestamps go away. And if the 
range types never supported float or float timestamps, I'd be ok with that.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Range types

2009-12-13 Thread Scott Bailey
I had proposed a temporal contrib module earlier and you wanted to see 
support for many range types not just timestamptz. So I had an idea on 
how to implement this but I want to see if you guys thought it was a 
viable idea.


So basically I have an anyrange pseudo type with the functions prev, 
next, last, etc defined. So instead of hard coding range types, we would 
allow the user to define their own range types. Basically if we are able 
to determine the previous and next values of the base types we'd be able 
to define a range type. I'm envisioning in a manner much like defining 
an enum type.


CREATE TYPE periodtz AS RANGE (timestamptz, '1 microsecond'::interval);
CREATE TYPE numrange AS RANGE (numeric(8,2));
-- determine granularity from typmod
CREATE TYPE floatrange AS RANGE (float, '0.1'::float);

Or getting really crazy...
CREATE TYPE terms AS ENUM ('2000_F', '2000_W', '2000_S', '2000_Su'...
  '2010_F', '2010_W', '2010_S', '2010_Su');
CREATE TYPE termrange AS RANGE (terms);

So basically I have a pg_range table to store the base typeid, a text 
field for the granule value and the granule typeid.


I doubt we would be able to get this in for the 8.5 release, especially 
since I'm still learning C and the Postgres internals. Jeff Davis is 
going to get something in before the next commit fest so we'll have some 
type of temporal/range support. But we wanted to see what direction the 
community felt we should go.


Scott Bailey

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xpath_table equivalent

2009-11-18 Thread Scott Bailey

Andrew Dunstan wrote:




I've been reading over the documentation to find an alternative to 
the deprecated xpath_table functionality. I think it may be a 
possibility but I'm not seeing a clear alternative.


Thanks,

Chris Graner


The standard is XMLTABLE and is implemented by both db2 and oracle but 
is on our list of unimplemented features. I would love to see this 
implemented in Postgres. I recall it coming up here before. But I 
don't think it went beyond discussing which xquery library we could use.





Yes, Chris spoke to me about this last night and emailed me an example 
of what he needs today, and I've spent the couple of hours thinking 
about it. Not have a nice way of getting a recordset out of a piece of 
XML is actually quite a gap in our API.


The trouble is that XMLTABLE is a horrible grammatical mess, ISTM, and I 
don't much like the way xpath_table() works either. Passing a table name 
as text into a function is rather ugly.


I think we could do with a much simple, albeit non-standard, API. 
Something like:


   xpathtable(source xml, rootnodes text, leaves variadic  text[]) 
returns setof record


But unless I'm mistaken we'd need the proposed LATERAL extension to make 
it iterate nicely over a table. Then we could possibly do something like:


   select x.bar, x.blurfl
   from
   foo f,
   lateral 
xpathtable(f.xmlfield,'//foo','bar','baz[1]/blurfl','@is-some-property')

   as x(bar int, blurfl text, xmlprop bool)
   where f.otherfield or x.xmlprop;

cheers

andrew


I agree that the syntax of XMLTABLE is odd. But not demonstrably worse 
than xpathtable. If we are going to exert effort on it, why not do it in 
a standards compliant way? Otherwise I'd suggest a stop gap of just 
adding some support functions to make it easier to extract a scalar 
value from a node. Something like what I did here.


http://scottrbailey.wordpress.com/2009/06/19/xml-parsing-postgres/

The nice thing about XMLTABLE is that it adds xquery support. I think 
the majority of xquery engines seem to be written in Java. XQuilla is 
C++. I'm not sure if our licensing is compatible, but it I would love 
the irony of using Berkeley DB XML (formerly Sleepycat) now that its 
owned by Oracle.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal - temporal contrib module

2009-10-29 Thread Scott Bailey
I would like to add a temporal contrib module. The most important piece 
would be adding a period data type and some support functions. Jeff 
Davis and I both have temporal projects on pgFoundry, and we've been 
collaborating for a while. But there are some areas we'd like to get 
some advice on.


Disk format - A period can be represented as [closed-closed], 
(open-open), [closed-open) or (open-closed] intervals. Right now we 
convert these to the most common form, closed-open and store as two 
timestamptz's.


Nulls - A common use case for periods is for modeling valid time. Often 
the end point is not known.  For instance, you know when an employee has 
been hired but the termination time typically wouldn't be known ahead of 
time. We can either represent these with a null end time or with 
infinity. But I'm not sure how to deal with them. Obviously we can test 
for containment and overlap. But what about length or set operations?


Non-contiguous Sets - A period defines a contiguous set of time. But 
many times we need to work with non-contiguous sets (work shifts in a 
week, bus schedules, etc).  Right now, I'm using period arrays. But 
period arrays can contain overlapping and adjacent periods. And we have 
no way to indicate that a period array has been coalesced into a 
non-contiguous set. And what indexing strategies could be used with 
non-contiguous sets?


Temporal Keys - We need two types of temporal keys. A primary key, 
exclusion type prevents overlap so someone isn't at two places at the 
same time. And a foreign key, inclusion type so we can check that the 
valid time of a child is contained with in the valid time of the parent. 
Jeff is working on the former, but there is no easy way to do the latter.



There is actually a lot of theory out there but very few 
implementations. Although not an official standard, we try to follow the 
TSQL2 spec pretty closely. Further reading:


Developing Time-Oriented Database Applications - Snodgrass
http://www.cs.arizona.edu/~rts/tdbbook.pdf

TSQL2 spec ftp://ftp.cs.arizona.edu/tsql/tsql2/spec.pdf

Temporal Data and the Relational Model - Date et al
http://books.google.com/books?isbn=1558608559

Dozens of publications
http://timecenter.cs.aau.dk/pub.htm


Regards,

Scott Bailey

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal - temporal contrib module

2009-10-29 Thread Scott Bailey



I would like to add a temporal contrib module. The most important piece
would be adding a period data type and some support functions. Jeff
Davis and I both have temporal projects on pgFoundry, and we've been
collaborating for a while.


I presume you're going to need some backend support and possibly new
syntax for some of the operations, right? That seems more urgent to
discuss than the possible inclusion into contrib.


Jeff Davis is already working on solving these issues for 8.5. But 
rather than wait until 8.6 or later to get a period data type added to 
core, I felt it was important to get the period type out in front of 
people to start using and testing. Plus we wanted to gauge interest from 
the community. Should we forge ahead and try to become the first general 
purpose database with support for temporal databases? Or should we wait 
another 20 years and see if an official specification materializes?



I'm very pleased to see people working on temporal issues, BTW! I used
to work on a database that did a lot of temporal operations, but the
DBMS didn't have any temporal data types or operations so we had to use
a lot of triggers etc. to achieve that, and it didn't perform well.


Nulls - A common use case for periods is for modeling valid time. Often
the end point is not known.  For instance, you know when an employee has
been hired but the termination time typically wouldn't be known ahead of
time. We can either represent these with a null end time or with
infinity. But I'm not sure how to deal with them. Obviously we can test
for containment and overlap. But what about length or set operations?


Hmm. Infinity feels like a better match. The behavior of length and set
operations falls out of that naturally. For example, length of a period
with an infinite beginning or end is infinite. For set operations, for
example the intersection of [123, infinity] and [100, 160] would be
[123, 160].


Two different answers from two respondents. And is there a conceptual 
difference between NULL and +/- infinity? Nothing lasts forever. So when 
would it make sense to use one verses the other? So in the example I gave



Non-contiguous Sets - A period defines a contiguous set of time. But
many times we need to work with non-contiguous sets (work shifts in a
week, bus schedules, etc).  Right now, I'm using period arrays. But
period arrays can contain overlapping and adjacent periods. And we have
no way to indicate that a period array has been coalesced into a
non-contiguous set. And what indexing strategies could be used with
non-contiguous sets?


I'd stick to your current definition that a period is a contiguous set
of time. A non-contiguous set consists of multiple contiguous periods,
so it can be represented as multiple rows in a table.


That's pretty much my sentiments exactly. But Jeff wanted to be sure 
that we didn't make a decision now that would limit it's usefulness later.



Temporal Keys - We need two types of temporal keys. A primary key,
exclusion type prevents overlap so someone isn't at two places at the
same time. And a foreign key, inclusion type so we can check that the
valid time of a child is contained with in the valid time of the parent.
Jeff is working on the former, but there is no easy way to do the latter.


I'm very excited about this. Foreign keys don't seem that hard, you'll
need foreign key triggers like we have today, but check for within
instead of equal.


Temporal Data and the Relational Model - Date et al
http://books.google.com/books?isbn=1558608559


+1 for the approach in this book. I'm not familiar enough with the TSQL2
spec to say whether it follows it.

It should also be kept in mind that although this class of problems are
generally thought of as temporal issues, IOW dealing with time, the same
approach works with ranges of integers or any other datatype with a
well-defined sort order. It would be nice if the temporal data type
would allow that too.


The period concept relates very closely to mathematical intervals. (In 
fact, I would argue that the SQL interval should actually be named 
period and the SQL period should be named interval so they matched their 
mathematical counterparts.) My primary concern is timestamp intervals, 
but I see no reason the exact same concepts wouldn't apply to intervals 
of integers, floats, dates, etc.


And actually there is a fair amount of overlap with spatial. The main 
difference being the number of dimensions. But the concepts of overlap, 
containment, and set operations like union and intersection are the same.


Scott Bailey

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xpath_table equivalent

2009-10-27 Thread Scott Bailey

Chris Graner wrote:

Hello,

I've been reading over the documentation to find an alternative to the 
deprecated xpath_table functionality. I think it may be a possibility 
but I'm not seeing a clear alternative.


Thanks,

Chris Graner


The standard is XMLTABLE and is implemented by both db2 and oracle but 
is on our list of unimplemented features. I would love to see this 
implemented in Postgres. I recall it coming up here before. But I don't 
think it went beyond discussing which xquery library we could use.


Scott Bailey

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers