Re: [HACKERS] Invalidating dependent views and functions
Robert Haas wrote: On Fri, Apr 30, 2010 at 3:33 AM, Scott Bailey arta...@comcast.net wrote: Proposal: Add an invalid flag to pg_class. Invalid objects would be ignored when doing dependency checks for DDL statements. And an exception would be thrown when an invalid object is called. This is similar to what Oracle does. And most Oracle tools have find and compile invalid objects with a statement like: ALTER VIEW foo RECOMPILE; ALTER PACKAGE bar RECOMPILE BODY; Keep in mind that our implementation is apparently quite different from Oracle's. Of course I have no idea what they do under the hood, but we don't even store the original text of the view. Instead, we store a parsed version of the view text that refers to the target objects logically rather than by name. That has some advantages; for example, you can rename a column in some other table that the view uses, and nothing breaks. You can rename a whole table that is used by the view, and nothing breaks. Even if we added storage for the text of the view, recompiling it might result in some fairly astonishing behavior - you might suddenly be referring to tables or columns that were quite different from the ones you originally targeted, if the old ones were renamed out of the way and new, eponymous ones were added. I'm familiar with the view-dependency-hell problem you mention, having fought with it (succesfully, I'm pleased to say, using a big Perl script to manage things - and also - obligatory dig here - to work around our lack of support for CREATE IF NOT EXISTS) on many occasions, but I don't have any brilliant ideas about how to solve it. I would like to eventually support ALTER VIEW ... DROP COLUMN; note that we do now support ADDING columns to a view using CREATE OR REPLACE as long as all the new ones are at the end. But neither of those things is going to help with a case like yours, when you want to change the type of the column. I'm not really sure what to do about that case. ...Robert I've been using the source in information_schema.views rather than storing the original source. Oracle does indeed store the original source code for your objects. I don't know what they use to recompile. But my inclination is that they use the original source. If you alter a table/column name I believe it will invalidate any dependent views which will need manually edited before they will compile successfully. As for Oracle's approach being stupid and not user friendly, OK, maybe they could automatically try to recompile. But even a manual process is better than no help at all. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Invalidating dependent views and functions
I've been working on a brute force method of saving view and function source when changes to an underlying object force that object to be dropped. But I think there is a way for Postgres to handle this that wouldn't be too hard to implement and would be extremely useful for us users. Problem: We need to change the last_name column of the people table from varchar(30) to varchar(50). You issue the alter table command only to be reminded that the people table is quite popular. It will likely be used in dozens of views and many of those views will have dependent views, and lets say there are some dependent functions too. And you have to drop all of them if you want to alter your column. Once they are dropped you can alter your column and then start digging through your source code repository to rebuild all of those views and functions that you just dropped. Proposal: Add an invalid flag to pg_class. Invalid objects would be ignored when doing dependency checks for DDL statements. And an exception would be thrown when an invalid object is called. This is similar to what Oracle does. And most Oracle tools have find and compile invalid objects with a statement like: ALTER VIEW foo RECOMPILE; ALTER PACKAGE bar RECOMPILE BODY; Oracle invalidates objects without warning. But maybe we could keep the current behavior and add an invalidate option. ALTER TABLE people ALTER last_name VARCHAR(50); -- Throw exception can not alter table with dependents ALTER TABLE people ALTER last_name VARCHAR(50) INVALIDATE; -- Alters column and invalidates any dependent objects Is this a viable option? Scott Bailey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended operator classes vs. type interfaces
Jeff Davis wrote: On Fri, 2010-04-09 at 12:50 -0500, Kevin Grittner wrote: I just thought that if you were adding more type information, oriented aournd the types themselves rather than index AMs, some form of inheritence might fit in gracefully. There are already some specific proposals for inheritance in database theory literature. For instance: Databases, Types, and the Relational Model by C.J. Date addresses inheritance explicitly (and the appendices have some interesting discussion). I'm not sure how compatible it is with SQL, though; and I am not very optimistic that we could accomplish such a restructuring of the type system while maintaining a reasonable level of backwards compatibility. Either way, I think it's a separate topic. Two types that are not related by any subtype/supertype relationship (like strings and ints) can conform to the same interface (total ordering); while the very same type can conform to two different interfaces. Regards, Jeff Davis Well I've been doing a lot of work with range abstract data types in Oracle lately. And I've got to say that the OO features in Oracle make it really nice. Of course its Oracle, so its like a half baked OO in cobol syntax, lol. But I for one think it would be great if Postgres had object data types that had methods and could be subclassed. For those not familiar with ADT's in Oracle, here's an example: CREATE TYPE period AS OBJECT ( beginningDATE, ending DATE, CONSTRUCTOR FUNCTION period ( self IN OUT NOCOPY period, beginning DATE, ending DATE ) RETURN SELF AS RESULT, -- config functions MEMBER FUNCTION granule RETURN INTERVAL DAY TO SECOND, MEMBER FUNCTION def_inc RETURN NUMBER, MEMBER FUNCTION range_union(p2 period) RETURN period ... ) NOT FINAL; CREATE TYPE date_range UNDER period ( OVERRIDING MEMBER FUNCTION granule RETURN INTERVAL DAY TO SECOND, OVERRIDING MEMBER FUNCTION def_inc RETURN NUMBER, ... ); Scott Bailey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath improvement V2
Jan Urbański wrote: Arie Bikker wrote: Hi all, I've combined the review suggestions of Jan Urbański, Scott Bailey, and others. This was a lot harder, then I had foreseen; and I took my time to do it the right way (hope you agree!). Hi, I see the patch has been marked as Returned with Feedback on the 6th of February, I assume on grounds of prolonged silence about it. I confess it was partly my fault, because soon after posting the review I suddenly had to focus on other things. I won't be able to review the new version in the next few days, and this commitfest is closing anyway... However I would hate to see that patch just disappear, as I think it's useful and you obviously invested some work in it. At this stage I would suggest moving it to the first 9.1 commitfest, since it's a nice feature, but not one we should burden the committers with this late in the development cycle. Arie, care to add that last version of the patch to the 2010-Next commitfest? Cheers, Jan I would agree w/ Jan about just adding to 9.1 except that it fixes several known bugs. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XQuery support
Andrew Dunstan wrote: Matthias Brantner wrote: I know this has been discussed several times and it seems the conclusin was it's impossible if we would like to use existing XQuery external modules (some are by license reasons and some are by techinical reasons). So it seems the only way to support XQuery is, developing our own XQuery functionality from scratch. I'm wondering if other people reach the same conclusion as me, or is it a totaly impossible project? Well xquilla on top of xerces should get us pretty much were we need to go. They are both under the Apache 2 license, which I believe would be ok. But they are C++ so I don't know if we could use them. I found Zorba, its C++ and Apache 2. But I don't know much about it. But it looks promising. http://www.zorba-xquery.com/ Sounds like a great project. Please, let us know (zorba-us...@lists.sourceforge.net) if you need any help or have questions with regard to plugging Zorba to PostGreSQL. We would love to see this happen and help. Well, maybe you can answer the questions I had last time I looked at it, namely: XQuery is a whole other question. Adding another library dependency is something we try to avoid. Zorba http://www.zorba-xquery.com/ might work, but it appears to have its own impressive list of dependencies (why does it require both libxml2 and xerces-c? That looks a bit redundant.) Even if we did implement XMLTABLE, I think I'd probably be inclined to start by limiting it to plain XPath, without the FLWOR stuff. I think that would satisfy the vast majority of needs, although you might feel differently. (Do a Google for XMLTABLE - every example I found uses plain XPath expressions.) A much more urgent need we have, IMNSHO, than for XQuery, is for XSLT support that actually works. But maybe that's just me. I've been playing around with Zorba and xQilla today. I couldn't actually get Zorba to build on my system. It looks like the two share a common (or similar) C API though. Zorba also had perl, Python, Ruby and PHP bindings (which I also couldn't get to work) but would be nice to be able to use from plperl, plpython, plruby also. If it were possible to drop the xerces dependency from Zorba and use libxml2 for parsing, that would be a strong case for it. The other big dependency for Zorba was ICU for converting character sets. I'm not familiar with that part of Postgres. But it seems to me that we would have our own libraries for that sort of thing. I'm not a big fan of doing XSLT in the database. But xQilla would give you XSLT 2.0 support. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath improvement suggestion
Robert Haas wrote: On Sun, Jan 17, 2010 at 11:33 AM, Jan Urbański wulc...@wulczer.org wrote: [ detailed review ] Arie, Are you planning to submit an updated patch? If so, please do so soon. Thanks, ...Robert What is the time limit on this? I've been testing Arie's patch and I want to see it get in. I can make the changes Jan requested if Arie doesn't. How long should I give him? Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XQuery support
Tatsuo Ishii wrote: Hi, I know this has been discussed several times and it seems the conclusin was it's impossible if we would like to use existing XQuery external modules (some are by license reasons and some are by techinical reasons). So it seems the only way to support XQuery is, developing our own XQuery functionality from scratch. I'm wondering if other people reach the same conclusion as me, or is it a totaly impossible project? -- Tatsuo Ishii SRA OSS, Inc. Japan Well xquilla on top of xerces should get us pretty much were we need to go. They are both under the Apache 2 license, which I believe would be ok. But they are C++ so I don't know if we could use them. I found Zorba, its C++ and Apache 2. But I don't know much about it. But it looks promising. http://www.zorba-xquery.com/ Sedna is a complete db/xquery solution licensed under Apache 2 and written in C/C++. Perhaps we could use pieces of this. Better than starting from scratch. http://modis.ispras.ru/sedna/index.html Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
I don't think. When we have function, with same parameters, same behave like some Oracle function, then I am strongly prefer Oracle name. I don't see any benefit from different name. It can only confuse developers and add the trable to people who porting applications. Meh. If the name is terrible, we don't have to use it, and it's easy enough to create an alias in SQL for those who need it. The corresponding function in Oracle is called wm_concat. In MySQL its called group_concat. I don't believe DB2 or SQL Server have built in equivalents. The Oracle name isn't really an option (wm' stands for workspace manager) I think listagg or string_agg would be the most appropriate names. Oh and before Oracle had wm_concat, Tom Kyte wrote a function called stragg that was pretty popular. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath improvement suggestion
Arie Bikker wrote: Peter Eisentraut wrote: On ons, 2010-01-06 at 23:46 +0100, Arie Bikker wrote: Hope this is the right attachement type (I'm new at this) BTW. here a some nice examples: - Get the number of attributes of the first childnode: select ( xpath('count(@*)',(xpath('*[1]','a b=cd e=f g=j//a'))[1]))[1]; - an alternative for xpath_exist('/a/d') select (xpath('boolean(/a/d)','a b=cd e=f g=j//a'))[1]; - fixes bug 4206 select xpath('//text()',xmlparse(document '?xml version=1.0?elem1elem2one/elem2elem2two/elem2elem2three/elem2elem3att=2//elem1')); - fixes bug 4294 select xpath('name(/my:a/*[last()])', 'a xmlns=http://myns.com/ns;btext1/bctext2/c/a', ARRAY[ARRAY['my','http://myns.com/ns']]); Instead of converting everything to text, there have been previous suggestions to add functionx like xpath_string, xpath_number, xpath_boolean that return the appropriate types from xpath. This could provide for better type safety and probably also more clarity. In any case, please consider adding test cases like the above to the regression tests in whatever patch comes out at the end. As an addition these xpath_sometype functions have been mentioned and can be handy. But, considering that the xpath function itself is a generalized function, the user of this function might not have beforehand knowledge of the type of the result; the first argument of the call could be used in a dynamic fashion. Comming back to the xpath_sometype functions - would these definitions be suitable? boolean xpath_boolean(xpath, xml [, nsarray]) text xpath_string(xpath, xml [, nsarray]) int xpath_number(xpath, xml [,nsarray]) implementation can be done via an xpath_nonnode function defined as: text xpath_nonnode(xpath, xml [,nsarray]) where each of the xpath_sometype functions simply interpret the text as its target type. Is this the way to go? kind regards, Arie Bikker Postgres' type system is MUCH more robust than anything in XPath/XML. And folks who use XML on a regular basis expect most XPath expressions to return a string any way. For instance how many confused users do you think you'll get with something like: SELECT xpath_boolean('boolean(/root/@bar)', 'root bar=false/) -- evaluates to true or SELECT xpath_number('/root/@foo', 'root foo=42/') --xpath will return the string '42' not a number unless you do something like: SELECT xpath_number('number(/root/@foo)', 'root foo=42/') I think we'd be much better of having a function like xpath_nonnode() or xpath_value() that returns text and let the user handle the casting. Scott Bailey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath improvement suggestion
Arie Bikker wrote: Sorry for the previous NUUUB post, didn't now the mailing list doesn't support html ;( Robert Haas wrote: On Tue, Jan 5, 2010 at 6:09 PM, Arie Bikker a...@abikker.nl wrote: Hi all, Well I had to burn some midnight oil trying to figure out why a construct like SELECT xpath('name()','a/'); doesn't give the expected result. Kept getting an empty array: xpath - {} instead of the expected {a} BugID 4294 and the TODO item better handling of XPath data types pointed in the right direction. whithin src/backend/utils/adt/xml.c in the function xpath the result of the call to xmlXPathCompiledEval is not handled optimally. In fact, the result is assumed to be a nodeset without consulting the -type member of the result. I've made some minor changes to xml.c to handle some non-nodeset results of xmlXPathCompiledEval. Essentially, the revised code makes an array of all the nodes in the xpathobj result in case this is a nodeset, or an array with a single element in case the reult is a number/string/boolean. The problem cases mentioned in http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php now work as expected. Revision of the code involves: - A switch statement to handle the result type of xmlXPathCompiledEval. - an additional function xmlpathobjtoxmltype. diff of the revisioned code with respect to original is in attached file. kind regards, Arie Bikker Hi, Could you please resend this as a context diff and add it to our patch management application? http://wiki.postgresql.org/wiki/Submitting_a_Patch https://commitfest.postgresql.org/action/commitfest_view/open Thanks! ...Robert Hope this is the right attachement type (I'm new at this) BTW. here a some nice examples: - Get the number of attributes of the first childnode: select ( xpath('count(@*)',(xpath('*[1]','a b=cd e=f g=j//a'))[1]))[1]; - an alternative for xpath_exist('/a/d') select (xpath('boolean(/a/d)','a b=cd e=f g=j//a'))[1]; - fixes bug 4206 select xpath('//text()',xmlparse(document '?xml version=1.0?elem1elem2one/elem2elem2two/elem2elem2three/elem2elem3att=2//elem1')); - fixes bug 4294 select xpath('name(/my:a/*[last()])', 'a xmlns=http://myns.com/ns;btext1/bctext2/c/a', ARRAY[ARRAY['my','http://myns.com/ns']]); Awesome! This really helps. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal: XML helper functions
One of the problem with shredding XML is that it is very kludgy to get a scalar value back from xpath. The xpath function always returns an array of XML. So for example, to extract a numeric value you need to: 1) use xpath to get the node 2) get the first element of the XML array 3) cast that to varchar 4) cast that to numeric So I wrote the following function: CREATE OR REPLACE FUNCTION xmlvalue( VARCHAR, XML ) RETURNS TEXT AS $$ SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$' THEN (xpath($1, $2))[1] WHEN $1 ~* '/text()$' THEN (xpath($1, $2))[1] WHEN $1 LIKE '%/' THEN (xpath($1 || 'text()', $2))[1] ELSE (xpath($1 || '/text()', $2))[1] END::text; $$ LANGUAGE 'sql' IMMUTABLE; It's pretty simple. It just does a check to see if you are extracting an attribute or an element and if element, it makes sure to get the text value. So query that used to look like: SELECT CAST( CAST( (xpath('/foo/bar/text()', myxml))[1] AS varchar) AS numeric) AS bar now becomes: SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar Second function just checks that the xpath expression finds at least one node. CREATE OR REPLACE FUNCTION xmlexists( VARCHAR, XML ) RETURNS BOOLEAN AS $$ SELECT CASE WHEN array_upper(xpath($1, $2), 1) 0 THEN true ELSE false END; $$ LANGUAGE 'sql' IMMUTABLE; On naming, SQL/XML specifies xmlexists and xmlcast. Latest db2 provides xmlcast(), Oracle has equivalent extractvalue() function, MSSQL uses xml.value(). The xmlvalue does only part of what is required by xmlcast (it won't cast scalar to xml). So would these functions need to be rewritten in c in order to be accepted? Regards, Scott Bailey Further reading: http://scottrbailey.wordpress.com/2009/06/19/xml-parsing-postgres/ http://en.wikibooks.org/wiki/SQL_Dialects_Reference/SQL_XML http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions052.htm#i1131042 http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0023486.htm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: XML helper functions
Merlin Moncure wrote: On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey arta...@comcast.net wrote: One of the problem with shredding XML is that it is very kludgy to get a scalar value back from xpath. The xpath function always returns an array of XML. So for example, to extract a numeric value you need to: 1) use xpath to get the node 2) get the first element of the XML array 3) cast that to varchar 4) cast that to numeric I just happen to be dealing with XML right now as well and my initial thought is that your suggestion doesn't buy you a whole lot: the root problem IMO is not dealing with what xpath gives you but that there is no DOMish representation of the xml document for you to query. You have to continually feed the entire document to xpath which is absolutely not scalable (if it works the way I think it does -- haven't looked at the code). No typically you'll only be passing the xml for a single row so what we end up doing in Postgres typically looks something like this: SELECT xmlvalue('/row/@id', bitesizexml)::int AS id, xmlvalue('/row/@lat', bitesizexml)::numeric AS lat, xmlvalue('/row/@lon', bitesizexml)::numeric, xmlvalue('/row/comment', bitesizexml) AS cmt FROM ( SELECT unnest(xpath('/foo/row', mybigxmldoc)) AS bitesizexml ) sub So only the one call has to work with the entire document. All the calls to xmlvalue are passed a much smaller node to work with. xpath is great for simple things but it's too texty and you need a more robust API to handle documents for serious parsing on the backend. In the short term i'd advise doing work in another pl like perl. This is basically the method used for Oracle too until they provided XMLTable functionality. They had a function xmlsequence that basically did the unnest(xpath()) part. Hopefully we'll get xmltable support soon. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: XML helper functions
Pavel Stehule wrote: 2010/1/5 Scott Bailey arta...@comcast.net: One of the problem with shredding XML is that it is very kludgy to get a scalar value back from xpath. The xpath function always returns an array of XML. So for example, to extract a numeric value you need to: 1) use xpath to get the node 2) get the first element of the XML array 3) cast that to varchar 4) cast that to numeric So I wrote the following function: CREATE OR REPLACE FUNCTION xmlvalue( VARCHAR, XML ) RETURNS TEXT AS $$ SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$' THEN (xpath($1, $2))[1] WHEN $1 ~* '/text()$' THEN (xpath($1, $2))[1] WHEN $1 LIKE '%/' THEN (xpath($1 || 'text()', $2))[1] ELSE (xpath($1 || '/text()', $2))[1] END::text; $$ LANGUAGE 'sql' IMMUTABLE; It's pretty simple. It just does a check to see if you are extracting an attribute or an element and if element, it makes sure to get the text value. So query that used to look like: SELECT CAST( CAST( (xpath('/foo/bar/text()', myxml))[1] AS varchar) AS numeric) AS bar now becomes: SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar Second function just checks that the xpath expression finds at least one node. CREATE OR REPLACE FUNCTION xmlexists( VARCHAR, XML ) RETURNS BOOLEAN AS $$ SELECT CASE WHEN array_upper(xpath($1, $2), 1) 0 THEN true ELSE false END; $$ LANGUAGE 'sql' IMMUTABLE; On naming, SQL/XML specifies xmlexists and xmlcast. I am for SQL/XML naming convention. Well I'm shying away from the name xmlcast because it is supposed to cast xml to scalar, scalar to xml and xml to xml. For instance these would all work on db2. SELECT xmlcast(null AS XML), xmlcast(bar AS XML), xmlcast(xmlquery('$x/baz/bar' PASSING foo.barxml AS x) AS VARCHAR(30)) FROM foo But I just found that Oracle added xmlcast in 11g and it only does xml to scalar (and only number, varchar2 and date/time types). So maybe you're right. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: XML helper functions
Peter Eisentraut wrote: On tis, 2010-01-05 at 10:14 -0800, Scott Bailey wrote: One of the problem with shredding XML is that it is very kludgy to get a scalar value back from xpath. The xpath function always returns an array of XML. So for example, to extract a numeric value you need to: 1) use xpath to get the node 2) get the first element of the XML array 3) cast that to varchar 4) cast that to numeric There has been talk about adding something like xpath_string, xpath_number, xpath_boolean for fetching xpath expressions that don't return nodesets. I think that would fit your use case. The first two sound very much like what I'm looking for. I'm unsure about the third. Is it's purpose to extract the scalar value of an expression and cast to bool as the other two do, or is it to identify if the xpath expression returned any nodes like xmlexists? Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath improvement suggestion
Arie Bikker wrote: Hi all, Well I had to burn some midnight oil trying to figure out why a construct like SELECT xpath('name()','a/'); doesn't give the expected result. Kept getting an empty array: xpath - {} instead of the expected {a} BugID 4294 and the TODO item better handling of XPath data types pointed in the right direction. whithin src/backend/utils/adt/xml.c in the function xpath the result of the call to xmlXPathCompiledEval is not handled optimally. In fact, the result is assumed to be a nodeset without consulting the -type member of the result. I've made some minor changes to xml.c to handle some non-nodeset results of xmlXPathCompiledEval. Essentially, the revised code makes an array of all the nodes in the xpathobj result in case this is a nodeset, or an array with a single element in case the reult is a number/string/boolean. The problem cases mentioned in http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php now work as expected. Revision of the code involves: - A switch statement to handle the result type of xmlXPathCompiledEval. - an additional function xmlpathobjtoxmltype. diff of the revisioned code with respect to original is in attached file. kind regards, Arie Bikker Well that's interesting. I was getting ready to dig into libxml2 to see what it would take to return scalar values from xpath expressions. Thanks. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom Lane wrote: Dimitri Fontaine dfonta...@hi-media.com writes: Tom Lane t...@sss.pgh.pa.us writes: Hm, how would you do it with LATERAL? The problem is not so much composition as the need for a variable number of rounds of composition. Let's have a try at it: select p2_member, array_accum(p1) from unnest(p2) as p2_member lateral (select period_except(p1_member, p2_member) from unnest(p1) p1_member) as x(p1); I don't think that does it. Maybe I misunderstand LATERAL, but what that looks like to me is that each p1 will be separately filtered by each p2, giving rise to a distinct element in the output. What we need is for each p1 to be filtered by *all* p2's, successively (though in any order). regards, tom lane That approach will only work if you coalesce your inputs into non-contiguous sets (NCS) first. Overlapping ranges would break it in a hurry. In addition to two coalesce operations, period_except would be calculated 1000x for a pair of 100 element arrays. Original solution, while not short was probably a little more elegant than Tom gave credit for. In a single pass it pulls out only the data points needed to build the resultant NCS without making assumptions that the inputs were coalesced. I think I'll still be able to do a single pass solution for continuous ranges. I just wont be able to do the coalesce operations inline with the set operations. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idea - new aggregates median, listagg
Thom Brown wrote: 2009/12/15 Pavel Stehule pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com Hello I am looking on new feature - ORDER clause in aggregate, and I thing, so we are able to effectively implement some non standard, but well known aggregates. a) function median - it is relative frequent request - with usually slow implementation b) function listagg (it is analogy of group_concat from MySQL) - it should simplify report generating and some other What is your opinion? Do you like to see these functions in core? I'm probably missing the point here, but when I originally saw MySQL's group_concat function, I found it odd that it featured ordering functionality. Shouldn't the order by determined by the query itself? Otherwise it's almost as if its separating the relationship between the result column and the resultset. For xmlagg in particular, it is quite useful to be able order the results. And sorting the query doesn't work for ordering the agg unless you do it in a subquery. Oracle has this functionality and it is quite handy. It would be nice to see listagg with the option to order as well. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Jeff Davis wrote: On Sun, 2009-12-13 at 23:49 -0800, Scott Bailey wrote: So basically I have an anyrange pseudo type with the functions prev, next, last, etc defined. So instead of hard coding range types, we would allow the user to define their own range types. Basically if we are able to determine the previous and next values of the base types we'd be able to define a range type. I'm envisioning in a manner much like defining an enum type. After an off-list discussion with Scott, I think there may be a solution here that works for everyone if we don't try so hard to unify the implementation of discrete and continuous ranges. The API should be very similar, of course, but the implementation doesn't need to be. Continuous ranges absolutely require the following information: start, end, and inclusivity information. But discrete ranges can instead be stored by counting the number of granules from the start point. For instance, it could be stored as: start, num_granules. That has a lot of benefits for discrete ranges of time. First of all, it allows the algebra to work reasonably well for the days and months part of the interval, so we can allow a granule of 1 day/week/month/year for a timestamp range. For output of the range, we can then just multiply the granule by the number of granules, and add that to the start time; thus avoiding the incremental addition problem with date math. I think this works reasonably well for timestamp/date ranges -- let me know if there is a problem here (aside from timestamptz, which I address below). Secondly, in the case of a timestamp range, we can use 7 bytes for storing the number of granules rather than another full 8-byte timestamp, leaving one byte for flags to represent NULL boundaries, infinite boundaries, etc. For timestamps that would still mean that an interval could be 2000 years long with '1 microsecond' granularity. For dates, 3 bytes is sufficient for a date range 45000 years long with granules of '1 day'. That means that we can get back down to a 16 byte representation for timestamp ranges, or 8 byte representation for date ranges. There are a few details, like infinite ranges, but those can be pretty easily solved with flags as well. There's one problem, and that's for timestamptz ranges with intervals that include days and months. Timezone adjustments are just not well-defined for that kind of granule (nor would it be particularly useful even if it magically worked), so this would have to be blocked somehow. I think that's a special case, and we could provide the user with a nice error message telling the user to use a date or timestamp range instead. So, the idea is to default to a continuous range type, but if the user supplies a granule, prior and next functions, and other necessary details, then it becomes a discrete range type. * continuous ranges can still have everything that everyone wants, including flags to indicate special values. * discrete range granule is specified explicitly, so it's not an implementation detail * discrete ranges can have a compact representation * discrete ranges would still have room for flags to indicate special values Comments? As I pointed out off-list, I think the granularity for timestamp range should be limited to hours and smaller. Anything larger is asking for trouble. And quite honestly if they wanted day granularity, they should use date range. Also, I think the granule should be same type as returned when subtracting two subtypes. So granule of date range should be int not interval. And if user wanted something with month granularity, perhaps an enum range of 'MM' would be better. Quite honestly the following 3 cases would probably meet 99% of need: CREATE TYPE period AS RANGE(timestamptz(0), interval '1 s'); CREATE TYPE period AS RANGE(timestamptz(3), interval '1 ms'); CREATE TYPE period AS RANGE(timestamptz, interval '1 us'); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: [ hacky special-case representation for discrete timestamp ranges ] I'm still not exactly clear on what the use-case is for discrete timestamp ranges, and I wonder how many people are going to be happy with a representation that can't handle a range that's open-ended on the left. They wouldn't. But the timestamp data would be the anchor, not necessarily the start point. As long as we ranges unbounded on both ends we'd be ok. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Ok, silly question here. But how do you determine the length of a continuous range? By definition length of [a, b) and (a, b] = b-a. But what about (a,b) and [a,b]? Are we saying that because they are continuous, the difference between values included in the range and those excluded are so infinitesimally small so as not to matter? Thus length (a,b) == length [a,b] == length [a,b)? And if that is the case, does the inclusiveness of the range really even matter? And can anyone point me to a reference for working with continuous ranges? Google just insists that I spelled contiguous wrong. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom Lane wrote: Martijn van Oosterhout klep...@svana.org writes: However, it does seem reasonable to allow people to restrict, either by typmod or a check constraint the kinds of values that can be stored in a particular column. Then an application can decide which way they want their intervals to work and have the database enforce it. Sure --- the range datatype should absolutely provide inquiry functions that let you determine all the properties of a range, so something like CHECK (is_open_on_right(col)) would work for that. I'm of the opinion that we must not usurp typmod for range behavior --- the right thing is to pass that through to the contained type, just as we do with arrays. (Note that a range over timestamp(0) would eliminate at least some of the platform dependencies we've been arguing about. I'm still quite dubious that next timestamp is anything except evidence that you've misformulated your problem, though.) regards, tom lane Well our work is based on over 15 years of temporal research (not by us) and numerous books from Snodgrass, Date and Celko; as well as partial implementations in other databases. So its not like we took a blue pill this weekend and woke up with this hair-brained idea. I understand your concern. But I think the objections are based more on implementation details with float timestamp rather than conceptually. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Jeff Davis wrote: On Tue, 2009-12-15 at 10:19 -0500, Tom Lane wrote: I'm not sure that anyone has argued that. I did suggest that there might be a small list of types for which we should provide discrete behavior (ie, with next/previous functions) and the rest could have continuous behavior (without that assumption). But I quite agree that we want both types of ranges. It seems like we're moving toward treating TIMESTAMP as continuous. If I'm correct, continuous ranges always need two extra bits of storage for the exclusivity. But for timestamps, that means 16 bytes (2 x 8-byte timestamp) turns into 17 bytes, which is really more like 20 or 24 bytes with alignment. Considering that these are likely to be used for audit or history tables, 8 bytes of waste (50%) seems excessive -- especially when treating them as discrete seems to work pretty well, at least for the int64 timestamps. Would it be OK if we handled float timestamp ranges as continuous and int64 timestamps discrete? You effectively lose the ability to build non-contiguous sets with continuous ranges. Which is integral to the work I'm doing (union, intersect, coalesce and minus sets of ranges) As for the extra bits, would it be better to just require continuous ranges to be either [] or [)? But I don't know which would be preferred. My inclination would be toward [), but Tom seemed to indicate that perhaps [] was the norm. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
David Fetter wrote: On Tue, Dec 15, 2009 at 11:31:05AM -0800, Scott Bailey wrote: Jeff Davis wrote: On Tue, 2009-12-15 at 10:19 -0500, Tom Lane wrote: Would it be OK if we handled float timestamp ranges as continuous and int64 timestamps discrete? That sounds like a recipe for disaster. Whatever timestamp ranges are, float and int64 should be treated the same way so as not to get surprises due to implementation details. You effectively lose the ability to build non-contiguous sets with continuous ranges. Which is integral to the work I'm doing (union, intersect, coalesce and minus sets of ranges) As for the extra bits, would it be better to just require continuous ranges to be either [] or [)? But I don't know which would be preferred. My inclination would be toward [), but Tom seemed to indicate that perhaps [] was the norm. [] makes certain operations--namely the important ones in calendaring--impossible, or at least incredibly kludgy, to do. I think we ought to leave openness at each end up to the user, independent of the underlying implementation details. FWIW, I think it would be a good idea to treat timestamps as continuous in all cases. Ok, let me give an example of what we can do with the current implementations that would not be possible with timestamps if we implement as suggested. Jeff's implementation uses a 1 microsecond step size or granule. And my implementation uses an interval step size and can be configured database wide, but default is 1 second. The function below takes two period arrays that can have overlapping and adjacent elements. It subtracts all values in pa1 that intersect with values in pa2. So perhaps pa1 is all of your work shifts for the month and pa2 is a combination of your leave and holidays. The result is a coalesced non-contiguous set of the times you would actually be working. But to do this kind of thing you need to be able to determine prior, first, last and next. I need an implementation that can do this for timestamps and not just ints and dates. CREATE OR REPLACE FUNCTION period_minus( pa1 IN period[], pa2 IN period[] ) RETURNS period[] AS $$ SELECT array_agg(prd) FROM ( SELECT period((t_in).start_time, MIN((t_out).end_time)) AS prd FROM ( SELECT DISTINCT first(p) AS start_time FROM unnest($1) p WHERE NOT contains($2, first(p)) AND NOT contains($1, prior(p)) UNION SELECT DISTINCT next(p) FROM unnest($2) p WHERE contains($1, next(p)) AND NOT contains($2, next(p)) ) t_in JOIN ( SELECT next(p) AS end_time FROM unnest($1) p WHERE NOT contains($1, next(p)) UNION ALL SELECT first(p) FROM unnest($2) p WHERE contains($1, first(p)) AND NOT contains($2, prior(p)) ) t_out ON t_in.start_time t_out.end_time GROUP BY t_in.start_time ORDER BY t_in.start_time ) sub; $$ LANGUAGE 'sql' IMMUTABLE STRICT; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: On Tue, 2009-12-15 at 11:49 -0800, David Fetter wrote: FWIW, I think it would be a good idea to treat timestamps as continuous in all cases. I disagree. There is a lot of value in treating timestamp ranges as discrete. One big reason is that the ranges can be translated between the different input/output forms, and there's a canonical form. As we know, a huge amount of the value in an RDBMS is unifying data from multiple applications with different conventions. Actually, that is exactly one of the reasons why what you propose is a *bad* idea. You want to institutionalize application dependence on a non-portable implementation detail, namely the granularity of machine representation of what's in principle a continuous value. That's one of the fastest routes to non-unifiable data I can think of. So, let's say one application uses (] and another uses [). If you are mixing the data and returning it to the application, you want to be able to provide the result according to its convention. You can't do that with a continuous range. The above is nonsense. [1,2) and [1,2] are simply different objects. A design that assumes that it is always possible to replace one by the other is broken so badly it's not even worth discussing. I don't hear anyone arguing that. But you should be able to convert between [1,2], [1,3), (0,3) and (0,2]. The only reason you'd have applications that fail to handle both open and closed intervals would be if someone were to create an implementation that didn't support both from the outset. Which we need not and should not do. And things get more interesting: if you mix (] and [), then range_union will produce () and range_intersect will produce []. So now you have all four conventions floating around the same database. Which is why it's a good idea to support all four... I don't understand you then. Where do you suppose we would define the inclusiveness for the value? At the type level, at the column level, or at the value level? A design that allows values of different inclusiveness and offers no means to convert from one to another is worthless. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
If this were an amazingly short and beautiful piece of code, it might support your argument, but it's neither. Well we can't all be arrogant brainiacs. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom Lane wrote: I wrote: The proposed problem is certainly soluble without any assumptions of discreteness. To be concrete, I think it could be approached like this: Assume the datatype provides a built-in function period_except(p1 period, p2 period) returns setof period which can return zero, one, or two rows depending on the inputs: no rows if p1 is completely contained in p2 one row if p1 partially overlaps p2, for example: [1,4] except [3,5] returns [1,3) [4,6] except [1,5) returns [5,6] two rows if p1 properly contains p2, for example [1,10] except [4,5] returns [1,4) and (5,10] [1,10] except [9,10) returns [1,9) and [10,10] and of course just p1 if p1 and p2 don't overlap at all. Given such a function it's a simple matter of successively removing each element of p2[] from the set representing the current members of p1[]. The way that I'd find most natural to code that is a loop, along the lines of foreach p2_member in unnest(p2) loop p1 := array(select period_except(p1_member, p2_member) from unnest(p1) p1_member); end loop; But maybe it can be done in a single SQL command. As this example makes clear, when dealing with continuous intervals you *must* admit both open and closed intervals, else you don't have a way to represent the results of except. Maybe part of the failure to communicate here arises from your desire to try to avoid supporting both kinds of intervals. But I think you really have to do it if you want to deal with data that hasn't got any natural granularity. regards, tom lane Alright well I'm going to calm down a bit and take a step back. Perhaps I'm just too close to the issue and not thinking outside of the box that I've built. Let me see if I can make everything work rather than arguing why it wont. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Martijn van Oosterhout wrote: On Sun, Dec 13, 2009 at 11:49:53PM -0800, Scott Bailey wrote: So basically I have an anyrange pseudo type with the functions prev, next, last, etc defined. So instead of hard coding range types, we would allow the user to define their own range types. Basically if we are able to determine the previous and next values of the base types we'd be able to define a range type. I'm envisioning in a manner much like defining an enum type. I find it odd that you could define functions next() and prev() since that assumes some kind of dicretisation which simply does not exist for most types I can think of. Because intervals (mathematical not SQL) can be open or closed at each end point we need to know what the next an previous value would be at the specified granularity. And while you can do some operations without knowing this, there are many you can't. For instance you could not tell whether two [] or () ranges were adjacent, or be able to coalesce an array of ranges. It would seem to me the real useful uses of ranges would be the operations overlaps, disjoint, proceeds, follows, etc, which could all be defined on any well-ordered type (wherever greater-than and less-than are well defined). No need to discretise anything. Do you have any actual usecase for a distretized range type for timestamp? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom Lane wrote: Scott Bailey arta...@comcast.net writes: Because intervals (mathematical not SQL) can be open or closed at each end point we need to know what the next an previous value would be at the specified granularity. And while you can do some operations without knowing this, there are many you can't. For instance you could not tell whether two [] or () ranges were adjacent, or be able to coalesce an array of ranges. This statement seems to me to demonstrate that you don't actually understand the concept of open and closed ranges. It has nothing whatsoever to do with assuming that the data type is discrete; these concepts are perfectly well defined for the reals, for example. What it is about is whether the inclusion conditions are bound or = bound. I won't address how you draw your conclusions here. But I find it 'interesting' that you assume that I don't know what I'm talking about rather than assume you don't fully understand what I'm talking about. Anyhow. For any given range you may be 4 combinations of values. Either the first value included in the range '[' or the last value preceding the start of the range '('; and the last value included in the range ']' or the first value following the end of the range ')'. We aren't going to store all four data points so we need to normalize into the most common form, a half-open interval [) and store just those two values. The first value included in the range and the first value after the end of our range. So lets say you are using a numeric range to model the high and low values of stocks trading on a given day. Now imagine doing this with no concept of granularity. You will most likely be given a range [low, high] with inclusive end points. So how do you convert that to a closed-open interval so you can store it? Is 20.4201 the next value after 20.42? Probably not. You are going to want to define 0.01 as the granularity for this (either type or column) so that 20.43 is. Or again are the ranges [14.0, 22.0] and [22.1, 29.0] adjacent? Maybe, maybe not. There is no way to tell w/o knowing the granularity. Perhaps the granularity is 0.1 and there are a billion values that are not included. Or perhaps the granularity is 0.1 and the are adjacent. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom Lane wrote: Scott Bailey arta...@comcast.net writes: So basically I have an anyrange pseudo type with the functions prev, next, last, etc defined. So instead of hard coding range types, we would allow the user to define their own range types. Basically if we are able to determine the previous and next values of the base types we'd be able to define a range type. I'm envisioning in a manner much like defining an enum type. I think array types, not enums, would be a better model. I was referring to the syntax for how the user actually defined an enum not about it's implementation. Basically what I was hoping to get out of this thread was whether it was better to allow the user to define their own range types by specifying the base type and possibly the granularity and default inclusiveness of the end points, or if we should just provide the types like period and intrange? Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: On Mon, 2009-12-14 at 14:23 -0500, Tom Lane wrote: I'd prefer not to leave it to the user to decide whether a type is discrete or not. I don't know how we can decide such a thing. Do you have any ideas? If the only interesting use-cases are ints and enums, maybe we could just hard-wire it. I think dates could be added to that list as well. But any implementation that doesn't do ranges of timestamptz are non-starters as far as I'm concerned. Certainly int64 timestamps and numeric are doable. And Jeff's period implementation supports float timestamps. I never use float timestamps so I can only assume that he made it work. Personally, I'd rather just see float timestamps go away. And if the range types never supported float or float timestamps, I'd be ok with that. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Range types
I had proposed a temporal contrib module earlier and you wanted to see support for many range types not just timestamptz. So I had an idea on how to implement this but I want to see if you guys thought it was a viable idea. So basically I have an anyrange pseudo type with the functions prev, next, last, etc defined. So instead of hard coding range types, we would allow the user to define their own range types. Basically if we are able to determine the previous and next values of the base types we'd be able to define a range type. I'm envisioning in a manner much like defining an enum type. CREATE TYPE periodtz AS RANGE (timestamptz, '1 microsecond'::interval); CREATE TYPE numrange AS RANGE (numeric(8,2)); -- determine granularity from typmod CREATE TYPE floatrange AS RANGE (float, '0.1'::float); Or getting really crazy... CREATE TYPE terms AS ENUM ('2000_F', '2000_W', '2000_S', '2000_Su'... '2010_F', '2010_W', '2010_S', '2010_Su'); CREATE TYPE termrange AS RANGE (terms); So basically I have a pg_range table to store the base typeid, a text field for the granule value and the granule typeid. I doubt we would be able to get this in for the 8.5 release, especially since I'm still learning C and the Postgres internals. Jeff Davis is going to get something in before the next commit fest so we'll have some type of temporal/range support. But we wanted to see what direction the community felt we should go. Scott Bailey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath_table equivalent
Andrew Dunstan wrote: I've been reading over the documentation to find an alternative to the deprecated xpath_table functionality. I think it may be a possibility but I'm not seeing a clear alternative. Thanks, Chris Graner The standard is XMLTABLE and is implemented by both db2 and oracle but is on our list of unimplemented features. I would love to see this implemented in Postgres. I recall it coming up here before. But I don't think it went beyond discussing which xquery library we could use. Yes, Chris spoke to me about this last night and emailed me an example of what he needs today, and I've spent the couple of hours thinking about it. Not have a nice way of getting a recordset out of a piece of XML is actually quite a gap in our API. The trouble is that XMLTABLE is a horrible grammatical mess, ISTM, and I don't much like the way xpath_table() works either. Passing a table name as text into a function is rather ugly. I think we could do with a much simple, albeit non-standard, API. Something like: xpathtable(source xml, rootnodes text, leaves variadic text[]) returns setof record But unless I'm mistaken we'd need the proposed LATERAL extension to make it iterate nicely over a table. Then we could possibly do something like: select x.bar, x.blurfl from foo f, lateral xpathtable(f.xmlfield,'//foo','bar','baz[1]/blurfl','@is-some-property') as x(bar int, blurfl text, xmlprop bool) where f.otherfield or x.xmlprop; cheers andrew I agree that the syntax of XMLTABLE is odd. But not demonstrably worse than xpathtable. If we are going to exert effort on it, why not do it in a standards compliant way? Otherwise I'd suggest a stop gap of just adding some support functions to make it easier to extract a scalar value from a node. Something like what I did here. http://scottrbailey.wordpress.com/2009/06/19/xml-parsing-postgres/ The nice thing about XMLTABLE is that it adds xquery support. I think the majority of xquery engines seem to be written in Java. XQuilla is C++. I'm not sure if our licensing is compatible, but it I would love the irony of using Berkeley DB XML (formerly Sleepycat) now that its owned by Oracle. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal - temporal contrib module
I would like to add a temporal contrib module. The most important piece would be adding a period data type and some support functions. Jeff Davis and I both have temporal projects on pgFoundry, and we've been collaborating for a while. But there are some areas we'd like to get some advice on. Disk format - A period can be represented as [closed-closed], (open-open), [closed-open) or (open-closed] intervals. Right now we convert these to the most common form, closed-open and store as two timestamptz's. Nulls - A common use case for periods is for modeling valid time. Often the end point is not known. For instance, you know when an employee has been hired but the termination time typically wouldn't be known ahead of time. We can either represent these with a null end time or with infinity. But I'm not sure how to deal with them. Obviously we can test for containment and overlap. But what about length or set operations? Non-contiguous Sets - A period defines a contiguous set of time. But many times we need to work with non-contiguous sets (work shifts in a week, bus schedules, etc). Right now, I'm using period arrays. But period arrays can contain overlapping and adjacent periods. And we have no way to indicate that a period array has been coalesced into a non-contiguous set. And what indexing strategies could be used with non-contiguous sets? Temporal Keys - We need two types of temporal keys. A primary key, exclusion type prevents overlap so someone isn't at two places at the same time. And a foreign key, inclusion type so we can check that the valid time of a child is contained with in the valid time of the parent. Jeff is working on the former, but there is no easy way to do the latter. There is actually a lot of theory out there but very few implementations. Although not an official standard, we try to follow the TSQL2 spec pretty closely. Further reading: Developing Time-Oriented Database Applications - Snodgrass http://www.cs.arizona.edu/~rts/tdbbook.pdf TSQL2 spec ftp://ftp.cs.arizona.edu/tsql/tsql2/spec.pdf Temporal Data and the Relational Model - Date et al http://books.google.com/books?isbn=1558608559 Dozens of publications http://timecenter.cs.aau.dk/pub.htm Regards, Scott Bailey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal - temporal contrib module
I would like to add a temporal contrib module. The most important piece would be adding a period data type and some support functions. Jeff Davis and I both have temporal projects on pgFoundry, and we've been collaborating for a while. I presume you're going to need some backend support and possibly new syntax for some of the operations, right? That seems more urgent to discuss than the possible inclusion into contrib. Jeff Davis is already working on solving these issues for 8.5. But rather than wait until 8.6 or later to get a period data type added to core, I felt it was important to get the period type out in front of people to start using and testing. Plus we wanted to gauge interest from the community. Should we forge ahead and try to become the first general purpose database with support for temporal databases? Or should we wait another 20 years and see if an official specification materializes? I'm very pleased to see people working on temporal issues, BTW! I used to work on a database that did a lot of temporal operations, but the DBMS didn't have any temporal data types or operations so we had to use a lot of triggers etc. to achieve that, and it didn't perform well. Nulls - A common use case for periods is for modeling valid time. Often the end point is not known. For instance, you know when an employee has been hired but the termination time typically wouldn't be known ahead of time. We can either represent these with a null end time or with infinity. But I'm not sure how to deal with them. Obviously we can test for containment and overlap. But what about length or set operations? Hmm. Infinity feels like a better match. The behavior of length and set operations falls out of that naturally. For example, length of a period with an infinite beginning or end is infinite. For set operations, for example the intersection of [123, infinity] and [100, 160] would be [123, 160]. Two different answers from two respondents. And is there a conceptual difference between NULL and +/- infinity? Nothing lasts forever. So when would it make sense to use one verses the other? So in the example I gave Non-contiguous Sets - A period defines a contiguous set of time. But many times we need to work with non-contiguous sets (work shifts in a week, bus schedules, etc). Right now, I'm using period arrays. But period arrays can contain overlapping and adjacent periods. And we have no way to indicate that a period array has been coalesced into a non-contiguous set. And what indexing strategies could be used with non-contiguous sets? I'd stick to your current definition that a period is a contiguous set of time. A non-contiguous set consists of multiple contiguous periods, so it can be represented as multiple rows in a table. That's pretty much my sentiments exactly. But Jeff wanted to be sure that we didn't make a decision now that would limit it's usefulness later. Temporal Keys - We need two types of temporal keys. A primary key, exclusion type prevents overlap so someone isn't at two places at the same time. And a foreign key, inclusion type so we can check that the valid time of a child is contained with in the valid time of the parent. Jeff is working on the former, but there is no easy way to do the latter. I'm very excited about this. Foreign keys don't seem that hard, you'll need foreign key triggers like we have today, but check for within instead of equal. Temporal Data and the Relational Model - Date et al http://books.google.com/books?isbn=1558608559 +1 for the approach in this book. I'm not familiar enough with the TSQL2 spec to say whether it follows it. It should also be kept in mind that although this class of problems are generally thought of as temporal issues, IOW dealing with time, the same approach works with ranges of integers or any other datatype with a well-defined sort order. It would be nice if the temporal data type would allow that too. The period concept relates very closely to mathematical intervals. (In fact, I would argue that the SQL interval should actually be named period and the SQL period should be named interval so they matched their mathematical counterparts.) My primary concern is timestamp intervals, but I see no reason the exact same concepts wouldn't apply to intervals of integers, floats, dates, etc. And actually there is a fair amount of overlap with spatial. The main difference being the number of dimensions. But the concepts of overlap, containment, and set operations like union and intersection are the same. Scott Bailey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath_table equivalent
Chris Graner wrote: Hello, I've been reading over the documentation to find an alternative to the deprecated xpath_table functionality. I think it may be a possibility but I'm not seeing a clear alternative. Thanks, Chris Graner The standard is XMLTABLE and is implemented by both db2 and oracle but is on our list of unimplemented features. I would love to see this implemented in Postgres. I recall it coming up here before. But I don't think it went beyond discussing which xquery library we could use. Scott Bailey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers