I recall not long ago a discussion about regular expressions in a query that hit on this exact topic but don't think it was ever resolved so I am giving it a go again...

Here is my query (keep in mind that I am just experimenting now so don't worry about the fact that I am using nested substring function calls):

SELECT
    referrer,
    substring(referrer FROM '^([^\\/]*\\/\\/[^\\/]*)(\\/)?'),
    substring(referrer FROM '^[^\\/]*\\/\\/www\\.google\\.[^\\/]*\\/[^\\?]*\\?(.*)$'),
    substring(substring(referrer FROM '^[^\\/]*\\/\\/www\\.google\\.[^\\/]*\\/[^\\?]*\\?(.*)$') FROM '((%&q=)|(q=))#"[^&]*#"((&%)|())' FOR '#')
FROM one_hour_air.web_page_view
WHERE referrer ~ '^[^\\/]*\\/\\/(www.google\\.[^\\/]*)\\/'

What I get is:

referrer
substring
substring_1
substring_2

http://www.google.ca/search?q=one+hour+heating&hl=en
http://www.google.ca
q=one+hour+heating&hl=en
q=

http://www.google.com/search?hl=en&q=One+hour+heating+and+Air
http://www.google.com
hl=en&q=One+hour+heating+and+Air
hl=en&q=

What I expected for substring_2 was (respectively):

one+hour+heating

One+hour+heating+and+Air

I thought by using the FOR '#' I could specify exactly what part of the _expression_ I would get but it still grabs the first (...) of the pattern.  At least that is what the documentation in seciton 9.7.2 at http://www.postgresql.org/docs/8.1/static/functions-matching.html led me to believe.  How can I get the part of the string I am really after without using one nested substring after another?

Thanks,
Aaron Bono

==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
==================================================================

Reply via email to