Here is my query (keep in mind that I am just experimenting now so don't worry about the fact that I am using nested substring function calls):
SELECT
referrer,
substring(referrer FROM '^([^\\/]*\\/\\/[^\\/]*)(\\/)?'),
substring(referrer FROM '^[^\\/]*\\/\\/www\\.google\\.[^\\/]*\\/[^\\?]*\\?(.*)$'),
substring(substring(referrer FROM '^[^\\/]*\\/\\/www\\.google\\.[^\\/]*\\/[^\\?]*\\?(.*)$') FROM '((%&q=)|(q=))#"[^&]*#"((&%)|())' FOR '#')
FROM one_hour_air.web_page_view
WHERE referrer ~ '^[^\\/]*\\/\\/(www.google\\.[^\\/]*)\\/'
What I get is:
referrer
substring
substring_1
substring_2
http://www.google.ca/search?q=one+hour+heating&hl=en
http://www.google.ca
q=one+hour+heating&hl=en
q=
http://www.google.com/search?hl=en&q=One+hour+heating+and+Air
http://www.google.com
hl=en&q=One+hour+heating+and+Air
hl=en&q=
substring
substring_1
substring_2
http://www.google.ca/search?q=one+hour+heating&hl=en
http://www.google.ca
q=one+hour+heating&hl=en
q=
http://www.google.com/search?hl=en&q=One+hour+heating+and+Air
http://www.google.com
hl=en&q=One+hour+heating+and+Air
hl=en&q=
What I expected for substring_2 was (respectively):
one+hour+heating
One+hour+heating+and+Air
I thought by using the FOR '#' I could specify exactly what part of the _expression_ I would get but it still grabs the first (...) of the pattern. At least that is what the documentation in seciton 9.7.2 at http://www.postgresql.org/docs/8.1/static/functions-matching.html led me to believe. How can I get the part of the string I am really after without using one nested substring after another?
Thanks,
Aaron Bono
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
==================================================================