Re: [SQL] Extracting hostname from URI column

2007-09-18 Thread John Summerfield
Paul Lambert wrote: Paul Lambert wrote: chester c young wrote: I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. substring( href from '.*://\([^/]*)' ); Ok, your solution looks better than mine... but I

Re: [SQL] Extracting hostname from URI column

2007-09-17 Thread Andrew Sullivan
On Mon, Sep 17, 2007 at 12:18:56AM +0100, Gregory Stark wrote: Not all countries break up their national tld space into sections like .co.uk or .com.au. Canadian domains can be bought directly under .ca like amazon.ca. I think you'll have to store a specific list of tlds and how deep you

Re: [SQL] Extracting hostname from URI column

2007-09-16 Thread Paul Lambert
Paul Lambert wrote: chester c young wrote: I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. substring( href from '.*://\([^/]*)' ); Ok, your solution looks better than mine... but I have no idea how to

Re: [SQL] Extracting hostname from URI column

2007-09-16 Thread Gregory Stark
Paul Lambert [EMAIL PROTECTED] writes: What would be the best way to do something like that? I assume it won't be a simple regex like the above due to the country codes on the end of some domains. My thought is look at the last portion of the domain, if it's 2 characters long then assume it's

Re: [SQL] Extracting hostname from URI column

2007-09-12 Thread Alvaro Herrera
Paul Lambert wrote: Andrej Ricnik-Bay wrote: Plain regex The key are the parenthesis () ... basically it will omit ANYTHING + two slashes at the beginning of a string. Then it will match everything BUT a slash, and as much of that as possible since regex are greedy by default (hence

[SQL] Extracting hostname from URI column

2007-09-11 Thread ogjunk-pgjedan
Hi, I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. This is what I'm trying, but it clearly doesn't do the job. = select substr(href, position('://' in href)+3, position('://' in href)+3+position('/' in

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread chester c young
I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. substring( href from '.*://\([^/]*)' ); Pinpoint customers who

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Paul Lambert
[EMAIL PROTECTED] wrote: Hi, I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. This is what I'm trying, but it clearly doesn't do the job. = select substr(href, position('://' in href)+3, position('://' in

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Paul Lambert
chester c young wrote: I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. substring( href from '.*://\([^/]*)' ); Ok, your solution looks better than mine... but I have no idea how to interpret that, time to

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Paul Lambert
Andrej Ricnik-Bay wrote: Plain regex The key are the parenthesis () ... basically it will omit ANYTHING + two slashes at the beginning of a string. Then it will match everything BUT a slash, and as much of that as possible since regex are greedy by default (hence the host name he was

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread ogjunk-pgjedan
] To: [EMAIL PROTECTED] Cc: sql pgsql pgsql-sql@postgresql.org Sent: Tuesday, September 11, 2007 8:42:46 PM Subject: Re: [SQL] Extracting hostname from URI column I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread chester c young
I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. substring( href from '.*://\([^/]*)' ); typo: no backslash in front of left paren substring( href from '.*://([^/]*)' ) match up thru // within

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Paul Lambert
[EMAIL PROTECTED] wrote: And what I'd like is something that would give me the counts for the number of occurrences of each unique hostname. Something much like `uniq -c'. Can anyone tell me how that's done or where I should look for info? (I'm not sure what to look for, that's the

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread chester c young
And what I'd like is something that would give me the counts for the number of occurrences of each unique hostname. Something much like `uniq -c'. Can anyone tell me how that's done or where I should look for info? (I'm not sure what to look for, that's the problem). select substring(

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread ogjunk-pgjedan
! Otis - Original Message From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: pgsql-sql@postgresql.org Sent: Tuesday, September 11, 2007 11:16:15 PM Subject: Re: [SQL] Extracting hostname from URI column Hi, Thanks, perfect! (though I'll have to look into the regex warning): = select substring

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Paul Lambert
Paul Lambert wrote: Just use distinct... test=# select distinct count(*),substring( href from '.*://([^/]*)' ) as domain from url group by domain order by domain; OK so distinct was redundant there... it gives the same results without it. AutoDRS=# select count(*) as occurances,substring(