On Oct 29, 2009, at 4:00am, Eran Zinman wrote:
Hello everyone,
I've created a plugin for Nutch 1.0 that extends the parser.
This plugin extract several kinds of information from the document
DOM.
In some cases I need to extract an "href" of a certain link. The
link in the
DOM is still relative as it was originally written in the html
document, so
for example it might be a link with an href of "/music".
My question is - how can I make this link have an absolute url - for
example
make "/music" to "http://www.example.com/music"?
new URL(baseUrl, relativeString)
will return the full URL, leaving aside a few minor edge cases.
The baseUrl will be the URL of the containing document, or the value
of the (potentially relative) location: response header field if it
exists, or the value of the <base> tag in the <head> element, if that
exists.
-- Ken
--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378