Never looked at this bit of code, but for the example
you provided with <embed src="...">, shouldn't the
code be:
+ linkParams.put("embed", new LinkParams("embed","src", 0));
not
+ linkParams.put("embed", new LinkParams("embed","source", 0));
Howie
> From: [EMAIL PROTECTED]
> To: [email protected]; [email protected]
> Subject: Extracting Embedded Outlinks
> Date: Wed, 23 Apr 2008 11:45:40 -0400
>
> I'm trying to extract outlinks to embedded youtube videos encoded as
> below, using a post Nutch 0.9 system.
>
> <object width="425" height="355"><param name="movie"
> value="http://www.youtube.com/v/8iYRjK2KSps&rel=1"></param><param
> name="wmode" value="transparent"></param><embed
> src="http://www.youtube.com/v/8iYRjK2KSps&rel=1"
> type="application/x-shockwave-flash" wmode="transparent" width="425"
> height="355"></embed></object>
>
> <embed src="http://www.youtube.com/v/A1_GQ-K7P_w&rel=" width="425"
> height="355" type="application/x-shockwave-flash"
> wmode="transparent"></embed>
>
> I modified DOMContentUtils.java as follows:
>
> public void setConf(Configuration conf) {
> + System.out.println("setting linkparams conf");
> this.conf = conf;
> linkParams.clear();
> linkParams.put("a", new LinkParams("a", "href", 1));
> + linkParams.put("embed", new LinkParams("embed","source", 0));
> + linkParams.put("object", new LinkParams("object", "movie", 2));
> linkParams.put("area", new LinkParams("area", "href", 0));
> if (conf.getBoolean("parser.html.form.use_action", false)) {
> linkParams.put("form", new LinkParams("form", "action", 1));
> }
> linkParams.put("frame", new LinkParams("frame", "src", 0));
> linkParams.put("iframe", new LinkParams("iframe", "src", 0));
> linkParams.put("script", new LinkParams("script", "src", 0));
> linkParams.put("link", new LinkParams("link", "href", 0));
> linkParams.put("img", new LinkParams("img", "src", 0));
> }
>
> But nothing happens. These links are always ignored. In fact, the
> print statement never prints.
>
> How can I extract these outlinks?
>
> Brian
>
>
> --
> Brian Ulicny
> bulicny at alum dot mit dot edu
> home: 781-721-5746
> fax: 360-361-5746
>
>
_________________________________________________________________
In a rush? Get real-time answers with Windows Live Messenger.
http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh_realtime_042008