Never looked at this bit of code, but for the example
you provided with <embed src="...">, shouldn't the
code be:

   + linkParams.put("embed", new LinkParams("embed","src", 0));

not

   + linkParams.put("embed", new LinkParams("embed","source", 0));

Howie


> From: [EMAIL PROTECTED]
> To: [email protected]; [email protected]
> Subject: Extracting Embedded Outlinks
> Date: Wed, 23 Apr 2008 11:45:40 -0400
> 
> I'm trying to extract outlinks to embedded youtube videos encoded as
> below, using a post Nutch 0.9 system.
>  
> <object width="425" height="355"><param name="movie"
> value="http://www.youtube.com/v/8iYRjK2KSps&rel=1";></param><param
> name="wmode" value="transparent"></param><embed
> src="http://www.youtube.com/v/8iYRjK2KSps&rel=1";
> type="application/x-shockwave-flash" wmode="transparent" width="425"
> height="355"></embed></object>
> 
> <embed src="http://www.youtube.com/v/A1_GQ-K7P_w&amp;rel="; width="425"
> height="355" type="application/x-shockwave-flash"
> wmode="transparent"></embed>
> 
> I modified DOMContentUtils.java as follows:
> 
>   public void setConf(Configuration conf) {
>    + System.out.println("setting linkparams conf");
>     this.conf = conf;
>     linkParams.clear();
>     linkParams.put("a", new LinkParams("a", "href", 1));
>    + linkParams.put("embed", new LinkParams("embed","source", 0));
>    + linkParams.put("object", new LinkParams("object", "movie", 2));
>     linkParams.put("area", new LinkParams("area", "href", 0));
>     if (conf.getBoolean("parser.html.form.use_action", false)) {
>       linkParams.put("form", new LinkParams("form", "action", 1));
>     }
>     linkParams.put("frame", new LinkParams("frame", "src", 0));
>     linkParams.put("iframe", new LinkParams("iframe", "src", 0));
>     linkParams.put("script", new LinkParams("script", "src", 0));
>     linkParams.put("link", new LinkParams("link", "href", 0));
>     linkParams.put("img", new LinkParams("img", "src", 0));
>   }
> 
> But nothing happens.  These links are always ignored.  In fact, the
> print statement never prints.
> 
> How can I extract these outlinks?
> 
> Brian
> 
> 
> -- 
>   Brian Ulicny
>   bulicny at alum dot mit dot edu
>   home: 781-721-5746
>   fax: 360-361-5746
> 
> 

_________________________________________________________________
In a rush? Get real-time answers with Windows Live Messenger.
http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh_realtime_042008

Reply via email to