OK , I got it .Thanks .

 I think I might use Tika to do the extraction.  The format is html , so I
need to use some token and regular expression to deal with it . Any
suggestion for that?

Thanks,

On Wed, Jul 20, 2011 at 12:13 AM, Gora Mohanty <[email protected]> wrote:

> On Wed, Jul 20, 2011 at 12:12 PM, Cheng Li <[email protected]> wrote:
> > Hi ,
> >
> >    I want to extract price data( here the price is $1110 ) from
> >
> http://www.kbb.com/volkswagen/jetta/1991-volkswagen-jetta/gl-sedan-2d/?vehicleid=11638&intent=buy-used&pricetype=private-party&condition=good
> .
> >
> >  But in the website source code , I cannot find any information about the
> > price of $1110. How should I extract  the price data from this page?
>
> Haven't tried crawling the site with Nutch, but the price is in the source
> code. Do a "View Source" in your browser, and search for 1,100 (there
> is a comma in there). I see
> <span class="value"><span class="icon"></span>$1,110</span>
>
> Regards,
> Gora
>



-- 
Cheng Li

Reply via email to