thanks very much!!!

in other words, now i only put
http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome
and
http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome
in the url.txt?


2009/3/20 yanky young <yanky.yo...@gmail.com>

> I think my guess is right. I just see the code of that page.
>
> those two urls are generated by javascript function:
>
> function jump(lan)
>
> in this case, nutch might not be that smart to recognize this kind of
> generated url
>
> but if you generated this two links from server side, and then the
> urls in web pages is static link, then nutch
>
> can crawl as usual.
>
> good luck
>
> yanky
>
>
> 2009/3/20 陈琛 <kylin.chc...@gmail.com>
>
> > thanks
> >
> > u can login in
> >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome
> >
> > and notice the upper right corner, have two translate , it can reach
> those
> > two urls
> >
> > so i am worried .
> > 2009/3/20 yanky young <yanky.yo...@gmail.com>
> >
> > > that must work, but it seems weird. u know, from the seed url you
> given,
> > > nutch will crawl from the seed url and the whole crawled pages is
> > actually
> > > a
> > > tree. the root node is the seed url. if u can not reach those two urls
> > from
> > > the seed url by yourself, nutch can not too.
> > >
> > > yanky
> > >
> > >
> > > 2009/3/20 陈琛 <kylin.chc...@gmail.com>
> > >
> > > > thanks..
> > > >               the url is http://www.laopdr.gov.la/...
> > > > depth 15 topN1200 ...
> > > >
> > > > seems must put
> > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A&;
> > > > <
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A
> > > > >
> > > > in
> > > > the urls directory
> > > >
> > > >
> > > >
> > > > 2009/3/19 yanky young <yanky.yo...@gmail.com>
> > > >
> > > > > Hi:
> > > > >
> > > > > i guess the urls you mentioned are all directed to the same jsp or
> > > > servlet,
> > > > > apparently they all begin with
> > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action<
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome
> > > > > >.
> > > > > the difference is the request_locale parameter. I have no idea how
> > > these
> > > > > two
> > > > > urls with different request_locale parameters are generated, but I
> > > guess
> > > > > nutch just don't know this request_locale parameters because this
> > > > parameter
> > > > > may be added by javascript or backend content management system.
> > Maybe
> > > u
> > > > > can
> > > > > write these links in a page that can be crawled by nutch. The point
> > is
> > > > that
> > > > > these links must can be found somewhere in your whole website
> pages.
> > if
> > > > > not,
> > > > > they can not be found by nutch.
> > > > >
> > > > > good luck
> > > > >
> > > > > yanky
> > > > >
> > > > >
> > > > >
> > > > > 2009/3/19 陈琛 <kylin.chc...@gmail.com>
> > > > >
> > > > > > please help me, it is Urgent and Important, thanks
> > > > > >
> > > > > > ---------- Forwarded message ----------
> > > > > > From: 陈琛 <kylin.chc...@gmail.com>
> > > > > > Date: 2009/3/19
> > > > > > Subject: index web
> > > > > > To: nutch-user@lucene.apache.org
> > > > > >
> > > > > >
> > > > > > hi, all:
> > > > > >
> > > > > > i can get index url like
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome
> > > > > >
> > > > > > but  cannot get index like
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome
> > > > > > &<
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome%0A&;
> > > > > >
> > > > > > and
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome
> > > > > > &<
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A&;
> > > > > >
> > > > >  >
> > > > > >
> > > > > > why not index ?
> > > > > > the web have any different?
> > > > > >
> > > > > > please notice "request_locale="
> > > > > >
> > > > > >
> > > > > > thanks
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to