Hi:

because they are actually the same page, you can only fine one. here is what
i see when i use wget to fetch http://app02.laopdr.gov.la/:

C:\Documents and Settings\yanky>wget http://app02.laopdr.gov.la
--2009-03-03 23:41:19--  http://app02.laopdr.gov.la/
Resolving app02.laopdr.gov.la... 203.110.66.105
Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://app02.laopdr.gov.la/ePortal [following]
--2009-03-03 23:41:20--  http://app02.laopdr.gov.la/ePortal
Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://app02.laopdr.gov.la/ePortal/ [following]
--2009-03-03 23:41:20--  http://app02.laopdr.gov.la/ePortal/
Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location:
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_
US [following]
--2009-03-03 23:41:21--
http://app02.laopdr.gov.la/ePortal/home/home.action?req
uest_locale=en_US
Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `home.act...@request_locale=en_us'

you must see that through several steps of 302 status,
http://app02.laopdr.gov.la arrives at
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US, so
when nutch fetches http://app02.laopdr.gov.la, it actually fetches
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US, so
finally only the page content of
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US is
fetched and indexed.

that doesn't have anything to do with dynamic pages. it is about how nutch
process 302 status.

good luck

yanky

2009/3/4 Yves Yu <[email protected]>

> thank you for your answer.
> I'm feeling strange because http://app02.laopdr.gov.la/ just as same as
> http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> but I cannot find it.
>
> you could see a few frames such as "Hot Event", "Businees" in
> http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> when I copy a few words in these frames, I cannot find this homepage.
> but nutch can find the page which in "more>>" by same words.
>
> I can see both http://app02.laopdr.gov.la/  and
> http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> in my fetch log, but I just cannot find the page.
>
> I'm doubting about dynamic pages... is that reasonable?
>
> 2009/3/3 yanky young <[email protected]>
> - 显示引用文字 -
>
> > Hi:
> >
> > Why do u think nutch can't find
> > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> >
> > Actually http://app02.laopdr.gov.la/ is the same page as
> > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> >
> > if you find http://app02.laopdr.gov.la  in your log, the page you said
> > must
> > be downloaded..
> >
> > good luck
> >
> > yanky
> >
> > 2009/3/3 Yves Yu <[email protected]>
> >
> > > Hi, all,
> > >
> > > I met a situation, need help, thank you in advance.
> > > I added
> > > http://app02.laopdr.gov.la/
> > > into urls.txt
> > >
> > > nutch can find
> > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10109&from=ePortal_NewsDetail_FromHome
> > >
> > > but nutch cannot find
> > >
> http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> > >
> > > anybody has any idea?
> > >
> > > Yves
> > >
> >
>

Reply via email to