Hi,
And, these is another question if you don't feel boring ~~)
for example
in
http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10109&from=ePortal_NewsDetail_FromHome
there is a phase "The summit will provide a good opportunity", I
can find
this page by the word "good", but if I add words to search, ex:
search
"opportunity" or "good opportunity", I found nothing.
why?
Yves
2009/3/4 yanky young <[email protected]>
Hi:
because they are actually the same page, you can only fine one.
here is
what
i see when i use wget to fetch http://app02.laopdr.gov.la/:
C:\Documents and Settings\yanky>wget http://app02.laopdr.gov.la
--2009-03-03 23:41:19-- http://app02.laopdr.gov.la/
Resolving app02.laopdr.gov.la... 203.110.66.105
Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://app02.laopdr.gov.la/ePortal [following]
--2009-03-03 23:41:20-- http://app02.laopdr.gov.la/ePortal
Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://app02.laopdr.gov.la/ePortal/ [following]
--2009-03-03 23:41:20-- http://app02.laopdr.gov.la/ePortal/
Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location:
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_
US [following]
--2009-03-03 23:41:21--
http://app02.laopdr.gov.la/ePortal/home/home.action?req
uest_locale=en_US
Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `home.act...@request_locale=en_us'
you must see that through several steps of 302 status,
http://app02.laopdr.gov.la arrives at
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
,
so
when nutch fetches http://app02.laopdr.gov.la, it actually fetches
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
,
so
finally only the page content of
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_USis
fetched and indexed.
that doesn't have anything to do with dynamic pages. it is about how
nutch
process 302 status.
good luck
yanky
2009/3/4 Yves Yu <[email protected]>
thank you for your answer.
I'm feeling strange because http://app02.laopdr.gov.la/ just as
same
as
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
but I cannot find it.
you could see a few frames such as "Hot Event", "Businees" in
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
when I copy a few words in these frames, I cannot find this
homepage.
but nutch can find the page which in "more>>" by same words.
I can see both http://app02.laopdr.gov.la/ and
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
in my fetch log, but I just cannot find the page.
I'm doubting about dynamic pages... is that reasonable?
2009/3/3 yanky young <[email protected]>
- 显示引用文字 -
Hi:
Why do u think nutch can't find
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
Actually http://app02.laopdr.gov.la/ is the same page as
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
if you find http://app02.laopdr.gov.la in your log, the page you
said
must
be downloaded..
good luck
yanky
2009/3/3 Yves Yu <[email protected]>
Hi, all,
I met a situation, need help, thank you in advance.
I added
http://app02.laopdr.gov.la/
into urls.txt
nutch can find
http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10109&from=ePortal_NewsDetail_FromHome
but nutch cannot find
http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
anybody has any idea?
Yves