sorry, i have no idea about this question. i guess there must be some kind of index leakage in nutch indexing process. some words must be ignored in indexing process. but why? i don't know either. hope someone else can answer your question.
good luck yanky 2009/3/4 Yves Yu <[email protected]> > Hi, > > And, these is another question if you don't feel boring ~~) > for example > > in > > http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10109&from=ePortal_NewsDetail_FromHome > > there is a phase "The summit will provide a good opportunity", I can find > this page by the word "good", but if I add words to search, ex: search > "opportunity" or "good opportunity", I found nothing. > > why? > > Yves > > > 2009/3/4 yanky young <[email protected]> > > > Hi: > > > > because they are actually the same page, you can only fine one. here is > > what > > i see when i use wget to fetch http://app02.laopdr.gov.la/: > > > > C:\Documents and Settings\yanky>wget http://app02.laopdr.gov.la > > --2009-03-03 23:41:19-- http://app02.laopdr.gov.la/ > > Resolving app02.laopdr.gov.la... 203.110.66.105 > > Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected. > > HTTP request sent, awaiting response... 302 Moved Temporarily > > Location: http://app02.laopdr.gov.la/ePortal [following] > > --2009-03-03 23:41:20-- http://app02.laopdr.gov.la/ePortal > > Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected. > > HTTP request sent, awaiting response... 302 Moved Temporarily > > Location: http://app02.laopdr.gov.la/ePortal/ [following] > > --2009-03-03 23:41:20-- http://app02.laopdr.gov.la/ePortal/ > > Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected. > > HTTP request sent, awaiting response... 302 Moved Temporarily > > Location: > > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_ > > US [following] > > --2009-03-03 23:41:21-- > > http://app02.laopdr.gov.la/ePortal/home/home.action?req > > uest_locale=en_US > > Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected. > > HTTP request sent, awaiting response... 200 OK > > Length: unspecified [text/html] > > Saving to: `home.act...@request_locale=en_us' > > > > you must see that through several steps of 302 status, > > http://app02.laopdr.gov.la arrives at > > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US > , > > so > > when nutch fetches http://app02.laopdr.gov.la, it actually fetches > > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US > , > > so > > finally only the page content of > > > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_USis > > fetched and indexed. > > > > that doesn't have anything to do with dynamic pages. it is about how > nutch > > process 302 status. > > > > good luck > > > > yanky > > > > 2009/3/4 Yves Yu <[email protected]> > > > > > thank you for your answer. > > > I'm feeling strange because http://app02.laopdr.gov.la/ just as same > as > > > > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US > > > but I cannot find it. > > > > > > you could see a few frames such as "Hot Event", "Businees" in > > > > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US > > > when I copy a few words in these frames, I cannot find this homepage. > > > but nutch can find the page which in "more>>" by same words. > > > > > > I can see both http://app02.laopdr.gov.la/ and > > > > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US > > > in my fetch log, but I just cannot find the page. > > > > > > I'm doubting about dynamic pages... is that reasonable? > > > > > > 2009/3/3 yanky young <[email protected]> > > > - 显示引用文字 - > > > > > > > Hi: > > > > > > > > Why do u think nutch can't find > > > > > > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US > > > > > > > > Actually http://app02.laopdr.gov.la/ is the same page as > > > > > > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US > > > > > > > > if you find http://app02.laopdr.gov.la in your log, the page you > said > > > > must > > > > be downloaded.. > > > > > > > > good luck > > > > > > > > yanky > > > > > > > > 2009/3/3 Yves Yu <[email protected]> > > > > > > > > > Hi, all, > > > > > > > > > > I met a situation, need help, thank you in advance. > > > > > I added > > > > > http://app02.laopdr.gov.la/ > > > > > into urls.txt > > > > > > > > > > nutch can find > > > > > > > > > > > > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10109&from=ePortal_NewsDetail_FromHome > > > > > > > > > > but nutch cannot find > > > > > > > > > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US > > > > > > > > > > anybody has any idea? > > > > > > > > > > Yves > > > > > > > > > > > > > > >
