sorry, i have no idea about this question. i guess there must be some kind
of index leakage in nutch indexing process. some words must be ignored in
indexing process. but why? i don't know either. hope someone else can answer
your question.

good luck

yanky


2009/3/4 Yves Yu <[email protected]>

> Hi,
>
> And, these is another question if you don't feel boring ~~)
> for example
>
> in
>
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10109&from=ePortal_NewsDetail_FromHome
>
> there is a phase "The summit will provide a good opportunity", I can find
> this page by the word "good", but if I add words to search, ex: search
> "opportunity" or "good opportunity", I found nothing.
>
> why?
>
> Yves
>
>
> 2009/3/4 yanky young <[email protected]>
>
> > Hi:
> >
> > because they are actually the same page, you can only fine one. here is
> > what
> > i see when i use wget to fetch http://app02.laopdr.gov.la/:
> >
> > C:\Documents and Settings\yanky>wget http://app02.laopdr.gov.la
> > --2009-03-03 23:41:19--  http://app02.laopdr.gov.la/
> > Resolving app02.laopdr.gov.la... 203.110.66.105
> > Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
> > HTTP request sent, awaiting response... 302 Moved Temporarily
> > Location: http://app02.laopdr.gov.la/ePortal [following]
> > --2009-03-03 23:41:20--  http://app02.laopdr.gov.la/ePortal
> > Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
> > HTTP request sent, awaiting response... 302 Moved Temporarily
> > Location: http://app02.laopdr.gov.la/ePortal/ [following]
> > --2009-03-03 23:41:20--  http://app02.laopdr.gov.la/ePortal/
> > Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
> > HTTP request sent, awaiting response... 302 Moved Temporarily
> > Location:
> > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_
> > US [following]
> > --2009-03-03 23:41:21--
> > http://app02.laopdr.gov.la/ePortal/home/home.action?req
> > uest_locale=en_US
> > Connecting to app02.laopdr.gov.la|203.110.66.105|:80... connected.
> > HTTP request sent, awaiting response... 200 OK
> > Length: unspecified [text/html]
> > Saving to: `home.act...@request_locale=en_us'
> >
> > you must see that through several steps of 302 status,
> > http://app02.laopdr.gov.la arrives at
> > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> ,
> > so
> > when nutch fetches http://app02.laopdr.gov.la, it actually fetches
> > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> ,
> > so
> > finally only the page content of
> >
> http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_USis
> > fetched and indexed.
> >
> > that doesn't have anything to do with dynamic pages. it is about how
> nutch
> > process 302 status.
> >
> > good luck
> >
> > yanky
> >
> > 2009/3/4 Yves Yu <[email protected]>
> >
> > > thank you for your answer.
> > > I'm feeling strange because http://app02.laopdr.gov.la/ just as same
> as
> > >
> http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> > > but I cannot find it.
> > >
> > > you could see a few frames such as "Hot Event", "Businees" in
> > >
> http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> > > when I copy a few words in these frames, I cannot find this homepage.
> > > but nutch can find the page which in "more>>" by same words.
> > >
> > > I can see both http://app02.laopdr.gov.la/  and
> > >
> http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> > > in my fetch log, but I just cannot find the page.
> > >
> > > I'm doubting about dynamic pages... is that reasonable?
> > >
> > > 2009/3/3 yanky young <[email protected]>
> > > - 显示引用文字 -
> > >
> > > > Hi:
> > > >
> > > > Why do u think nutch can't find
> > > >
> > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> > > >
> > > > Actually http://app02.laopdr.gov.la/ is the same page as
> > > >
> > http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> > > >
> > > > if you find http://app02.laopdr.gov.la  in your log, the page you
> said
> > > > must
> > > > be downloaded..
> > > >
> > > > good luck
> > > >
> > > > yanky
> > > >
> > > > 2009/3/3 Yves Yu <[email protected]>
> > > >
> > > > > Hi, all,
> > > > >
> > > > > I met a situation, need help, thank you in advance.
> > > > > I added
> > > > > http://app02.laopdr.gov.la/
> > > > > into urls.txt
> > > > >
> > > > > nutch can find
> > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10109&from=ePortal_NewsDetail_FromHome
> > > > >
> > > > > but nutch cannot find
> > > > >
> > >
> http://app02.laopdr.gov.la/ePortal/home/home.action?request_locale=en_US
> > > > >
> > > > > anybody has any idea?
> > > > >
> > > > > Yves
> > > > >
> > > >
> > >
> >
>

Reply via email to