i read lotoff post regarding redirected urls but didnt find a sollution !




> From: mbel...@msn.com
> To: nutch-user@lucene.apache.org; mille...@gmail.com
> Subject: RE: Content of redirected urls empty
> Date: Tue, 9 Mar 2010 16:59:05 +0000
> 
> 
> 
> hi,
> 
> i dont know if you did find few minutes to see my problem :)
> 
> but i want to explain it again, mabe it wasnt clear :
> 
> 
> i have HTTP  pages redirected to HTTPS   (but it's the same URL):
> 
> HTTP://page1.com   redirrected to HTTPS://page1.com
> 
> the content of my page HTTP is empty.
> the content of my page HTTPS is not empty
> 
> in my segment i found botch the 2 URLS (HTTP and HTTPS ) , the content of 
> HTTPS page is not empty
> 
> but in my index i found the HTTP one with the empty content.
> 
> is there a maner to tell to nutch to index the url with the non empty 
> content? or why nutch doesnt index the target URL rather than indexing the 
> empty (origin) one ??
> 
> thx a lot
> 
> 
> 
> 
> 
> > From: mbel...@msn.com
> > To: nutch-user@lucene.apache.org
> > Subject: RE: Content of redirected urls empty
> > Date: Mon, 8 Mar 2010 17:08:06 +0000
> > 
> > 
> > i'm sorry...i just checked twice...and in my index i have the original URL, 
> > which is  the HTTP one with the empty content...but it dosent index the 
> > HTTPS one....and i using solr index
> > thx
> > 
> > 
> > 
> > > From: mbel...@msn.com
> > > To: nutch-user@lucene.apache.org
> > > Subject: RE: Content of redirected urls empty
> > > Date: Mon, 8 Mar 2010 17:01:34 +0000
> > > 
> > > 
> > > 
> > > 
> > > Hi, i'v just dumped my segments and found that i have both 2 URLS, the 
> > > original one (HTTP) with an empty content and the REDIRCTED TO or the 
> > > DESTINATION URL (HTTPS) with NON EMPTY content !
> > > 
> > > but in my search i found only the HTTPS URL with an empty content !! 
> > > logically the content of the HTTPS  URL is not empty !
> > > it's just mixing the HTTPS url with the content of the HTTP one.
> > > 
> > > 
> > > our redirect is done by java code  response.sendRedirect(…), so it seams 
> > > to be http redirect right ??
> > > 
> > > thx for helping me :)
> > > 
> > > 
> > > > Date: Mon, 8 Mar 2010 15:51:34 +0100
> > > > From: a...@getopt.org
> > > > To: nutch-user@lucene.apache.org
> > > > Subject: Re: Content of redirected urls empty
> > > > 
> > > > On 2010-03-08 14:55, BELLINI ADAM wrote:
> > > > >
> > > > >
> > > > > is there any idea guys ??
> > > > >
> > > > >
> > > > >> From: mbel...@msn.com
> > > > >> To: nutch-user@lucene.apache.org
> > > > >> Subject: Content of redirected urls empty
> > > > >> Date: Fri, 5 Mar 2010 22:01:05 +0000
> > > > >>
> > > > >>
> > > > >>
> > > > >> hi,
> > > > >> the content of my redirected urls is empty...but still have the 
> > > > >> other metadata...
> > > > >> i have an http urls that is redirected to https.
> > > > >> in my index i find the http URL but with an empty content...
> > > > >> could you explain it plz?
> > > > 
> > > > There are two ways to redirect - one is with protocol, and the other is 
> > > > with content (either meta refresh, or javascript).
> > > > 
> > > > When you dump the segment, is there really no content for the 
> > > > redirected 
> > > > url?
> > > > 
> > > > 
> > > > -- 
> > > > Best regards,
> > > > Andrzej Bialecki     <><
> > > >   ___. ___ ___ ___ _ _   __________________________________
> > > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > > http://www.sigram.com  Contact: info at sigram dot com
> > > > 
> > >                                     
> > > _________________________________________________________________
> > > Live connected with Messenger on your phone
> > > http://go.microsoft.com/?linkid=9712958
> >                                       
> > _________________________________________________________________
> > IM on the go with Messenger on your phone
> > http://go.microsoft.com/?linkid=9712960
>                                         
> _________________________________________________________________
> Stay in touch.
> http://go.microsoft.com/?linkid=9712959
                                          
_________________________________________________________________
Take your contacts everywhere
http://go.microsoft.com/?linkid=9712959

Reply via email to