Andrzej, thanks for explanation.
How can I distinguish this redirect pages from the 'normal' ones (with
content)? Some status or flag (with parseData.getStatus() I get
success(1,0) for both redirect and normal pages). Can I use HTTP
response code and if so how can I get it (I don't see it in
Tomislav Poljak wrote:
Andrzej, thanks for explanation.
How can I distinguish this redirect pages from the 'normal' ones (with
content)? Some status or flag (with parseData.getStatus() I get
success(1,0) for both redirect and normal pages). Can I use HTTP
response code and if so how can I get
Tomislav Poljak wrote:
Hi, I have been reading data from Nutch segments and came across
pages/records with empty parse text. So I looked more into this and
manually fetched data for this urls. Lots of them are redirect page,
but stored into Nutch segment as pages (with meta data but empty