What it truncates, 'http://' or 'sId=386'? Or something inside URL?

Just inject http://business.verizon.net/ ... nutch should find the rest...

I believe Nutch doesn't have any limits with URL length, although some Web
servers limited to 4000...


>
http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_pageLabel
=S
>
MBPortal_page_main_marketplace&_nfpb=true&_windowLabel=MarketPlacePFControll
er
>
_1&MarketPlacePFController_1_actionOverride=%252Fpageflows%252Fverizon%252Fs
mb
>
%252Fportal%252FmarketPlacePF%252FgetProductDetails&MarketPlacePFController_
1p
> roductsId=386
> 
> Thanks/Regards,
> Parvez
> 
> 
> 
> On Tue, Sep 1, 2009 at 4:43 PM, Fuad Efendi <[email protected]> wrote:
> 
> > > I opened the part-00000 file in the dump folder and there, is only ONE
> > url
> > > and it has been truncated to 318 chars
> > > How make Nutch consider URLs with length more than 318 chars
> >
> > Please provide original (before truncating) sample of such URL
> > Thanks
> >
> >
> >
> >
> >


Reply via email to