thanks. it was about 65 KB !!!

On Tue, Oct 4, 2011 at 9:34 PM, Markus Jelsma <[email protected]>wrote:

> check your http.content.limit, i can at least parse one of your files
> correctly.
>
> > Hi
> > I want to crawl with this seed:
> > http://shce.sums.ac.ir/articles/farsi.html
> >
> > but when fetching operation arrives to pdf and doc files give me some
> > errors like these:
> >
> ---------------------------------------------------------------------------
> > ----- ParseSegment: starting at 2011-10-04 21:08:05
> > ParseSegment: segment: crawl-2/segments/20111004210620
> > Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/department/technical/pd
> > f/brca.doc: failed(2,0): Your file contains 124 sectors, but the initial
> > DIFAT array at index 0 referenced block # 151. This isn't allowed and
> > your file is corrupt Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/asthmpreventio
> > n.pdf: failed(2,0): null
> > Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/bicycle_safety
> > .pdf: failed(2,0): null
> > Error parsing:
> > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca2.pdf:
> > failed(2,0): null
> > Error parsing:
> > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca3.pdf:
> > failed(2,0): null
> > Error parsing:
> > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca4.pdf:
> > failed(2,0): null
> > Error parsing:
> > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca5.pdf:
> > failed(2,0): null
> > Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/cancerrisks.pd
> > f: failed(2,0): null
> > Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/cellphonehazar
> > d.pdf: failed(2,0): null
> > Error parsing:
> > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/chol.pdf:
> > failed(2,0): null
> > Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/coronarydispre
> > vention.pdf: failed(2,0): null
> > Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/diabetescontro
> > l.pdf: failed(2,0): null
> > Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/diabeteshandou
> > ts.pdf: failed(2,0): null
> > Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/farsidiabetes
> .
> > pdf: failed(2,0): null
> > Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/quitsession.pd
> > f: failed(2,0): null
> > Error parsing:
> >
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/thalassemia3.p
> > df: failed(2,0): null
> > ParseSegment: finished at 2011-10-04 21:08:07, elapsed: 00:00:02
> > -------------------------------------------------------------------
> > can anyone help me?
>

Reply via email to