check your http.content.limit, i can at least parse one of your files 
correctly.

> Hi
> I want to crawl with this seed:
> http://shce.sums.ac.ir/articles/farsi.html
> 
> but when fetching operation arrives to pdf and doc files give me some
> errors like these:
> ---------------------------------------------------------------------------
> ----- ParseSegment: starting at 2011-10-04 21:08:05
> ParseSegment: segment: crawl-2/segments/20111004210620
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/department/technical/pd
> f/brca.doc: failed(2,0): Your file contains 124 sectors, but the initial
> DIFAT array at index 0 referenced block # 151. This isn't allowed and 
> your file is corrupt Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/asthmpreventio
> n.pdf: failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/bicycle_safety
> .pdf: failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca2.pdf:
> failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca3.pdf:
> failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca4.pdf:
> failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca5.pdf:
> failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/cancerrisks.pd
> f: failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/cellphonehazar
> d.pdf: failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/chol.pdf:
> failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/coronarydispre
> vention.pdf: failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/diabetescontro
> l.pdf: failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/diabeteshandou
> ts.pdf: failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/farsidiabetes.
> pdf: failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/quitsession.pd
> f: failed(2,0): null
> Error parsing:
> http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/thalassemia3.p
> df: failed(2,0): null
> ParseSegment: finished at 2011-10-04 21:08:07, elapsed: 00:00:02
> -------------------------------------------------------------------
> can anyone help me?

Reply via email to