check your http.content.limit, i can at least parse one of your files correctly.
> Hi > I want to crawl with this seed: > http://shce.sums.ac.ir/articles/farsi.html > > but when fetching operation arrives to pdf and doc files give me some > errors like these: > --------------------------------------------------------------------------- > ----- ParseSegment: starting at 2011-10-04 21:08:05 > ParseSegment: segment: crawl-2/segments/20111004210620 > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/department/technical/pd > f/brca.doc: failed(2,0): Your file contains 124 sectors, but the initial > DIFAT array at index 0 referenced block # 151. This isn't allowed and > your file is corrupt Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/asthmpreventio > n.pdf: failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/bicycle_safety > .pdf: failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca2.pdf: > failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca3.pdf: > failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca4.pdf: > failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca5.pdf: > failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/cancerrisks.pd > f: failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/cellphonehazar > d.pdf: failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/chol.pdf: > failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/coronarydispre > vention.pdf: failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/diabetescontro > l.pdf: failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/diabeteshandou > ts.pdf: failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/farsidiabetes. > pdf: failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/quitsession.pd > f: failed(2,0): null > Error parsing: > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/thalassemia3.p > df: failed(2,0): null > ParseSegment: finished at 2011-10-04 21:08:07, elapsed: 00:00:02 > ------------------------------------------------------------------- > can anyone help me?

