thanks. it was about 65 KB !!! On Tue, Oct 4, 2011 at 9:34 PM, Markus Jelsma <[email protected]>wrote:
> check your http.content.limit, i can at least parse one of your files > correctly. > > > Hi > > I want to crawl with this seed: > > http://shce.sums.ac.ir/articles/farsi.html > > > > but when fetching operation arrives to pdf and doc files give me some > > errors like these: > > > --------------------------------------------------------------------------- > > ----- ParseSegment: starting at 2011-10-04 21:08:05 > > ParseSegment: segment: crawl-2/segments/20111004210620 > > Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/department/technical/pd > > f/brca.doc: failed(2,0): Your file contains 124 sectors, but the initial > > DIFAT array at index 0 referenced block # 151. This isn't allowed and > > your file is corrupt Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/asthmpreventio > > n.pdf: failed(2,0): null > > Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/bicycle_safety > > .pdf: failed(2,0): null > > Error parsing: > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca2.pdf: > > failed(2,0): null > > Error parsing: > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca3.pdf: > > failed(2,0): null > > Error parsing: > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca4.pdf: > > failed(2,0): null > > Error parsing: > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/ca5.pdf: > > failed(2,0): null > > Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/cancerrisks.pd > > f: failed(2,0): null > > Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/cellphonehazar > > d.pdf: failed(2,0): null > > Error parsing: > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/chol.pdf: > > failed(2,0): null > > Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/coronarydispre > > vention.pdf: failed(2,0): null > > Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/diabetescontro > > l.pdf: failed(2,0): null > > Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/diabeteshandou > > ts.pdf: failed(2,0): null > > Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/farsidiabetes > . > > pdf: failed(2,0): null > > Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/quitsession.pd > > f: failed(2,0): null > > Error parsing: > > > http://shce.sums.ac.ir/icarusplus/export/sites/shce/download/thalassemia3.p > > df: failed(2,0): null > > ParseSegment: finished at 2011-10-04 21:08:07, elapsed: 00:00:02 > > ------------------------------------------------------------------- > > can anyone help me? >

