Re: fetching pdfs from our website

2017-08-10 Thread Sebastian Nagel
Hi David, there are a couple of options to configure how links are followed by the crawler, esp. db.max.outlinks.per.page db.ignore.external.links It the white space in the URLs intended? > https://assets0.mysite.com/asset /DB_product.pdf >>> +https://assets.*. mysite.com/asset URLs normall

Re: fetching pdfs from our website

2017-08-09 Thread d.ku...@technisat.de
Hey Sebastian, thanks a lot. I already increased it to around 65MB. All our pdfs about 3 to 8mb big. Any other suggestions? ;) Thanks David > Am 09.08.2017 um 18:50 schrieb Sebastian Nagel : > > Hi David, > > for PDFs you usually need to increase the following property: > > > http.conten

Re: fetching pdfs from our website

2017-08-09 Thread Sebastian Nagel
Hi David, for PDFs you usually need to increase the following property: http.content.limit 65536 The length limit for downloaded content using the http protocol, in bytes. If this value is nonnegative (>=0), content longer than it will be truncated; otherwise, no truncation at all. Do