[jira] [Created] (NUTCH-2406) Sum up constants, make minor changes

2017-08-08 Thread kenneth mcfarland (JIRA)
kenneth mcfarland created NUTCH-2406: Summary: Sum up constants, make minor changes Key: NUTCH-2406 URL: https://issues.apache.org/jira/browse/NUTCH-2406 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-2406) Sum up constants, make minor changes

2017-08-08 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118039#comment-16118039 ] ASF GitHub Bot commented on NUTCH-2406: --- kpm1985 opened a new pull request #210: NUTCH-2406 Minor

fetching pdfs from our website

2017-08-08 Thread d.ku...@technisat.de
Hey currently, we are on nutch 2.3.1 and using it to crawl our websites. One of our focus is to get all the pdfs on our website crawled. -> Links on different Websites are like: https://assets0.mysite.com/asset /DB_product.pdf I tried different things: At the configurations I removed ever