I want to use solr to index a markdown website. The files are in native markdown, but they are served in HTML (by markserv).
Here's what I did: docker run --name solr -d -p 8983:8983 -t solr docker exec -it --user=solr solr bin/solr create_core -c handbook Then, to crawl the site: quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes md /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web org.apache.solr.util.SimplePostTool http://quadra.franz.com:9091/index.md SimplePostTool version 5.0.0 Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract Entering auto mode. Indexing pages with content-types corresponding to file endings md SimplePostTool: WARNING: Never crawl an external web site faster than every 10 seconds, your IP will probably be blocked Entering recursive mode, depth=10, delay=0s Entering crawl at level 0 (1 links total, 1 new) Exception in thread "main" java.lang.NullPointerException at org.apache.solr.util.SimplePostTool$PageFetcher.readPageFromUrl(SimplePostTool.java:1138) at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:603) at org.apache.solr.util.SimplePostTool.postWebPages(SimplePostTool.java:563) at org.apache.solr.util.SimplePostTool.doWebMode(SimplePostTool.java:365) at org.apache.solr.util.SimplePostTool.execute(SimplePostTool.java:187) at org.apache.solr.util.SimplePostTool.main(SimplePostTool.java:172) quadra[git:master]$ Any ideas on what I did wrong? Thanks. Kevin