Hi Sebastian, On 14 September 2016 at 15:20, Sebastian Nagel <[email protected]> wrote:
> Should have the same effect than indexing with -deleteGone. > If you are using Nutch 1.12 also have a look at this bug which > could be the reason for your problem: > https://issues.apache.org/jira/browse/NUTCH-2269 > Do you see similar errors in the logs? > > 2016-09-13 04:38:04,391 INFO solr.SolrIndexWriter - Indexing 177 documents 2016-09-13 04:38:41,017 INFO solr.SolrMappingReader - source: appKey dest: appKey 2016-09-13 04:38:41,030 INFO solr.SolrMappingReader - source: access dest: access 2016-09-13 04:38:41,030 INFO solr.SolrMappingReader - source: content dest: content 2016-09-13 04:38:41,030 INFO solr.SolrMappingReader - source: endtime dest: endtime 2016-09-13 04:38:41,030 INFO solr.SolrMappingReader - source: keywords dest: keywords 2016-09-13 04:38:41,030 INFO solr.SolrMappingReader - source: site dest: site 2016-09-13 04:38:41,030 INFO solr.SolrMappingReader - source: title dest: title 2016-09-13 04:38:41,031 INFO solr.SolrMappingReader - source: tstamp dest: changed 2016-09-13 04:38:41,031 INFO solr.SolrMappingReader - source: tstamp dest: created 2016-09-13 04:38:41,031 INFO solr.SolrMappingReader - source: siteHash dest: siteHash 2016-09-13 04:38:41,031 INFO solr.SolrMappingReader - source: uid dest: uid 2016-09-13 04:38:41,031 INFO solr.SolrMappingReader - source: type dest: type 2016-09-13 04:38:41,031 INFO solr.SolrMappingReader - source: site dest: nutchSite_stringS 2016-09-13 04:38:41,031 INFO solr.SolrMappingReader - source: host dest: nutchHost_stringS 2016-09-13 04:41:22,120 INFO indexer.IndexingJob - Indexer: finished at 2016-09-13 04:41:22, elapsed: 00:03:34 2016-09-13 04:41:30,489 INFO indexer.CleaningJob - CleaningJob: starting at 2016-09-13 04:41:30 2016-09-13 04:41:32,047 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2016-09-13 04:41:35,680 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2016-09-13 04:41:35,759 INFO solr.SolrMappingReader - source: appKey dest: appKey 2016-09-13 04:41:35,759 INFO solr.SolrMappingReader - source: access dest: access 2016-09-13 04:41:35,759 INFO solr.SolrMappingReader - source: content dest: content 2016-09-13 04:41:35,759 INFO solr.SolrMappingReader - source: endtime dest: endtime 2016-09-13 04:41:35,759 INFO solr.SolrMappingReader - source: keywords dest: keywords 2016-09-13 04:41:35,759 INFO solr.SolrMappingReader - source: site dest: site 2016-09-13 04:41:35,759 INFO solr.SolrMappingReader - source: title dest: title 2016-09-13 04:41:35,760 INFO solr.SolrMappingReader - source: tstamp dest: changed 2016-09-13 04:41:35,760 INFO solr.SolrMappingReader - source: tstamp dest: created 2016-09-13 04:41:35,760 INFO solr.SolrMappingReader - source: siteHash dest: siteHash 2016-09-13 04:41:35,760 INFO solr.SolrMappingReader - source: uid dest: uid 2016-09-13 04:41:35,760 INFO solr.SolrMappingReader - source: type dest: type 2016-09-13 04:41:35,760 INFO solr.SolrMappingReader - source: site dest: nutchSite_stringS 2016-09-13 04:41:35,760 INFO solr.SolrMappingReader - source: host dest: nutchHost_stringS 2016-09-13 04:41:36,541 INFO indexer.CleaningJob - CleaningJob: deleted a total of 2 documents 2016-09-13 04:41:36,545 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2016-09-13 04:41:37,313 INFO indexer.CleaningJob - CleaningJob: finished at 2016-09-13 04:41:37, elapsed: 00:00:06 2016-09-13 04:41:38,857 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable It claims to have deleted 2 documents, but there are plenty of 404 pages still in the index. I think it's quite an old version of Nutch. There is a lib/apache-nutch-1.8.jar file :-) -- Met vriendelijke groet, Jigal van Hemert | Ontwikkelaar Langesteijn 124 3342LG Hendrik-Ido-Ambacht T. +31 (0)78 635 1200 F. +31 (0)848 34 9697 KvK. 23 09 28 65 [email protected] www.alternet.nl Disclaimer: Dit bericht (inclusief eventuele bijlagen) kan vertrouwelijke informatie bevatten. Als u niet de beoogde ontvanger bent van dit bericht, neem dan direct per e-mail of telefoon contact op met de verzender en verwijder dit bericht van uw systeem. Het is niet toegestaan de inhoud van dit bericht op welke wijze dan ook te delen met derden of anderszins openbaar te maken zonder schriftelijke toestemming van alterNET Internet BV. U wordt geadviseerd altijd bijlagen te scannen op virussen. AlterNET kan op geen enkele wijze verantwoordelijk worden gesteld voor geleden schade als gevolg van virussen. Alle eventueel genoemde prijzen S.E. & O., excl. 21% BTW, excl. reiskosten. Op al onze prijsopgaven, offertes, overeenkomsten, en diensten zijn, met uitzondering van alle andere voorwaarden, de Algemene Voorwaarden van alterNET Internet B.V. van toepassing. Op al onze domeinregistraties en hostingactiviteiten zijn tevens onze aanvullende hostingvoorwaarden van toepassing. Dit bericht is uitsluitend bedoeld voor de geadresseerde. Aan dit bericht kunnen geen rechten worden ontleend. ! Bedenk voordat je deze email uitprint, of dit werkelijk nodig is !

