I'm not sure how to get some of the data from a crawled PDF document into my Solr index. When I run the parsechecker tool I can see the date I need as an attribute of the Content Metadata (date=2018-08-06T14:14:00Z), but I'm not sure how I configure the solrindex-mapping.xml to successfully map this to a Solr field.
I tried adding the below mapping, but it didn't work: <field dest="date" source="date"/> Below is an example of the result of the parsechecker data showing the date attribute in the Content Metadata: --------- ParseData --------- Version: 5 Status: success(1,0) Title: XXXXXXX Outlinks: 1 outlink: toUrl: https://xxx.zzz anchor: Content Metadata: Server=Microsoft-IIS/7.5 Connection=close Last-Modified=Mon, 06 Aug 2018 15:16:28 GMT Date=Wed, 13 Feb 2019 10:36:52 GMT nutch.crawl.score=0.0 nutch.fetch.time=1550054216537 Cache-Control=no-cache, no-store ETag="8727b79f5faf0086a80c86df4cbbac12" Content-Disposition=inline; filename=xxxxx.pdf" X-AspNet-Version=4.0.30319 Content-Length=81903 Content-Type=application/pdf X-Powered-By=ASP.NET Parse Metadata: date=2018-08-06T14:14:00Z pdf:PDFVersion=1.5 xmp:CreatorTool=Microsoft Office Word access_permission:modify_annotations=true access_permission:can_print_degraded=true dc:creator=XXXXX dcterms:created=2018-08-06T14:14:00Z Last-Modified=2018-08-06T14:14:00Z dcterms:modified=2018-08-06T14:14:00Z dc:format=application/pdf; version=1.5 Last-Save-Date=2018-08-06T14:14:00Z access_permission:fill_in_form=true meta:save-date=2018-08-06T14:14:00Z pdf:encrypted=false dc:title=xxxxxxxx modified=2018-08-06T14:14:00Z Content-Type=application/pdf creator=XXXXXX meta:author=XXXXX meta:creation-date=2018-08-06T14:14:00Z created=Mon Aug 06 15:14:00 BST 2018 access_permission:extract_for_accessibility=true access_permission:assemble_document=true xmpTPg:NPages=7 Creation-Date=2018-08-06T14:14:00Z access_permission:extract_content=true access_permission:can_print=true Author=XXXXXX producer=Aspose.Words for .NET 16.2.0.0 access_permission:can_modify=true -- *Tom Potter* Software Developer T: 0191 241 3703 E: tom.pot...@orangebus.co.uk <lou...@orangebus.co.uk> • W: www.orangebus.co.uk • [image: Orange Bus] <http://www.orangebus.co.uk/> Orange Bus, Milburn House, Dean Street, Newcastle Upon Tyne, NE1 1LE -- This email and any attachment to it are confidential. Unless you are the intended recipient, you may not use, copy or disclose either the message or any information contained in the message. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any views or opinions expressed in this email are those of the sender unless otherwise stated. All copyright in any Orange Bus and/or Capita material in this email is reserved. All emails may be recorded by Orange Bus and monitored for legitimate business purposes. Orange Bus and Capita exclude all liability for any loss or damage arising or resulting from the receipt, use or transmission of this email to the fullest extent permitted by law. Orange Bus Limited is a company registered in England & Wales under company registration number 4444974. Our registered company address is 30 Berners Street, London, W1T 3LR, United Kingdom. Orange Bus Limited, part of Capita Software, is a subsidiary of Capita Business Services Ltd registered in England & Wales under company number 2299747. *You are receiving this message from Capita Software. Should you wish to see how we may have collected or may use your information, or view ways to exercise your individual rights, see our Privacy Notice <https://www.capitasoftware.com/PrivacyNotice>*