Hi, I am a newbie to Solr and I am trying to index some xml documents using DIH and XPath but I am unable to do it. I get a response message of successful indexing but no document is added to the index. I do not know what i m doing wrong.
This is my data config xml file <dataConfig> <dataSource type="FileDataSource"/> <document> <entity name="nytxmldir" rootEntity="false" datasource="null" processor="FileListEntityProcessor" fileName=".*\.xml" recursive="true" baseDir="/home/farhan/Downloads/nytxml" > <entity name="nytxml" pk="id" datasource="nytxmldir" url="${nytxmldir.fileAbsolutePath}" processor="XPathEntityProcessor" forEach="/ntif" transformer="RegexTransformer"> <field column="id" xpath="/ntif/head/docdata/doc-id/@id-string"/> <field column="title" xpath="/ntif/head/title"/> <field column="paragraph" xpath="/ntif/body/body.content/block[@class='full_text']/p"/> </entity> </entity> </document> </dataConfig> This is my xml document <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE nitf SYSTEM " http://www.nitf.org/IPTC/NITF/3.3/specification/dtd/nitf-3-3.dtd"> <nitf change.date="June 10, 2005" change.time="19:30" version="-//IPTC//DTD NITF 3.3//EN"> <head> <title>Paid Notice: Deaths BRADLEY, CAROL L.</title> <meta content="dn010107" name="slug"/> <meta content="1" name="publication_day_of_month"/> <meta content="1" name="publication_month"/> <meta content="2007" name="publication_year"/> <meta content="Monday" name="publication_day_of_week"/> <meta content="Classified" name="dsk"/> <meta content="7" name="print_page_number"/> <meta content="B" name="print_section"/> <meta content="3" name="print_column"/> <meta content="Paid Death Notices" name="online_sections"/> <docdata> <doc-id id-string="1815719"/> <doc.copyright holder="The New York Times" year="2007"/> <identified-content> <person class="indexing_service">BRADLEY, CAROL L.</person> <classifier class="online_producer" type="types_of_material">Paid Death Notice</classifier> <classifier class="online_producer" type="taxonomic_classifier">Top/Classifieds/Paid Death Notices</classifier> </identified-content> </docdata> <pubdata date.publication="20070101T000000" ex-ref=" http://query.nytimes.com/gst/fullpage.html?res=9B06E1DE1E3AF932A35752C0A9619C8B63" item-length="49" name="The New York Times" unit-of-measure="word"/> </head> <body> <body.head> <hedline> <hl1>Paid Notice: Deaths BRADLEY, CAROL L.</hl1> </hedline> </body.head> <body.content> <block class="lead_paragraph"> <p>BRADLEY--Carol L., 84, of Tinton Falls, NJ died peacefully at Seabrook Village on December 27. Beloved wife of Floyd (Pete) Bradley, Jr.; loving mother of Steven, Floyd and Lynette Bradley; adored grandmother of Victoria Kent and Camilla, William and Melissa Bradley; caring stepgrandmother of Matthew and Charlton Field.</p> </block> <block class="full_text"> <p>BRADLEY--Carol L., 84, of Tinton Falls, NJ died peacefully at Seabrook Village on December 27. Beloved wife of Floyd (Pete) Bradley, Jr.; loving mother of Steven, Floyd and Lynette Bradley; adored grandmother of Victoria Kent and Camilla, William and Melissa Bradley; caring stepgrandmother of Matthew and Charlton Field.</p> </block> </body.content> </body> </nitf> I am really stumped as to why it is not working. I know DIH does not support full XPath syntax but according to the wiki it supports the limited XPath syntax that I am using. Also I have read various internet forums and people have suggested to use groovy and xlts which I am unfamiliar with. I hope someone can help me. Thanks Farhan