The document you tried to index has an "id" but not a "fake_id". Because "fake_id" is your index uniqueKey, you have to include it in every document you index. Your most likely fix for this is to use a Transformer to generate a "fake_id". You might get away with changing this:
<field column="fake_id" name="fake_id" meta="true" /> to this: <field column="fake_id" name="id" meta="true" /> This assumes, of course, for these pdf documents the "fake_id" should always be the same as the "id". James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: anarchos78 [mailto:rigasathanasio...@hotmail.com] Sent: Friday, May 11, 2012 12:32 PM To: solr-user@lucene.apache.org Subject: RE: Indexing data from pdf I have included the extras and I am getting the following: *From Solr:* <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">2</int></lst> <lst name="initArgs"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </lst> <str name="command">full-import</str> <str name="status">idle</str> <str name="importResponse"/> <lst name="statusMessages"> <str name="Total Requests made to DataSource">0</str> <str name="Total Rows Fetched">2</str> <str name="Total Documents Skipped">0</str> <str name="Full Dump Started">2012-05-11 20:21:50</str> <str name="">Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.</str> <str name="Committed">2012-05-11 20:21:51</str> <str name="Total Documents Processed">0</str><str name="Total Documents Failed">1</str> <str name="Time taken">0:0:1.284</str></lst><str name="WARNING">This response format is experimental. It is likely to change in the future.</str> </response> *The log file:* org.apache.solr.handler.dataimport.SolrWriter upload WARNING: Error creating document : SolrInputDocument[{id=id(1.0)={1}, biog=biog(1.0)={Dinos Michailidis Dinos Michailidis (1355 or 1356 – 1418) was a medieval Egyptian writer and mathematician born in a village in the Nile Delta. He is the author of Subh al-a 'sha, a fourteen volume encyclopedia in Arabic, which included a section on cryptology. This information was attributed to Taj ad-Din Ali ibn ad-Duraihim ben Muhammad ath-Tha 'alibi al-Mausili who lived from 1312 to 1361, but whose writings on cryptology have been lost. The list of ciphers in this work included both substitution and transposition, and for the first time, a cipher with multiple substitutions for each plaintext letter. Also traced to Ibn al-Duraihim is an exposition on and worked example of cryptanalysis, including the use of tables of letter frequencies and sets of letters which can not occur together in one word. }, model=model(1.0)={patata}}] org.apache.solr.common.SolrException: [doc=null] missing required field: fake_id at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:355) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:66) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:723) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426) *The data-config.xml:* <?xml version="1.0" encoding="utf-8"?> <dataConfig> <dataSource type="BinFileDataSource" name="binary" /> <document> <entity name="f" dataSource="binary" rootEntity="false" processor="FileListEntityProcessor" baseDir="/solr/solr/docu/" fileName=".*pdf" recursive="true"> <entity name="tika" processor="TikaEntityProcessor" url="${f.fileAbsolutePath}" format="text"> <field column="id" name="id" meta="true" /> <field column="fake_id" name="fake_id" meta="true" /> <field column="model" name="model" meta="true" /> <field column="text" name="biog" /> </entity> </entity> </document> </dataConfig> *The schema.xml (fields):* <fields> <field name="id" type="string" indexed="true" stored="true" /> <field name="fake_id" type="string" indexed="true" stored="true" /> <field name="model" type="text_en" indexed="true" stored="true" /> <field name="firstname" type="text_en" indexed="true" stored="true"/> <field name="lastname" type="text_en" indexed="true" stored="true"/> <field name="title" type="text_en" indexed="true" stored="true"/> <field name="biog" type="text_en" indexed="true" stored="true"/> </fields> <uniqueKey>fake_id</uniqueKey> <defaultSearchField>text</defaultSearchField> What is going wrong now? I have included all the required fields in the schema.xml. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-data-from-pdf-tp3979876p3980571.html Sent from the Solr - User mailing list archive at Nabble.com.