I'm attempting to index my data. Which are autogenerated html documents, the source being TEI(xml) documents.

I created the collection with

sudo su - solr -c "/opt/solr/bin/solr create -c mbepp -n data_driven_schema_configs"

I'm indexing with

find . -mindepth 2 -not -name "person_*.*" -not -name "place_*.*" -name "*.html" | xargs /opt/solr/bin/post -c mbepp

We are playing fast and loose with some of the meta data, especially the date, here is an except that is causing a problem.

<meta name="description" content="I am the way, the truth, and the life" />
<meta name="date" content="Unknown" />
<meta name="dc.date.created" content="Unknown" />

Most documents have a correctly formatted date string and I would like to keep that data available for search on the date field.

I'm getting an error when indexing any file that has a non-standard date and its skipping the entire file?

POSTing file L02164.html (text/html) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/mbepp/update/extract?resource.name=%2Fhome%2Fscott%2Fworkspace%2Fmbel-work%2Ftei2html%2Fbuild%2Fweb%2F.%2FL02164%2FL02164.html&literal.id=%2Fhome%2Fscott%2Fworkspace%2Fmbel-work%2Ftei2html%2Fbuild%2Fweb%2F.%2FL02164%2FL02164.html
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">4</int></lst><lst name="error"><str name="msg">Invalid Date String:'Unknown'</str><int name="code">400</int></lst>
</response>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/mbepp/update/extract?resource.name=%2Fhome%2Fscott%2Fworkspace%2Fmbel-work%2Ftei2html%2Fbuild%2Fweb%2F.%2FL02164%2FL02164.html&literal.id=%2Fhome%2Fscott%2Fworkspace%2Fmbel-work%2Ftei2html%2Fbuild%2Fweb%2F.%2FL02164%2FL02164.html

I realize it is complaining because the date string isn't matching the data_driven_schema file. How can I coerce it into allowing the non-standard date strings while still using the correctly formatted ones?

thanks,

Scott

--
We must learn to live together as brothers or perish together as fools.
Martin Luther King Jr.

Reply via email to