I'm attempting to index my data. Which are autogenerated html
documents, the source being TEI(xml) documents.
I created the collection with
sudo su - solr -c "/opt/solr/bin/solr create -c mbepp -n
data_driven_schema_configs"
I'm indexing with
find . -mindepth 2 -not -name "person_*.*" -not -name "place_*.*" -name
"*.html" | xargs /opt/solr/bin/post -c mbepp
We are playing fast and loose with some of the meta data, especially the
date, here is an except that is causing a problem.
<meta name="description" content="I am the way, the truth, and the life" />
<meta name="date" content="Unknown" />
<meta name="dc.date.created" content="Unknown" />
Most documents have a correctly formatted date string and I would like
to keep that data available for search on the date field.
I'm getting an error when indexing any file that has a non-standard date
and its skipping the entire file?
POSTing file L02164.html (text/html) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for
url:
http://localhost:8983/solr/mbepp/update/extract?resource.name=%2Fhome%2Fscott%2Fworkspace%2Fmbel-work%2Ftei2html%2Fbuild%2Fweb%2F.%2FL02164%2FL02164.html&literal.id=%2Fhome%2Fscott%2Fworkspace%2Fmbel-work%2Ftei2html%2Fbuild%2Fweb%2F.%2FL02164%2FL02164.html
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int
name="QTime">4</int></lst><lst name="error"><str name="msg">Invalid Date
String:'Unknown'</str><int name="code">400</int></lst>
</response>
SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 400 for URL:
http://localhost:8983/solr/mbepp/update/extract?resource.name=%2Fhome%2Fscott%2Fworkspace%2Fmbel-work%2Ftei2html%2Fbuild%2Fweb%2F.%2FL02164%2FL02164.html&literal.id=%2Fhome%2Fscott%2Fworkspace%2Fmbel-work%2Ftei2html%2Fbuild%2Fweb%2F.%2FL02164%2FL02164.html
I realize it is complaining because the date string isn't matching the
data_driven_schema file. How can I coerce it into allowing the
non-standard date strings while still using the correctly formatted ones?
thanks,
Scott
--
We must learn to live together as brothers or perish together as fools.
Martin Luther King Jr.