On Apr 12, 2006, at 12:36 AM, Chris Hostetter wrote:
: > Is there a way to find out what string is being written (perhaps
: > modify the code to catch that particular exception and display the
: > string)
:
: I know its a bunch of text I culled from pages like this:
:
: <http://www.purl.org/swinburnearchive/txt/aicatlnt00>
:
: (it'll redirect)
I got a flat 404.
to pinpoint the exact text, i would start by changing the start/rows
params so that you get one doc at a time untill you find one that
causes
the error .. then change your fl to just be the id and one other
field,
and try each of the field names untill you find the one with the
data that
caused the problem.
my hunch is that when POSTing the doc, the wrong charset (or char
encoding, i allways get them confused) was used by Jetty, so a corrupt
string was indexed, and it isn't obvious untill it was displayed.
It's this, sorry for the previous bad URL:
<http://www.letrs.indiana.edu/swinburne/txt/aicatlnt00.txt>
I suspect the charset diagnosis is the likely culprit. My Java
client is using HttpClient to read the data from that URL, add it to
a field, and then send it on to Solr. Lots of potential issues with
the charset/encoding issue to go awry.
Erik