On Apr 12, 2006, at 12:36 AM, Chris Hostetter wrote:


: > Is there a way to find out what string is being written (perhaps
: > modify the code to catch that particular exception and display the
: > string)
:
: I know its a bunch of text I culled from pages like this:
:
:       <http://www.purl.org/swinburnearchive/txt/aicatlnt00>
:
: (it'll redirect)

I got a flat 404.

to pinpoint the exact text, i would start by changing the start/rows
params so that you get one doc at a time untill you find one that causes the error .. then change your fl to just be the id and one other field, and try each of the field names untill you find the one with the data that
caused the problem.


my hunch is that when POSTing the doc, the wrong charset (or char
encoding, i allways get them confused) was used by Jetty, so a corrupt
string was indexed, and it isn't obvious untill it was displayed.

It's this, sorry for the previous bad URL:

        <http://www.letrs.indiana.edu/swinburne/txt/aicatlnt00.txt>

I suspect the charset diagnosis is the likely culprit. My Java client is using HttpClient to read the data from that URL, add it to a field, and then send it on to Solr. Lots of potential issues with the charset/encoding issue to go awry.

        Erik

Reply via email to