Re: Can't render html entities when adding documents

Aaron Suggs Tue, 19 Jun 2007 20:38:21 -0700

I'm was getting the same XmlPullParserException from solr while using
solr-ruby to index HTML.


I solved things by running text through the html_escape() method in
ERB::Utils before submitting to Solr.

In the console, the following generates the XmlPullParserException in
solr, which manifests itself as a Net::HTTPFatalError in solr-ruby:

 Solr::Connection.new(http://localhost:8083/solr, :autocommit =>
:on).add(:id => 1, :value_t => '&nbsp;')
Net::HTTPFatalError: 500...XmlPullParserException...

But escape_html (aliased as the h() method by default) characters
works like a charm:

 include ERB::Util
 Solr::Connection.new(http://localhost:8083/solr, :autocommit =>
:on).add(:id => 1, :value_t => h('&nbsp;'))
=> true

Subsequently, searching for strings like 'nbsp' returns hits on those
escaped entities, which may or may not be what you want:

Solr::Connection.new(SOLR_URL, :autocommit => :on).query('value_t:nbsp').hits

=> [{"score"=>10.771498, "id"=>1, "value_t"=>"&nbsp;"}]

If you don't want searches for 'nbsp' to return all documents with
escaped non-breaking spaces, the solution lies in defining some new
fieldtype in solr/conf/schema.xml

-Aaron Suggs

On 6/19/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 6/19/07, Thiago Jackiw <[EMAIL PROTECTED]> wrote:
> There's something funky with solr-ruby's xml processing when adding
> documents, but I don't really know what it is yet. It can't process
> html entities at all, not even an html blank space "&nbsp;":

nbsp is not a default XML entity.
Try replacing it with &#160;

-Yonik

Re: Can't render html entities when adding documents

Reply via email to