Re: Can't render html entities when adding documents

Erik Hatcher Wed, 20 Jun 2007 03:23:09 -0700

Thiago,

I'll have to look late this week/weekend if I get a chance then, buthow did acts_as_solr create the XML passed to Solr? I think youused my original hack for that communication which used REXML,right? solr-ruby now supports both REXML and libxml2 - and I'vefound that libxml2 does things properly whereas REXML was screwingthings up.

I suspect we can come up with a simple test case that shows wherethings are wacky. If you can submit one of those I'll be glad tolook into this as soon as I can (this weekend at the earliest).


        Erik


On Jun 20, 2007, at 2:06 AM, Thiago Jackiw wrote:

Replying to my own post, I just tried with solr 1.2 with the last 2
previous versions of acts_as_solr and it worked great, so I'm pretty
sure this is a solr-ruby issue. I'll do some more testing with the way
solr-ruby adds documents to Solr.

--
Thiago Jackiw
acts_as_solr => http://acts-as-solr.railsfreaks.com


On 6/19/07, Thiago Jackiw <[EMAIL PROTECTED]> wrote:

What's interesting is that on the previous versions of acts_as_solr
(without solr-ruby) the html entities where getting indexed fine

without passing through ERB's html_escape method. That's that Idid as

a fast fix before starting this thread.

Did anything change in Solr 1.2 in regards to xml parsing? And IguessI should try the previous version of the acts_as_solr plugin withSolr

1.2 to see if I get the same error.

--
Thiago Jackiw
acts_as_solr => http://acts-as-solr.railsfreaks.com


On 6/19/07, Aaron Suggs <[EMAIL PROTECTED]> wrote:

> I'm was getting the same XmlPullParserException from solr whileusing

> solr-ruby to index HTML.
>
> I solved things by running text through the html_escape() method in
> ERB::Utils before submitting to Solr.
>

> In the console, the following generates theXmlPullParserException in

> solr, which manifests itself as a Net::HTTPFatalError in solr-ruby:
>
>   Solr::Connection.new(http://localhost:8083/solr, :autocommit =>
> :on).add(:id => 1, :value_t => '&nbsp;')
> Net::HTTPFatalError: 500...XmlPullParserException...
>
> But escape_html (aliased as the h() method by default) characters
> works like a charm:
>
>   include ERB::Util
>   Solr::Connection.new(http://localhost:8083/solr, :autocommit =>
> :on).add(:id => 1, :value_t => h('&nbsp;'))
> => true
>

> Subsequently, searching for strings like 'nbsp' returns hits onthose

> escaped entities, which may or may not be what you want:

> >> Solr::Connection.new(SOLR_URL, :autocommit => :on).query('value_t:nbsp').hits

> => [{"score"=>10.771498, "id"=>1, "value_t"=>"&nbsp;"}]
>
> If you don't want searches for 'nbsp' to return all documents with
> escaped non-breaking spaces, the solution lies in defining some new
> fieldtype in solr/conf/schema.xml
>
> -Aaron Suggs
>
> On 6/19/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > On 6/19/07, Thiago Jackiw <[EMAIL PROTECTED]> wrote:

> > > There's something funky with solr-ruby's xml processing whenadding> > > documents, but I don't really know what it is yet. It can'tprocess

> > > html entities at all, not even an html blank space "&nbsp;":
> >
> > nbsp is not a default XML entity.
> > Try replacing it with &#160;
> >
> > -Yonik
> >
>

Re: Can't render html entities when adding documents

Reply via email to