Re: Highlighting problems with HTML tagged fields

2006-07-28 Thread Yonik Seeley
On 7/28/06, Andrew May <[EMAIL PROTECTED]> wrote: Because I don't want the tags indexed I'm using a modified version of the "text" field type that uses the HTMLStripWhitespaceTokenizerFactory instead of the normal WhitespaceTokenizerFactory. HTMLStripWhitespaceTokenizerFactory works in two pha

Highlighting problems with HTML tagged fields

2006-07-28 Thread Andrew May
Hi, I'm indexing some content that contains HTML markup, and this seems to throw off the highlighting somehow. Example title field: 40Ar/39Ar laserprobe dating of mylonitic fabrics in a polyorogenic terrane of NW Iberia If I search form title:fabrics and turn highlighting on, the highlight

Re: Doc add limit

2006-07-28 Thread Yonik Seeley
Does anyone know if the following table is still valid for HttpUrlConnection: http://www.innovation.ch/java/HTTPClient/urlcon_vs_httpclient.html If so, there are a couple of advantages to using HTTPClient with Solr: - direct streaming to/from socket (could be important for very large requests/res

Re: Doc add limit

2006-07-28 Thread Andrew May
I'm using HttpClient for indexing and searching and it seems to work well. You can either POST files directly (only works in 3.1 alpha, use InputStreamRequestEntity in 3.0): PostMethod post = new PostMethod(solrUrl); post.setRequestEntity(new FileRequestEntity(file, "application/xml"

Re: Re: Doc add limit

2006-07-28 Thread Bertrand Delacretaz
On 7/28/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: ...Getting all the little details of connection handling correct can be tough... it's probably a good idea if we work toward common client libraries so everyone doesn't have to reinvent them Jakarta's HttpClient [1] is IMHO a good base fo

Re: Doc add limit

2006-07-28 Thread sangraal aiken
Yeah that code is pretty bare bones... I'm still in the initial testing stage. You're right it definitely needs some more thourough work. I did try removing all the conn.disconnect(); statements and there was no change. I'm going to give the Java Client code you sent me yesterday a shot and see

Re: Doc add limit

2006-07-28 Thread Yonik Seeley
It may be some sort of weird interaction with persistent connections and timeouts (both client and server have connection timeouts I assume). Does anything change if you remove your .disconnect() call (it shouldn't be needed). Do you ever see any exceptions in the client side? The code you show

Re: Doc add limit

2006-07-28 Thread sangraal aiken
Sure, the method that does all the work updating Solr is the doUpdate(String s) method in the GanjaUpdate class I'm pasting below. It's hanging when I try to read the response... the last output I receive in my log is Got Reader... -- package com.iceninetech.solr.update; import com.icen

Re: Problem with well-formed XML docs

2006-07-28 Thread Chris Hostetter
Andre, which Appserver are you using to run Solr? ... there have been several reports of bugs with the way Jetty deals with the the XML escaped output produced by Solr, particularaly when non ascii characters are involved. If you are using a version of Jetty, have you tried using a build more rec

Re: Own Similarity Class in Solr

2006-07-28 Thread Chris Hostetter
:I have a field "searchname" with a boost of "3.0" during the : document.add. Another field "text" is a copyField of several entries, index time "field boosts", "document boosts", and document length are all factored into the "fieldNorm" value at indexing time -- so if you want to use field bo

Problem with well-formed XML docs

2006-07-28 Thread Andre Basse
Hi all, I have imported some XML documents to Solr. However when I do a query for certain documents I get following error message in the browser: XML Parsing Error: not well-formed Location: http://192.168.32.128:8983/solr/select/?stylesheet=&q=cat%0D%0A&version=2.1&start=0&rows=10&indent=o