To highlight a field, Solr needs some extra Lucene values. If these are not configured for the field in the schema, Solr has to re-analyze the field to highlight it. If you want faster highlighting, you have to add term vectors to the schema. Here is the grand map of such things:
http://wiki.apache.org/solr/FieldOptionsByUseCase On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <erickerick...@gmail.com> wrote: > What are you actual highlighting requirements? you could try > things like maxAnalyzedChars, requireFieldMatch, etc.... > > http://wiki.apache.org/solr/HighlightingParameters > has a good list, but you've probably already seen that page.... > > Best > Erick > > On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote: > >> To follow up, I've found that my queries are very fast (even with &fq=), >> until I add &hl=true. What can I do to speed up highlighting? Should I >> consider injecting a line at a time, rather than the entire file as a field? >> >> >> -Pete >> >> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote: >> >> > Thanks for everyone's help - I have this working now, but sometimes the >> queries are incredibly slow!! For example, <int name="QTime">461360</int>. >> Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to >> inject without throwing heap memory errors. However, my data set is very >> small! 36 text files, for a total of 113MB. (It will grow to many TB, but >> for now, this is a test). The largest file is 34MB. >> > >> > Therefore, I'm sure I'm doing something wrong :-) Here's my config: >> > >> > >> ----------------------------------------------------------------------------------------------- >> > >> > For the schema.xml, <types> is all default. For fields, here are the >> only lines that aren't commented out: >> > >> > <field name="id" type="string" indexed="true" stored="true" >> required="true" /> >> > <field name="body" type="text" indexed="true" stored="true" >> multiValued="true"/> >> > <field name="timestamp" type="date" indexed="true" stored="true" >> default="NOW" multiValued="false"/> >> > <field name="build" type="string" indexed="true" stored="true" >> multiValued="false"/> >> > <field name="device" type="string" indexed="true" stored="true" >> multiValued="false"/> >> > <dynamicField name="*" type="ignored" multiValued="true" /> >> > >> > ... then, for the rest: >> > >> > <uniqueKey>id</uniqueKey> >> > >> > <!-- field for the QueryParser to use when an explicit fieldname is >> absent --> >> > <defaultSearchField>body</defaultSearchField> >> > >> > <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --> >> > <solrQueryParser defaultOperator="AND"/> >> > >> > >> > >> ----------------------------------------------------------------------------------------------- >> > >> > >> > Invoking: java -Xmx3584M -Xms1024M -jar start.jar >> > >> > >> > >> ----------------------------------------------------------------------------------------------- >> > >> > >> > Injecting: >> > >> > #!/bin/sh >> > >> > J=0 >> > for i in `find . -name \*.txt`; do >> > (( J++ )) >> > curl " >> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body" >> -F "myfi...@$i"; >> > done; >> > >> > >> > echo "------------- Committing" >> > curl "http://localhost:8983/solr/update/extract?commit=true" >> > >> > >> > >> ----------------------------------------------------------------------------------------------- >> > >> > >> > Searching: >> > >> > >> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true >> > >> > >> > >> > >> > >> > -Pete >> > >> > On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote: >> > >> >> try adding &hl.fl=text >> >> to specify your highlight field. I don't understand why you're only >> >> getting the ID field back though. Do note that the highlighting >> >> is after the docs, related by the ID. >> >> >> >> Try a (non highlighting) query of just * to verify that you're >> >> pointing at the index you think you are. It's possible that >> >> you've modified a different index with SolrJ than your web >> >> server is pointing at. >> >> >> >> Also, SOLR has no way of knowing you're modified your index >> >> with SolrJ, so it may not be automatically reopening an >> >> IndexReader so your recent changes may not be visible >> >> until you force the SOLR reader to reopen. >> >> >> >> HTH >> >> Erick >> >> >> >> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote: >> >> >> >>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: >> >>> >> >>>>> 1) I can get my docs in the index, but when I search, it >> >>>>> returns the entire document. I'd love to have it only >> >>>>> return the line (or two) around the search term. >> >>>> >> >>>> Solr can generate Google-like snippets as you describe. >> >>>> http://wiki.apache.org/solr/HighlightingParameters >> >>> >> >>> Here's how I commit my documents: >> >>> >> >>> J=0; >> >>> for i in `find . -name \*.txt`; do >> >>> (( J++ )) >> >>> curl "http://localhost:8983/solr/update/extract?literal.id=doc$J" >> >>> -F "myfi...@$i"; >> >>> done; >> >>> >> >>> echo "------------- Committing" >> >>> curl "http://localhost:8983/solr/update/extract?commit=true" >> >>> >> >>> >> >>> Then, I try to query using >> >>> >> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing >> >>> but I only get back the document ID rather than the snippet: >> >>> >> >>> <doc> >> >>> <float name="score">0.05030759</float> >> >>> <arr name="content_type"> >> >>> <str>text/plain</str> >> >>> </arr> >> >>> <str name="id">doc16</str> >> >>> </doc> >> >>> >> >>> I'm using the schema.xml from the "lucid imagination: Indexing text and >> >>> html files" tutorial. >> >>> >> >>> >> >>> >> >>> -Pete >> >>> >> > >> >> > -- Lance Norskog goks...@gmail.com