Wow, thanks Lance - it's really fast now! The last piece of the puzzle is setting up a nice front-end. Are there any pre-built front-ends available, that mimic Google (for example), with facets?
-Peter On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote: > To highlight a field, Solr needs some extra Lucene values. If these > are not configured for the field in the schema, Solr has to re-analyze > the field to highlight it. If you want faster highlighting, you have > to add term vectors to the schema. Here is the grand map of such > things: > > http://wiki.apache.org/solr/FieldOptionsByUseCase > > On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <erickerick...@gmail.com> > wrote: >> What are you actual highlighting requirements? you could try >> things like maxAnalyzedChars, requireFieldMatch, etc.... >> >> http://wiki.apache.org/solr/HighlightingParameters >> has a good list, but you've probably already seen that page.... >> >> Best >> Erick >> >> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote: >> >>> To follow up, I've found that my queries are very fast (even with &fq=), >>> until I add &hl=true. What can I do to speed up highlighting? Should I >>> consider injecting a line at a time, rather than the entire file as a field? >>> >>> >>> -Pete >>> >>> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote: >>> >>>> Thanks for everyone's help - I have this working now, but sometimes the >>> queries are incredibly slow!! For example, <int name="QTime">461360</int>. >>> Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to >>> inject without throwing heap memory errors. However, my data set is very >>> small! 36 text files, for a total of 113MB. (It will grow to many TB, but >>> for now, this is a test). The largest file is 34MB. >>>> >>>> Therefore, I'm sure I'm doing something wrong :-) Here's my config: >>>> >>>> >>> ----------------------------------------------------------------------------------------------- >>>> >>>> For the schema.xml, <types> is all default. For fields, here are the >>> only lines that aren't commented out: >>>> >>>> <field name="id" type="string" indexed="true" stored="true" >>> required="true" /> >>>> <field name="body" type="text" indexed="true" stored="true" >>> multiValued="true"/> >>>> <field name="timestamp" type="date" indexed="true" stored="true" >>> default="NOW" multiValued="false"/> >>>> <field name="build" type="string" indexed="true" stored="true" >>> multiValued="false"/> >>>> <field name="device" type="string" indexed="true" stored="true" >>> multiValued="false"/> >>>> <dynamicField name="*" type="ignored" multiValued="true" /> >>>> >>>> ... then, for the rest: >>>> >>>> <uniqueKey>id</uniqueKey> >>>> >>>> <!-- field for the QueryParser to use when an explicit fieldname is >>> absent --> >>>> <defaultSearchField>body</defaultSearchField> >>>> >>>> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --> >>>> <solrQueryParser defaultOperator="AND"/> >>>> >>>> >>>> >>> ----------------------------------------------------------------------------------------------- >>>> >>>> >>>> Invoking: java -Xmx3584M -Xms1024M -jar start.jar >>>> >>>> >>>> >>> ----------------------------------------------------------------------------------------------- >>>> >>>> >>>> Injecting: >>>> >>>> #!/bin/sh >>>> >>>> J=0 >>>> for i in `find . -name \*.txt`; do >>>> (( J++ )) >>>> curl " >>> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body" >>> -F "myfi...@$i"; >>>> done; >>>> >>>> >>>> echo "------------- Committing" >>>> curl "http://localhost:8983/solr/update/extract?commit=true" >>>> >>>> >>>> >>> ----------------------------------------------------------------------------------------------- >>>> >>>> >>>> Searching: >>>> >>>> >>> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true >>>> >>>> >>>> >>>> >>>> >>>> -Pete >>>> >>>> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote: >>>> >>>>> try adding &hl.fl=text >>>>> to specify your highlight field. I don't understand why you're only >>>>> getting the ID field back though. Do note that the highlighting >>>>> is after the docs, related by the ID. >>>>> >>>>> Try a (non highlighting) query of just * to verify that you're >>>>> pointing at the index you think you are. It's possible that >>>>> you've modified a different index with SolrJ than your web >>>>> server is pointing at. >>>>> >>>>> Also, SOLR has no way of knowing you're modified your index >>>>> with SolrJ, so it may not be automatically reopening an >>>>> IndexReader so your recent changes may not be visible >>>>> until you force the SOLR reader to reopen. >>>>> >>>>> HTH >>>>> Erick >>>>> >>>>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote: >>>>> >>>>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: >>>>>> >>>>>>>> 1) I can get my docs in the index, but when I search, it >>>>>>>> returns the entire document. I'd love to have it only >>>>>>>> return the line (or two) around the search term. >>>>>>> >>>>>>> Solr can generate Google-like snippets as you describe. >>>>>>> http://wiki.apache.org/solr/HighlightingParameters >>>>>> >>>>>> Here's how I commit my documents: >>>>>> >>>>>> J=0; >>>>>> for i in `find . -name \*.txt`; do >>>>>> (( J++ )) >>>>>> curl "http://localhost:8983/solr/update/extract?literal.id=doc$J" >>>>>> -F "myfi...@$i"; >>>>>> done; >>>>>> >>>>>> echo "------------- Committing" >>>>>> curl "http://localhost:8983/solr/update/extract?commit=true" >>>>>> >>>>>> >>>>>> Then, I try to query using >>>>>> >>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing >>>>>> but I only get back the document ID rather than the snippet: >>>>>> >>>>>> <doc> >>>>>> <float name="score">0.05030759</float> >>>>>> <arr name="content_type"> >>>>>> <str>text/plain</str> >>>>>> </arr> >>>>>> <str name="id">doc16</str> >>>>>> </doc> >>>>>> >>>>>> I'm using the schema.xml from the "lucid imagination: Indexing text and >>>>>> html files" tutorial. >>>>>> >>>>>> >>>>>> >>>>>> -Pete >>>>>> >>>> >>> >>> >> > > > > -- > Lance Norskog > goks...@gmail.com