To follow up, I've found that my queries are very fast (even with &fq=), until I add &hl=true. What can I do to speed up highlighting? Should I consider injecting a line at a time, rather than the entire file as a field?
-Pete On Jun 29, 2010, at 11:07 AM, Peter Spam wrote: > Thanks for everyone's help - I have this working now, but sometimes the > queries are incredibly slow!! For example, <int name="QTime">461360</int>. > Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to inject > without throwing heap memory errors. However, my data set is very small! 36 > text files, for a total of 113MB. (It will grow to many TB, but for now, > this is a test). The largest file is 34MB. > > Therefore, I'm sure I'm doing something wrong :-) Here's my config: > > ----------------------------------------------------------------------------------------------- > > For the schema.xml, <types> is all default. For fields, here are the only > lines that aren't commented out: > > <field name="id" type="string" indexed="true" stored="true" required="true" > /> > <field name="body" type="text" indexed="true" stored="true" > multiValued="true"/> > <field name="timestamp" type="date" indexed="true" stored="true" > default="NOW" multiValued="false"/> > <field name="build" type="string" indexed="true" stored="true" > multiValued="false"/> > <field name="device" type="string" indexed="true" stored="true" > multiValued="false"/> > <dynamicField name="*" type="ignored" multiValued="true" /> > > ... then, for the rest: > > <uniqueKey>id</uniqueKey> > > <!-- field for the QueryParser to use when an explicit fieldname is absent --> > <defaultSearchField>body</defaultSearchField> > > <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --> > <solrQueryParser defaultOperator="AND"/> > > > ----------------------------------------------------------------------------------------------- > > > Invoking: java -Xmx3584M -Xms1024M -jar start.jar > > > ----------------------------------------------------------------------------------------------- > > > Injecting: > > #!/bin/sh > > J=0 > for i in `find . -name \*.txt`; do > (( J++ )) > curl > "http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body" > -F "myfi...@$i"; > done; > > > echo "------------- Committing" > curl "http://localhost:8983/solr/update/extract?commit=true" > > > ----------------------------------------------------------------------------------------------- > > > Searching: > > http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true > > > > > > -Pete > > On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote: > >> try adding &hl.fl=text >> to specify your highlight field. I don't understand why you're only >> getting the ID field back though. Do note that the highlighting >> is after the docs, related by the ID. >> >> Try a (non highlighting) query of just * to verify that you're >> pointing at the index you think you are. It's possible that >> you've modified a different index with SolrJ than your web >> server is pointing at. >> >> Also, SOLR has no way of knowing you're modified your index >> with SolrJ, so it may not be automatically reopening an >> IndexReader so your recent changes may not be visible >> until you force the SOLR reader to reopen. >> >> HTH >> Erick >> >> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote: >> >>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: >>> >>>>> 1) I can get my docs in the index, but when I search, it >>>>> returns the entire document. I'd love to have it only >>>>> return the line (or two) around the search term. >>>> >>>> Solr can generate Google-like snippets as you describe. >>>> http://wiki.apache.org/solr/HighlightingParameters >>> >>> Here's how I commit my documents: >>> >>> J=0; >>> for i in `find . -name \*.txt`; do >>> (( J++ )) >>> curl "http://localhost:8983/solr/update/extract?literal.id=doc$J" >>> -F "myfi...@$i"; >>> done; >>> >>> echo "------------- Committing" >>> curl "http://localhost:8983/solr/update/extract?commit=true" >>> >>> >>> Then, I try to query using >>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing >>> but I only get back the document ID rather than the snippet: >>> >>> <doc> >>> <float name="score">0.05030759</float> >>> <arr name="content_type"> >>> <str>text/plain</str> >>> </arr> >>> <str name="id">doc16</str> >>> </doc> >>> >>> I'm using the schema.xml from the "lucid imagination: Indexing text and >>> html files" tutorial. >>> >>> >>> >>> -Pete >>> >