Re: Very basic questions: Indexing text - working, but slow!

Lance Norskog Tue, 29 Jun 2010 21:05:05 -0700

To highlight a field, Solr needs some extra Lucene values. If these
are not configured for the field in the schema, Solr has to re-analyze
the field to highlight it. If you want faster highlighting, you have
to add term vectors to the schema. Here is the grand map of such
things:


http://wiki.apache.org/solr/FieldOptionsByUseCase

On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> What are you actual highlighting requirements? you could try
> things like maxAnalyzedChars, requireFieldMatch, etc....
>
> http://wiki.apache.org/solr/HighlightingParameters
> has a good list, but you've probably already seen that page....
>
> Best
> Erick
>
> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote:
>
>> To follow up, I've found that my queries are very fast (even with &fq=),
>> until I add &hl=true.  What can I do to speed up highlighting?  Should I
>> consider injecting a line at a time, rather than the entire file as a field?
>>
>>
>> -Pete
>>
>> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
>>
>> > Thanks for everyone's help - I have this working now, but sometimes the
>> queries are incredibly slow!!  For example, <int name="QTime">461360</int>.
>>  Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to
>> inject without throwing heap memory errors.  However, my data set is very
>> small!  36 text files, for a total of 113MB.  (It will grow to many TB, but
>> for now, this is a test).  The largest file is 34MB.
>> >
>> > Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
>> >
>> >
>> -----------------------------------------------------------------------------------------------
>> >
>> > For the schema.xml, <types> is all default.  For fields, here are the
>> only lines that aren't commented out:
>> >
>> >   <field name="id" type="string" indexed="true" stored="true"
>> required="true" />
>> >   <field name="body" type="text" indexed="true" stored="true"
>> multiValued="true"/>
>> >   <field name="timestamp" type="date" indexed="true" stored="true"
>> default="NOW" multiValued="false"/>
>> >   <field name="build" type="string" indexed="true" stored="true"
>> multiValued="false"/>
>> >   <field name="device" type="string" indexed="true" stored="true"
>> multiValued="false"/>
>> >   <dynamicField name="*" type="ignored" multiValued="true" />
>> >
>> > ... then, for the rest:
>> >
>> > <uniqueKey>id</uniqueKey>
>> >
>> > <!-- field for the QueryParser to use when an explicit fieldname is
>> absent -->
>> > <defaultSearchField>body</defaultSearchField>
>> >
>> > <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
>> > <solrQueryParser defaultOperator="AND"/>
>> >
>> >
>> >
>> -----------------------------------------------------------------------------------------------
>> >
>> >
>> > Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
>> >
>> >
>> >
>> -----------------------------------------------------------------------------------------------
>> >
>> >
>> > Injecting:
>> >
>> > #!/bin/sh
>> >
>> > J=0
>> > for i in `find . -name \*.txt`; do
>> >       (( J++ ))
>> >       curl "
>> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body";
>> -F "myfi...@$i";
>> > done;
>> >
>> >
>> > echo "------------- Committing"
>> > curl "http://localhost:8983/solr/update/extract?commit=true";
>> >
>> >
>> >
>> -----------------------------------------------------------------------------------------------
>> >
>> >
>> > Searching:
>> >
>> >
>> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
>> >
>> >
>> >
>> >
>> >
>> > -Pete
>> >
>> > On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
>> >
>> >> try adding &hl.fl=text
>> >> to specify your highlight field. I don't understand why you're only
>> >> getting the ID field back though. Do note that the highlighting
>> >> is after the docs, related by the ID.
>> >>
>> >> Try a (non highlighting) query of just * to verify that you're
>> >> pointing at the index you think you are. It's possible that
>> >> you've modified a different index with SolrJ than your web
>> >> server is pointing at.
>> >>
>> >> Also, SOLR has no way of knowing you're modified your index
>> >> with SolrJ, so it may not be automatically reopening an
>> >> IndexReader so your recent changes may not be visible
>> >> until you force the SOLR reader to reopen.
>> >>
>> >> HTH
>> >> Erick
>> >>
>> >> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:
>> >>
>> >>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>> >>>
>> >>>>> 1) I can get my docs in the index, but when I search, it
>> >>>>> returns the entire document.  I'd love to have it only
>> >>>>> return the line (or two) around the search term.
>> >>>>
>> >>>> Solr can generate Google-like snippets as you describe.
>> >>>> http://wiki.apache.org/solr/HighlightingParameters
>> >>>
>> >>> Here's how I commit my documents:
>> >>>
>> >>> J=0;
>> >>> for i in `find . -name \*.txt`; do
>> >>>      (( J++ ))
>> >>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J";
>> >>> -F "myfi...@$i";
>> >>> done;
>> >>>
>> >>> echo "------------- Committing"
>> >>> curl "http://localhost:8983/solr/update/extract?commit=true";
>> >>>
>> >>>
>> >>> Then, I try to query using
>> >>>
>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>> >>> but I only get back the document ID rather than the snippet:
>> >>>
>> >>> <doc>
>> >>> <float name="score">0.05030759</float>
>> >>> <arr name="content_type">
>> >>> <str>text/plain</str>
>> >>> </arr>
>> >>> <str name="id">doc16</str>
>> >>> </doc>
>> >>>
>> >>> I'm using the schema.xml from the "lucid imagination: Indexing text and
>> >>> html files" tutorial.
>> >>>
>> >>>
>> >>>
>> >>> -Pete
>> >>>
>> >
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Very basic questions: Indexing text - working, but slow!

Reply via email to