Re: Very basic questions: Indexing text - working, but slow!

Erick Erickson Tue, 29 Jun 2010 18:30:00 -0700

What are you actual highlighting requirements? you could try
things like maxAnalyzedChars, requireFieldMatch, etc....


http://wiki.apache.org/solr/HighlightingParameters
has a good list, but you've probably already seen that page....

Best
Erick

On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote:

> To follow up, I've found that my queries are very fast (even with &fq=),
> until I add &hl=true.  What can I do to speed up highlighting?  Should I
> consider injecting a line at a time, rather than the entire file as a field?
>
>
> -Pete
>
> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
>
> > Thanks for everyone's help - I have this working now, but sometimes the
> queries are incredibly slow!!  For example, <int name="QTime">461360</int>.
>  Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to
> inject without throwing heap memory errors.  However, my data set is very
> small!  36 text files, for a total of 113MB.  (It will grow to many TB, but
> for now, this is a test).  The largest file is 34MB.
> >
> > Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
> >
> >
> -----------------------------------------------------------------------------------------------
> >
> > For the schema.xml, <types> is all default.  For fields, here are the
> only lines that aren't commented out:
> >
> >   <field name="id" type="string" indexed="true" stored="true"
> required="true" />
> >   <field name="body" type="text" indexed="true" stored="true"
> multiValued="true"/>
> >   <field name="timestamp" type="date" indexed="true" stored="true"
> default="NOW" multiValued="false"/>
> >   <field name="build" type="string" indexed="true" stored="true"
> multiValued="false"/>
> >   <field name="device" type="string" indexed="true" stored="true"
> multiValued="false"/>
> >   <dynamicField name="*" type="ignored" multiValued="true" />
> >
> > ... then, for the rest:
> >
> > <uniqueKey>id</uniqueKey>
> >
> > <!-- field for the QueryParser to use when an explicit fieldname is
> absent -->
> > <defaultSearchField>body</defaultSearchField>
> >
> > <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
> > <solrQueryParser defaultOperator="AND"/>
> >
> >
> >
> -----------------------------------------------------------------------------------------------
> >
> >
> > Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
> >
> >
> >
> -----------------------------------------------------------------------------------------------
> >
> >
> > Injecting:
> >
> > #!/bin/sh
> >
> > J=0
> > for i in `find . -name \*.txt`; do
> >       (( J++ ))
> >       curl "
> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body";
> -F "myfi...@$i";
> > done;
> >
> >
> > echo "------------- Committing"
> > curl "http://localhost:8983/solr/update/extract?commit=true";
> >
> >
> >
> -----------------------------------------------------------------------------------------------
> >
> >
> > Searching:
> >
> >
> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
> >
> >
> >
> >
> >
> > -Pete
> >
> > On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
> >
> >> try adding &hl.fl=text
> >> to specify your highlight field. I don't understand why you're only
> >> getting the ID field back though. Do note that the highlighting
> >> is after the docs, related by the ID.
> >>
> >> Try a (non highlighting) query of just * to verify that you're
> >> pointing at the index you think you are. It's possible that
> >> you've modified a different index with SolrJ than your web
> >> server is pointing at.
> >>
> >> Also, SOLR has no way of knowing you're modified your index
> >> with SolrJ, so it may not be automatically reopening an
> >> IndexReader so your recent changes may not be visible
> >> until you force the SOLR reader to reopen.
> >>
> >> HTH
> >> Erick
> >>
> >> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:
> >>
> >>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
> >>>
> >>>>> 1) I can get my docs in the index, but when I search, it
> >>>>> returns the entire document.  I'd love to have it only
> >>>>> return the line (or two) around the search term.
> >>>>
> >>>> Solr can generate Google-like snippets as you describe.
> >>>> http://wiki.apache.org/solr/HighlightingParameters
> >>>
> >>> Here's how I commit my documents:
> >>>
> >>> J=0;
> >>> for i in `find . -name \*.txt`; do
> >>>      (( J++ ))
> >>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J";
> >>> -F "myfi...@$i";
> >>> done;
> >>>
> >>> echo "------------- Committing"
> >>> curl "http://localhost:8983/solr/update/extract?commit=true";
> >>>
> >>>
> >>> Then, I try to query using
> >>>
> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
> >>> but I only get back the document ID rather than the snippet:
> >>>
> >>> <doc>
> >>> <float name="score">0.05030759</float>
> >>> <arr name="content_type">
> >>> <str>text/plain</str>
> >>> </arr>
> >>> <str name="id">doc16</str>
> >>> </doc>
> >>>
> >>> I'm using the schema.xml from the "lucid imagination: Indexing text and
> >>> html files" tutorial.
> >>>
> >>>
> >>>
> >>> -Pete
> >>>
> >
>
>

Re: Very basic questions: Indexing text - working, but slow!

Reply via email to