Re: Very basic questions: Indexing text - working, but slow!

Peter Spam Tue, 29 Jun 2010 18:12:16 -0700

To follow up, I've found that my queries are very fast (even with &fq=), until 
I add &hl=true.  What can I do to speed up highlighting?  Should I consider 
injecting a line at a time, rather than the entire file as a field?



-Pete

On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:

> Thanks for everyone's help - I have this working now, but sometimes the 
> queries are incredibly slow!!  For example, <int name="QTime">461360</int>.  
> Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to inject 
> without throwing heap memory errors.  However, my data set is very small!  36 
> text files, for a total of 113MB.  (It will grow to many TB, but for now, 
> this is a test).  The largest file is 34MB.
> 
> Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
> 
> -----------------------------------------------------------------------------------------------
> 
> For the schema.xml, <types> is all default.  For fields, here are the only 
> lines that aren't commented out:
> 
>   <field name="id" type="string" indexed="true" stored="true" required="true" 
> />
>   <field name="body" type="text" indexed="true" stored="true" 
> multiValued="true"/>
>   <field name="timestamp" type="date" indexed="true" stored="true" 
> default="NOW" multiValued="false"/>
>   <field name="build" type="string" indexed="true" stored="true" 
> multiValued="false"/>
>   <field name="device" type="string" indexed="true" stored="true" 
> multiValued="false"/>
>   <dynamicField name="*" type="ignored" multiValued="true" />
> 
> ... then, for the rest:
> 
> <uniqueKey>id</uniqueKey>
> 
> <!-- field for the QueryParser to use when an explicit fieldname is absent -->
> <defaultSearchField>body</defaultSearchField>
> 
> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
> <solrQueryParser defaultOperator="AND"/>
> 
> 
> -----------------------------------------------------------------------------------------------
> 
> 
> Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
> 
> 
> -----------------------------------------------------------------------------------------------
> 
> 
> Injecting:
> 
> #!/bin/sh
> 
> J=0
> for i in `find . -name \*.txt`; do 
>       (( J++ ))
>       curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body";
>  -F "myfi...@$i"; 
> done;
> 
> 
> echo "------------- Committing"
> curl "http://localhost:8983/solr/update/extract?commit=true";
> 
> 
> -----------------------------------------------------------------------------------------------
> 
> 
> Searching:
> 
> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
> 
> 
> 
> 
> 
> -Pete
> 
> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
> 
>> try adding &hl.fl=text
>> to specify your highlight field. I don't understand why you're only
>> getting the ID field back though. Do note that the highlighting
>> is after the docs, related by the ID.
>> 
>> Try a (non highlighting) query of just * to verify that you're
>> pointing at the index you think you are. It's possible that
>> you've modified a different index with SolrJ than your web
>> server is pointing at.
>> 
>> Also, SOLR has no way of knowing you're modified your index
>> with SolrJ, so it may not be automatically reopening an
>> IndexReader so your recent changes may not be visible
>> until you force the SOLR reader to reopen.
>> 
>> HTH
>> Erick
>> 
>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:
>> 
>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>>> 
>>>>> 1) I can get my docs in the index, but when I search, it
>>>>> returns the entire document.  I'd love to have it only
>>>>> return the line (or two) around the search term.
>>>> 
>>>> Solr can generate Google-like snippets as you describe.
>>>> http://wiki.apache.org/solr/HighlightingParameters
>>> 
>>> Here's how I commit my documents:
>>> 
>>> J=0;
>>> for i in `find . -name \*.txt`; do
>>>      (( J++ ))
>>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J";
>>> -F "myfi...@$i";
>>> done;
>>> 
>>> echo "------------- Committing"
>>> curl "http://localhost:8983/solr/update/extract?commit=true";
>>> 
>>> 
>>> Then, I try to query using
>>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>>> but I only get back the document ID rather than the snippet:
>>> 
>>> <doc>
>>> <float name="score">0.05030759</float>
>>> <arr name="content_type">
>>> <str>text/plain</str>
>>> </arr>
>>> <str name="id">doc16</str>
>>> </doc>
>>> 
>>> I'm using the schema.xml from the "lucid imagination: Indexing text and
>>> html files" tutorial.
>>> 
>>> 
>>> 
>>> -Pete
>>> 
>

Re: Very basic questions: Indexing text - working, but slow!

Reply via email to