To follow up, I've found that my queries are very fast (even
with &fq=),
until I add &hl=true. What can I do to speed up highlighting?
Should I
consider injecting a line at a time, rather than the entire
file as a field?
-Pete
On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
Thanks for everyone's help - I have this working now, but
sometimes the
queries are incredibly slow!! For example, <int
name="QTime">461360</int>.
Also, I had to bump up the min/max RAM size to 1GB/3.5GB for
things to
inject without throwing heap memory errors. However, my data
set is very
small! 36 text files, for a total of 113MB. (It will grow to
many TB, but
for now, this is a test). The largest file is 34MB.
Therefore, I'm sure I'm doing something wrong :-) Here's my
config:
-----------------------------------------------------------------------------------------------
For the schema.xml, <types> is all default. For fields, here
are the
only lines that aren't commented out:
<field name="id" type="string" indexed="true" stored="true"
required="true" />
<field name="body" type="text" indexed="true" stored="true"
multiValued="true"/>
<field name="timestamp" type="date" indexed="true" stored="true"
default="NOW" multiValued="false"/>
<field name="build" type="string" indexed="true" stored="true"
multiValued="false"/>
<field name="device" type="string" indexed="true" stored="true"
multiValued="false"/>
<dynamicField name="*" type="ignored" multiValued="true" />
... then, for the rest:
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit
fieldname is
absent -->
<defaultSearchField>body</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="AND"/>
-----------------------------------------------------------------------------------------------
Invoking: java -Xmx3584M -Xms1024M -jar start.jar
-----------------------------------------------------------------------------------------------
Injecting:
#!/bin/sh
J=0
for i in `find . -name \*.txt`; do
(( J++ ))
curl "
http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body
"
-F "myfi...@$i";
done;
echo "------------- Committing"
curl "http://localhost:8983/solr/update/extract?commit=true"
-----------------------------------------------------------------------------------------------
Searching:
http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
-Pete
On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
try adding &hl.fl=text
to specify your highlight field. I don't understand why
you're only
getting the ID field back though. Do note that the highlighting
is after the docs, related by the ID.
Try a (non highlighting) query of just * to verify that you're
pointing at the index you think you are. It's possible that
you've modified a different index with SolrJ than your web
server is pointing at.
Also, SOLR has no way of knowing you're modified your index
with SolrJ, so it may not be automatically reopening an
IndexReader so your recent changes may not be visible
until you force the SOLR reader to reopen.
HTH
Erick
On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com>
wrote:
On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
1) I can get my docs in the index, but when I search, it
returns the entire document. I'd love to have it only
return the line (or two) around the search term.
Solr can generate Google-like snippets as you describe.
http://wiki.apache.org/solr/HighlightingParameters
Here's how I commit my documents:
J=0;
for i in `find . -name \*.txt`; do
(( J++ ))
curl "http://localhost:8983/solr/update/extract?literal.id=doc$J
"
-F "myfi...@$i";
done;
echo "------------- Committing"
curl "http://localhost:8983/solr/update/extract?commit=true"
Then, I try to query using
http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
but I only get back the document ID rather than the snippet:
<doc>
<float name="score">0.05030759</float>
<arr name="content_type">
<str>text/plain</str>
</arr>
<str name="id">doc16</str>
</doc>
I'm using the schema.xml from the "lucid imagination:
Indexing text and
html files" tutorial.
-Pete