Hi.
There seems to be several options for implementing an
autocomplete/autosuggestions feature with Solr. I am trying to
summarize those possibilities together with their advantages and
disadvantages. It would be really nice to read some of your opinions.
* Using N-Gram filter + text field query
Below are the reasons why I thought it wouldn't be feasible to have
pre-filtered results with filter queries. please comment.
Since can't pen down direct business reqs due to confidentially contact with
the client, I'll mock out scenario using an example.
- There is a parent entity called Quiz,
Hi.
Is there any way to make such scheme working:
I have many documents, each has a random field to enable random
sorting, and i have a weight field.
I want to get random results, but documents with bigger weight should
appear more frequently.
Is that possible?
Thanks, in advance.
I am using Solr 4.0 api
to search from index (made using solr1.4 version). I am
getting error Invalid version (expected 2, but 1) or the
data in not in 'javabin' format. Can anyone help me to fix
problem.
You need to use solrj version 1.4 which is compatible to your index
--- On Mon, 3/14/11, Denis Kuzmenok forward...@ukr.net wrote:
From: Denis Kuzmenok forward...@ukr.net
Subject: Solr sorting
To: solr-user@lucene.apache.org
Date: Monday, March 14, 2011, 10:23 AM
Hi.
Is there any way to make such scheme working:
I have many documents, each
has a
--- On Mon, 3/14/11, Denis Kuzmenok forward...@ukr.net wrote:
From: Denis Kuzmenok forward...@ukr.net
Subject: Solr sorting
To: solr-user@lucene.apache.org
Date: Monday, March 14, 2011, 10:23 AM
Hi.
Is there any way to make such scheme working:
I have many documents, each
has a
Hi,
we use an external file field configured as dynamic field. The dynamic
field name (and so the name of the provided file) may contain spaces.
But currently it is not possible to query for such fields. The
following query results in a ParseException:
q=val:(experience_foo\ bar)
--- On Mon, 3/14/11, Denis Kuzmenok forward...@ukr.net wrote:
From: Denis Kuzmenok forward...@ukr.net
Subject: Re: Solr sorting
To: Ahmet Arslan solr-user@lucene.apache.org
Date: Monday, March 14, 2011, 12:24 PM
--- On Mon, 3/14/11, Denis Kuzmenok forward...@ukr.net
wrote:
From:
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Sunday, March 13, 2011 6:25 PM
To: solr-user@lucene.apache.org; andy.ne...@gmail.com
Subject: Re: Results driving me nuts!
--- On Sun, 3/13/11, Andy Newby andy.ne...@gmail.com wrote:
From: Andy Newby
Hi All
Is there any way to drop term vectors from already built index file.
Regards
Ahsan Iqbal
Hi guys,
I have master slave replication enabled. Slave is replicating every 3
minutes and I encourage problems while I'm performing full import
command on master (which takes about 7 minutes).
Slave repliacates partial index about 200k documents out of 700k.
After next repliacation full index is
You need to reindex.
On Monday 14 March 2011 14:04:00 Ahsan |qbal wrote:
Hi All
Is there any way to drop term vectors from already built index file.
Regards
Ahsan Iqbal
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Do you commit to often? Slaves won't replicate if while master is indexing if
you don't send commits. Can you only commit once the indexing finishes?
On Monday 14 March 2011 14:04:51 lame wrote:
Hi guys,
I have master slave replication enabled. Slave is replicating every 3
minutes and I
I don't commit at all we use Dataimporter, but I have a feeling that
it could be done by DIH (autocommit is it possible)?
2011/3/14 Markus Jelsma markus.jel...@openindex.io:
Do you commit to often? Slaves won't replicate if while master is indexing if
you don't send commits. Can you only
In solrconfig there might be a autocommit section enabled.
On Monday 14 March 2011 14:18:42 lame wrote:
I don't commit at all we use Dataimporter, but I have a feeling that
it could be done by DIH (autocommit is it possible)?
2011/3/14 Markus Jelsma markus.jel...@openindex.io:
Do you
Thanks alot Glen and Yonik... That's a very convincing explanation...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Using-Solr-over-Lucene-effects-performance-tp2666909p2676015.html
Sent from the Solr - User mailing list archive at Nabble.com.
It looks like (we don't have autocommit section in
solr.DirectUpdateHandler2, is ramBufferSizeMB is responsible for
that?):
indexDefaults
useCompoundFilefalse/useCompoundFile
mergeFactor10/mergeFactor
ramBufferSizeMB320/ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
It's not easy if you have lots of facet values (in my case, can even be
up to a million), but there is no way built-in to Solr to get this. I
have been told that some of the faceting strategies (there are actually
several in use in Solr based on your parameters and the nature of your
data)
On 3/13/2011 6:24 PM, Ahmet Arslan wrote:
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm
I can see that the one with 5 matches is longer than the other. Shorter
documents are favored in solr/lucene with length normalization factor.
Is
Hi,
we have 3 solr cores, each of them is running a delta-import every 2
minutes on
a MySQL database.
We've noticed a significant increase of MySQL queries per second, since
we've
started the delta updates.
Before that, the database server received between 50 and 100 queries per
second,
You can use omitNorms=true for any given field. Length normalization will be
disabled and index-time boosting will not be available any more.
TermFrequencies can also be disabled by setting
omitTermFreqAndPositions=true for any given field. Omitting TF can be very
useful if you need an easy
These settings don't affect a commit. But, the maxPendingDeletes might but i'm
unsure. If you commit on the master and slaves are configured to replicate on
commit, it all should have the same index version.
On Monday 14 March 2011 14:42:51 lame wrote:
It looks like (we don't have autocommit
Robert,
that may extremly depend on your (sub-)entities, and how you built your queries.
perhaps http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
would help you - like said, depending on your config
Regards
Stefan
2011/3/14 Robert Gründler rob...@dubture.com:
Hi,
we
You could use clean= false parameter trick then just use query. Thus would
reduce the queries by half for deltas.
Bill Bell
Sent from mobile
On Mar 14, 2011, at 8:57 AM, Stefan Matheis matheis.ste...@googlemail.com
wrote:
Robert,
that may extremly depend on your (sub-)entities, and how
It doesn't necessarily need to go through an XSLT but the idea remains the
same. I want have the highest scores first no matter which result they match
with.
So if the results are like this:
lst name=moreLikeThis
result name=3 numFound=2 start=0 maxScore=0.439
doc
float
Hi all,
I have a field in my schema called boost_score. I would like to set it up so
that if I pass in a certain flag, each document score is boosted by the
number in boost_score.
For example if I use:
http://localhost/solr/search/?q=dog
I would get search results like normal. But if I use:
See boosting documents by function query. This way you can use document's
boost_score field to affect the final score.
http://wiki.apache.org/solr/FunctionQuery
On Monday 14 March 2011 16:40:42 Brian Lamb wrote:
Hi all,
I have a field in my schema called boost_score. I would like to set it
Hi,
In Solr 1.4.1 we don't have feature to disable automatic generation of phrase
queries. The phrase queries are generated thanks of the word delimiter filter
i use. The problem is, i cannot use the QS parameter in DisMax to allow slop
for these generated phrase queries because i require a
I am still stuck at the same point.
Looking here and there I could read that the memory limit (heap space) may
need to be increased to -Xms512M -Xmx512M when launching the
java -jar start.jar
command. But in my vps I've been forced to set the Xmx limit to maximum
Xmx400M since at higher value
Aha. Yeah, I've read the documentation several times,but still find
myself confused.
But do I understand this right now:
If I do omitNorms=true, but still leave term freq and positions in
default case (ie, NOT omitTermFreqAndPositions=true) ... then a
document with more occurences of a
Ahum, one option would of course not work: copyFielding them to field with
positions but the phrase query is executed on fields specified in qf (not pf).
And since i need tf=1 in qf, it wouldn't work.
I guess extending DefaultSimilarity is the best option, this way i still have
position
On Monday 14 March 2011 17:27:05 Jonathan Rochkind wrote:
Aha. Yeah, I've read the documentation several times,but still find
myself confused.
But do I understand this right now:
If I do omitNorms=true, but still leave term freq and positions in
default case (ie, NOT
+1 on some kind of simple performance framework that would allow comparing Solr
vs Lucene. Any chance the Lucene benchmark programs in contrib could be
adopted to read Solr config information?
BTW: You probably want to empty the OS cache in addition to restarting Solr
between each run if the
We have also commits from application (besides full import) - maybe
that is the case.
If you don't have any other ideas I'll probably try reindexing second
core, than swap cores and run delta import (to import documets added
in the meantime).
2011/3/14 Markus Jelsma markus.jel...@openindex.io:
Yes, commits from the application will interfere indeed. If your business
scenario allows for using always optimized indices you might choose to only
replicate on optimize.
On Monday 14 March 2011 18:45:15 lame wrote:
We have also commits from application (besides full import) - maybe
that is
Hello everyone,
First of all here is our Solr setup:
- Solr nightly build 986158
- Running solr inside the default jetty comes with solr build
- 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of RAM)
- Index replicated (on optimize) to slaves via Solr Replication
- Size of
Hi Doğacan,
Are you, at some point, running out of heap space? In my experience, that's
the common cause of increased load and excessivly high response times (or time
outs).
Cheers,
Hello everyone,
First of all here is our Solr setup:
- Solr nightly build 986158
- Running solr inside
Hello,
2011/3/14 Markus Jelsma markus.jel...@openindex.io
Hi Doğacan,
Are you, at some point, running out of heap space? In my experience, that's
the common cause of increased load and excessivly high response times (or
time
outs).
How much of a heap size would be enough? Our index size
Hello,
2011/3/14 Markus Jelsma markus.jel...@openindex.io
Hi Doğacan,
Are you, at some point, running out of heap space? In my experience,
that's the common cause of increased load and excessivly high response
times (or time
outs).
How much of a heap size would be enough? Our
I know this thread is old but I encountered the exact same problem and
couldn't figure out what's wrong. I'm using DIH for SQL Server. Please let
me know. And the link that you provided seems to be not exist anymore.
Thanks,
Ram.
--
View this message in context:
I've definitely had cases in 1.4.1 where even though I didn't have an
OOM error, Solr was being weirdly slow, and increasing the JVM heap size
fixed it. I can't explain why it happened, or exactly how you'd know
this was going on, I didn't see anything odd in the logs to indicate, I
just
Hello again,
2011/3/14 Markus Jelsma markus.jel...@openindex.io
Hello,
2011/3/14 Markus Jelsma markus.jel...@openindex.io
Hi Doğacan,
Are you, at some point, running out of heap space? In my experience,
that's the common cause of increased load and excessivly high response
Nope, no OOM errors.
That's a good start!
Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
functions.
Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
to 8gb every 20 seconds or so,
gc runs, falls down to 1gb.
Hmm, maybe the garbage
It's actually, as I understand it, expected JVM behavior to see the heap
rise to close to it's limit before it gets GC'd, that's how Java GC
works. Whether that should happen every 20 seconds or what, I don't nkow.
Another option is setting better JVM garbage collection arguments, so GC
You might also want to add the following switches for your GC log.
JAVA_OPTS=$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails - Xloggc:/var/log/tomcat6/gc.log
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
Also, what JVM version are you using and
That depends on your GC settings and generation sizes. And, instead of
UseParallelGC you'd better use UseParNewGC in combination with CMS.
See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
It's actually, as I understand it, expected JVM behavior to see the heap
rise to close to it's
I like field collapsing because that way my suggestions gives phrase
results (ie the suggestion starts with what the user has typed so far)
and thus I limit suggestions to be in the order of the words typed. I
think that looks better for our retail oriented site. I populate the
index with
Robert, thanks for your answer. What Solr version do you use? 4.0?
As mentioned in my other post here I tried to patch 1.4 for using
field collapsing, but couldn't get it to work (compiled fine, but
collapsed parameters seems to be completely ignored).
2011/3/14 Robert Petersen rober...@buy.com:
Hello fellow SOLRers,
Within my custom query-component, I wish to obtain an instance of the analyzer
for a given named field.
Is a schema object I can access?
thanks in advance
paul
I am doing this very differently. We are on solr 1.4.0 and I accomplish the
collapsing in my wrapper layer. I have written a layer of code around SOLR, an
indexer on one end and a search service wrapping solrs on the other end. I
manually collapse the field in my code. I keep both a
Note that due to the 'raw' nature of my source data I also have to heavily
filter my data before collapsing it also. I don't want to suggest garbage
phrases just because a lot of people searched on them. We store auxiliary data
in the index for filtering on to perform the grouping.
Hello,
2011/3/14 Markus Jelsma markus.jel...@openindex.io
That depends on your GC settings and generation sizes. And, instead of
UseParallelGC you'd better use UseParNewGC in combination with CMS.
JConsole now shows a different profile output but load is still high and
performance is still
Mmm. SearchHander.handleRequestBody takes care of sharding. Could your system
suffer from http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock
?
I'm not sure, i haven't seen a similar issue in a sharded environment,
probably because it was a controlled environment.
Hello,
See how Lucid Enterprise does it... A bit differently.
On 3/14/11 12:14 AM, Kai Schlamp kai.schl...@googlemail.com wrote:
Hi.
There seems to be several options for implementing an
autocomplete/autosuggestions feature with Solr. I am trying to
summarize those possibilities together with their
Turn off all autocommitting..
On 3/14/11 7:04 AM, lame l...@o2.pl wrote:
Hi guys,
I have master slave replication enabled. Slave is replicating every 3
minutes and I encourage problems while I'm performing full import
command on master (which takes about 7 minutes).
Slave repliacates partial
2011/3/14 Markus Jelsma markus.jel...@openindex.io
Mmm. SearchHander.handleRequestBody takes care of sharding. Could your
system
suffer from
http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock
?
We increased thread limit (which was 1 before) but it did not help.
Anyway,
Does it make sense to apply WordDelimiterFilterFactory to non-english
language, such as spanish? What about asian lanaguage?
The following are the typical use case for WordDelimiterFilterFactory. Is
1, 2, 3, and 4 applicable to all wester language (including spanish)? For
asian language, is
Within my custom query-component, I wish to obtain an
instance of the analyzer for a given named field.
Is a schema object I can access?
public void process(ResponseBuilder rb) throws IOException {
MapString,FieldType map = rb.req.getSchema().getFieldTypes();
Does it make sense to apply
WordDelimiterFilterFactory to non-english
language, such as spanish?
Yes it makes sense. WDF is especially good for product names; like i-phone,
iphone4 etc.
Can you provide more details? Or a link?
--- On Mon, 3/14/11, Bill Bell billnb...@gmail.com wrote:
See how Lucid Enterprise does it... A
bit differently.
On 3/14/11 12:14 AM, Kai Schlamp kai.schl...@googlemail.com
wrote:
Hi.
There seems to be several options for implementing an
http://lucidworks.lucidimagination.com/display/LWEUG/Spell+Checking+and+Aut
omatic+Completion+of+User+Queries
For Auto-Complete, find the following section in the solrconfig.xml file
for the collection:
!-- Auto-Complete component --
searchComponent name=autocomplete
Like many people, Solr is not my primary data store. Not all of my data need
be searchable and for simple and fast retrieval I store it in a database
(Cassandra in my case). Actually I don't have this all built up yet, but my
intention is that whenever new data is entered that it be added to my
@Robert: That sounds interesting and very flexible, but also like a
lot of work. This approach also doesn't seem to allow querying Solr
directly by using Ajax ... one of the big benefits in my opinion when
using Solr.
@Bill: There are some things I don't like about the Suggester
component. It
Look at Solandra. Solr + Cassandra.
On 3/14/11 9:38 PM, onlinespend...@gmail.com onlinespend...@gmail.com
wrote:
Like many people, Solr is not my primary data store. Not all of my data
need
be searchable and for simple and fast retrieval I store it in a database
(Cassandra in my case). Actually
64 matches
Mail list logo