date:20110629

Re: How to Create a weighted function (dismax or otherwise)

2011-06-29 Thread Ahmet Arslan

 I am trying to create a feature that
 allows search results to be displayed by
 this formula sum(weight1*text relevance score, weight2 *
 price). weight1 and
 weight2 are numeric values that can be changed to influence
 the search
 results.
 
 I am sending the following query params to the Solr
 instance for searching.
 
 q=red
 defType=dismax
 qf=10^name+2^price

Correct syntax of qf and pf is fieldName^boostFactor, i.e, 
qf=name^10 price^2

However your query is a word, so it won't match in price field. I assume price 
field is numeric. 

You can simulate sum(weight1*text relevance score, weight2 * price).
with bf parameter and FunctionQueries. 

q=reddefTypeedismaxqf=namebf=product(price,w1/w2)

http://wiki.apache.org/solr/FunctionQuery
http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29

Re: Solr - search queries not returning results

2011-06-29 Thread Ahmet Arslan


 I believe I am missing something very elementary. The
 following query
 returns zero hits:
 
 http://localhost:8983/solr/core0/select/?q=testabc

With this URL, you are hitting the RequestHandler defined as requestHandler 
name= default=true / in your core0/conf/solrconfig.xml.

 However, using solritas, it finds many results:
 
 http://localhost:8983/solr/core0/itas?q=testabc

With this one, you are hitting the one registered as requestHandler 
name=/itas 

 Do you have any idea what the issue may be?

Probably they have different default parameters configured. 

For example (e)dismax versus lucene query parser. lucene query parser searches 
testabc in your default field. dismax searches it in all of the fields defined 
in qf parameter.

You can see the full parameter list by appending echoParams=all to your search 
URL.

Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev

...Using RAMDirectory really does not help performance...

I kind of agree, but in my experience with lucene, there are cases
where RAMDirectory helps a lot, with all its drawbacks (huge heap and
gc() tuning).

We had very good experience with MMAP on average, but moving to
RAMDirectory with properly tuned gc() reduced 95% of slow performers
in upper range of response times (e.g. slowest 5% queries). On average
it made practically no difference.
Maybe is this mitigated by better warm up on solr than our hand-tuned
warmup, maybe not, I do not really know.

In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir you need to tune gc(), choose your poison :)

I argue, in some cases it is very hard to tame IO quirks (e.g. this is
shared resource, you never know what going really on in shared app
setup!). Then, see only what is happening on major merge and all these
efforts with native linux directory to somehow get a grip on that...
If you have spare ram, you are probably safer with RAMDirectory.

From the theoretical perspective, in ideal case, RAM ought to be
faster than disk (and more expensive). If this is not the case, we did
something wrong. I have a feeling that this work Mike is doing with
in memory Codecs (fst TermDictionary, pulsing codec co) in Lucene 4,
native directory features ... will make RAMDirectory really obsolete
for production setup.

Cheers,
eks

On Wed, Jun 29, 2011 at 6:00 AM, Lance Norskog goks...@gmail.com wrote:
Using RAMDirectory really does not help performance. Java garbage
collection has to work around all of the memory taken by the segments.
It works out that Solr works better (for most indexes) without using
the RAMDirectory.

On Sun, Jun 26, 2011 at 2:07 PM, nipunb ni...@walmartlabs.com wrote:
PS: Sorry if this is a repost, I was unable to see my message in the mailing
list - this may have been due to my outgoing email different from the one I
used to subscribe to the list with.

Overview – Trying to evaluate if keeping the index in memory using
RAMDirectoryFactory can help in query performance.I am trying to perform the
indexing on the master using solr.StandardDirectoryFactory and make those
indexes accesible to the slave using solr.RAMDirectoryFactory

Details:
We have set-up Solr in a master/slave enviornment. The index is built on the
master and then replicated to slaves which are used to serve the query.
The replication is done using the in-built Java replication in Solr.
On the master, in the indexDefaults of solrconfig.xml we have
directoryFactory name=DirectoryFactory
class=solr.StandardDirectoryFactory/

On the slave, I tried to use the following in the indexDefaults

directoryFactory name=DirectoryFactory
class=solr.RAMDirectoryFactory/

My slave shows no data for any queries. In solrconfig.xml it is mentioned
that replication doesn’t work when using RAMDirectoryFactory, however this (
https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use
it to have the index on disk and then load into memory.

To test the sanity of my set-up, I changed solrconfig.xml in the slave to
and replicated:
directoryFactory name=DirectoryFactory
class=solr.StandardDirectoryFactory/
I was able to see the results.

Shouldn’t RAMDirectoryFactory be used for reading index from disk into
memory?

Any help/pointers in the right direction would be appreciated.

Thanks!

--
View this message in context:
http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3111792p3111792.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Lance Norskog
goks...@gmail.com

Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo

Hi, i have this bunch of lines in my schema.xml that should do a replacement
but it doesn't work!

fieldType name=salary_max_text class=solr.TextField
omitNorms=true
  analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=([0-9]+k?[.,]?[0-9]*).*?([0-9]+k?[.,]?[0-9]*) replacement=$2/
  /analyzer
/fieldType


I need it to extract only the numbers from some other string. The strings
can be anything: only letters (so it should replace it with an empty
string), letters + numbers. The numbers can be in one of those formats

17000 -- ok
17,000 -- should be replaced with 17000
17.000 -- should be replaced with 17000
17k -- should be replaced with 17000

how can i accomplish this? 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3120748.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread Toke Eskildsen

On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote:
 In MMAP, you need to have really smart warm up (MMAP) to beat IO
 quirks, for RAMDir  you need to tune gc(), choose your poison :)

Other alternatives are operating system RAM disks (avoids the GC
problem) and using SSDs (nearly the same performance as RAM).

Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev

sure,  SSD or RAM disks fix these problems with IO.

Anyhow, I can really see no alternative for some in memory index for
slaves, especially for low latency master-slave apps (high commit rate
is a problem).

having possibility to run slaves  in memory that are slurping updates
from Master  seams to me like a preffered method (you need no
twiddling with OS, just CPU and RAM is what you need for your slaves,
run slave and point it to master ). I assume that update propagation
times could be better by having
some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that
does reload() directly from the Master (maybe even uncommitted,
somehow NRT-likish).

Point being, lower latency update than current 1-5 Minutes (wiki
recommended values) is not going to be possible with current
master-slave solution, due to the nature of it (commit to disk on
master, copy delta to slave disk, reload...) This is a lot of ping
pong... ES and solandra are by nature better suited if you need update
propagation in  seconds range.

It is just thinking aloud, and slightly off-topic... solr/lucene as it
is today, rocks  anyhow.



On Wed, Jun 29, 2011 at 10:55 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote:
 In MMAP, you need to have really smart warm up (MMAP) to beat IO
 quirks, for RAMDir  you need to tune gc(), choose your poison :)

 Other alternatives are operating system RAM disks (avoids the GC
 problem) and using SSDs (nearly the same performance as RAM).

Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev

sure,  SSD or RAM disks fix these problems with IO.

Anyhow, I can really see no alternative for some in memory index for
slaves, especially for low latency master-slave apps (high commit rate
is a problem).

having possibility to run slaves  in memory that are slurping updates
from Master  seams to me like a preffered method (you need no
twiddling with OS, just CPU and RAM is what you need for your slaves,
run slave and point it to master ). I assume that update propagation
times could be better by having
some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that
does reload() directly from the Master (maybe even uncommitted,
somehow NRT-likish).

Point being, lower latency update than current 1-5 Minutes (wiki
recommended values) is not going to be possible with current
master-slave solution, due to the nature of it (commit to disk on
master, copy delta to slave disk, reload...) This is a lot of ping
pong... ES and solandra are by nature better suited if you need update
propagation in  seconds range.

It is just thinking aloud, and slightly off-topic... solr/lucene as it
is today, rocks  anyhow.

On Wed, Jun 29, 2011 at 10:55 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote:
 In MMAP, you need to have really smart warm up (MMAP) to beat IO
 quirks, for RAMDir  you need to tune gc(), choose your poison :)

 Other alternatives are operating system RAM disks (avoids the GC
 problem) and using SSDs (nearly the same performance as RAM).

filters effect on search results

2011-06-29 Thread Romi

Hi, when i query for elegant in solr i get results for elegance too. 

*I used these filters for index analyze*
WhitespaceTokenizerFactory 
StopFilterFactory 
WordDelimiterFilterFactory
LowerCaseFilterFactory 
SynonymFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
ReversedWildcardFilterFactory 

*
and for query analyze:*

.WhitespaceTokenizerFactory
SynonymFilterFactory
StopFilterFactory
WordDelimiterFilterFactory 
LowerCaseFilterFactory 
EnglishPorterFilterFactory 
RemoveDuplicatesTokenFilterFactory 

I want to know which filter affecting my search result.

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/filters-effect-on-search-results-tp3120968p3120968.html
Sent from the Solr - User mailing list archive at Nabble.com.

76 matches

Mail list logo