Re: How to Create a weighted function (dismax or otherwise)

2011-06-29 Thread Ahmet Arslan
 I am trying to create a feature that
 allows search results to be displayed by
 this formula sum(weight1*text relevance score, weight2 *
 price). weight1 and
 weight2 are numeric values that can be changed to influence
 the search
 results.
 
 I am sending the following query params to the Solr
 instance for searching.
 
 q=red
 defType=dismax
 qf=10^name+2^price

Correct syntax of qf and pf is fieldName^boostFactor, i.e, 
qf=name^10 price^2

However your query is a word, so it won't match in price field. I assume price 
field is numeric. 

You can simulate sum(weight1*text relevance score, weight2 * price).
with bf parameter and FunctionQueries. 

q=reddefTypeedismaxqf=namebf=product(price,w1/w2)

http://wiki.apache.org/solr/FunctionQuery
http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29



Re: Solr - search queries not returning results

2011-06-29 Thread Ahmet Arslan

 I believe I am missing something very elementary. The
 following query
 returns zero hits:
 
 http://localhost:8983/solr/core0/select/?q=testabc

With this URL, you are hitting the RequestHandler defined as requestHandler 
name= default=true / in your core0/conf/solrconfig.xml.

 However, using solritas, it finds many results:
 
 http://localhost:8983/solr/core0/itas?q=testabc

With this one, you are hitting the one registered as requestHandler 
name=/itas 

 Do you have any idea what the issue may be?

Probably they have different default parameters configured. 

For example (e)dismax versus lucene query parser. lucene query parser searches 
testabc in your default field. dismax searches it in all of the fields defined 
in qf parameter.

You can see the full parameter list by appending echoParams=all to your search 
URL.


Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev
...Using RAMDirectory really does not help performance...

I kind of agree,  but in my experience with lucene,  there are cases
where RAMDirectory helps a lot, with all its drawbacks (huge heap and
gc() tuning).

We had very good experience with MMAP on average, but moving to
RAMDirectory with properly tuned gc() reduced 95% of slow performers
in upper range of response times (e.g. slowest 5% queries). On average
it made practically no difference.
Maybe is this mitigated by better warm up on solr than our hand-tuned
warmup, maybe not, I do not really know.

In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir  you need to tune gc(), choose your poison :)

I argue, in some cases it is very hard to tame IO quirks (e.g. this is
shared resource, you never know what going really on in shared app
setup!). Then, see only what is happening on major merge and all these
efforts with native linux directory to somehow get a grip on that...
If you have spare ram, you are probably safer with RAMDirectory.

From the theoretical perspective, in ideal case, RAM ought to be
faster than disk (and more expensive). If this is not the case, we did
something wrong.  I have a feeling that this work Mike is doing  with
in memory Codecs (fst TermDictionary, pulsing codec  co) in Lucene 4,
native directory features ... will make RAMDirectory really obsolete
for production setup.


Cheers,
eks




On Wed, Jun 29, 2011 at 6:00 AM, Lance Norskog goks...@gmail.com wrote:
 Using RAMDirectory really does not help performance. Java garbage
 collection has to work around all of the memory taken by the segments.
 It works out that Solr works better (for most indexes) without using
 the RAMDirectory.



 On Sun, Jun 26, 2011 at 2:07 PM, nipunb ni...@walmartlabs.com wrote:
 PS: Sorry if this is a repost, I was unable to see my message in the mailing
 list - this may have been due to my outgoing email different from the one I
 used to subscribe to the list with.

 Overview – Trying to evaluate if keeping the index in memory using
 RAMDirectoryFactory can help in query performance.I am trying to perform the
 indexing on the master using solr.StandardDirectoryFactory and make those
 indexes accesible to the slave using solr.RAMDirectoryFactory

 Details:
 We have set-up Solr in a master/slave enviornment. The index is built on the
 master and then replicated to slaves which are used to serve the query.
 The replication is done using the in-built Java replication in Solr.
 On the master, in the indexDefaults of solrconfig.xml we have
 directoryFactory name=DirectoryFactory
        class=solr.StandardDirectoryFactory/

 On the slave, I tried to use the following in the indexDefaults

 directoryFactory name=DirectoryFactory
         class=solr.RAMDirectoryFactory/

 My slave shows no data for any queries. In solrconfig.xml it is mentioned
 that replication doesn’t work when using RAMDirectoryFactory, however this (
 https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use
 it to have the index on disk and then load into memory.

 To test the sanity of my set-up, I changed solrconfig.xml in the slave to
 and replicated:
 directoryFactory name=DirectoryFactory
        class=solr.StandardDirectoryFactory/
 I was able to see the results.

 Shouldn’t RAMDirectoryFactory be used for reading index from disk into
 memory?

 Any help/pointers in the right direction would be appreciated.

 Thanks!

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3111792p3111792.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Lance Norskog
 goks...@gmail.com



Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
Hi, i have this bunch of lines in my schema.xml that should do a replacement
but it doesn't work!

fieldType name=salary_max_text class=solr.TextField
omitNorms=true
  analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=([0-9]+k?[.,]?[0-9]*).*?([0-9]+k?[.,]?[0-9]*) replacement=$2/
  /analyzer
/fieldType


I need it to extract only the numbers from some other string. The strings
can be anything: only letters (so it should replace it with an empty
string), letters + numbers. The numbers can be in one of those formats

17000 -- ok
17,000 -- should be replaced with 17000
17.000 -- should be replaced with 17000
17k -- should be replaced with 17000

how can i accomplish this? 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3120748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread Toke Eskildsen
On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote:
 In MMAP, you need to have really smart warm up (MMAP) to beat IO
 quirks, for RAMDir  you need to tune gc(), choose your poison :)

Other alternatives are operating system RAM disks (avoids the GC
problem) and using SSDs (nearly the same performance as RAM).



Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev
sure,  SSD or RAM disks fix these problems with IO.

Anyhow, I can really see no alternative for some in memory index for
slaves, especially for low latency master-slave apps (high commit rate
is a problem).

having possibility to run slaves  in memory that are slurping updates
from Master  seams to me like a preffered method (you need no
twiddling with OS, just CPU and RAM is what you need for your slaves,
run slave and point it to master ). I assume that update propagation
times could be better by having
some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that
does reload() directly from the Master (maybe even uncommitted,
somehow NRT-likish).

Point being, lower latency update than current 1-5 Minutes (wiki
recommended values) is not going to be possible with current
master-slave solution, due to the nature of it (commit to disk on
master, copy delta to slave disk, reload...) This is a lot of ping
pong... ES and solandra are by nature better suited if you need update
propagation in  seconds range.

It is just thinking aloud, and slightly off-topic... solr/lucene as it
is today, rocks  anyhow.



On Wed, Jun 29, 2011 at 10:55 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote:
 In MMAP, you need to have really smart warm up (MMAP) to beat IO
 quirks, for RAMDir  you need to tune gc(), choose your poison :)

 Other alternatives are operating system RAM disks (avoids the GC
 problem) and using SSDs (nearly the same performance as RAM).




Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev
sure,  SSD or RAM disks fix these problems with IO.

Anyhow, I can really see no alternative for some in memory index for
slaves, especially for low latency master-slave apps (high commit rate
is a problem).

having possibility to run slaves  in memory that are slurping updates
from Master  seams to me like a preffered method (you need no
twiddling with OS, just CPU and RAM is what you need for your slaves,
run slave and point it to master ). I assume that update propagation
times could be better by having
some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that
does reload() directly from the Master (maybe even uncommitted,
somehow NRT-likish).

Point being, lower latency update than current 1-5 Minutes (wiki
recommended values) is not going to be possible with current
master-slave solution, due to the nature of it (commit to disk on
master, copy delta to slave disk, reload...) This is a lot of ping
pong... ES and solandra are by nature better suited if you need update
propagation in  seconds range.

It is just thinking aloud, and slightly off-topic... solr/lucene as it
is today, rocks  anyhow.

On Wed, Jun 29, 2011 at 10:55 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote:
 In MMAP, you need to have really smart warm up (MMAP) to beat IO
 quirks, for RAMDir  you need to tune gc(), choose your poison :)

 Other alternatives are operating system RAM disks (avoids the GC
 problem) and using SSDs (nearly the same performance as RAM).




filters effect on search results

2011-06-29 Thread Romi
Hi, when i query for elegant in solr i get results for elegance too. 

*I used these filters for index analyze*
WhitespaceTokenizerFactory 
StopFilterFactory 
WordDelimiterFilterFactory
LowerCaseFilterFactory 
SynonymFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
ReversedWildcardFilterFactory 

*
and for query analyze:*

.WhitespaceTokenizerFactory
SynonymFilterFactory
StopFilterFactory
WordDelimiterFilterFactory 
LowerCaseFilterFactory 
EnglishPorterFilterFactory 
RemoveDuplicatesTokenFilterFactory 

I want to know which filter affecting my search result.

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/filters-effect-on-search-results-tp3120968p3120968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan

 Hi, i have this bunch of lines in my
 schema.xml that should do a replacement
 but it doesn't work!
 
     fieldType name=salary_max_text
 class=solr.TextField
 omitNorms=true
       analyzer type=index
           tokenizer
 class=solr.StandardTokenizerFactory/
         charFilter
 class=solr.PatternReplaceCharFilterFactory
 pattern=([0-9]+k?[.,]?[0-9]*).*?([0-9]+k?[.,]?[0-9]*)
 replacement=$2/
       /analyzer
     /fieldType
 

charFilter definitions should be above the tokenizer definition.
i.e., 
analyzer
charFilter
tokenizer
filter


Re: filters effect on search results

2011-06-29 Thread Ahmet Arslan
 Hi, when i query for elegant in
 solr i get results for elegance too. 
 
 *I used these filters for index analyze*
 WhitespaceTokenizerFactory 
 StopFilterFactory 
 WordDelimiterFilterFactory
 LowerCaseFilterFactory 
 SynonymFilterFactory
 EnglishPorterFilterFactory
 RemoveDuplicatesTokenFilterFactory
 ReversedWildcardFilterFactory 
 
 *
 and for query analyze:*
 
 .WhitespaceTokenizerFactory
 SynonymFilterFactory
 StopFilterFactory
 WordDelimiterFilterFactory 
 LowerCaseFilterFactory 
 EnglishPorterFilterFactory 
 RemoveDuplicatesTokenFilterFactory 
 
 I want to know which filter affecting my search result.
 

It is EnglishPorterFilterFactory, you can verify it from admin/analysis.jsp 
page.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
fieldType name=salary_min_text class=solr.TextField 
  analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*
replacement=$1/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
  /analyzer
  analyzer type=query
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*
replacement=$1/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
  /analyzer
/fieldType

fieldType name=salary_max_text class=solr.TextField 
  analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*
replacement=$2/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
  /analyzer
  analyzer type=query
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*
replacement=$2/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
  /analyzer
/fieldType

this is the final version of my schema part, but what i get is this:


doc
float name=score1.0/float
str name=salaryNegotiable/str
str name=salary_maxNegotiable/str
str name=salary_minNegotiable/str
/doc
doc
float name=score1.0/float
str name=salary£7 to £8 per hour/str
str name=salary_max£7 to £8 per hour/str
str name=salary_min£7 to £8 per hour/str
/doc
doc
float name=score1.0/float
str name=salary£125 to £150 per day/str
str name=salary_max£125 to £150 per day/str
str name=salary_min£125 to £150 per day/str
/doc

which is not what i'm expecting... the regular expression works in
http://www.fileformat.info/tool/regex.htm without any problem

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fuzzy Query Param

2011-06-29 Thread Michael McCandless
Which version of Solr (Lucene) are you using?

Recent versions of Lucene now accept ~N  1 to be edit distance.  Ie
foobar~2 matches any term that's = 2 edit distance away from foobar.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 28, 2011 at 11:00 PM, entdeveloper
cameron.develo...@gmail.com wrote:
 According to the docs on lucene query syntax:

 Starting with Lucene 1.9 an additional (optional) parameter can specify the
 required similarity. The value is between 0 and 1, with a value closer to 1
 only terms with a higher similarity will be matched.

 I was messing around with this and started doing queries with values greater
 than 1 and it seemed to be doing something. However I haven't been able to
 find any documentation on this.

 What happens when specifying a fuzzy query with a value  1?

 tiger~2
 animal~3

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3120235.html
 Sent from the Solr - User mailing list archive at Nabble.com.



How to disable Phonetic search

2011-06-29 Thread Mohammad Shariq
I am using solr1.4
When I search for keyword ansys I get lot of posts.
but when I search for ansys NOT ansi I get nothing.
I guess its because of Phonetic search, ansys is converted into ansi (
that is NOT keyword) and nothing returns.

How to handle this kind of problem.

-- 
Thanks and Regards
Mohammad Shariq


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
     fieldType
 name=salary_min_text class=solr.TextField 
       analyzer type=index
         charFilter
 class=solr.PatternReplaceCharFilterFactory
 pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*
 replacement=$1/
         tokenizer
 class=solr.KeywordTokenizerFactory/
         filter
 class=solr.LowerCaseFilterFactory /
         filter
 class=solr.TrimFilterFactory /
       /analyzer
       analyzer type=query
         charFilter
 class=solr.PatternReplaceCharFilterFactory
 pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*
 replacement=$1/
         tokenizer
 class=solr.KeywordTokenizerFactory/
         filter
 class=solr.LowerCaseFilterFactory /
         filter
 class=solr.TrimFilterFactory /
       /analyzer
     /fieldType
 
     fieldType name=salary_max_text
 class=solr.TextField 
       analyzer type=index
         charFilter
 class=solr.PatternReplaceCharFilterFactory
 pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*
 replacement=$2/
         tokenizer
 class=solr.KeywordTokenizerFactory/
         filter
 class=solr.LowerCaseFilterFactory /
         filter
 class=solr.TrimFilterFactory /
       /analyzer
       analyzer type=query
         charFilter
 class=solr.PatternReplaceCharFilterFactory
 pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*
 replacement=$2/
         tokenizer
 class=solr.KeywordTokenizerFactory/
         filter
 class=solr.LowerCaseFilterFactory /
         filter
 class=solr.TrimFilterFactory /
       /analyzer
     /fieldType
 
 this is the final version of my schema part, but what i
 get is this:
 
 
 doc
 float name=score1.0/float
 str name=salaryNegotiable/str
 str name=salary_maxNegotiable/str
 str name=salary_minNegotiable/str
 /doc
 doc
 float name=score1.0/float
 str name=salary£7 to £8 per hour/str
 str name=salary_max£7 to £8 per
 hour/str
 str name=salary_min£7 to £8 per
 hour/str
 /doc
 doc
 float name=score1.0/float
 str name=salary£125 to £150 per
 day/str
 str name=salary_max£125 to £150 per
 day/str
 str name=salary_min£125 to £150 per
 day/str
 /doc
 
 which is not what i'm expecting... the regular expression
 works in
 http://www.fileformat.info/tool/regex.htm
 without any problem

I am not good with regular expressions, but response always contains 
untouched/un-analyzed version of fields. You can visually test your 
fieldType/regex on admin/analysis.jsp page. It show indexed terms step by step.


Re: How to disable Phonetic search

2011-06-29 Thread Ahmet Arslan
 I am using solr1.4
 When I search for keyword ansys I get lot of posts.
 but when I search for ansys NOT ansi I get nothing.
 I guess its because of Phonetic search, ansys is
 converted into ansi (
 that is NOT keyword) and nothing returns.
 
 How to handle this kind of problem.

Find and remove occurrences of solr.PhoneticFilterFactory from your 
schema.xml file.


Re: conditionally update document on unique id

2011-06-29 Thread Shalin Shekhar Mangar
On Wed, Jun 29, 2011 at 2:01 AM, eks dev eks...@yahoo.co.uk wrote:

 Quick question,
 Is there a way with solr to conditionally update document on unique
 id? Meaning, default, add behavior if id is not already in index and
 *not to touch index if already there.

 Deletes are not important (no sync issues).

 I am asking because I noticed with deduplication turned on,
 index-files get modified even if I update the same documents again
 (same signatures).
 I am facing very high dupes rate (40-50%), and setup is going to be
 master-slave with high commit rate (requirement is to reduce
 propagation latency for updates). Having unnecessary index
 modifications is going to waste  effort to ship the same information
 again and again.

 if there is no standard way, what would be the fastest way to check if
 Term exists in index from UpdateRequestProcessor?


I'd suggest that you use the searcher's getDocSet with a TermQuery.

Use the SolrQueryRequest#getSearcher so you don't need to worry about ref
counting.

e.g. req.getSearcher().getDocSet(new TermQuery(new Term(signatureField,
sigString))).size();



 I intend to extend SignatureUpdateProcessor to prevent a document from
 propagating down the chain if this happens?
 Would that be a way to deal with it? I repeat, there are no deletes to
 make headaches with synchronization


Yes, that should be fine.

-- 
Regards,
Shalin Shekhar Mangar.


Re: filters effect on search results

2011-06-29 Thread François Schiettecatte
Indeed, I find the Porter stemmer to be too 'aggressive' for my taste, I prefer 
the EnglishMinimalStemFilterFactory, with the caveat that it depends on your 
data set.

Cheers

François

On Jun 29, 2011, at 6:21 AM, Ahmet Arslan wrote:

 Hi, when i query for elegant in
 solr i get results for elegance too. 
 
 *I used these filters for index analyze*
 WhitespaceTokenizerFactory 
 StopFilterFactory 
 WordDelimiterFilterFactory
 LowerCaseFilterFactory 
 SynonymFilterFactory
 EnglishPorterFilterFactory
 RemoveDuplicatesTokenFilterFactory
 ReversedWildcardFilterFactory 
 
 *
 and for query analyze:*
 
 .WhitespaceTokenizerFactory
 SynonymFilterFactory
 StopFilterFactory
 WordDelimiterFilterFactory 
 LowerCaseFilterFactory 
 EnglishPorterFilterFactory 
 RemoveDuplicatesTokenFilterFactory 
 
 I want to know which filter affecting my search result.
 
 
 It is EnglishPorterFilterFactory, you can verify it from admin/analysis.jsp 
 page.



Encoding problem while indexing

2011-06-29 Thread Engy Morsy
I am working on indexing arabic documents containg arabic diacritics and 
dotless characters (old arabic characters), I am using Apache Tomcat server, 
and I am using my modified version of the aramorph analyzer as the arabic 
analyzer. I managed on the development enviorment to normalize the arabic 
diacritics and dotless characters (same concept as in the 
solr.ArabicNormalizationFilterFactory). and i can verfiy that the analyzer is 
working fine, and i get the correct stem for arabic words. the input text file 
for testing has a utf-8 encoding.

When i build the aramorph jar file and place it under solr lib, the diacritics 
and the dotless characters splits the word. I made sure that the server.xml 
contains the URI-Encoding=utf-8.

I also made sure that the text being send to solr using solj is utf-8 encoding
example : solr.addBean(new Doc(4,new String(حِباًَ.getBytes(UTF8;

but nothing is working.

I tried to use the analyze link on solr admin for both indexing and querying 
and both shows that the arabic word is splited if a diacritics or dotless 
character is found.

Do you have any idea what might be the problem


schema snippet:

fieldType name=text class=solr.TextField
analyzer type=index 
class=gpl.pierrick.brihaye.aramorph.lucene.ArabicNormalizeStemmer/
analyzer type=query 
class=gpl.pierrick.brihaye.aramorph.lucene.ArabicNormalizeStemmer/
/fieldType

I also added the following parameter to the JVM: -Dfile.encoding=UTF-8

Thanks,
engy


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
Index Analyzer
org.apache.solr.analysis.KeywordTokenizerFactory
{luceneMatchVersion=LUCENE_31}
position1
term text   £22000 - £25000 per annum + benefits
startOffset 0
endOffset   36


org.apache.solr.analysis.PatternReplaceFilterFactory {replacement=$2,
pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*,
luceneMatchVersion=LUCENE_31}
position1
term text   25000
startOffset 0
endOffset   36


this is my output for the field salary_max, it seems to be working from the
admin jsp interface

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121353.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
 Index Analyzer
 org.apache.solr.analysis.KeywordTokenizerFactory
 {luceneMatchVersion=LUCENE_31}
 position    1
 term text    £22000 - £25000 per annum +
 benefits
 startOffset    0
 endOffset    36
 
 
 org.apache.solr.analysis.PatternReplaceFilterFactory
 {replacement=$2,
 pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*,
 luceneMatchVersion=LUCENE_31}
 position    1
 term text    25000
 startOffset    0
 endOffset    36
 
 
 this is my output for the field salary_max, it seems to be
 working from the
 admin jsp interface

That's good to know. If you explain your final goal in detail, users can give 
better pointers.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
i have the string You may earn 25k dollars per week stored in the field
salary

i'm using 2 copyfields salary_min and salary_max with source in salary
with those 2 datatypes 

salary is text
salary_min is salary_min_text
salary_max is salary_max_text

so, i was expecting this:

solr updates its index
solr copies the value from salary to salary_min and applies the value with
the regex
solr copies the value from salary to salary_max and applies the value with
the regex


but it's not working, it copies the value from one field to another, but the
filter isn't applied, even if it's working as you could see


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121386.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
 i have the string You may earn 25k
 dollars per week stored in the field
 salary
 
 i'm using 2 copyfields salary_min and salary_max with
 source in salary
 with those 2 datatypes 
 
 salary is text
 salary_min is salary_min_text
 salary_max is salary_max_text
 
 so, i was expecting this:
 
 solr updates its index
 solr copies the value from salary to salary_min and applies
 the value with
 the regex
 solr copies the value from salary to salary_max and applies
 the value with
 the regex
 
 
 but it's not working, it copies the value from one field to
 another, but the
 filter isn't applied, even if it's working as you could
 see

Okey, that makes sense. copyField just copies the content. It has nothing to do 
with analyzers. Two solutions comes to my mind.

1-) If you are using data import handler, I think (i am not good with regex), 
you can use regex transformer to populate these two fields.

http://wiki.apache.org/solr/DataImportHandler#RegexTransformer

2-) If not, you can populate these two field in a custom 
UpdateRequestProcessor. There is an example to modify and to start here :

http://wiki.apache.org/solr/UpdateRequestProcessor


what is solr clustering component

2011-06-29 Thread Romi
I just went through solr wiki page for clustering. But i am not getting what
is the benefit of using clustering. Can anyone tell me what is actually
clusering and what its use in indexing and searching.
does it effect search results??
Please reply


-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-solr-clustering-component-tp3121484p3121484.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
ok, but i'm not applying the filtering on the copyfields.
this is how my schema looks:



field name=salary type=text indexed=true stored=true /
field name=salary_min type=salary_min_text indexed=true stored=true
/
field name=salary_max type=salary_max_text indexed=true stored=true
/
 

copyField source=salary dest=salary_min /
copyField source=salary dest=salary_max /

and the two datatypes defined before. that's why i tought i could first use
copyField to copy the value then index them with my two datatypes
filtering...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121497.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: filters effect on search results

2011-06-29 Thread Romi
admin/analysis.jsp page shows RemoveDuplicatesTokenFilterFactory
,ReversedWildcardFilterFactory ,.EnglishPorterFilterFactory 

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/filters-effect-on-search-results-tp3120968p3121506.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to disable Phonetic search

2011-06-29 Thread Mohammad Shariq
I was using SnowballPorterFilterFactory for stemming, and that stammer was
stemming the words.
I added the keyword ansys to   file protwords.txt.
Now the stemming is not happening for ansys and Its OK now.

On 29 June 2011 17:12, Ahmet Arslan iori...@yahoo.com wrote:

  I am using solr1.4
  When I search for keyword ansys I get lot of posts.
  but when I search for ansys NOT ansi I get nothing.
  I guess its because of Phonetic search, ansys is
  converted into ansi (
  that is NOT keyword) and nothing returns.
 
  How to handle this kind of problem.

 Find and remove occurrences of solr.PhoneticFilterFactory from your
 schema.xml file.




-- 
Thanks and Regards
Mohammad Shariq


Re: Regex replacement not working!

2011-06-29 Thread Juan Grande
Hi Samuele,

It's not clear for me if your goal is to search on that field (for example,
salary_min:[100 TO 200]) or if you want to show the transformed field to
the user (so you want the result of the regex replacement to be included in
the search results).

If your goal is to show the results to the user, then (as Ahmet said in a
previous mail) it won't work, because the content of the documents is stored
verbatim. The analysis only affects the way that documents are searched.

If your goal is to search, could you please show us the query that you're
using to test the use case?

Thanks!

*Juan*



On Wed, Jun 29, 2011 at 10:02 AM, samuele.mattiuzzo samum...@gmail.comwrote:

 ok, but i'm not applying the filtering on the copyfields.
 this is how my schema looks:



 field name=salary type=text indexed=true stored=true /
 field name=salary_min type=salary_min_text indexed=true
 stored=true
 /
 field name=salary_max type=salary_max_text indexed=true
 stored=true
 /


 copyField source=salary dest=salary_min /
 copyField source=salary dest=salary_max /

 and the two datatypes defined before. that's why i tought i could first use
 copyField to copy the value then index them with my two datatypes
 filtering...

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121497.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Regex replacement not working!

2011-06-29 Thread Michael Kuhlmann
Am 29.06.2011 12:30, schrieb samuele.mattiuzzo:
 fieldType name=salary_min_text class=solr.TextField 
   analyzer type=index
...

 this is the final version of my schema part, but what i get is this:
 
 
 doc
 float name=score1.0/float
 str name=salaryNegotiable/str
 str name=salary_maxNegotiable/str
 str name=salary_minNegotiable/str
 /doc
...


The mistake is that you assume that the filter applied to the result.
This is not true. Index filters only affect the index (as the name
says), not the contents.

Therefore, if you have copyFields that are stored, the'll always return
the same value as the original field.

Try inspecting your index data with luke or the admin console. Then
you'll see whether your regex applies.

Greetings,
Kuli


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
my goal is/was storing the value into the field, and i get i have to create
my Update handler.

i was trying to use query with salary_min:[100 TO 200] and it's actually
working... since i just need it to search, i'll stay with this solution

is the [100 TO 200] a performance killer? i remember reading something
around, but cannot find it again...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121625.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - search queries not returning results

2011-06-29 Thread Walter Closenfleight
Thanks to both of you, I understand now and am now getting the expected
results.

Cheers!

On Wed, Jun 29, 2011 at 2:21 AM, Ahmet Arslan iori...@yahoo.com wrote:


  I believe I am missing something very elementary. The
  following query
  returns zero hits:
 
  http://localhost:8983/solr/core0/select/?q=testabc

 With this URL, you are hitting the RequestHandler defined as
 requestHandler name= default=true / in your
 core0/conf/solrconfig.xml.

  However, using solritas, it finds many results:
 
  http://localhost:8983/solr/core0/itas?q=testabc

 With this one, you are hitting the one registered as requestHandler
 name=/itas

  Do you have any idea what the issue may be?

 Probably they have different default parameters configured.

 For example (e)dismax versus lucene query parser. lucene query parser
 searches testabc in your default field. dismax searches it in all of the
 fields defined in qf parameter.

 You can see the full parameter list by appending echoParams=all to your
 search URL.



Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
 my goal is/was storing the value into
 the field, and i get i have to create
 my Update handler.
 
 i was trying to use query with salary_min:[100 TO 200] and
 it's actually
 working... since i just need it to search, i'll stay with
 this solution
 
 is the [100 TO 200] a performance killer? i remember
 reading something
 around, but cannot find it again...

Please be aware that range query is working on strings. It will return unwanted 
results. String sorting and integer sorting is different.

If you are after range queries you need to defied price_min and price_max 
fields as trie-based types. tint, tdouble etc. And populate them with the 
update processor or at client side.


Re: what is solr clustering component

2011-06-29 Thread Ahmet Arslan

 I just went through solr wiki page
 for clustering. But i am not getting what
 is the benefit of using clustering. Can anyone tell me what
 is actually
 clusering and what its use in indexing and searching.
 does it effect search results??
 Please reply

It is for search result clustering. Try the demo with the query word jaguar.  
http://search.carrot2.org/stable/search

It generates clusters and labels. (on the left)



Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
ok, last question on the UpdateProcessor: can you please give me the steps to
implement my own?
i mean, i can push my custom processor in solr's code, and then what?
i don't understand how i have to change the solrconf.xml and how can i bind
that to the updater i just wrotea
and also i don't understand how i do have to change the schema.xml

i'm sorry for this question, but i started working on solr 5 days ago and
for some things i really need a lot of documentation, and this isn't fully
covered anywhere

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
 ok, last question on the
 UpdateProcessor: can you please give me the steps to
 implement my own?
 i mean, i can push my custom processor in solr's code, and
 then what?
 i don't understand how i have to change the solrconf.xml
 and how can i bind
 that to the updater i just wrotea
 and also i don't understand how i do have to change the
 schema.xml
 
 i'm sorry for this question, but i started working on solr
 5 days ago and
 for some things i really need a lot of documentation, and
 this isn't fully
 covered anywhere

Implementing a conditional copyField example is a good place start. You can 
use it as a template. 

You don't need to modify the solr source code for this. You can write your 
class, compile it, put the resulting jar into solrHome/lib directory. It is 
explained here, how to register your new update processor in solrconfig.xml

http://wiki.apache.org/solr/SolrPlugins#UpdateRequestProcessorFactory  


Re: Regex replacement not working!

2011-06-29 Thread Adam Estrada
I have had the same problems with regex and I went with the regular pattern
replace filter rather than the charfilter. When I added it to the very end
of the chain, only then would it work...I am on Solr 3.2. I have also
noticed that the HTML filter factory is not working either. When I dump the
field that it's supposed to be working on, all the hyperlinks and everything
that you would expect to be stripped are still present.

Adam

On Wed, Jun 29, 2011 at 10:04 AM, samuele.mattiuzzo samum...@gmail.comwrote:

 ok, last question on the UpdateProcessor: can you please give me the steps
 to
 implement my own?
 i mean, i can push my custom processor in solr's code, and then what?
 i don't understand how i have to change the solrconf.xml and how can i bind
 that to the updater i just wrotea
 and also i don't understand how i do have to change the schema.xml

 i'm sorry for this question, but i started working on solr 5 days ago and
 for some things i really need a lot of documentation, and this isn't fully
 covered anywhere

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121743.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
too bad it is still in todo, that's why i was asking some for some tips on
writing, compiling, registration, calling...


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
 too bad it is still in todo, that's
 why i was asking some for some tips on
 writing, compiling, registration, calling...

Here is general information about how to customize solr via plugins.
http://wiki.apache.org/solr/SolrPlugins

Here is the registration and code example.
http://wiki.apache.org/solr/UpdateRequestProcessor


Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Shawn Heisey
I have noticed a significant difference in filter cache warming times on 
my shards between 3.2 and 1.4.1.  What can I do to troubleshoot this?  
Please let me know what additional information you might need to look 
deeper.  I know this isn't enough.


It takes about 3 seconds to do an autowarm count of 8 on 1.4.1 and 10-15 
seconds to do an autowarm count of 4 on 3.2.  The only explicit warming 
query is *:*, sorted descending by post_date, a tlong field containing a 
UNIX timestamp, precisionStep 16.  The indexes are not entirely 
identical, but the new one did evolve from the old one.  Perhaps one of 
the experts might spot something that makes for much slower filter cache 
warming, or some way to look deeper if this seems wrong?  Is there a way 
to see the search URL bits that populated the cache?


Index differences: The new index has four extra small fields, is no 
longer removing stopwords, and has omitTermFreqAndPositions enabled on a 
significant number of fields.  Most of the fields are tokenized text, 
and now more than half of those don't have tf and tp enabled.  Naturally 
the largest text field where most of the matches happen still does have 
them enabled.


To increase reindex speed, the new index has a termIndexInterval of 
1024, the old one is at the default of 128.  In terms of raw size, the 
new index is less than one percent larger than the old one.  The old 
shards average out to 17.22GB, the new ones to 17.41GB.  Here's an 
overview of the differences of each type of file (comparing the huge 
optimized segment only, not the handful of tiny ones since) on one the 
index with the largest size gap, old value listed first:


fdt: 6317180127/6055634923 (4.1% decrease)
fdx: 76447972/75647412 (1% decrease)
fnm: 382, 338 (44 bytes!  woohoo!)
frq: 2828400926/2873249038 (1.5% increase)
nrm: 28367782/38223988 (35% increase)
prx: 2449154203/2684249069 (9.5% increase)
tii: 1686298/13329832 (790% increase)  
tis: 923045932/999294109 (8% increase)
tvd: 18910972/19111840 (1% increase)
tvf: 5867309063/5640332282 (3.9% decrease)
tvx: 151294820/152895940 (1% increase)

The tii and nrm files are the only ones that saw a significant size 
increase, but the tii file is MUCH bigger.


Thanks,
Shawn



Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Yonik Seeley
Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2
and then run some of the queries to see if you can figure out which are slower?

Do any of the queries have stopwords in fields where you now index
those?  If so, that could entirely account for the difference.

-Yonik
http://www.lucidimagination.com

On Wed, Jun 29, 2011 at 10:59 AM, Shawn Heisey s...@elyograg.org wrote:
 I have noticed a significant difference in filter cache warming times on my
 shards between 3.2 and 1.4.1.  What can I do to troubleshoot this?  Please
 let me know what additional information you might need to look deeper.  I
 know this isn't enough.

 It takes about 3 seconds to do an autowarm count of 8 on 1.4.1 and 10-15
 seconds to do an autowarm count of 4 on 3.2.  The only explicit warming
 query is *:*, sorted descending by post_date, a tlong field containing a
 UNIX timestamp, precisionStep 16.  The indexes are not entirely identical,
 but the new one did evolve from the old one.  Perhaps one of the experts
 might spot something that makes for much slower filter cache warming, or
 some way to look deeper if this seems wrong?  Is there a way to see the
 search URL bits that populated the cache?

 Index differences: The new index has four extra small fields, is no longer
 removing stopwords, and has omitTermFreqAndPositions enabled on a
 significant number of fields.  Most of the fields are tokenized text, and
 now more than half of those don't have tf and tp enabled.  Naturally the
 largest text field where most of the matches happen still does have them
 enabled.

 To increase reindex speed, the new index has a termIndexInterval of 1024,
 the old one is at the default of 128.  In terms of raw size, the new index
 is less than one percent larger than the old one.  The old shards average
 out to 17.22GB, the new ones to 17.41GB.  Here's an overview of the
 differences of each type of file (comparing the huge optimized segment only,
 not the handful of tiny ones since) on one the index with the largest size
 gap, old value listed first:

 fdt: 6317180127/6055634923 (4.1% decrease)
 fdx: 76447972/75647412 (1% decrease)
 fnm: 382, 338 (44 bytes!  woohoo!)
 frq: 2828400926/2873249038 (1.5% increase)
 nrm: 28367782/38223988 (35% increase)
 prx: 2449154203/2684249069 (9.5% increase)
 tii: 1686298/13329832 (790% increase)  
 tis: 923045932/999294109 (8% increase)
 tvd: 18910972/19111840 (1% increase)
 tvf: 5867309063/5640332282 (3.9% decrease)
 tvx: 151294820/152895940 (1% increase)

 The tii and nrm files are the only ones that saw a significant size
 increase, but the tii file is MUCH bigger.

 Thanks,
 Shawn




Solr just 'hangs' under load test - ideas?

2011-06-29 Thread Bob Sandiford
Hi, all.

I'm hoping someone has some thoughts here.

We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the 
getLuceneVersion() calls, but use luceneMatchVersion directly).

We're running in a Tomcat instance, 64 bit Java.  CATALINA_OPTS are: -Xmx7168m 
-Xms7168m -XX:MaxPermSize=256M

We're running 2 Solr cores, with the same schema.

We use SolrJ to run our searches from a Java app running in JBoss.

JBoss, Tomcat, and the Solr Index folders are all on the same server.

In case it's relevant, we're using JMeter as a load test harness.

We're running on Solaris, a 16 processor box with 48GB physical memory.

I've run a successful load test at a 100 user load (at that rate there are 
about 5-10 solr searches / second), and solr search responses were coming in 
under 100ms.

When I tried to ramp up, as far as I can tell, Solr is just hanging.  (We have 
some logging statements around the SolrJ calls - just before, we log how long 
our query construction takes, then we run the SolrJ query and log the search 
times.  We're getting a number of the query construction logs, but no 
corresponding search time logs).

Symptoms:
The Tomcat and JBoss processes show as well under 1% CPU, and they are still 
the top processes.  CPU states show around 99% idle.   RES usage for the two 
Java processes around 3GB each.  LWP under 120 for each.  STATE just shows as 
sleep.  JBoss is still 'alive', as I can get into a piece of software that 
talks to our JBoss app to get data.

We set things up to use log4j logging for Solr - the log isn't showing any 
errors or exceptions.

We're not indexing - just searching.

Back in January, we did load testing on a prototype, and had no problems 
(though that was Solr 1.4 at the time).  It ramped up beautifully - bottle 
necks were our apps, not Solr.  What I'm benchmarking now is a descendent of 
that prototyping - a bit more complex on searches and more fields in the 
schema, but same basic search logic as far as SolrJ usage.

Any ideas?  What else to look at?  Ringing any bells?

I can send more details if anyone wants specifics...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/



[Announce] Solr 3.2 with RankingAlgorithm NRT capability, very high performance 1428 tps

2011-06-29 Thread Nagendra Nagarajayya

Hi!

I would like to announce Solr 3.2 with RankingAlgorithm has Near Real 
Time capability now. The NRT performance is very high, 1428 
documents/sec [ MBArtists 390k index]. The NRT functionality allows you 
to add documents without the IndexSearchers being closed or caches being 
cleared. A commit is not needed with the document update. Searches can 
run concurrently with document updates. No changes are needed except for 
enabling the NRT through solrconfig.xml.


A new visible attribute has been introduced that allows one to tune the 
visibility of a document added to the index. The default is 150ms. This 
can be set to 0 enabling documents to become visible for searches as 
soon as they are added. The visibility attribute is added as below:


realtime visible=150true/realtime

With visible attribute at 200ms,  the performance is about  1428 TPS 
(document adds) on a dual core intel system with 2GB heap with searches 
in parallel.


I have a wiki page that describes NRT performance in detail and can be 
accessed from here:


http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver3.2

You can download Solr 3.2 with RankingAlgorithm (NRT version) from here:

http://solr-ra.tgels.com


I would like to invite you to give this version a try as the performance 
is very high, comparable to the default load.


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.com
http://rankingalgorithm.tgels.com



Field Value Highlighting

2011-06-29 Thread zarni aung
Hi,

I need help in figuring out the right configuration to perform highlighting
in Solr.  I can retrieve the matching documents plus the highlighted
matches.

I've done another tool called DTSearch where it would return the offset
positions of the field value to highlight.  I've tried a few different
configurations but it appears that Solr returns the actual matched documents
+ a section called highlighting with snippets (which can be configured to
have length of 'X').  I was wondering if there is a way to retrieve just the
actual documents with highlighted values or a way to retrieve the offset
position of the field values so that I can perform highlighting.

I am using SolrNet client to integrate to Solr.  I've also tweaked the
configs and used the web admin interface to test highlighting but not yet
successful.

Thank you in advance.

Z


Re: Solr just 'hangs' under load test - ideas?

2011-06-29 Thread Yonik Seeley
Can you get a thread dump to see what is hanging?

-Yonik
http://www.lucidimagination.com

On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford
bob.sandif...@sirsidynix.com wrote:
 Hi, all.

 I'm hoping someone has some thoughts here.

 We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the 
 getLuceneVersion() calls, but use luceneMatchVersion directly).

 We're running in a Tomcat instance, 64 bit Java.  CATALINA_OPTS are: 
 -Xmx7168m -Xms7168m -XX:MaxPermSize=256M

 We're running 2 Solr cores, with the same schema.

 We use SolrJ to run our searches from a Java app running in JBoss.

 JBoss, Tomcat, and the Solr Index folders are all on the same server.

 In case it's relevant, we're using JMeter as a load test harness.

 We're running on Solaris, a 16 processor box with 48GB physical memory.

 I've run a successful load test at a 100 user load (at that rate there are 
 about 5-10 solr searches / second), and solr search responses were coming in 
 under 100ms.

 When I tried to ramp up, as far as I can tell, Solr is just hanging.  (We 
 have some logging statements around the SolrJ calls - just before, we log how 
 long our query construction takes, then we run the SolrJ query and log the 
 search times.  We're getting a number of the query construction logs, but no 
 corresponding search time logs).

 Symptoms:
 The Tomcat and JBoss processes show as well under 1% CPU, and they are still 
 the top processes.  CPU states show around 99% idle.   RES usage for the two 
 Java processes around 3GB each.  LWP under 120 for each.  STATE just shows as 
 sleep.  JBoss is still 'alive', as I can get into a piece of software that 
 talks to our JBoss app to get data.

 We set things up to use log4j logging for Solr - the log isn't showing any 
 errors or exceptions.

 We're not indexing - just searching.

 Back in January, we did load testing on a prototype, and had no problems 
 (though that was Solr 1.4 at the time).  It ramped up beautifully - bottle 
 necks were our apps, not Solr.  What I'm benchmarking now is a descendent of 
 that prototyping - a bit more complex on searches and more fields in the 
 schema, but same basic search logic as far as SolrJ usage.

 Any ideas?  What else to look at?  Ringing any bells?

 I can send more details if anyone wants specifics...

 Bob Sandiford | Lead Software Engineer | SirsiDynix
 P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
 www.sirsidynix.comhttp://www.sirsidynix.com/




CopyField into another CopyField?

2011-06-29 Thread entdeveloper
In solr, is it possible to 'chain' copyfields so that you can copy the value
of one into another?

Example:

field name=title ... /
field name=author ... /
field name=name ... /
field name=autocomplete ... /
field name=ac_spellcheck ... /


copyField source=title dest=autocomplete /
copyField source=author dest=autocomplete /
copyField source=name dest=autocomplete /
copyField source=autocomplete dest=ac_spellcheck /

Point being, every time I add a new field to the autocomplete, I want it to
automatically also be added to ac_spellcheck without having to do it twice.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/CopyField-into-another-CopyField-tp3122408p3122408.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fuzzy Query Param

2011-06-29 Thread entdeveloper
I'm using Solr trunk. 

If it's levenstein/edit distance, that's great, that's what I want. It just
didn't seem to be officially documented anywhere so I wanted to find out for
sure. Thanks for confirming.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3122418.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sorting by vale of field

2011-06-29 Thread Judioo
Hi

Say I have a field type in multiple documents which can be either
type:bike
type:boat
type:car
type:van


and I want to order a search to give me documents in the following order

type:car
type:van
type:boat
type:bike

Is there a way I can do this just using the sort method?

Thanks


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Shawn Heisey

On 6/29/2011 9:17 AM, Yonik Seeley wrote:

Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2
and then run some of the queries to see if you can figure out which are slower?

Do any of the queries have stopwords in fields where you now index
those?  If so, that could entirely account for the difference.


The query cache warms very quickly, it's the filter cache that's taking 
forever.  I'm not intimately familiar with what is being put in our 
filter queries by our webapp, but I'd be a little surprised if there are 
stopwords there.  A quick grep through solr logs (when I've turned it up 
to INFO) for the really common ones didn't reveal any.  People do type 
them in fairly frequently, but they go into q= ... fq values are 
constructed internally, not from what a user types, and as far as I 
know, they involve fields that have never had stopwords removed.


I will do some experimentation with your suggestions.

Thanks,
Shawn



RE: Sorting by vale of field

2011-06-29 Thread Michael Ryan
You could try adding a new int field (like typeSort) that has the desired 
sort values. So when adding a document with type:car, also add typeSort:1; when 
adding type:van, also add typeSort:2; etc. Then you could do sort=typeSort 
asc to get them in your desired order.

I think this is also possible with custom function queries, but I've never done 
that.

-Michael


Methods for preserving text entities?

2011-06-29 Thread Walter Closenfleight
We have some text entities in fields to index (and search) like so:

field name=textSolr is a really myword; search engine!/field

I would like to preserve/protect myword; and not resolve it in the indexing
or search results.

What sort of methods have people used? I realize the results are returned in
XML format, so preserving these text entities may be hard. Are people
replacing the  character or doing something else?

Thanks in advance!


Re: Default schema - 'keywords' not multivalued

2011-06-29 Thread Tod

On 06/28/2011 12:04 PM, Chris Hostetter wrote:


: I'm streaming over the document content (presumably via tika) and its
: gathering the document's metadata which includes the keywords metadata field.
: Since I'm also passing that field from the DB to the REST call as a list (as
: you suggested) there is a collision because the keywords field is single
: valued.
:
: I can change this behavior using a copy field.  What I wanted to know is if
: there was a specific reason the default schema defined a field like keywords
: single valued so I could make sure I wasn't missing something before I changed
: things.

That file is just an example, you're absolutely free to change it to meet
your use case.

I'm not very familiar with Tika, but based on the comment in the example
config...

!-- Common metadata fields, named specifically to match up with
  SolrCell metadata when parsing rich documents such as Word, PDF.
  Some fields are multiValued only because Tika currently may return
  multiple values for them.
--

...i suspect it was intentional that that field is *not* multiValued (i
guess Tika always returns a single delimited value?) but if you have
multiple descrete values you want to send for your DB backed data there is
no downside to changing that.

: While I'm at it, I'd REALLY like to know how to use DIH to index the metadata
: from the database while simultaneously streaming over the document content and
: indexing it.  I've never quite figured it out yet but I have to believe it is
: a possibility.

There's a TikaEntityProcessor that can be used to have Tika crunch the
data that comes from an entity and extract out specific fields, and it
can be used in combination with a JdbcDataSource and a BinFileDataSource
so that a field in your db data specifies the name of a file on disk to
use as the TikaEntity -- but i've personally never tried it

Here's a simple example someone posted last year that they got working...

http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html



-Hoss



Thanks Hoss, I'll just change the schema then.

The problem with TikaEntityProcessor is this installation is still 
running v1.4.1 so I'll need to upgrade.


Any short and sweet instructions for upgrading to 3.2?  I have a pretty 
straight forward Tomcat install, would just dropping in the new war suffice?



- Tod


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Shawn Heisey

On 6/29/2011 11:27 AM, Shawn Heisey wrote:

On 6/29/2011 9:17 AM, Yonik Seeley wrote:
Hmmm, you could comment out the query and filter caches on both 1.4.1 
and 3.2
and then run some of the queries to see if you can figure out which 
are slower?


Do any of the queries have stopwords in fields where you now index
those?  If so, that could entirely account for the difference.


The query cache warms very quickly, it's the filter cache that's 
taking forever.  I'm not intimately familiar with what is being put in 
our filter queries by our webapp, but I'd be a little surprised if 
there are stopwords there.  A quick grep through solr logs (when I've 
turned it up to INFO) for the really common ones didn't reveal any.  
People do type them in fairly frequently, but they go into q= ... fq 
values are constructed internally, not from what a user types, and as 
far as I know, they involve fields that have never had stopwords removed.


I should add that this happens only after the index has had at least a 
few hundred queries, when deletes are committed.  The delete process 
runs every ten minutes, and checks for document presence before issuing 
the delete, which avoids unnecessary commits.


Just now, three of the six shards had documents deleted, and they took 
29.07, 27.57, and 28.66 seconds to warm.  The 1.4.1 counterpart to the 
29.07 second one only took 4.78 seconds, and it did twice as many 
autowarm queries.  I know it's not my single *:* sorted warming query 
(firstSearcher and newSearcher), because on solr startup with either 
version, warm time is 0.01 seconds.  I have useColdSearcher set to false.


Thanks,
Shawn



Re: Methods for preserving text entities?

2011-06-29 Thread Walter Closenfleight
Ah, I think I suddenly answered my own question, but appreciate further
insight if you have it. I converted the  in myword; to an amp; so it
looks like this:

 field name=textSolr is a really amp;myword; search engine!/field



On Wed, Jun 29, 2011 at 12:40 PM, Walter Closenfleight 
walter.p.closenflei...@gmail.com wrote:

 We have some text entities in fields to index (and search) like so:

 field name=textSolr is a really myword; search engine!/field

 I would like to preserve/protect myword; and not resolve it in the
 indexing or search results.

 What sort of methods have people used? I realize the results are returned
 in XML format, so preserving these text entities may be hard. Are people
 replacing the  character or doing something else?

 Thanks in advance!



Re: How to Create a weighted function (dismax or otherwise)

2011-06-29 Thread aster
Are there any best practices or preferred ways to accomplish what I am
trying? 

Do the params for defType, qf and bf belong in a solr request handler? 

Is it possible to have the weights as variables so they can be tweaked till
we find the optimum balance in showing our results?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Create-a-weighted-function-dismax-or-otherwise-tp3119977p3122630.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr just 'hangs' under load test - ideas?

2011-06-29 Thread Bob Sandiford
OK - I figured it out.  It's not solr at all (and I'm not really surprised).

In the prototype benchmarks, we used a different instance of tomcat than we're 
using for production load tests.  Our prototype tomcat instance had no 
maxThreads value set, so was using the default value of 200.  The production 
tomcat environment has a maxThreads value of 15 - we were just running out of 
threads and getting connection refused exceptions thrown when we ramped up the 
Solr hits past a certain level.

Thanks for considering, Yonik (and any others waiting to see any reply I 
made)...

(As others have said - this listserv is great!)

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: Wednesday, June 29, 2011 12:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr just 'hangs' under load test - ideas?
 
 Can you get a thread dump to see what is hanging?
 
 -Yonik
 http://www.lucidimagination.com
 
 On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford
 bob.sandif...@sirsidynix.com wrote:
  Hi, all.
 
  I'm hoping someone has some thoughts here.
 
  We're running Solr 3.1 (with the patch for SolrQueryParser.java to
 not do the getLuceneVersion() calls, but use luceneMatchVersion
 directly).
 
  We're running in a Tomcat instance, 64 bit Java.  CATALINA_OPTS are:
 -Xmx7168m -Xms7168m -XX:MaxPermSize=256M
 
  We're running 2 Solr cores, with the same schema.
 
  We use SolrJ to run our searches from a Java app running in JBoss.
 
  JBoss, Tomcat, and the Solr Index folders are all on the same server.
 
  In case it's relevant, we're using JMeter as a load test harness.
 
  We're running on Solaris, a 16 processor box with 48GB physical
 memory.
 
  I've run a successful load test at a 100 user load (at that rate
 there are about 5-10 solr searches / second), and solr search responses
 were coming in under 100ms.
 
  When I tried to ramp up, as far as I can tell, Solr is just hanging.
  (We have some logging statements around the SolrJ calls - just before,
 we log how long our query construction takes, then we run the SolrJ
 query and log the search times.  We're getting a number of the query
 construction logs, but no corresponding search time logs).
 
  Symptoms:
  The Tomcat and JBoss processes show as well under 1% CPU, and they
 are still the top processes.  CPU states show around 99% idle.   RES
 usage for the two Java processes around 3GB each.  LWP under 120 for
 each.  STATE just shows as sleep.  JBoss is still 'alive', as I can get
 into a piece of software that talks to our JBoss app to get data.
 
  We set things up to use log4j logging for Solr - the log isn't
 showing any errors or exceptions.
 
  We're not indexing - just searching.
 
  Back in January, we did load testing on a prototype, and had no
 problems (though that was Solr 1.4 at the time).  It ramped up
 beautifully - bottle necks were our apps, not Solr.  What I'm
 benchmarking now is a descendent of that prototyping - a bit more
 complex on searches and more fields in the schema, but same basic
 search logic as far as SolrJ usage.
 
  Any ideas?  What else to look at?  Ringing any bells?
 
  I can send more details if anyone wants specifics...
 
  Bob Sandiford | Lead Software Engineer | SirsiDynix
  P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
  www.sirsidynix.comhttp://www.sirsidynix.com/
 
 




Re: Custom Query Processing

2011-06-29 Thread Jamie Johnson
Anything is an option, but I think I found another way.  I am going to add a
new SearchComponent which reads some additional query parameters and builds
the appropriate filter.


On Tue, Jun 28, 2011 at 2:07 PM, Dmitry Kan dmitry@gmail.com wrote:

 You should modify the SolrCore for this, if I'm not mistaken.

 Would extending LuceneQParserPlugin (solr 1.4) be an option for you?

 On Tue, Jun 28, 2011 at 12:25 AM, Jamie Johnson jej2...@gmail.com wrote:

  I have a need to take an incoming solr query and apply some additional
  constraints to it on the Solr end.  Our previous implementation used a
  QueryWrapperFilter along with some custom code to build a new Filter from
  the query provided.  How can we plug this filter into Solr?
 



 --
 Regards,

 Dmitry Kan



Looking for Custom Highlighting guidance

2011-06-29 Thread Jamie Johnson
I have a schema with a text field and a text_phonetic field and would like
to perform highlighting on them in such a way that the tokens that match are
combined.  What would be a reasonable way to accomplish this?


Re: Default schema - 'keywords' not multivalued

2011-06-29 Thread Chris Hostetter

: The problem with TikaEntityProcessor is this installation is still running
: v1.4.1 so I'll need to upgrade.
: 
: Any short and sweet instructions for upgrading to 3.2?  I have a pretty
: straight forward Tomcat install, would just dropping in the new war suffice?

It should be fairly straight forward, check the instructions in 
CHANGES.txt for any potential gotchas.

I posted a writtup a while back on upgrading from 1.4 to 3.1 from a user 
perspective...

http://www.lucidimagination.com/blog/2011/04/01/solr-powered-isfdb-part-8/



-Hoss


Strip Punctuation From Field

2011-06-29 Thread Curtis Wilde
From all I've read, using something like PatternReplaceFilterFactory allows
you to replace / remove text in an index, but is there anything similar that
allows manipulation of the text in the associated field? For example, if I
pulled a status from Twitter like, Hi, this is a #hashtag. I would like to
remove the # from that string and use it for both the index, and also the
field value that is returned from a query, i.e., Hi, this is a hashtag.


Re: Looking for Custom Highlighting guidance

2011-06-29 Thread Mike Sokolov

Does the phonetic analysis preserve the offsets of the original text field?

If so, you should probably be able to hack up FastVectorHighlighter to 
do what you want.


-Mike

On 06/29/2011 02:22 PM, Jamie Johnson wrote:

I have a schema with a text field and a text_phonetic field and would like
to perform highlighting on them in such a way that the tokens that match are
combined.  What would be a reasonable way to accomplish this?

   


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Yonik Seeley
On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey s...@elyograg.org wrote:
 Just now, three of the six shards had documents deleted, and they took
 29.07, 27.57, and 28.66 seconds to warm.  The 1.4.1 counterpart to the 29.07
 second one only took 4.78 seconds, and it did twice as many autowarm
 queries.

Can you post the logs at the INFO level that covers the warming period?

-Yonik
http://www.lucidimagination.com


Writing SolrPlugin example

2011-06-29 Thread Ravi Prakash
Is there a Solr plugin example similar to Nutch's(
http://wiki.apache.org/nutch/WritingPluginExample) example? I found was a
SolrPlugin(http://wiki.apache.org/solr/SolrPlugins) wiki page but it didn't
have any example code. It would be helpful if there was a concrete example
that would explain how to write, compile and build a custom plugin.

Thanks,
Ravi


Re: conditionally update document on unique id

2011-06-29 Thread eks dev
Thanks Shalin!

would you not expect

req.getSearcher().docFreq(t);

to be slightly faster? Or maybe even

req.getSearcher().getFirstMatch(t) != -1;

which one should be faster, any known side effects?




On Wed, Jun 29, 2011 at 1:45 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Wed, Jun 29, 2011 at 2:01 AM, eks dev eks...@yahoo.co.uk wrote:

 Quick question,
 Is there a way with solr to conditionally update document on unique
 id? Meaning, default, add behavior if id is not already in index and
 *not to touch index if already there.

 Deletes are not important (no sync issues).

 I am asking because I noticed with deduplication turned on,
 index-files get modified even if I update the same documents again
 (same signatures).
 I am facing very high dupes rate (40-50%), and setup is going to be
 master-slave with high commit rate (requirement is to reduce
 propagation latency for updates). Having unnecessary index
 modifications is going to waste  effort to ship the same information
 again and again.

 if there is no standard way, what would be the fastest way to check if
 Term exists in index from UpdateRequestProcessor?


 I'd suggest that you use the searcher's getDocSet with a TermQuery.

 Use the SolrQueryRequest#getSearcher so you don't need to worry about ref
 counting.

 e.g. req.getSearcher().getDocSet(new TermQuery(new Term(signatureField,
 sigString))).size();



 I intend to extend SignatureUpdateProcessor to prevent a document from
 propagating down the chain if this happens?
 Would that be a way to deal with it? I repeat, there are no deletes to
 make headaches with synchronization


 Yes, that should be fine.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: conditionally update document on unique id

2011-06-29 Thread Yonik Seeley
On Wed, Jun 29, 2011 at 4:32 PM, eks dev eks...@googlemail.com wrote:
 req.getSearcher().getFirstMatch(t) != -1;

Yep, this is currently the fastest option we have.

-Yonik
http://www.lucidimagination.com


Re: conditionally update document on unique id

2011-06-29 Thread eks dev
Hi Yonik,
as this recommendation comes from you, I am not going to test it, you
are well known as a speed junkie ;)

When we are there (in SignatureUpdateProcessor), why is this code not
moved to the constructor, but remains in processAdd

...
Signature sig = (Signature)
req.getCore().getResourceLoader().newInstance(signatureClass);
sig.init(params);
...
Should we be expecting on the fly signatureClass changes / params? I
am still not all that familiar with solr life cycles... might be
stupid question.

Thanks,
eks


On Wed, Jun 29, 2011 at 10:36 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Wed, Jun 29, 2011 at 4:32 PM, eks dev eks...@googlemail.com wrote:
 req.getSearcher().getFirstMatch(t) != -1;

 Yep, this is currently the fastest option we have.

 -Yonik
 http://www.lucidimagination.com



Re: After the query component has the results, can I do more filtering on them?

2011-06-29 Thread arian487
bump

--
View this message in context: 
http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3123502.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting by vale of field

2011-06-29 Thread Judioo
Thanks,
Yes this is the work around I am currently doing.
 Still wondering is the sort method can be used alone.




On 29 June 2011 18:34, Michael Ryan mr...@moreover.com wrote:

 You could try adding a new int field (like typeSort) that has the desired
 sort values. So when adding a document with type:car, also add typeSort:1;
 when adding type:van, also add typeSort:2; etc. Then you could do
 sort=typeSort asc to get them in your desired order.

 I think this is also possible with custom function queries, but I've never
 done that.

 -Michael



Re: After the query component has the results, can I do more filtering on them?

2011-06-29 Thread Ahmet Arslan
 So I made a custom search component
 which runs right after the query
 component and this custom component will update the score
 of each based on
 some things (and no, I definitely can't use existing
 components).  I didn't
 see any easy way to just update the score so what I
 currently do is
 something like this:
 
                
 DocList docList = rb.getResults().docList;
             float[]
 scores = new float[docList.size()];
         int[] docs = new
 int[docList.size()];
         int docCounter = 0;
         int maxScore = 0;
         
         while
 (docList.iterator().hasNext()) {
            
 int userId = docList.iterator().nextDoc();
            
 int score = userIdsToScore.get(userId);
             
            
 scores[docCounter] = score;
            
 docs[docCounter] = userId;
            
 docCounter++;
            
     
             if
 (maxScore  score) {
            
     maxScore = score;
             }
         }
         docList = new
 DocSlice(0, docCounter, docs, scores, 0, maxScore);
 
 my userIdsToScore hashtable is how I'm determining the new
 score.  There are
 a few other things I'm doing but this is the gist. 
 I'm also not sure how to
 go about sorting this...but basically my question is, is
 this how I should
 be updating the score of the documents?

This way you are just updating the scores cosmetically. i.e. they are not 
sorted by score anymore. Plus with this approach you can only process start + 
rows many documents at maximum. Obtaining the whole result set is not an option.

If you have some mapping like userIdsToScore, may be you can use 
ExternalFileField combined with FunctionQueries to influence score.


Building a facet search filter frontend in XSLT

2011-06-29 Thread Filype Pereira
Hi all,

I am looking for some help in building a front end facet filter using XSLT.

The code I use is: http://pastebin.com/xVv9La9j

On the image attached, the checkbox should be selected. (You clicked and
submited the facet form. The URL changed)

I can use xsl:if, but there's nothing that I can use on the XML that will
let me test before outputting the input checkbox.

Has anyone done any similar thing?

I haven't seen any examples building a facet search filter frontend in XSLT,
the example.xsl that comes with solr is pretty basic, are there any other
examples in XSLT implementing the facet filters around?

Thanks,

Filype


Re: How to Create a weighted function (dismax or otherwise)

2011-06-29 Thread Ahmet Arslan
 Are there any best practices or
 preferred ways to accomplish what I am
 trying? 

People usually prefer multiplicative boosting. But in your case you want 
additive boosting. Dismax's bf is additive. 

There is also _val_ hook. http://wiki.apache.org/solr/SolrQuerySyntax

 Do the params for defType, qf and bf belong in a solr
 request handler? 

They can be defined in defaults section in request handler as well as via query 
parameters. q=testpf=...qf=...

 Is it possible to have the weights as variables so they can
 be tweaked till
 we find the optimum balance in showing our results?

Yes you can try different settings on the fly using query parameters.




Re: Building a facet search filter frontend in XSLT

2011-06-29 Thread lee carroll
Hi Filype,

in the response you should have a list of fq arguments something like
arr name=fq
strfield:facetValue/str
strfield:FacetValue/str
/arr

use this to set your inputs to be selected / checked



On 29 June 2011 23:54, Filype Pereira pereira.fil...@gmail.com wrote:
 Hi all,
 I am looking for some help in building a front end facet filter using XSLT.
 The code I use is: http://pastebin.com/xVv9La9j
 On the image attached, the checkbox should be selected. (You clicked and
 submited the facet form. The URL changed)
 I can use xsl:if, but there's nothing that I can use on the XML that will
 let me test before outputting the input checkbox.
 Has anyone done any similar thing?
 I haven't seen any examples building a facet search filter frontend in XSLT,
 the example.xsl that comes with solr is pretty basic, are there any other
 examples in XSLT implementing the facet filters around?
 Thanks,
 Filype



Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Yonik Seeley
On Wed, Jun 29, 2011 at 3:28 PM, Yonik Seeley
yo...@lucidimagination.com wrote:

 On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey s...@elyograg.org wrote:
  Just now, three of the six shards had documents deleted, and they took
  29.07, 27.57, and 28.66 seconds to warm.  The 1.4.1 counterpart to the 29.07
  second one only took 4.78 seconds, and it did twice as many autowarm
  queries.

 Can you post the logs at the INFO level that covers the warming period?

OK, your filter queries have hundreds of terms in them (and that means
hundreds of term lookups, which uses the term index).
Thus, your termIndexInterval change is be the leading suspect for the
slowdown.  A termIndexInterval of 1024 means that
a term lookup will seek to the closest 1024th term and then call
next() until the desired term is found.  Hence instead of calling
next()
an average of 64 times internally, it's now 512 times.

Of course there is still a mystery about why your tii (which is the
term index) would be so much bigger instead of smaller...

-Yonik
http://www.lucidimagination.com


Multicore clustering setup problem

2011-06-29 Thread Walter Closenfleight
I had set up the clusteringComponent in solrconfig.xml for my first core. It
has been working fine and now I want to get my next core working. I set up
the second core with the clustering component so that I could use it, use
solritas properly, etc. but Solr did not like the solrconfig.xml changes for
the second core. I'm getting this error when Solr is started or when I hit a
Solr related URL:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent'

Should the clusteringComponent be set up in a shared configuration file
somehow or is there something else I am doing wrong?

Thanks in advance!


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Shawn Heisey

On 6/29/2011 7:50 PM, Yonik Seeley wrote:

OK, your filter queries have hundreds of terms in them (and that means
hundreds of term lookups, which uses the term index).
Thus, your termIndexInterval change is be the leading suspect for the
slowdown.  A termIndexInterval of 1024 means that
a term lookup will seek to the closest 1024th term and then call
next() until the desired term is found.  Hence instead of calling
next()
an average of 64 times internally, it's now 512 times.

Of course there is still a mystery about why your tii (which is the
term index) would be so much bigger instead of smaller...


It turns out I got the two indexes backwards, the smaller one was the 
new index.  I may have mixed up the indexes on some of the other files 
too, but they weren't much different, so I'm not going to try and figure 
out where any mistakes might be.


Earlier in the afternoon I figured this out, removed termIndexInterval 
from my config, and rebuilt the index.  I had originally put this in to 
speed up indexing.  The evidence I had available at the time told me 
that this goal was accomplished, but the rebuild actually went faster 
without the statement.  Warming times are now averaging under 10 seconds 
even with the warmup count back up to 8.  This is still slower than I 
would like, but it is a major improvement.  Even more important, I 
understand what happened.


I was thinking perhaps I might actually decrease the termIndexInterval 
value below the default of 128.  I know from reading the Hathi Trust 
blog that memory usage for the tii file is much more than the size of 
the file would indicate, but if I increase it from 13MB to 26MB, it 
probably would still be OK.


Are any index intervals for the other Lucene files configurable in a 
similar manner?  I know that screwing too much with the defaults can 
make things much worse, so I would be very careful with any adjustments, 
and try to fully understand why any performance gain or loss occurred.


Thanks,
Shawn



Re: what is solr clustering component

2011-06-29 Thread Romi
thanks iorixxx, i changed my configuration to include clustering in search
results. in my xml format search results i got a tag clusters, to show
this clusters in to search results do i need to parse this xml.
and my second question is does clustering effect indexes. 

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-solr-clustering-component-tp3121484p3124627.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: what is solr clustering component

2011-06-29 Thread Stanislaw Osinski

 and my second question is does clustering effect indexes.


No, it doesn't. Clustering is performed only on the search results produced
by Solr, it doesn't change anything in the index.

Cheers,

Staszek


Re: Multicore clustering setup problem

2011-06-29 Thread Stanislaw Osinski
Hi,

Can you post the full strack trace? I'd need to know if it's
really org.apache.solr.handler.clustering.ClusteringComponent that's missing
or some other class ClusteringComponent depends on.

Cheers,

Staszek

On Thu, Jun 30, 2011 at 04:19, Walter Closenfleight 
walter.p.closenflei...@gmail.com wrote:

 I had set up the clusteringComponent in solrconfig.xml for my first core.
 It
 has been working fine and now I want to get my next core working. I set up
 the second core with the clustering component so that I could use it, use
 solritas properly, etc. but Solr did not like the solrconfig.xml changes
 for
 the second core. I'm getting this error when Solr is started or when I hit
 a
 Solr related URL:

 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.clustering.ClusteringComponent'

 Should the clusteringComponent be set up in a shared configuration file
 somehow or is there something else I am doing wrong?

 Thanks in advance!