Performance issues with facets and filter query exclusions

2014-07-18 Thread Hayden Muhl
I was doing some performance testing on facet queries and I noticed
something odd. Most queries tended to be under 500 ms, but every so often
the query time jumped to something like 5000 ms.

q=*:*fq={!tag=productBrandId}productBrandId:(156
1227)facet.field={!ex=productBrandId}productBrandIdfacet=true

I noticed that the drop in performance happened any time I had a filter
query tag match up with a facet exclusion. If I had a query where the tags
and exclusions differ, like the following...

q=*:*fq={!tag=foo}foo:123facet.field={!ex=bar}barfacet=true

... then performance was fine. Any time a tag and ex parameter matched,
I would see the drop in performance.

I worked around this by constructing individual queries for each facet I
wanted to construct and not using the filter query exclusion feature at
all. Running the multiple separate facet queries ended up being much faster
than using the filter query exclusion feature.

I wasn't able to find anything on the bug tracker about this. Does anyone
have any hints about what could be causing this? I'm on Solr 4.4 and have
not tested newer versions, so I don't know if this problem has been
addressed.

- Hayden


Re: Performance issues with facets and filter query exclusions

2014-07-18 Thread Hayden Muhl
That query is representative of some of the queries in my test, but I
didn't notice any correlation between using the match all docs query and
poor query performance. Here's another example of a query that took longer
than expected.

qt=enq=dress green
leatherfq=userId:(383)fq={!tag=productRetailerId}productRetailerId:(83
644)fq={!tag=productCanonicalColorId}productCanonicalColorId:(16 7
13)facet.field={!ex=productRetailerId}productRetailerIdfacet=truefacet.mincount=1facet.limit=100

This query took over five seconds. Here I'm just doing one facet on the
field productRetailerId. For the actual search results, Solr will have to
do an intersection of four queries: dress green leather, userId:(...)
,productRetailerId:(...) and productCanonicalColorId:(...). For the
facet, it will have to compute an intersection on the same queries
excluding the productRetailerId:(...) query.

To your point about the match all docs query, there are plenty of examples
which ran quickly with a match all docs query. I've put together a Google
spreadsheet with some of my test results.

https://docs.google.com/spreadsheets/d/149k6_CM6JuGMbqhZIfiJetTxDxXcWdKeGU6FomjwO9Y/edit?usp=sharing

I ran another test with some simplified facet queries. In these examples, I
only did one facet at a time, and never faceted on a field I was running a
filter query on. These are examples of queries I would run to get the same
functionality as filter query exclusion.

https://docs.google.com/spreadsheets/d/1xzS2sbb6btyvydD6Q5X8ecD82DE92Pls-DbK2nwdTvc/edit?usp=sharing

Most of these queries run in under 100 ms, but even the slowest tend to be
under 500 ms. I can reproduce the functionality of the five second query at
the beginning of this email by running two of these simplified queries.

There are examples in my first spreadsheet where a filter exclusion is
happening and the query performs just fine. However, it seems that all slow
queries have a filter exclusion, and no queries without a filter exclusion
have query times longer than a second.

For reference, all these tests were done on a non-optimized core with about
80 million records, and no indexing happening. Each of the spreadsheets
represents performance on a warmed core. I warmed the core by running the
test for about a minute before gathering this data. The spreadsheets are
output from Solr Meter. I can post logs if that's easier to look at.

- Hayden


On Fri, Jul 18, 2014 at 11:48 AM, Yonik Seeley yo...@heliosearch.com
wrote:

 On Fri, Jul 18, 2014 at 2:10 PM, Hayden Muhl haydenm...@gmail.com wrote:
  I was doing some performance testing on facet queries and I noticed
  something odd. Most queries tended to be under 500 ms, but every so often
  the query time jumped to something like 5000 ms.
 
  q=*:*fq={!tag=productBrandId}productBrandId:(156
  1227)facet.field={!ex=productBrandId}productBrandIdfacet=true
 
  I noticed that the drop in performance happened any time I had a filter
  query tag match up with a facet exclusion.

 Is this an actual query that took a long time, or just an example?
 My guess is that q is actually much more expensive.

 If a filter is excluded, the base DocSet for faceting must be re-computed.
 This involves intersecting all the DocSets for the other filters not
 excluded (which should all be cached) with the DocSet of the query
 (which won't be cached and will need to be generated).  That last step
 can be expensive, depending on the query.

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data



Re: Strategies for effective prefix queries?

2014-07-16 Thread Hayden Muhl
A copy field does not address my problem, and this has nothing to do with
stored fields. This is a query parsing problem, not an indexing problem.

Here's the use case.

If someone has a username like bob-smith, I would like it to match
prefixes of bo and sm. I tokenize the username into the tokens bob
and smith. Everything is fine so far.

If someone enters bo sm as a search string, I would like bob-smith to
be one of the results. The query to do this is straight forward,
username:bo* username:sm*. Here's the problem. In order to construct that
query, I have to tokenize the search string bo sm **on the client**. I
don't want to reimplement tokenization on the client. Is there any way to
give Solr the string bo sm, have Solr do the tokenization, then treat
each token like a prefix?


On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 So copyField it to another and apply alternative processing there. Use
 eDismax to search both. No need to store the copied field, just index it.

 Regards,
  Alex
 On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote:

  Both fields? There is only one field here: username.
 
 
  On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  
  wrote:
 
   Search against both fields (one split, one not split)? Keep original
   and tokenized form? I am doing something similar with class name
   autocompletes here:
  
  
 
 https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
  
   Regards,
  Alex.
   Personal: http://www.outerthoughts.com/ and @arafalov
   Solr resources: http://www.solr-start.com/ and @solrstart
   Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
  
  
   On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com
  wrote:
I'm working on using Solr for autocompleting usernames. I'm running
  into
   a
problem with the wildcard queries (e.g. username:al*).
   
We are tokenizing usernames so that a username like solr-user will
 be
tokenized into solr and user, and will match both sol and use
prefixes. The problem is when we get solr-u as a prefix, I'm having
  to
split that up on the client side before I construct a query
   username:solr*
username:u*. I'm basically using a regex as a poor man's tokenizer.
   
Is there a better way to approach this? Is there a way to tell Solr
 to
tokenize a string and use the parts as prefixes?
   
- Hayden
  
 



Re: Strategies for effective prefix queries?

2014-07-16 Thread Hayden Muhl
Thank you Jorge. I didn't know about that filter. It's just what I was
looking for.

- Hayden


On Wed, Jul 16, 2014 at 4:35 PM, Jorge Luis Betancourt Gonzalez 
jlbetanco...@uci.cu wrote:

 Perhaps what you’re trying to do could be addressed by using the
 EdgeNGramFilterFactory filter? For query suggestions I’m using a very
 similar approach, this is an extract of the configuration I’m using:

 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory maxGramSize=“10
 minGramSize=1/

 Basically this allows you to get partial matches from any part of the
 string, let’s say the field get’s this content at index time: A brown
 fox”, this document will be matched by the query (“bro”) for instance. My
 personal recommendation is to use this in a separated field that get’s
 populated through a copyField, this way you could apply different boosts.

 Greetings,

 On Jul 16, 2014, at 2:00 PM, Hayden Muhl haydenm...@gmail.com wrote:

  A copy field does not address my problem, and this has nothing to do with
  stored fields. This is a query parsing problem, not an indexing problem.
 
  Here's the use case.
 
  If someone has a username like bob-smith, I would like it to match
  prefixes of bo and sm. I tokenize the username into the tokens bob
  and smith. Everything is fine so far.
 
  If someone enters bo sm as a search string, I would like bob-smith to
  be one of the results. The query to do this is straight forward,
  username:bo* username:sm*. Here's the problem. In order to construct
 that
  query, I have to tokenize the search string bo sm **on the client**. I
  don't want to reimplement tokenization on the client. Is there any way to
  give Solr the string bo sm, have Solr do the tokenization, then treat
  each token like a prefix?
 
 
  On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
  So copyField it to another and apply alternative processing there. Use
  eDismax to search both. No need to store the copied field, just index
 it.
 
  Regards,
  Alex
  On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote:
 
  Both fields? There is only one field here: username.
 
 
  On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch 
  arafa...@gmail.com
 
  wrote:
 
  Search against both fields (one split, one not split)? Keep original
  and tokenized form? I am doing something similar with class name
  autocompletes here:
 
 
 
 
 https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
 
  Regards,
Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources: http://www.solr-start.com/ and @solrstart
  Solr popularizers community:
  https://www.linkedin.com/groups?gid=6713853
 
 
  On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com
  wrote:
  I'm working on using Solr for autocompleting usernames. I'm running
  into
  a
  problem with the wildcard queries (e.g. username:al*).
 
  We are tokenizing usernames so that a username like solr-user will
  be
  tokenized into solr and user, and will match both sol and use
  prefixes. The problem is when we get solr-u as a prefix, I'm having
  to
  split that up on the client side before I construct a query
  username:solr*
  username:u*. I'm basically using a regex as a poor man's tokenizer.
 
  Is there a better way to approach this? Is there a way to tell Solr
  to
  tokenize a string and use the parts as prefixes?
 
  - Hayden
 
 
 

 VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
 julio de 2014. Ver www.uci.cu



Re: Strategies for effective prefix queries?

2014-07-15 Thread Hayden Muhl
Both fields? There is only one field here: username.


On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Search against both fields (one split, one not split)? Keep original
 and tokenized form? I am doing something similar with class name
 autocompletes here:

 https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com wrote:
  I'm working on using Solr for autocompleting usernames. I'm running into
 a
  problem with the wildcard queries (e.g. username:al*).
 
  We are tokenizing usernames so that a username like solr-user will be
  tokenized into solr and user, and will match both sol and use
  prefixes. The problem is when we get solr-u as a prefix, I'm having to
  split that up on the client side before I construct a query
 username:solr*
  username:u*. I'm basically using a regex as a poor man's tokenizer.
 
  Is there a better way to approach this? Is there a way to tell Solr to
  tokenize a string and use the parts as prefixes?
 
  - Hayden



Strategies for effective prefix queries?

2014-07-14 Thread Hayden Muhl
I'm working on using Solr for autocompleting usernames. I'm running into a
problem with the wildcard queries (e.g. username:al*).

We are tokenizing usernames so that a username like solr-user will be
tokenized into solr and user, and will match both sol and use
prefixes. The problem is when we get solr-u as a prefix, I'm having to
split that up on the client side before I construct a query username:solr*
username:u*. I'm basically using a regex as a poor man's tokenizer.

Is there a better way to approach this? Is there a way to tell Solr to
tokenize a string and use the parts as prefixes?

- Hayden


Wildcard searches and tokenization

2014-03-03 Thread Hayden Muhl
I'm working on a user name autocomplete feature, and am having some issues
with the way we are tokenizing user names.

We're using the StandardTokenizerFactory to tokenize user names, so
foo-bar gets split into two tokens. We take input from the user and use
it as a prefix to search on the user name. This means wildcard searches of
fo* and ba* both return foo-bar, which is what we want.

We have a problem when someone types in foo-b as a prefix. I would like
to split this into foo and b, then use each as a prefix in a wildcard
search. Is there an easy way to tell Solr, Tokenize this, then do a prefix
search?

I've written at least one QParserPlugin, so that's an option. Hopefully
there's an easier way I'm unaware of.

- Hayden


Re: java.lang.LinkageError when using custom filters in multiple cores

2013-09-23 Thread Hayden Muhl
Upgraded to 4.4.0, and that seems to have fixed it.

The transition was mostly painless once I realized that the interface to
the AbstractAnalysisFactory had changed between 4.2 and 4.3.

Thanks.

- Hayden


On Sat, Sep 21, 2013 at 3:28 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Did you try latest solr? There was a library loading bug with multiple
 cores. Not a perfect match to your description but close enough.

 Regards,
 Alex
 On 21 Sep 2013 02:28, Hayden Muhl haydenm...@gmail.com wrote:

  I have two cores favorite and user running in the same Tomcat
 instance.
  In each of these cores I have identical field types text_en, text_de,
  text_fr, and text_ja. These fields use some custom token filters I've
  written. Everything was going smoothly when I only had the favorite
 core.
  When I added the user core, I started getting java.lang.LinkageErrors
  being thrown when I start up Tomcat. The error always happens with one of
  the classes I've written, but it's unpredictable which class the
  classloader chokes on.
 
  Here's the really strange part. I comment out the text_* fields in the
  user core and the errors go away (makes sense). I add text_en back in,
 no
  error (OK). I add text_fr back in, no error (OK). I add text_de back
  in, and I get the error (ah ha!). I comment text_de out again, and I
  still get the same error (wtf?).
 
  I also put a break point at
 
 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:424),
  and when I load everything one at a time, I don't get any errors.
 
  I'm running Tomcat 5.5.28, Java version 1.6.0_39 and Solr 4.2.0. I'm
  running this all within Eclipse 1.5.1 on a mac. I have not tested this
 on a
  production-like system yet.
 
  Here's an example stack trace. In this case it was one of my Japanese
  filters, but other times it will choke on my synonym filter, or my
 compound
  word filter. The specific class it fails on doesn't seem to be relevant.
 
  SEVERE: null:java.lang.LinkageError: loader (instance of
   org/apache/catalina/loader/WebappClassLoader): attempted  duplicate
 class
  definition for name: com/shopstyle/solrx/KatakanaVuFilterFactory
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
  at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
  at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
  at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
  at
 
 
 org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:904)
  at
 
 
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1353)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:295)
  at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:249)
  at
 
 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:424)
  at
 
 
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:462)
  at
 
 
 org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:89)
  at
 
 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
  at
 
 
 org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:392)
  at
 
 
 org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:86)
  at
 
 
 org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
  at
 
 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
  at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:373)
  at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:121)
  at
 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1018)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
  at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
  at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  at java.lang.Thread.run(Thread.java:680)
 
  - Hayden
 



java.lang.LinkageError when using custom filters in multiple cores

2013-09-20 Thread Hayden Muhl
I have two cores favorite and user running in the same Tomcat instance.
In each of these cores I have identical field types text_en, text_de,
text_fr, and text_ja. These fields use some custom token filters I've
written. Everything was going smoothly when I only had the favorite core.
When I added the user core, I started getting java.lang.LinkageErrors
being thrown when I start up Tomcat. The error always happens with one of
the classes I've written, but it's unpredictable which class the
classloader chokes on.

Here's the really strange part. I comment out the text_* fields in the
user core and the errors go away (makes sense). I add text_en back in, no
error (OK). I add text_fr back in, no error (OK). I add text_de back
in, and I get the error (ah ha!). I comment text_de out again, and I
still get the same error (wtf?).

I also put a break point at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:424),
and when I load everything one at a time, I don't get any errors.

I'm running Tomcat 5.5.28, Java version 1.6.0_39 and Solr 4.2.0. I'm
running this all within Eclipse 1.5.1 on a mac. I have not tested this on a
production-like system yet.

Here's an example stack trace. In this case it was one of my Japanese
filters, but other times it will choke on my synonym filter, or my compound
word filter. The specific class it fails on doesn't seem to be relevant.

SEVERE: null:java.lang.LinkageError: loader (instance of
 org/apache/catalina/loader/WebappClassLoader): attempted  duplicate class
definition for name: com/shopstyle/solrx/KatakanaVuFilterFactory
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at
org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:904)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1353)
at java.lang.ClassLoader.loadClass(ClassLoader.java:295)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:424)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:462)
at
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:89)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:392)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:86)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:373)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:121)
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1018)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)

- Hayden


PositionLengthAttribute - Does it do anything at all?

2013-04-18 Thread Hayden Muhl
I've been playing around with the PositionLengthAttribute for a few days,
and it doesn't seem to have any effect at all.

I'm aware that position length is not stored in the index, as explained in
this blog post.

http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html

However, even when used at query time it doesn't seem to do anything. Let's
take the following token stream as an example.

text: he
posInc: 1
posLen: 1

text: cannot
posInc: 1
posLen: 2

text: can
posInc: 0
posLen: 1

text: not
posInc: 1
posLen: 1

text: help
posInc: 1
posLen: 1

If we were to construct this graph of tokens, it should match the phrases
he can not help and he cannot help. According to my testing, it will
match the phrases he can not help and he cannot not help, because the
position length is entirely ignored and treated as if it is always 1.

Am I misunderstanding how these attributes work?

- Hayden


Re: What to expect when testing Japanese search index

2013-03-23 Thread Hayden Muhl
A search for a single character will only return hits if that character
makes up a whole word, and only if the tokenizer recognizes that character
as a word. It's just like in other languages, where a search for p won't
return documents with the word apple.

If I were you, I would go into the Solr admin UI and start playing around
with the analysis tool. You can paste a phrase in there and it will show
you what tokens that phrase will be broken into. I think that will give you
a better understanding of why you are getting these search results.

You also don't mention which version of Solr you are using. Can you also
include the definition of your text_ja field type?

- Hayden


On Thu, Mar 21, 2013 at 7:01 AM, Van Tassell, Kristian 
kristian.vantass...@siemens.com wrote:

 I’m trying to set up our search index to handle Japanese data, and while
 some searches yield results, others do not. This is especially true the
 smaller the search term.

 For example, searching for this term: 更

 Yields no results even though I know it appears in the text. I understand
 that this character alone may not be a full word without further context,
 and thus, perhaps it should not return a hit(?).

 What about putting a star after it? 更*

 Should that return hits? I had been using the text_ja boilerplate setup,
 but wonder if a bigram (text_cjk) may work better for my non-Japanese
 speaking testing phase. Thanks in advance for any insight!




Global .properties file for all Solr cores?

2013-02-08 Thread Hayden Muhl
I've read the documentation about how you can configure a Solr core with a
properties file. Is there any way to specify a properties file that will
apply to all cores running on a server?

Here's my scenario. I have a solr setup where I have two cores, foo and
bar. I want to enable replication using properties, as is suggested on
the wiki.

http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node

I would like my master/slave settings to apply to all cores on a box, but I
would still like to have separate solrcore.properties files so that other
properties can be set per core. In other words, I would like a setup like
this, with three files.

#solr.properties
# These properties should apply to all cores on a box
enable.master=true
enable.slave=false

#foo.solrcore.properties
# These properties only apply to core foo
filterCache.size=16384

#bar.solrcore.properties
# These properties only apply to core bar
filterCache.size=2048

What I'm trying to avoid is having to duplicate the global values across
all solrcore.properties files.

I've looked into having a .properties file that applies to the whole
context, but we are running Tomcat, which does not make this easy. It seems
the only way to do this with Tomcat is with the CATALINA_OPTS environment
variable, and I would rather duplicate values across solrcore.properties
files than use CATALINA_OPTS.

- Hayden