Different behavior (bug?) for RegExTransformer in Solr5

2015-05-26 Thread Carrie Coy
I'm experimenting with Solr5 (5.1.0 1672403 - timpotter - 2015-04-09 
10:37:54).  In my custom DIH, I use a RegExTransformer to load several 
columns, which may or may not be present.  If present, the regexp 
matches and the data loads correctly in both Solr4 and 5. If not present 
and the regexp fails, the column is empty in Solr 4.   But in Solr5 it 
contains the original string to be matched.


In other words, in Solr 5.10, if the 'replaceWith' value is empty, 
'replaceWith' appears to revert to the original string.


Example:

Column 'data' contains:   column1:xxx,column3:yyy

DIH regexp:
field column=column1   regex=^.*column1:(.*?),.*$ 
replaceWith=$1  sourceColName=data /
field column=column2   regex=^.*column2:(.*?),.*$ 
replaceWith=$1  sourceColName=data /
field column=column3   regex=^.*column3:(.*?),.*$ 
replaceWith=$1  sourceColName=data /


solr4:
column1: xxx
column2:
column3: yyy

solr5:
column1:xxx
column2: column1:xxx,column3:yyy
column3: yyy


Re: lukeall.jar for Solr4r?

2012-11-06 Thread Carrie Coy
Thank you very much for taking the time to do this.   This version is 
able to read the index files, but there is at least one issue:


The home screen reports ERROR: can't count terms per field and  this 
exception is thrown:


java.util.NoSuchElementException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1098)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
at 
java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)

at org.getopt.luke.IndexInfo.countTerms(IndexInfo.java:64)
at org.getopt.luke.IndexInfo.getNumTerms(IndexInfo.java:109)
at org.getopt.luke.Luke$3.run(Luke.java:1165)


On 11/05/2012 05:08 PM, Shawn Heisey wrote:

On 11/5/2012 2:52 PM, Shawn Heisey wrote:
No idea whether I did it right, or even whether it works.  All my 
indexes are either 3.5 or 4.1-SNAPSHOT, so I can't actually test it.  
You can get to the resulting jar and my patch against the 
luke-4.0.0-ALPHA source:


https://dl.dropbox.com/u/97770508/luke-4.0.0-unofficial.patch
https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial.jar

If you have an immediate need for 4.0.0 support in Luke, please try 
it out and let me know whether it works.  If it doesn't work, or when 
the official luke 4.0.0 is released, I will remove those files from 
my dropbox.


I just realized that the version I uploaded there was compiled with 
java 1.7.0_09.  I don't know if this is actually a problem, but just 
in case, I re-did the compile on a machine with 1.6.0_29.  The 
filename referenced above now points to this version and I have 
included a file that indicates its java7 origins:


https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial-java7.jar

Thanks,
Shawn



Re: lukeall.jar for Solr4r?

2012-11-05 Thread Carrie Coy
I checked out luke-src-4.0.0-ALPHA.tgz, the most recent I could find, 
and compiled, but I still get the error Format version not supported 
(resource 
MMapIndexInput(path=/var/lib/tomcat6/solr/apache-solr-4.0.0-/core1/data/index/_7.tvx)): 
1 needs to be between 0 and 0


Can anyone post a luke.jar capable of reading 4.0 indexes?

On 10/27/2012 09:17 PM, Lance Norskog wrote:

Aha! Andrzej has not built a 4.0 release version. You need to check out the 
source and compile your own.

http://code.google.com/p/luke/downloads/list

- Original Message -
| From: Carrie Coyc...@ssww.com
| To: solr-user@lucene.apache.org
| Sent: Friday, October 26, 2012 7:33:45 AM
| Subject: lukeall.jar for Solr4r?
|
| Where can I get a copy of Luke capable of reading Solr4 indexes?  My
| lukeall-4.0.0-ALPHA.jar no longer works.
|
| Thx,
| Carrie Coy
|


lukeall.jar for Solr4r?

2012-10-26 Thread Carrie Coy
Where can I get a copy of Luke capable of reading Solr4 indexes?  My 
lukeall-4.0.0-ALPHA.jar no longer works.


Thx,
Carrie Coy


Re: UnsupportedOperationException: ExternalFileField (SOLVED)

2012-10-25 Thread Carrie Coy
The problem seems to have been caused by my failure to completely remove 
the existing index files when I switched the inStock field from an 
indexed boolean field to externally maintained.   After I removed 
everything and re-indexed from scratch, the error went away.


On 10/24/2012 08:57 PM, Carrie Coy wrote:
(Solr4) I'm getting the following error trying to use 
ExternalFileField to maintain an inStock flag.   Any idea what I'm 
doing wrong?


schema.xml:
field name=inStock type=file /
fieldtype name=file keyField=id defVal=1 stored=false 
indexed=false class=solr.ExternalFileField valType=float/


-rw-r--r-- 1 tomcat tomcat 100434 Oct 24 20:07 external_inStock:
YM0600=1
YM0544=1
YM0505=1

solrconfig.xml:
str name=boostif(inStock,10,1)/str


SEVERE: null:java.lang.UnsupportedOperationException
at 
org.apache.solr.schema.ExternalFileField.write(ExternalFileField.java:85)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:130)
at 
org.apache.solr.response.JSONWriter.writeSolrDocument(JSONResponseWriter.java:355)
at 
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:275)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172)
at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java:154)
at 
org.apache.solr.response.PHPWriter.writeNamedList(PHPResponseWriter.java:54)
at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:91)
at 
org.apache.solr.response.PHPResponseWriter.write(PHPResponseWriter.java:36)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:411)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:289)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:662)


UnsupportedOperationException: ExternalFileField

2012-10-24 Thread Carrie Coy
(Solr4) I'm getting the following error trying to use ExternalFileField 
to maintain an inStock flag.   Any idea what I'm doing wrong?


schema.xml:
 field name=inStock type=file /
 fieldtype name=file keyField=id defVal=1 stored=false 
indexed=false class=solr.ExternalFileField valType=float/


-rw-r--r-- 1 tomcat tomcat 100434 Oct 24 20:07 external_inStock:
YM0600=1
YM0544=1
YM0505=1

solrconfig.xml:
 str name=boostif(inStock,10,1)/str


SEVERE: null:java.lang.UnsupportedOperationException
at 
org.apache.solr.schema.ExternalFileField.write(ExternalFileField.java:85)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:130)
at 
org.apache.solr.response.JSONWriter.writeSolrDocument(JSONResponseWriter.java:355)
at 
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:275)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172)
at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java:154)
at 
org.apache.solr.response.PHPWriter.writeNamedList(PHPResponseWriter.java:54)
at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:91)
at 
org.apache.solr.response.PHPResponseWriter.write(PHPResponseWriter.java:36)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:411)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:289)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:662)


WordBreak spell correction makes split terms optional?

2012-10-02 Thread Carrie Coy
The user query  design your ownbinoculars  is corrected by the 
'wordbreak' dictionary to:


str name=querystringdesign your (own binoculars)/str

Where are the parentheses coming from?  Can I strip them with a 
post-processing filter?   The parentheses make the terms optional, so, 
while the first match is excellent, the rest are irrelevant.


Thx,
Carrie Coy





omit tf using per-field CustomSimilarity?

2012-09-24 Thread Carrie Coy
I'm trying to configure per-field similarity to disregard term frequency 
(omitTf) in a 'title' field.   I'm trying to follow the example docs 
without success: my custom similarity doesn't seem to have any effect on 
'tf'.   Is the NoTfSimilarity function below written correctly?   Any 
advice is much appreciated.


my schema.xml:

field name=title type=text_custom_sim indexed=true stored=true 
omitNorms=true termVectors=true /


similarity class=solr.SchemaSimilarityFactory/
fieldType name=text_custom_sim class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true

analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
similarity class=com.ssww.NoTfSimilarityFactory /
 .
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
similarity class=com.ssww.NoTfSimilarityFactory /
 .
/analyzer


NoTfSimilarityFactory.java:

   package com.ssww;

   import org.apache.lucene.search.similarities.Similarity;
   import org.apache.solr.schema.SimilarityFactory;

   public class NoTfSimilarityFactory extends SimilarityFactory {
  @Override
  public Similarity getSimilarity() {
return new NoTfSimilarity();
  }
   }


NoTfSimilarity.java:

   package com.ssww;
   import org.apache.lucene.search.similarities.DefaultSimilarity;

   public final class NoTfSimilarity extends DefaultSimilarity {
public float tf(int i) {
 return 1;
}

   }

These two files are in a jar in the lib directory of this core.   Here's 
the results of a search for paint with custom and default similarity:


Indexed with per-field NoTfSimilarity:

284.5441 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | 
shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | 
bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product 
of:
  280.5598 = (MATCH) sum of:
280.5598 = (MATCH) max of:
  280.5598 = (MATCH) weight(title:paint^8.0 in 48) [], result of:
280.5598 = score(doc=48,freq=2.0 = termFreq=2.0
), product of:
  39.83825 = queryWeight, product of:
8.0 = boost
4.979781 = idf(docFreq=187, maxDocs=10059)
1.0 = queryNorm
  7.042474 = fieldWeight in 48, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
4.979781 = idf(docFreq=187, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  18.217428 = (MATCH) weight(search_keywords:paint in 48) [], result of:
18.217428 = score(doc=48,freq=1.0 = termFreq=1.0
), product of:
  4.268188 = queryWeight, product of:
4.268188 = idf(docFreq=382, maxDocs=10059)
1.0 = queryNorm
  4.268188 = fieldWeight in 48, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
4.268188 = idf(docFreq=382, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  7.725952 = (MATCH) weight(description:paint^0.5 in 48) [], result of:
7.725952 = score(doc=48,freq=2.0 = termFreq=2.0
), product of:
  1.6527361 = queryWeight, product of:
0.5 = boost
3.3054721 = idf(docFreq=1002, maxDocs=10059)
1.0 = queryNorm
  4.6746435 = fieldWeight in 48, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
3.3054721 = idf(docFreq=1002, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  106.50396 = (MATCH) weight(nosyn:paint^5.0 in 48) [], result of:
106.50396 = score(doc=48,freq=4.0 = termFreq=4.0
), product of:
  16.317472 = queryWeight, product of:
5.0 = boost
3.2634945 = idf(docFreq=1045, maxDocs=10059)
1.0 = queryNorm
  6.526989 = fieldWeight in 48, product of:
2.0 = tf(freq=4.0), with freq of:
  4.0 = termFreq=4.0
3.2634945 = idf(docFreq=1045, maxDocs=10059)
1.0 = fieldNorm(doc=48)
 1.0142012 = 
scale(int(page_views)=18,toMin=1.0,toMax=3.0,fromMin=0.0,fromMax=2535.0)


Indexed with DefaultSimilarity:

7.630908 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | 
shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | 
bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product 
of:
  7.524058 = (MATCH) sum of:
7.524058 = (MATCH) max of:
  7.524058 = (MATCH) weight(title:paint^8.0 in 3504) [DefaultSimilarity], 
result of:
7.524058 = fieldWeight in 3504, product of:
  1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
  5.3203125 = idf(docFreq=197, maxDocs=14892)
  1.0 = fieldNorm(doc=3504)
  0.5091842 = (MATCH) weight(search_keywords:paint in 3504) 
[DefaultSimilarity], result of:
0.5091842 = score(doc=3504,freq=1.0 = termFreq=1.0
), product of:
  

Solved: Re: omit tf using per-field CustomSimilarity?

2012-09-24 Thread Carrie Coy
My problem was that I specified the per-field similarity class INSIDE 
the analyzer instead of outside it.


fieldType
analyzer
similarity
/fieldType

On 09/24/2012 02:56 PM, Carrie Coy wrote:
I'm trying to configure per-field similarity to disregard term 
frequency (omitTf) in a 'title' field.   I'm trying to follow the 
example docs without success: my custom similarity doesn't seem to 
have any effect on 'tf'.   Is the NoTfSimilarity function below 
written correctly?   Any advice is much appreciated.


my schema.xml:

field name=title type=text_custom_sim indexed=true 
stored=true omitNorms=true termVectors=true /


similarity class=solr.SchemaSimilarityFactory/
fieldType name=text_custom_sim class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true

analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
similarity class=com.ssww.NoTfSimilarityFactory /
 .
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
similarity class=com.ssww.NoTfSimilarityFactory /
 .
/analyzer


NoTfSimilarityFactory.java:

   package com.ssww;

   import org.apache.lucene.search.similarities.Similarity;
   import org.apache.solr.schema.SimilarityFactory;

   public class NoTfSimilarityFactory extends SimilarityFactory {
  @Override
  public Similarity getSimilarity() {
return new NoTfSimilarity();
  }
   }


NoTfSimilarity.java:

   package com.ssww;
   import org.apache.lucene.search.similarities.DefaultSimilarity;

   public final class NoTfSimilarity extends DefaultSimilarity {
public float tf(int i) {
 return 1;
}

   }

These two files are in a jar in the lib directory of this core.   
Here's the results of a search for paint with custom and default 
similarity:


Indexed with per-field NoTfSimilarity:

284.5441 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | 
shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | 
bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), 
product of:

  280.5598 = (MATCH) sum of:
280.5598 = (MATCH) max of:
  280.5598 = (MATCH) weight(title:paint^8.0 in 48) [], result of:
280.5598 = score(doc=48,freq=2.0 = termFreq=2.0
), product of:
  39.83825 = queryWeight, product of:
8.0 = boost
4.979781 = idf(docFreq=187, maxDocs=10059)
1.0 = queryNorm
  7.042474 = fieldWeight in 48, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
4.979781 = idf(docFreq=187, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  18.217428 = (MATCH) weight(search_keywords:paint in 48) [], 
result of:

18.217428 = score(doc=48,freq=1.0 = termFreq=1.0
), product of:
  4.268188 = queryWeight, product of:
4.268188 = idf(docFreq=382, maxDocs=10059)
1.0 = queryNorm
  4.268188 = fieldWeight in 48, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
4.268188 = idf(docFreq=382, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  7.725952 = (MATCH) weight(description:paint^0.5 in 48) [], 
result of:

7.725952 = score(doc=48,freq=2.0 = termFreq=2.0
), product of:
  1.6527361 = queryWeight, product of:
0.5 = boost
3.3054721 = idf(docFreq=1002, maxDocs=10059)
1.0 = queryNorm
  4.6746435 = fieldWeight in 48, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
3.3054721 = idf(docFreq=1002, maxDocs=10059)
1.0 = fieldNorm(doc=48)
  106.50396 = (MATCH) weight(nosyn:paint^5.0 in 48) [], result of:
106.50396 = score(doc=48,freq=4.0 = termFreq=4.0
), product of:
  16.317472 = queryWeight, product of:
5.0 = boost
3.2634945 = idf(docFreq=1045, maxDocs=10059)
1.0 = queryNorm
  6.526989 = fieldWeight in 48, product of:
2.0 = tf(freq=4.0), with freq of:
  4.0 = termFreq=4.0
3.2634945 = idf(docFreq=1045, maxDocs=10059)
1.0 = fieldNorm(doc=48)
 1.0142012 = 
scale(int(page_views)=18,toMin=1.0,toMax=3.0,fromMin=0.0,fromMax=2535.0)



Indexed with DefaultSimilarity:

7.630908 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | 
shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | 
bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), 
product of:

  7.524058 = (MATCH) sum of:
7.524058 = (MATCH) max of:
  7.524058 = (MATCH) weight(title:paint^8.0 in 3504) 
[DefaultSimilarity], result of:

7.524058 = fieldWeight in 3504, product of:
  1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
  5.3203125 = idf(docFreq=197, maxDocs=14892)
  1.0

Conditionally apply synonyms?

2012-09-19 Thread Carrie Coy
Is there an existing TokenFilterFactory that can conditionally insert 
synonyms based on a given document attribute, say category? Some 
synonyms only make sense in context: bats in Sports is different from 
bats in Party and Novelty.


It seems the synonyms.txt file would need an additional column that 
could be checked against the document attribute prior to appending synonyms:


*#synonymscategory*
post,polesports
wheel,caster   furniture
pat,paddy,patrick   holiday

Is anything like this possible without writing a custom TokenFilterFactory?


Re: Conditionally apply synonyms?

2012-09-19 Thread Carrie Coy
the latter:  the document (eg product)  has a category, and the synonyms 
would be applied at index time.  sports-related bat synonyms to 
baseball bats, and halloween-themed bat synonyms to scary bats, 
for example.



On 09/19/2012 05:08 PM, Erick Erickson wrote:

Not that I know of, synonyms are an all-or-nothing on a field.

But how would you indicate the context at index time as opposed to
query time? Especially at query time, there's very little in the way of
context to figure out what the category was.

Or were you thinking that the document had a category and applying
this only at index time?

Best
Erick

On Wed, Sep 19, 2012 at 3:23 PM, Carrie Coyc...@ssww.com  wrote:

Is there an existing TokenFilterFactory that can conditionally insert
synonyms based on a given document attribute, say category? Some synonyms
only make sense in context: bats in Sports is different from bats in
Party and Novelty.

It seems the synonyms.txt file would need an additional column that could be
checked against the document attribute prior to appending synonyms:

*#synonymscategory*
post,polesports
wheel,caster   furniture
pat,paddy,patrick   holiday

Is anything like this possible without writing a custom TokenFilterFactory?


local param invariant for q is ignored?

2012-08-31 Thread Carrie Coy
(solr4-beta) I'm trying to follow the instructions in this article: 
http://searchhub.org/dev/2011/06/20/solr-powered-isfdb-part-10/ to apply 
a custom sort order to search results:


Essentially, it involves creating a new qq parameter, and substituting 
it into the original q parameter as a local param via an invariant, as 
below.


My RequestHandler looks like this:

requestHandler name=/browse class=solr.SearchHandler
lst name=defaults
str name=echoParamsall/str

!-- VelocityResponseWriter settings --
lst name=invariants
str name=q{!boost b=page_views v=$qq}/str
/lst
str name=wtvelocity/str
str name=v.templatebrowse/str
str name=v.layoutlayout/str
str name=qq*:*/str

/requestHandler

I triple-checked the small modifications these files:

velocity/VM_global_library.vm.
#macro(q)qq=$!{esc.url($params.get('qq'))}#end

velocity/query.vm
span #annTitle(Add the query using the qq= parameter)Find: input 
type=text id=q name=qq value=$!esc.html($params.get('qq'))/ 
input type=submit id=querySubmit/ input type=reset//span




But instead of creating a query like this:
http://localhost:8080/solr/ssww/browse?q={!boost b=page_views 
v=$qq}qq=glue   (which returns great results)


It creates this:
http://localhost:8080/solr/ssww/browse?qq=glue   (no results)

From the query debugging:

lstname=params
strname=qqglue/str
strname=v.templatebrowse/str
strname=qftitle^2 description^0.5 cat^0.5 bullets^0.5 
search_keywords^0.3 /str

strname=wtxml/str
strname=rows10/str
strname=defTypeedismax/str
strname=invariants{q={!boost b=page_views v=$qq}}/str
strname=debugtrue/str
strname=wtxml/str
strname=qqglue/str

lstname=debug
nullname=rawquerystring/null
nullname=querystring/null
strname=parsedquery/str
strname=parsedquery_toString/str
lstname=explain/lst
strname=QParserExtendedDismaxQParser/str
nullname=altquerystring/null
nullname=boostfuncs/null

Does anybody know what I might have done wrong?






Re: local param invariant for q is ignored? (solved)

2012-08-31 Thread Carrie Coy
Of course I saw my error within seconds of pressing send.   The 
invariants block should appear outside the defaults block in the 
RequestHandler.


On 08/31/2012 04:25 PM, Carrie Coy wrote:
(solr4-beta) I'm trying to follow the instructions in this article: 
http://searchhub.org/dev/2011/06/20/solr-powered-isfdb-part-10/ to 
apply a custom sort order to search results:


Essentially, it involves creating a new qq parameter, and substituting 
it into the original q parameter as a local param via an invariant, as 
below.


My RequestHandler looks like this:

requestHandler name=/browse class=solr.SearchHandler
lst name=defaults
str name=echoParamsall/str

!-- VelocityResponseWriter settings --
lst name=invariants
str name=q{!boost b=page_views v=$qq}/str
/lst
str name=wtvelocity/str
str name=v.templatebrowse/str
str name=v.layoutlayout/str
str name=qq*:*/str

/requestHandler

I triple-checked the small modifications these files:

velocity/VM_global_library.vm.
#macro(q)qq=$!{esc.url($params.get('qq'))}#end

velocity/query.vm
span #annTitle(Add the query using the qq= parameter)Find: input 
type=text id=q name=qq value=$!esc.html($params.get('qq'))/ 
input type=submit id=querySubmit/ input type=reset//span




But instead of creating a query like this:
http://localhost:8080/solr/ssww/browse?q={!boost b=page_views 
v=$qq}qq=glue   (which returns great results)


It creates this:
http://localhost:8080/solr/ssww/browse?qq=glue   (no results)

From the query debugging:

lstname=params
strname=qqglue/str
strname=v.templatebrowse/str
strname=qftitle^2 description^0.5 cat^0.5 bullets^0.5 
search_keywords^0.3 /str

strname=wtxml/str
strname=rows10/str
strname=defTypeedismax/str
strname=invariants{q={!boost b=page_views v=$qq}}/str
strname=debugtrue/str
strname=wtxml/str
strname=qqglue/str

lstname=debug
nullname=rawquerystring/null
nullname=querystring/null
strname=parsedquery/str
strname=parsedquery_toString/str
lstname=explain/lst
strname=QParserExtendedDismaxQParser/str
nullname=altquerystring/null
nullname=boostfuncs/null

Does anybody know what I might have done wrong?






Re: More debugging DIH - URLDataSource (solved)

2012-08-28 Thread Carrie Coy
Thank you for these suggestions.   The real problem was incorrect syntax 
for the primary key column in data-config.xml.   Once I corrected that, 
the data loaded fine.


wrong:

field  column=part_code  name=id
xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME'] 
regex=/^PRODUCT:.*\((.*?)\)$/  replaceWith=$1/


Right:

field  column=id
xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME'] 
regex=/^PRODUCT:.*\((.*?)\)$/  replaceWith=$1/



On 08/25/2012 08:52 PM, Lance Norskog wrote:

About XPaths: the XPath engine does a limited range of xpaths. The doc
says that your paths are covered.

About logs: You only have the RegexTransformer listed. You need to add
LogTransformer to the transformer list:
http://wiki.apache.org/solr/DataImportHandler#LogTransformer

Having xml entity codes in the url string seems right. Can you verify
the url that goes to the remote site? Can you read the logs at the
remote site? Can you run this code through a proxy and watch the data?

On Fri, Aug 24, 2012 at 1:34 PM, Carrie Coyc...@ssww.com  wrote:

I'm trying to write a DIH to incorporate page view metrics from an XML feed
into our index.   The DIH makes a single request, and updates 0 documents.
I set log level to finest for the entire dataimport section, but I still
can't tell what's wrong.  I suspect the XPath.
http://localhost:8080/solr/core1/admin/dataimport.jsp?handler=/dataimport
returns 404.  Any suggestions on how I can debug this?

*

  solr-spec
  4.0.0.2012.08.06.22.50.47


The XML data:

?xml version='1.0' encoding='UTF-8'?
ReportDataResponse
Data
Rows
Row rowKey=P#PRODUCT: BURLAP POTATO SACKS  (PACK OF 12)
(W4537)#N/A#5516196614 rowActionAvailability=0 0 0
Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: BURLAP POTATO
SACKS  (PACK OF 12) (W4537)/Value
Value columnId=PAGE_VIEWS comparisonSpecifier=A2388/Value
/Row
Row rowKey=P#PRODUCT: OPAQUE PONY BEADS 6X9MM  (BAG OF 850)
(BE9000)#N/A#5521976460 rowActionAvailability=0 0 0
Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: OPAQUE PONY
BEADS 6X9MM  (BAG OF 850) (BE9000)/Value
Value columnId=PAGE_VIEWS comparisonSpecifier=A1313/Value
/Row
/Rows
/Data
/ReportDataResponse

My DIH:

|dataConfig
  dataSource name=coremetrics
  type=URLDataSource
  encoding=UTF-8
  connectionTimeout=5000
  readTimeout=1/

  document
 entity  name=coremetrics
 dataSource=coremetrics
 pk=id

url=https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=**amp;username=amp;format=XMLamp;userAuthKey=amp;language=en_USmp;viewID=9475540amp;period_a=M20110930;
 processor=XPathEntityProcessor
 stream=true
 forEach=/ReportDataResponse/Data/Rows/Row
 logLevel=fine
 transformer=RegexTransformer

 field  column=part_code  name=id
xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME']
regex=/^PRODUCT:.*\((.*?)\)$/  replaceWith=$1/
 field  column=page_views
xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_VIEWS']  /
/entity
  /document
/dataConfig
|

|||This little test perl script correctly extracts the data:|
||
|use XML::XPath;|
|use XML::XPath::XMLParser;|
||
|my $xp = XML::XPath-new(filename =  'cm.xml');|
|||my $nodeset = $xp-find('/ReportDataResponse/Data/Rows/Row');|
|||foreach my $node ($nodeset-get_nodelist) {|
|||my $page_name = $node-findvalue('Value[@columnId=PAGE_NAME]');|
|my $page_views = $node-findvalue('Value[@columnId=PAGE_VIEWS]');|
|$page_name =~ s/^PRODUCT:.*\((.*?)\)$/$1/;|
|}|

 From logs:

INFO: Loading DIH Configuration: data-config.xml
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter
loadDataConfig
INFO: Data Configuration loaded successfully
Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import}
status=0 QTime=2
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Aug 24, 2012 3:53:10 PM
org.apache.solr.handler.dataimport.SimplePropertiesWriter
readIndexerProperties
INFO: Read dataimport.properties
Aug 24, 2012 3:53:10 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [ssww] REMOVING ALL DOCUMENTS FROM INDEX
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.URLDataSource
getData
FINE: Accessing URL:
https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=*username=***format=XMLuserAuthKey=**language=en_USviewID=9475540period_a=M20110930
Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:12 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport 

More debugging DIH - URLDataSource

2012-08-24 Thread Carrie Coy
I'm trying to write a DIH to incorporate page view metrics from an XML 
feed into our index.   The DIH makes a single request, and updates 0 
documents.  I set log level to finest for the entire dataimport 
section, but I still can't tell what's wrong.  I suspect the XPath.   
http://localhost:8080/solr/core1/admin/dataimport.jsp?handler=/dataimport returns 
404.  Any suggestions on how I can debug this?


   *

 solr-spec
 4.0.0.2012.08.06.22.50.47


The XML data:

?xml version='1.0' encoding='UTF-8'?
ReportDataResponse
Data
Rows
Row rowKey=P#PRODUCT: BURLAP POTATO SACKS  (PACK OF 12) 
(W4537)#N/A#5516196614 rowActionAvailability=0 0 0
Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: BURLAP 
POTATO SACKS  (PACK OF 12) (W4537)/Value

Value columnId=PAGE_VIEWS comparisonSpecifier=A2388/Value
/Row
Row rowKey=P#PRODUCT: OPAQUE PONY BEADS 6X9MM  (BAG OF 850) 
(BE9000)#N/A#5521976460 rowActionAvailability=0 0 0
Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: OPAQUE PONY 
BEADS 6X9MM  (BAG OF 850) (BE9000)/Value

Value columnId=PAGE_VIEWS comparisonSpecifier=A1313/Value
/Row
/Rows
/Data
/ReportDataResponse

My DIH:

|dataConfig
 dataSource name=coremetrics
 type=URLDataSource
 encoding=UTF-8
 connectionTimeout=5000
 readTimeout=1/

 document
entity  name=coremetrics
dataSource=coremetrics
pk=id

url=https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=**amp;username=amp;format=XMLamp;userAuthKey=amp;language=en_USmp;viewID=9475540amp;period_a=M20110930;
processor=XPathEntityProcessor
stream=true
forEach=/ReportDataResponse/Data/Rows/Row
logLevel=fine
transformer=RegexTransformer  

field  column=part_code  name=id
xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME']  regex=/^PRODUCT:.*\((.*?)\)$/  
replaceWith=$1/
field  column=page_views 
xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_VIEWS']  /
   /entity
 /document
/dataConfig
|

|||This little test perl script correctly extracts the data:|
||
|use XML::XPath;|
|use XML::XPath::XMLParser;|
||
|my $xp = XML::XPath-new(filename = 'cm.xml');|
|||my $nodeset = $xp-find('/ReportDataResponse/Data/Rows/Row');|
|||foreach my $node ($nodeset-get_nodelist) {|
|||my $page_name = $node-findvalue('Value[@columnId=PAGE_NAME]');|
|my $page_views = $node-findvalue('Value[@columnId=PAGE_VIEWS]');|
|$page_name =~ s/^PRODUCT:.*\((.*?)\)$/$1/;|
|}|

From logs:

INFO: Loading DIH Configuration: data-config.xml
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter 
loadDataConfig

INFO: Data Configuration loaded successfully
Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import} 
status=0 QTime=2
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport

INFO: Starting Full Import
Aug 24, 2012 3:53:10 PM 
org.apache.solr.handler.dataimport.SimplePropertiesWriter 
readIndexerProperties

INFO: Read dataimport.properties
Aug 24, 2012 3:53:10 PM org.apache.solr.update.DirectUpdateHandler2 
deleteAll

INFO: [ssww] REMOVING ALL DOCUMENTS FROM INDEX
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.URLDataSource 
getData
FINE: Accessing URL: 
https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=*username=***format=XMLuserAuthKey=**language=en_USviewID=9475540period_a=M20110930

Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:12 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=1

Aug 24, 2012 3:53:14 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=1

Aug 24, 2012 3:53:16 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:18 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:20 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:22 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:24 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:27 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport 

Shingle and PositionFilterFactory question

2012-08-20 Thread Carrie Coy
I am trying to use shingles and position filter to make a query for 
foot print, for example, match either foot print or footprint.   
From the docs: using the PositionFilter 
http://wiki.apache.org/solr/PositionFilter in combination makes it 
possible to make all shingles synonyms of each other.


I've configured my analyzer like this:
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ShingleFilterFactory minShingleSize=2 
maxShingleSize=2 outputUnigrams=true 
outputUnigramsIfNoShingles=false tokenSeparator=/

filter class=solr.PositionFilterFactory/
/analyzer

user query:  foot print

Without PositionFilterFactory, parsed query:+(((title:foot) 
(title:print))~2) (title:(foot footprint) print)


With PositionFilterFactory, parsed query: +(((title:foot) 
(title:print))~2) ()


Why, when I add PositionFilterFactory into the mix, is the footprint 
shingle is omitted?


Output of analysis:

WT

text
raw_bytes
start
end
position
type


foot
[66 6f 6f 74]
0
4
1
word


print
[70 72 69 6e 74]
5
10
2
word

SF

text
raw_bytes
start
end
positionLength
type
position


foot
[66 6f 6f 74]
0
4
1
word
1


footprint
[66 6f 6f 74 70 72 69 6e 74]
0
10
2
shingle
1


print
[70 72 69 6e 74]
5
10
1
word
2



Thanks,
Carrie Coy









WordBreakSolrSpellChecker ignores MinBreakWordLength?

2012-06-28 Thread Carrie Coy
I set MinBreakWordLength = 3 thinking it would prevent 
WordBreakSolrSpellChecker from suggesting corrections made up of 
subwords shorter than 3 characters, but I still get suggestions like this:


query: Touch N' Match
suggestion: (t o u ch) 'n (m a t ch)

Can someone help me understand why?  Here is the relevant portion of 
solrconfig.xml:


str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.count10/str
str name=spellcheck.collatetrue/str
str name=spellcheck.maxCollations15/str
str name=spellcheck.maxCollationTries100/str
str name=spellcheck.alternativeTermCount4/str
str name=spellcheck.collateParam.mm100%/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.MinBreakWordLength3/str
str name=spellcheck.maxChanges3/str



Re: Solved: WordBreakSolrSpellChecker ignores MinBreakWordLength?

2012-06-28 Thread Carrie Coy
Thanks! The combination of these two suggestions (relocating the 
wordbreak parameters to the spellchecker configuration and correcting 
the spelling of the parameter to minBreakLength) fixed the problem I 
was having.


On 06/28/2012 10:22 AM, Dyer, James wrote:

Carrie,

Try taking the workbreak parameters out of the request handler configuration and instead put them 
in the spellchecker configuration.  You also need to remove the spellcheck. prefix.  Also, the 
correct spelling for this parameter is minBreakLength.  Here's an example.

lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=field{your field name here}/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges3/int
  int name=minBreakLength3/int
/lst

All of the parameters in the following source file go in the spellchecker 
configuration like this:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/WordBreakSolrSpellChecker.java

Descriptions of each of these parameters can be found in this source file:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/suggest/src/java/org/apache/lucene/search/spell/WordBreakSpellChecker.java

Let me know if this works out for you.  Any more feedback you can provide on 
the newer spellcheck features you're using is appreciated.  Thanks.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Carrie Coy [mailto:c...@ssww.com]
Sent: Thursday, June 28, 2012 8:20 AM
To: solr-user@lucene.apache.org
Subject: WordBreakSolrSpellChecker ignores MinBreakWordLength?

I set MinBreakWordLength = 3 thinking it would prevent
WordBreakSolrSpellChecker from suggesting corrections made up of
subwords shorter than 3 characters, but I still get suggestions like this:

query: Touch N' Match
suggestion: (t o u ch) 'n (m a t ch)

Can someone help me understand why?  Here is the relevant portion of
solrconfig.xml:

str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.count10/str
str name=spellcheck.collatetrue/str
str name=spellcheck.maxCollations15/str
str name=spellcheck.maxCollationTries100/str
str name=spellcheck.alternativeTermCount4/str
str name=spellcheck.collateParam.mm100%/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.MinBreakWordLength3/str
str name=spellcheck.maxChanges3/str



Re: WordBreak and default dictionary crash Solr

2012-06-18 Thread Carrie Coy

On 06/15/2012 05:16 PM, Dyer, James wrote:

I'm pretty sure you've found a bug here.  Could you tell me whether you're 
using a build from Trunk or Solr_4x ?  Also, do you know the svn revision or 
the Jenkins build # (or timestamp) you're working from?
I continued to see the problem after updating to version below 
(previously was running version built on 06-09):


   *

 solr-spec
 4.0.0.2012.06.16.10.22.10

   *

 solr-impl
 4.0-2012-06-16_10-02-16 1350899 - hudson - 2012-06-16 10:22:10


Could you try instead to use DirectSolrSpellChecker instead of IndexBasedSpellChecker for 
your default dictionary?


Switching to DirectSolrSpellChecker appears to fix the problem: a query 
with 2 misspellings, one from each dictionary, does not crash Solr and 
is correctly spell-checked.


Thanks!

Carrie Coy


WordBreak and default dictionary crash Solr

2012-06-15 Thread Carrie Coy

Is this a configuration problem or a bug?

We use two dictionaries, default (spellcheckerFreq)  and 
solr.WordBreakSolrSpellChecker.  When a query contains 2 misspellings, 
one corrected by the default dictionary, and the other corrected by the 
wordbreak dictionary (strawberryn shortcake) , Solr crashes with error 
below.   It doesn't matter which dictionary is checked first.


java.lang.NullPointerException
at 
org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:566)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1555)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:662)


Multiple errors corrected by the SAME dictionary (either wordbreak or 
default) do not crash Solr.   Here is excerpt from our solrconfig.xml:


searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=fieldspell/str
str name=combineWordstrue/str
str name=breakWordstrue/str
int name=maxChanges1/int
/lst
lst name=spellchecker
str name=namedefault/str
str name=fieldspell/str
str name=spellcheckIndexDirspellcheckerFreq/str
str name=buildOnOptimizetrue/str
/lst
/searchComponent

requestHandler name=/select class=solr.SearchHandler
lst name=defaults
   .
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.count3/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
/lst
/requestHandler





PorterStemmerTokenizerFactory ?

2012-06-07 Thread Carrie Coy
I've read different suggestions on how to handle cases where synonyms 
are used and there are multiple
version of the original word that need to point to the same set of 
synonyms (/responsibility, responsibilities, obligation, duty/ ).


The approach that seems most logical is to configure a 
SynonymFilterFactory to use a custom TokenizerFactory that stems 
synonyms by calling out to the PorterStemmer.


Does anyone know if a PorterStemmerTokenizerFactory already exists 
somewhere?


Thank you.
Carrie Coy