date:20101102

Thank you Bernd! I couldn't make it run though. Here is my problem:

1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
directive: lib path=../lib/stempel-1.0.jar /
3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

(...)
  !-- Polish --
  fieldType name=text_pl class=solr.TextField
analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=org.getopt.stempel.lucene.StempelFilter /
  !--filter
class=org.getopt.solr.analysis.StempelTokenFilterFactory
protected=protwords.txt / --
/analyzer
  /fieldType
(...)

4. jar file is loaded but I got an error:
SEVERE: Could not start SOLR. Check solr/home property
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
(...)

5. Different class gave me that one:
SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.getopt.solr.analysis.StempelTokenFilterFactory'
  at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
  at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
(...)

Question is: How to make fieldType / and filter / work with that Stempel? :)

Cheers,
Jakub Godawa.

2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 I have ported the KStemmer for use in most recent Solr trunk version.
 My stemmer is located in the lib directory of Solr 
 solr/lib/KStemmer-2.00.jar
 because it belongs to Solr.

 Write it as FilterFactory and use it as Filter like:
 filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory 
 protected=protwords.txt /

 This is how my fieldType looks like:

    fieldType name=text_kstem class=solr.TextField 
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=false /
        filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1 /
        filter class=solr.LowerCaseFilterFactory /
        filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory 
 protected=protwords.txt /
        filter class=solr.RemoveDuplicatesTokenFilterFactory /
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt /
        filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1 /
        filter class=solr.LowerCaseFilterFactory /
        filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory 
 protected=protwords.txt /
        filter class=solr.RemoveDuplicatesTokenFilterFactory /
      /analyzer
    /fieldType

 Regards,
 Bernd



 Am 28.10.2010 14:56, schrieb Jakub Godawa:
 Hi!
 There is a polish stemmer http://www.getopt.org/stempel/ and I have
 problems connecting it with solr 1.4.1
 Questions:

 1. Where EXACTLY do I put stemper-1.0.jar file?
 2. How do I register the file, so I can build a fieldType like:

 fieldType name=text_pl class=solr.TextField
   analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/
 /fieldType

 3. Is that the right approach to make it work?

 Thanks for verbose explanation,
 Jakub.

Re: Phrase Query Problem?

That's not the response I get when I try your query, so I suspect
something's not quite right with your test...

But you could also try putting parentheses around the words, like
mykeywords:(Compliance+With+Conduct+Standards)

Best
Erick

On Tue, Nov 2, 2010 at 5:25 AM, Tod listac...@gmail.com wrote:

 On 11/1/2010 11:14 PM, Ken Stanley wrote:

 On Mon, Nov 1, 2010 at 10:26 PM, Todlistac...@gmail.com  wrote:

  I have a number of fields I need to do an exact match on.  I've defined
 them as 'string' in my schema.xml.  I've noticed that I get back query
 results that don't have all of the words I'm using to search with.

 For example:



 q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json

 Should, with an exact match, return only one entry but it returns five
 some
 of which don't have any of the fields I've specified.  I've tried this
 both
 with and without quotes.

 What could I be doing wrong?


 Thanks - Tod



 Tod,

 Without knowing your exact field definition, my first guess would be your
 first boolean query; because it is not quoted, what SOLR typically does is
 to transform that type of query into something like (assuming your
 uniqueKey
 is id): (mykeywords:Compliance id:With id:Conduct id:Standards). If you
 do
 (mykeywords:Compliance+With+Conduct+Standards) you might see different
 (better?) results. Otherwise, appenddebugQuery=on to your URL and you can
 see exactly how SOLR is parsing your query. If none of that helps, what is
 your field definition in your schema.xml?

 - Ken


 The field definition is:

 field name=mykeywords type=string indexed=true stored=true
 multiValued=true/

 The request:


 select?q=(((mykeywords:Compliance+With+Attorney+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))fl=mykeywordsstart=0indent=truewt=jsondebugQuery=on

 The response looks like this:

  responseHeader:{
  status:0,
  QTime:8,
  params:{
wt:json,
q:(((mykeywords:Compliance With Attorney Conduct
 Standards)OR(mykeywords:All)OR(mykeywords:ALL))),
start:0,
indent:true,
fl:mykeywords,
debugQuery:on}},
  response:{numFound:6,start:0,docs:[
{
 mykeywords:[Compliance With Attorney Conduct Standards]},
{
 mykeywords:[Anti-Bribery,Bribes]},
{
 mykeywords:[Marketing Guidelines,Marketing]},
{},
{
 mykeywords:[Anti-Bribery,Due Diligence]},
{
 mykeywords:[Anti-Bribery,AntiBribery]}]
  },
  debug:{
  rawquerystring:(((mykeywords:Compliance With Attorney Conduct
 Standards)OR(mykeywords:All)OR(mykeywords:ALL))),
  querystring:(((mykeywords:Compliance With Attorney Conduct
 Standards)OR(mykeywords:All)OR(mykeywords:ALL))),
  parsedquery:(mykeywords:Compliance text:attorney text:conduct
 text:standard) mykeywords:All mykeywords:ALL,
  parsedquery_toString:(mykeywords:Compliance text:attorney text:conduct
 text:standard) mykeywords:All mykeywords:ALL,
  explain:{
 ...

 As you mentioned, looking at the parsed query its breaking the request up
 on word boundaries rather than on the entire phrase.  The goal is to return
 only the very first entry.  Any ideas?


 Thanks - Tod

how to get TermVectorComponent using xml , vs. SOLR-949

2010-11-02 Thread Will Milspec

Hi all,

This seems a basic question: what's the best way to get
TermVectorComponents. from the Solr XmL response?

SolrJ does not include TermVectorComponents in its api; the SOLR-949 patch
adds this ability, but after 2 years it's still not in the mainline. (And
doesn't patch cleanly to the current head 1.4).

I'm new to Solr and familiar with the SolrJ but not as the best means for
getting/parsing the raw xml.  (Typically I find the dtd and right code to
parse the dom using the dtd. In this case I've seen a few examples, but
nothing definiive)

Our team would rather use the out of the box solr rather than manually
apply patches and worry about consistency during upgrades...

Thanks in advance,

will

Re: Disk usage per-field

2010-11-02 Thread Muneeb Ali


Hi,

I am currently benchmarking solr index with different fields to see the
impact on its size/ search speed etc. A feature to find the disk usage per
field of index would be really handy and save me alot of time. Do we have
any updates on this?

Has anyone tried writing custom code for it ? 

- Muneeb
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disk-usage-per-field-tp934765p1827739.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to use polish stemmer - Stempel - in schema.xml?

Erick I've put the jar files like that before. I also added the
directive and put the file in instanceDir/lib

What is still a problem is that even the files are loaded:
2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
to classloader

I am not able to use the FilterFactory... maybe I am attempting it in
a wrong way?

Cheers,
Jakub Godawa.

2010/11/2 Erick Erickson erickerick...@gmail.com:
 The polish stemmer jar file needs to be findable by Solr, if you copy
 it to solr_home/lib and restart solr you should be set.

 Alternatively, you can add another lib directive to the solrconfig.xml
 file
 (there are several examples in that file already).

 I'm a little confused about not being able to find TokenFilter, is that
 still
 a problem?

 HTH
 Erick

 On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote:

 Thank you Bernd! I couldn't make it run though. Here is my problem:

 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
 directive: lib path=../lib/stempel-1.0.jar /
 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

 (...)
  !-- Polish --
   fieldType name=text_pl class=solr.TextField
    analyzer
       tokenizer class=solr.WhitespaceTokenizerFactory/
      filter class=solr.LowerCaseFilterFactory/
      filter class=org.getopt.stempel.lucene.StempelFilter /
      !--    filter
 class=org.getopt.solr.analysis.StempelTokenFilterFactory
 protected=protwords.txt / --
    /analyzer
  /fieldType
 (...)

 4. jar file is loaded but I got an error:
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
      at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 (...)

 5. Different class gave me that one:
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.getopt.solr.analysis.StempelTokenFilterFactory'
      at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
      at
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
 (...)

 Question is: How to make fieldType / and filter / work with that
 Stempel? :)

 Cheers,
 Jakub Godawa.

 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
  Hi Jakub,
 
  I have ported the KStemmer for use in most recent Solr trunk version.
  My stemmer is located in the lib directory of Solr
 solr/lib/KStemmer-2.00.jar
  because it belongs to Solr.
 
  Write it as FilterFactory and use it as Filter like:
  filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /
 
  This is how my fieldType looks like:
 
     fieldType name=text_kstem class=solr.TextField
 positionIncrementGap=100
       analyzer type=index
         tokenizer class=solr.WhitespaceTokenizerFactory /
         filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=false /
         filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1
  catenateAll=0 splitOnCaseChange=1 /
         filter class=solr.LowerCaseFilterFactory /
         filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /
         filter class=solr.RemoveDuplicatesTokenFilterFactory /
       /analyzer
       analyzer type=query
         tokenizer class=solr.WhitespaceTokenizerFactory /
         filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
         filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0
  catenateAll=0 splitOnCaseChange=1 /
         filter class=solr.LowerCaseFilterFactory /
         filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /
         filter class=solr.RemoveDuplicatesTokenFilterFactory /
       /analyzer
     /fieldType
 
  Regards,
  Bernd
 
 
 
  Am 28.10.2010 14:56, schrieb Jakub Godawa:
  Hi!
  There is a polish stemmer http://www.getopt.org/stempel/ and I have
  problems connecting it with solr 1.4.1
  Questions:
 
  1. Where EXACTLY do I put stemper-1.0.jar file?
  2. How do I register the file, so I can build a fieldType like:
 
  fieldType name=text_pl class=solr.TextField
    analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/
  /fieldType
 
  3. Is that the right approach to make it work?
 
  Thanks for verbose explanation,
  Jakub.

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-02 Thread Bernd Fehling

Hi Jakub,

if you unzip your stempel-1.0.jar do you have the
required directory structure and file in there?
org/getopt/stempel/lucene/StempelFilter.class

Regards,
Bernd

Am 02.11.2010 13:54, schrieb Jakub Godawa:
 Erick I've put the jar files like that before. I also added the
 directive and put the file in instanceDir/lib
 
 What is still a problem is that even the files are loaded:
 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
 INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
 to classloader
 
 I am not able to use the FilterFactory... maybe I am attempting it in
 a wrong way?
 
 Cheers,
 Jakub Godawa.
 
 2010/11/2 Erick Erickson erickerick...@gmail.com:
 The polish stemmer jar file needs to be findable by Solr, if you copy
 it to solr_home/lib and restart solr you should be set.

 Alternatively, you can add another lib directive to the solrconfig.xml
 file
 (there are several examples in that file already).

 I'm a little confused about not being able to find TokenFilter, is that
 still
 a problem?

 HTH
 Erick

 On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote:

 Thank you Bernd! I couldn't make it run though. Here is my problem:

 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
 directive: lib path=../lib/stempel-1.0.jar /
 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

 (...)
  !-- Polish --
   fieldType name=text_pl class=solr.TextField
analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=org.getopt.stempel.lucene.StempelFilter /
  !--filter
 class=org.getopt.solr.analysis.StempelTokenFilterFactory
 protected=protwords.txt / --
/analyzer
  /fieldType
 (...)

 4. jar file is loaded but I got an error:
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
  at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 (...)

 5. Different class gave me that one:
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.getopt.solr.analysis.StempelTokenFilterFactory'
  at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
  at
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
 (...)

 Question is: How to make fieldType / and filter / work with that
 Stempel? :)

 Cheers,
 Jakub Godawa.

 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 I have ported the KStemmer for use in most recent Solr trunk version.
 My stemmer is located in the lib directory of Solr
 solr/lib/KStemmer-2.00.jar
 because it belongs to Solr.

 Write it as FilterFactory and use it as Filter like:
 filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /

 This is how my fieldType looks like:

fieldType name=text_kstem class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=false /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
/fieldType

 Regards,
 Bernd



 Am 28.10.2010 14:56, schrieb Jakub Godawa:
 Hi!
 There is a polish stemmer http://www.getopt.org/stempel/ and I have
 problems connecting it with solr 1.4.1
 Questions:

 1. Where EXACTLY do I put stemper-1.0.jar file?
 2. How do I register the file, so I can build a fieldType like:

 fieldType name=text_pl class=solr.TextField
   analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/
 /fieldType

 3. Is that the right approach to make it work?

 Thanks for verbose explanation,
 Jakub.




-- 
*

Re: Problem with phrase matches in Solr

2010-11-02 Thread Moazzam Khan

I will. Thanks Darren

-Moazzam

On Mon, Nov 1, 2010 at 1:15 PM,  dar...@ontrenet.com wrote:
 Take a look at term proximity and phrase query.

 http://wiki.apache.org/solr/SolrRelevancyCookbook

 Hey guys,

 I have a solr index where i store information about experts from
 various fields. The thing is when I search for channel marketing i
 get people that have the word channel or marketing in their data. I
 only want people who have that entire phrase in their bio. I copy the
 contents of bio to the default search field (which is text)

 How can I make sure that exact phrase matching works while the search
 is agile enough that half searches match too (like uni matches
 university, etc - this works but not phrase matching)?

 I hope I was able to properly explain my problem. If not, please let me
 know.

 Thanks in advance,
 Moazzam

Re: How to use polish stemmer - Stempel - in schema.xml?

This is what stempel-1.0.jar consist of after jar -xf:

jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
org/:
egothor  getopt

org/egothor:
stemmer

org/egothor/stemmer:
Cell.class Diff.classGener.class  MultiTrie2.class
Optimizer2.class  Reduce.classRow.classTestAll.class
TestLoad.class  Trie$StrEnum.class
Compile.class  DiffIt.class  Lift.class   MultiTrie.class
Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
Trie.class

org/getopt:
stempel

org/getopt/stempel:
Benchmark.class  lucene  Stemmer.class

org/getopt/stempel/lucene:
StempelAnalyzer.class  StempelFilter.class
jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
META-INF/:
MANIFEST.MF
jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
res:
tables

res/tables:
readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
stemmer_200.out  stemmer_500.out  stemmer_700.out

2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 if you unzip your stempel-1.0.jar do you have the
 required directory structure and file in there?
 org/getopt/stempel/lucene/StempelFilter.class

 Regards,
 Bernd

 Am 02.11.2010 13:54, schrieb Jakub Godawa:
 Erick I've put the jar files like that before. I also added the
 directive and put the file in instanceDir/lib

 What is still a problem is that even the files are loaded:
 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
 to classloader

 I am not able to use the FilterFactory... maybe I am attempting it in
 a wrong way?

 Cheers,
 Jakub Godawa.

 2010/11/2 Erick Erickson erickerick...@gmail.com:
 The polish stemmer jar file needs to be findable by Solr, if you copy
 it to solr_home/lib and restart solr you should be set.

 Alternatively, you can add another lib directive to the solrconfig.xml
 file
 (there are several examples in that file already).

 I'm a little confused about not being able to find TokenFilter, is that
 still
 a problem?

 HTH
 Erick

 On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote:

 Thank you Bernd! I couldn't make it run though. Here is my problem:

 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
 directive: lib path=../lib/stempel-1.0.jar /
 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

 (...)
  !-- Polish --
   fieldType name=text_pl class=solr.TextField
    analyzer
       tokenizer class=solr.WhitespaceTokenizerFactory/
      filter class=solr.LowerCaseFilterFactory/
      filter class=org.getopt.stempel.lucene.StempelFilter /
      !--    filter
 class=org.getopt.solr.analysis.StempelTokenFilterFactory
 protected=protwords.txt / --
    /analyzer
  /fieldType
 (...)

 4. jar file is loaded but I got an error:
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
      at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 (...)

 5. Different class gave me that one:
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.getopt.solr.analysis.StempelTokenFilterFactory'
      at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
      at
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
 (...)

 Question is: How to make fieldType / and filter / work with that
 Stempel? :)

 Cheers,
 Jakub Godawa.

 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 I have ported the KStemmer for use in most recent Solr trunk version.
 My stemmer is located in the lib directory of Solr
 solr/lib/KStemmer-2.00.jar
 because it belongs to Solr.

 Write it as FilterFactory and use it as Filter like:
 filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /

 This is how my fieldType looks like:

    fieldType name=text_kstem class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=false /
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1 /
        filter class=solr.LowerCaseFilterFactory /
        filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /
        filter class=solr.RemoveDuplicatesTokenFilterFactory /
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
        filter

Re: Phrase Query Problem?

2010-11-02 Thread Ken Stanley

On Tue, Nov 2, 2010 at 8:19 AM, Erick Erickson erickerick...@gmail.comwrote:

 That's not the response I get when I try your query, so I suspect
 something's not quite right with your test...

 But you could also try putting parentheses around the words, like
 mykeywords:(Compliance+With+Conduct+Standards)

 Best
 Erick


I agree with Erick, your query string showed quotes, but your parsed query
did not. Using quotes, or parenthesis, would pretty much leave your query
alone. There is one exception that I've found: if you use a stopword
analyzer, any stop words would be converted to ? in the parsed query. So if
you absolutely need every single word to match, regardless, you cannot use a
field type that uses the stop word analyzer.

For example, I have two dynamic field definitions: df_text_* that does the
default text transformations (including stop words), and df_text_exact_*
that does nothing (field type is string). When I run the
query df_text_exact_company_name:Bank of America OR
df_text_company_name:Bank of America, the following is shown as my
query/parsed query when debugQuery is on:

str name=rawquerystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=querystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=parsedquery
df_text_exact_company_name:Bank of America
PhraseQuery(df_text_company_name:bank ? america)
/str
str name=parsedquery_toString
df_text_exact_company_name:Bank of America df_text_company_name:bank ?
america
/str

The difference is subtle, but important. If I were to do
df_text_company_name:Bank and America, I would still match Bank of
America. These are things that you should keep in mind when you are
creating fields for your indices.

A useful tool for seeing what SOLR does to your query terms is the Analysis
tool found in the admin panel. You can do an analysis on either a specific
field, or by a field type, and you will see a breakdown by Analyzer for
either the index, query, or both of any query that you put in. This would
definitely be useful when trying to determine why SOLR might return what it
does.

- Ken

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-02 Thread Bernd Fehling


So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
In this case I would assume a file StempelTokenFilterFactory.class
in your directory org/getopt/solr/analysis/.

And a class which extends the BaseTokenFilterFactory rigth?
...
public class StempelTokenFilterFactory extends BaseTokenFilterFactory 
implements ResourceLoaderAware {
...



Am 02.11.2010 14:20, schrieb Jakub Godawa:
 This is what stempel-1.0.jar consist of after jar -xf:
 
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
 org/:
 egothor  getopt
 
 org/egothor:
 stemmer
 
 org/egothor/stemmer:
 Cell.class Diff.classGener.class  MultiTrie2.class
 Optimizer2.class  Reduce.classRow.classTestAll.class
 TestLoad.class  Trie$StrEnum.class
 Compile.class  DiffIt.class  Lift.class   MultiTrie.class
 Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
 Trie.class
 
 org/getopt:
 stempel
 
 org/getopt/stempel:
 Benchmark.class  lucene  Stemmer.class
 
 org/getopt/stempel/lucene:
 StempelAnalyzer.class  StempelFilter.class
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
 META-INF/:
 MANIFEST.MF
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
 res:
 tables
 
 res/tables:
 readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
 stemmer_200.out  stemmer_500.out  stemmer_700.out
 
 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 if you unzip your stempel-1.0.jar do you have the
 required directory structure and file in there?
 org/getopt/stempel/lucene/StempelFilter.class

 Regards,
 Bernd

 Am 02.11.2010 13:54, schrieb Jakub Godawa:
 Erick I've put the jar files like that before. I also added the
 directive and put the file in instanceDir/lib

 What is still a problem is that even the files are loaded:
 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
 to classloader

 I am not able to use the FilterFactory... maybe I am attempting it in
 a wrong way?

 Cheers,
 Jakub Godawa.

 2010/11/2 Erick Erickson erickerick...@gmail.com:
 The polish stemmer jar file needs to be findable by Solr, if you copy
 it to solr_home/lib and restart solr you should be set.

 Alternatively, you can add another lib directive to the solrconfig.xml
 file
 (there are several examples in that file already).

 I'm a little confused about not being able to find TokenFilter, is that
 still
 a problem?

 HTH
 Erick

 On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com 
 wrote:

 Thank you Bernd! I couldn't make it run though. Here is my problem:

 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
 directive: lib path=../lib/stempel-1.0.jar /
 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

 (...)
  !-- Polish --
   fieldType name=text_pl class=solr.TextField
analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=org.getopt.stempel.lucene.StempelFilter /
  !--filter
 class=org.getopt.solr.analysis.StempelTokenFilterFactory
 protected=protwords.txt / --
/analyzer
  /fieldType
 (...)

 4. jar file is loaded but I got an error:
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
  at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 (...)

 5. Different class gave me that one:
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.getopt.solr.analysis.StempelTokenFilterFactory'
  at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
  at
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
 (...)

 Question is: How to make fieldType / and filter / work with that
 Stempel? :)

 Cheers,
 Jakub Godawa.

 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 I have ported the KStemmer for use in most recent Solr trunk version.
 My stemmer is located in the lib directory of Solr
 solr/lib/KStemmer-2.00.jar
 because it belongs to Solr.

 Write it as FilterFactory and use it as Filter like:
 filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /

 This is how my fieldType looks like:

fieldType name=text_kstem class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=false /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1 /

Re: How to use polish stemmer - Stempel - in schema.xml?

Sorry, I am not Java programmer at all. I would appreciate more
verbose (or step by step) help.

2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:

 So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
 In this case I would assume a file StempelTokenFilterFactory.class
 in your directory org/getopt/solr/analysis/.

 And a class which extends the BaseTokenFilterFactory rigth?
 ...
 public class StempelTokenFilterFactory extends BaseTokenFilterFactory 
 implements ResourceLoaderAware {
 ...



 Am 02.11.2010 14:20, schrieb Jakub Godawa:
 This is what stempel-1.0.jar consist of after jar -xf:

 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
 org/:
 egothor  getopt

 org/egothor:
 stemmer

 org/egothor/stemmer:
 Cell.class     Diff.class    Gener.class  MultiTrie2.class
 Optimizer2.class  Reduce.class        Row.class    TestAll.class
 TestLoad.class  Trie$StrEnum.class
 Compile.class  DiffIt.class  Lift.class   MultiTrie.class
 Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
 Trie.class

 org/getopt:
 stempel

 org/getopt/stempel:
 Benchmark.class  lucene  Stemmer.class

 org/getopt/stempel/lucene:
 StempelAnalyzer.class  StempelFilter.class
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
 META-INF/:
 MANIFEST.MF
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
 res:
 tables

 res/tables:
 readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
 stemmer_200.out  stemmer_500.out  stemmer_700.out

 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 if you unzip your stempel-1.0.jar do you have the
 required directory structure and file in there?
 org/getopt/stempel/lucene/StempelFilter.class

 Regards,
 Bernd

 Am 02.11.2010 13:54, schrieb Jakub Godawa:
 Erick I've put the jar files like that before. I also added the
 directive and put the file in instanceDir/lib

 What is still a problem is that even the files are loaded:
 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 INFO: Adding 
 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
 to classloader

 I am not able to use the FilterFactory... maybe I am attempting it in
 a wrong way?

 Cheers,
 Jakub Godawa.

 2010/11/2 Erick Erickson erickerick...@gmail.com:
 The polish stemmer jar file needs to be findable by Solr, if you copy
 it to solr_home/lib and restart solr you should be set.

 Alternatively, you can add another lib directive to the solrconfig.xml
 file
 (there are several examples in that file already).

 I'm a little confused about not being able to find TokenFilter, is that
 still
 a problem?

 HTH
 Erick

 On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com 
 wrote:

 Thank you Bernd! I couldn't make it run though. Here is my problem:

 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
 directive: lib path=../lib/stempel-1.0.jar /
 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

 (...)
  !-- Polish --
   fieldType name=text_pl class=solr.TextField
    analyzer
       tokenizer class=solr.WhitespaceTokenizerFactory/
      filter class=solr.LowerCaseFilterFactory/
      filter class=org.getopt.stempel.lucene.StempelFilter /
      !--    filter
 class=org.getopt.solr.analysis.StempelTokenFilterFactory
 protected=protwords.txt / --
    /analyzer
  /fieldType
 (...)

 4. jar file is loaded but I got an error:
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
      at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 (...)

 5. Different class gave me that one:
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.getopt.solr.analysis.StempelTokenFilterFactory'
      at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
      at
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
 (...)

 Question is: How to make fieldType / and filter / work with that
 Stempel? :)

 Cheers,
 Jakub Godawa.

 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 I have ported the KStemmer for use in most recent Solr trunk version.
 My stemmer is located in the lib directory of Solr
 solr/lib/KStemmer-2.00.jar
 because it belongs to Solr.

 Write it as FilterFactory and use it as Filter like:
 filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /

 This is how my fieldType looks like:

    fieldType name=text_kstem class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=false /
        filter

Highlighting and maxBooleanClauses limit

2010-11-02 Thread Ken Stanley

By default, the solrconfig.xml has maxBooleanClauses set to 1024, which in
my opinion should be more than enough clauses in general. Recently, we have
been noticing errors in our Catalina log: SEVERE:
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set
to 2048. As a temporary (and quick) work around, we tried to increase the
maxBooleanClauses to 2048, but are still experiencing problems hitting the
limit. The full error (including the query ran before the error) is:

INFO: [bizjournals] webapp=/solr path=/select/
params={facet=truesort=df_date_published+aschl=trueversion=2.2facet.field=facet_typefacet.field=facet_authorfacet.field=facet_arr_industriesfq=df_date_published:[*+TO+NOW]hl.requireFieldMatch=truehl.fragsize=75facet.mincount=1indent=onhl.fl=df_text_contentwt=xmlrows=25hl.snippets=2hl.maxAlternateFieldLength=150start=0q=(df_text_blog_name:farm+bill)+OR+((df_text_headline:[*+TO+*]+AND+df_date_published:[*+TO+NOW])+AND+((df_text_author:farm+bill)+OR+(df_text_content:farm+bill)+OR+(df_text_headline:farm+bill)+OR+(df_text_blog_name:farm+bill)))hl.alternateField=df_text_contenthl.usePhraseHighlighter=true}
hits=269 status=500 QTime=729
Nov 2, 2010 4:10:09 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount
is set to 2048
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:153)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:144)
at
org.apache.lucene.search.MultiTermQuery$ScoringBooleanQueryRewrite.rewrite(MultiTermQuery.java:110)
at
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:382)
at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:178)
at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111)
at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111)
at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111)
at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
at
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
at
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)

I've noticed in the stack trace that this exception occurs when trying to
build the query for the highlighting; I've confirmed this by copying the
params and changing hl=true to hl=false. Unfortunately, when using
debugQuery=on, I do not see any details on what is going on with the
highlighting portion of the query (after artificially increasing the
maxBooleanClauses so the query will run).

With all of that said, my question(s) to the list are: Is there a way to
determine how exactly the highlighter is building its query (i.e., some sort
of highlighting debug setting)? Is the behavior of highlighting in SOLR
intended to be held to the same restrictions (maxBooleanClauses) as the
query

Slave replication with custom dataDir

2010-11-02 Thread Kura


Hey guys,

I have 2 instances of Solr running, one as a master, one as a slave.

Both have dataDir/var/lib/solr/data/dataDir

The master works fine, the slave dies with a huge set of stack traces.

The Solr wiki says that replication must match the dataDir if it's 
custom, but how do I actually set that?

Re: Slave replication with custom dataDir

2010-11-02 Thread Kura

This is a log dump, please be aware that this only appears in my log if 
I have the following enabled in config.


dataDir/var/lib/solr/data/dataDir

... snip ...

requestHandler name=/replication class=solr.ReplicationHandler 
lst name=slave
str name=masterUrlhttp://10.1.2.196:8080/solr/replication/str
str name=pollInterval00:00:20/str
/lst
/requestHandler


Log ouput

03/11/2010 1:23:47 AM org.apache.solr.servlet.SolrDispatchFilter init
SEVERE: Could not start SOLR. Check solr/home property
java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.clinit(MultiThreadedHttpConnectionManager.java:70)
at 
org.apache.solr.handler.SnapPuller.createHttpClient(SnapPuller.java:110)

at org.apache.solr.handler.SnapPuller.init(SnapPuller.java:138)
at 
org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:775)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:486)

at org.apache.solr.core.SolrCore.init(SolrCore.java:589)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at 
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838)
at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4488)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at 
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
at 
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498)

at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at 
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)

at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
at 
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at 
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
at 
org.apache.catalina.core.StandardService.start(StandardService.java:516)
at 
org.apache.catalina.core.StandardServer.start(StandardServer.java:710)

at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: java.lang.ClassNotFoundException: 
org.apache.commons.logging.LogFactory
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1484)
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1329)

... 35 more
03/11/2010 1:23:47 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to searc...@689e8c34 main
03/11/2010 1:23:47 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError: 
org/apache/commons/logging/LogFactory
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.clinit(MultiThreadedHttpConnectionManager.java:70)
at 
org.apache.solr.handler.SnapPuller.createHttpClient(SnapPuller.java:110)

at org.apache.solr.handler.SnapPuller.init(SnapPuller.java:138)
at 
org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:775)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:486)

at org.apache.solr.core.SolrCore.init(SolrCore.java:589)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at

Query question

2010-11-02 Thread kenf_nc


I can't seem to find the right formula for this. I have a need to build a
query where one of the fields should boost the score, but not affect the
query if there isn't a match. For example, if I have documents with
restaurants, name, address, cuisine, description, etc.  I want to search on,
say, 

Romantic AND View AND city:Chicago

if city is in fact Chicago it should score higher, but if city is not
Chicago (or even if it's missing the city field), but matches the other
query parameters it should still come back in the results. Is something like
this possible?  It's kind of like  q=(some query) optional boost if
field:value.

Thanks,
Ken
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-question-tp1828367p1828367.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query question

I think you'll find the dismax request handler helpful in general, it 
supports more flexible query wrangling like that.


With the dismax request handler, I think the bq (boost query) parameter 
will do what you need, eg:


bq=city:Chicago^5.0

The ^5.0 is how much boost you want, you can play around with it to see 
what works well for your use cases.


http://wiki.apache.org/solr/DisMaxQParserPlugin



kenf_nc wrote:

I can't seem to find the right formula for this. I have a need to build a
query where one of the fields should boost the score, but not affect the
query if there isn't a match. For example, if I have documents with
restaurants, name, address, cuisine, description, etc.  I want to search on,
say, 


Romantic AND View AND city:Chicago

if city is in fact Chicago it should score higher, but if city is not
Chicago (or even if it's missing the city field), but matches the other
query parameters it should still come back in the results. Is something like
this possible?  It's kind of like  q=(some query) optional boost if
field:value.

Thanks,
Ken

Re: Phrase Query Problem?

2010-11-02 Thread Tod


On 11/2/2010 9:21 AM, Ken Stanley wrote:

On Tue, Nov 2, 2010 at 8:19 AM, Erick Ericksonerickerick...@gmail.comwrote:


That's not the response I get when I try your query, so I suspect
something's not quite right with your test...

But you could also try putting parentheses around the words, like
mykeywords:(Compliance+With+Conduct+Standards)

Best
Erick



I agree with Erick, your query string showed quotes, but your parsed query
did not. Using quotes, or parenthesis, would pretty much leave your query
alone. There is one exception that I've found: if you use a stopword
analyzer, any stop words would be converted to ? in the parsed query. So if
you absolutely need every single word to match, regardless, you cannot use a
field type that uses the stop word analyzer.

For example, I have two dynamic field definitions: df_text_* that does the
default text transformations (including stop words), and df_text_exact_*
that does nothing (field type is string). When I run the
query df_text_exact_company_name:Bank of America OR
df_text_company_name:Bank of America, the following is shown as my
query/parsed query when debugQuery is on:

str name=rawquerystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=querystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=parsedquery
df_text_exact_company_name:Bank of America
PhraseQuery(df_text_company_name:bank ? america)
/str
str name=parsedquery_toString
df_text_exact_company_name:Bank of America df_text_company_name:bank ?
america
/str

The difference is subtle, but important. If I were to do
df_text_company_name:Bank and America, I would still match Bank of
America. These are things that you should keep in mind when you are
creating fields for your indices.

A useful tool for seeing what SOLR does to your query terms is the Analysis
tool found in the admin panel. You can do an analysis on either a specific
field, or by a field type, and you will see a breakdown by Analyzer for
either the index, query, or both of any query that you put in. This would
definitely be useful when trying to determine why SOLR might return what it
does.

- Ken



What it turned out to be was escaping the spaces.

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

became

q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

If I tried

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

... it didn't work.  Once I removed the quotes and escaped spaces it 
worked as expected.  This seems odd since I would have expected the 
quotes to have triggered a phrase query.


Thanks for your help.

- Tod

Re: Query question

Do you want something like (Romantic AND View) OR city:Chicago^10?

Best
Erick

On Tue, Nov 2, 2010 at 10:45 AM, kenf_nc ken.fos...@realestate.com wrote:


 I can't seem to find the right formula for this. I have a need to build a
 query where one of the fields should boost the score, but not affect the
 query if there isn't a match. For example, if I have documents with
 restaurants, name, address, cuisine, description, etc.  I want to search
 on,
 say,

 Romantic AND View AND city:Chicago

 if city is in fact Chicago it should score higher, but if city is not
 Chicago (or even if it's missing the city field), but matches the other
 query parameters it should still come back in the results. Is something
 like
 this possible?  It's kind of like  q=(some query) optional boost if
 field:value.

 Thanks,
 Ken
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Query-question-tp1828367p1828367.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dynamically create new core

2010-11-02 Thread Marc Sturlese


To create the core, the folder with the confs must already exist and has to
be placed in the proper place (inside the solr home). Once you run the
create core action, this core will we added to solr.xml and dinamically
loaded.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamically-create-new-core-tp1827097p1828560.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting and maxBooleanClauses limit

2010-11-02 Thread Koji Sekiguchi


(10/11/02 23:14), Ken Stanley wrote:

I've noticed in the stack trace that this exception occurs when trying to
build the query for the highlighting; I've confirmed this by copying the
params and changing hl=true to hl=false. Unfortunately, when using
debugQuery=on, I do not see any details on what is going on with the
highlighting portion of the query (after artificially increasing the
maxBooleanClauses so the query will run).

With all of that said, my question(s) to the list are: Is there a way to
determine how exactly the highlighter is building its query (i.e., some sort
of highlighting debug setting)?


Basically I think highlighter uses main query, but try to rewrite it
before highlighting.


Is the behavior of highlighting in SOLR
intended to be held to the same restrictions (maxBooleanClauses) as the
query parser (even though the highlighting query is built internally)?


I think so because maxBooleanClauses is a static variable.

I saw your stack trace and glance at highlighter source,
my assumption is - highlighter tried to rewrite (expand) your
range queries to boolean query, even if you set requireFieldMatch to true.

Can you try to query without the range query? If the problem goes away,
I think it is highlighter bug. Highlighter should skip the range query
when user set requireFieldMatch to true, because your range query is for
another field. If so, please open a jira issue.

Koji
--
http://www.rondhuit.com/en/

Re: Phrase Query Problem?

Indeed something doesn't seem right about that, quotes are for phrases,
you are right, and I get confused even thinking about what happens when
you try to escape spaces like that.

I think there's something odd going on with your URI-escaping in
general. Here's what the string should actually look like for
mykeywords:Compliance With Conduct Standards , when put into a URI:

mykeywords%3A%22Compliance+With+Conduct+Standards%22

You really ought to escape the colon and the double quotes too, to
follow URI spec. If you weren't escaping the double-quotes, that could
explain your issue. And I seriously don't understand what putting a
backslash in the URI accomplishes in this case, it confuses me trying to
understand what's going on there, and personally I never like it when i
just try random things until something I don't understand works.

Tod wrote:

On 11/2/2010 9:21 AM, Ken Stanley wrote:

On Tue, Nov 2, 2010 at 8:19 AM, Erick Ericksonerickerick...@gmail.comwrote:

That's not the response I get when I try your query, so I suspect
something's not quite right with your test...

But you could also try putting parentheses around the words, like
mykeywords:(Compliance+With+Conduct+Standards)

Best
Erick

I agree with Erick, your query string showed quotes, but your parsed query
did not. Using quotes, or parenthesis, would pretty much leave your query
alone. There is one exception that I've found: if you use a stopword
analyzer, any stop words would be converted to ? in the parsed query. So if
you absolutely need every single word to match, regardless, you cannot use a
field type that uses the stop word analyzer.

For example, I have two dynamic field definitions: df_text_* that does the
default text transformations (including stop words), and df_text_exact_*
that does nothing (field type is string). When I run the
query df_text_exact_company_name:Bank of America OR
df_text_company_name:Bank of America, the following is shown as my
query/parsed query when debugQuery is on:

str name=rawquerystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=querystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=parsedquery
df_text_exact_company_name:Bank of America
PhraseQuery(df_text_company_name:bank ? america)
/str
str name=parsedquery_toString
df_text_exact_company_name:Bank of America df_text_company_name:bank ?
america
/str

The difference is subtle, but important. If I were to do
df_text_company_name:Bank and America, I would still match Bank of
America. These are things that you should keep in mind when you are
creating fields for your indices.

A useful tool for seeing what SOLR does to your query terms is the Analysis
tool found in the admin panel. You can do an analysis on either a specific
field, or by a field type, and you will see a breakdown by Analyzer for
either the index, query, or both of any query that you put in. This would
definitely be useful when trying to determine why SOLR might return what it
does.

- Ken

What it turned out to be was escaping the spaces.

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

became

q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

If I tried

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

... it didn't work. Once I removed the quotes and escaped spaces it
worked as expected. This seems odd since I would have expected the
quotes to have triggered a phrase query.

Thanks for your help.

- Tod

Re: Query question

2010-11-02 Thread kenf_nc


Jonathan, Dismax is something I've been meaning to look into, and bq does
seem to fit the bill, although I'm worried about this line in the wiki
 :TODO:  That latter part is deprecated behavior but still works. It can be
problematic so avoid it.
It still seems to be the closest to what I want however so I'll play with
it.

Erick, that query would return all restaurants in Chicago, whether they
matched Romantic View or not. Although the scores should sort relevant
results to the top, the results would still contain a lot of things I wasn't
interested in.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-question-tp1828367p1828639.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query question

Don't worry about that line.  It just means that one particular kind of 
'default' behavior in bq shouldn't be relied upon, if you don't entirely 
understand that behavior they're saying is deprecated (as I don't 
either!) anyway, don't worry about it, just supply an explicit boost in 
your bq.


bq isn't going anywhere, it is a stable and well-used part of dismax.

kenf_nc wrote:

Jonathan, Dismax is something I've been meaning to look into, and bq does
seem to fit the bill, although I'm worried about this line in the wiki
 :TODO:  That latter part is deprecated behavior but still works. It can be
problematic so avoid it.
It still seems to be the closest to what I want however so I'll play with
it.

Erick, that query would return all restaurants in Chicago, whether they
matched Romantic View or not. Although the scores should sort relevant
results to the top, the results would still contain a lot of things I wasn't
interested in.

Re: Highlighting and maxBooleanClauses limit

2010-11-02 Thread Markus Jelsma

Hmm, i'm not sure it's the highlighter alone. Depending on the query it can 
also get triggered by the spellcheck component. See below what happens with a 
maxBoolean = 16.

HTTP ERROR: 500

maxClauseCount is set to 16

org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 
16
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:153)
at 
org.apache.lucene.search.spell.SpellChecker.add(SpellChecker.java:329)
at 
org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:260)
at 
org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:140)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:140)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)



On Tuesday 02 November 2010 16:26:00 Koji Sekiguchi wrote:
 (10/11/02 23:14), Ken Stanley wrote:
  I've noticed in the stack trace that this exception occurs when trying to
  build the query for the highlighting; I've confirmed this by copying the
  params and changing hl=true to hl=false. Unfortunately, when using
  debugQuery=on, I do not see any details on what is going on with the
  highlighting portion of the query (after artificially increasing the
  maxBooleanClauses so the query will run).
  
  With all of that said, my question(s) to the list are: Is there a way to
  determine how exactly the highlighter is building its query (i.e., some
  sort of highlighting debug setting)?
 
 Basically I think highlighter uses main query, but try to rewrite it
 before highlighting.
 
  Is the behavior of highlighting in SOLR
  intended to be held to the same restrictions (maxBooleanClauses) as the
  query parser (even though the highlighting query is built internally)?
 
 I think so because maxBooleanClauses is a static variable.
 
 I saw your stack trace and glance at highlighter source,
 my assumption is - highlighter tried to rewrite (expand) your
 range queries to boolean query, even if you set requireFieldMatch to true.
 
 Can you try to query without the range query? If the problem goes away,
 I think it is highlighter bug. Highlighter should skip the range query
 when user set requireFieldMatch to true, because your range query is for
 another field. If so, please open a jira issue.
 
 Koji

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Re: Highlighting and maxBooleanClauses limit

2010-11-02 Thread Ken Stanley

On Tue, Nov 2, 2010 at 11:26 AM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (10/11/02 23:14), Ken Stanley wrote:

 I've noticed in the stack trace that this exception occurs when trying to
 build the query for the highlighting; I've confirmed this by copying the
 params and changing hl=true to hl=false. Unfortunately, when using
 debugQuery=on, I do not see any details on what is going on with the
 highlighting portion of the query (after artificially increasing the
 maxBooleanClauses so the query will run).

 With all of that said, my question(s) to the list are: Is there a way to
 determine how exactly the highlighter is building its query (i.e., some
 sort
 of highlighting debug setting)?


 Basically I think highlighter uses main query, but try to rewrite it
 before highlighting.


  Is the behavior of highlighting in SOLR
 intended to be held to the same restrictions (maxBooleanClauses) as the
 query parser (even though the highlighting query is built internally)?


 I think so because maxBooleanClauses is a static variable.

 I saw your stack trace and glance at highlighter source,
 my assumption is - highlighter tried to rewrite (expand) your
 range queries to boolean query, even if you set requireFieldMatch to true.

 Can you try to query without the range query? If the problem goes away,
 I think it is highlighter bug. Highlighter should skip the range query
 when user set requireFieldMatch to true, because your range query is for
 another field. If so, please open a jira issue.

 Koji
 --
 http://www.rondhuit.com/en/


Koji, that is most excellent. Thank you for pointing out that the range
queries were causing the highlighter to exceed the maxBooleanClauses. Once I
removed them from my main query (and moved them into separate filter
queries), SOLR and highlighting worked as I expected them to work.

Per your suggestion, I have opened a JIRA ticket (SOLR-2216) for this
problem. I am somewhat a novice at Java, and I have not yet had the pleasure
of getting the SOLR sources in my working environment, but I would be more
than eager to potentially assist in finding a solution - with maybe some
mentoring from a more experienced developer.

Anyway, thank you again, I am very excited to have a suitable work around
for the time being.

- Ken Stanley

IndexableBinaryStringTools (was FieldCache)

2010-11-02 Thread Mathias Walter

Hi,

  [...] I tried to use IndexableBinaryStringTools to re-encode my 11 byte
  array. The size was increased to 7 characters (= 14 bytes)
  which is still a gain of more than 50 percent compared to the UTF8
  encoding. BTW: I found no sample how to use the
  IndexableBinaryStringTools class except in the unit tests.
 
 IndexableBinaryStringTools will eventually be deprecated and then dropped, in 
 favor of native
 indexable/searchable binary terms.  More work is required before these are 
 possible, though.
 
 Well-maintained unit tests are not a bad way to describe functionality...

Sure, but there is no unit test for Solr.

  I assume that the char[] returned form IndexableBinaryStringTools.encode
  is encoded in UTF-8 again and then stored. At some point
  the information is lost and cannot be recovered.
 
 Can you give an example?  This should not happen.

It's hard to give an example output, because the binary string representation 
contains unprintiple characters. I'll try to explain what I'm doing.

My character array returned by IndexableBinaryStringTools.encode looks like 
following:

char[] encoded = new char[] {0, 8508, 3392, 64, 0, 8, 0, 0};

Then I add it to a SolrInputDocument:

SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, new String(encoded));

If I now print the SolrInputDocument using System.out.println(doc), the String 
representation of the character array is correct.

Then I add it to a RAMDirectory:

ArrayListSolrInputDocument docs = new ArrayListSolrInputDocument();
docs.add(doc);
solrServer.add(docs);
solrServer.commit();

... and immediately retrieve it like follows:

SolrQuery query = new SolrQuery();
query.setQuery(*:*);
QueryResponse rsp = solrServer.query(query);
SolrDocumentList docList = rsp.getResults();
System.out.println(docList);

Now the string representation of the SolrDocuments ID looks different than that 
of the SolrInputDocument.

If I do not create a new string in doc.addField, just the string representation 
of the array address will be added the the SolrInputDocument.

BTW: I've tested it with EmbeddedSolrServer and Solr/Lucene trunk.

Why has the string representation changed? From the changed string I cannot 
decode the correct ID.

--
Kind regards,
Mathias

Re: Possible memory leaks with frequent replication

2010-11-02 Thread Simon Wistow

On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said:
 You should query against the indexer. I'm impressed that you got 5s
 replication to work reliably.

That's our current solution - I was just wondering if there was anything 
I was missing. 

Thanks!

Re: Possible memory leaks with frequent replication

2010-11-02 Thread Yonik Seeley

On Tue, Nov 2, 2010 at 12:32 PM, Simon Wistow si...@thegestalt.org wrote:
 On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said:
 You should query against the indexer. I'm impressed that you got 5s
 replication to work reliably.

 That's our current solution - I was just wondering if there was anything
 I was missing.

You could also try dialing down maxWarmingSearchers to 1 - that should
prevent multiple searchers warming at the same time and may be the
source of you running out of memory.

-Yonik
http://www.lucidimagination.com

Re: Possible memory leaks with frequent replication

It's definitely a known 'issue' that you can't replicate (or do any 
other kind of index change, including a commit) at a faster frequency 
than your warming queries take to complete, or you'll wind up with 
something like you've seen.


It's in some documentation somewhere I saw, for sure.

The advice to 'just query against the master' is kind of odd, because, 
then... why have a slave at all, if you aren't going to query against 
it?  I guess just for backup purposes.


But even with just one solr, or querying master, if you commit at rate 
such that commits come before the warming queries can complete, you're 
going to have the same issue.


The only answer I know of is Don't commit (or replicate) at a faster 
rate than it takes your warming to complete.  You can reduce your 
warming queries/operations, or reduce your commit/replicate frequency.


Would be interesting/useful if Solr noticed this going on, and gave you 
some kind of error in the log (or even an exception when started with a 
certain parameter for testing) Overlapping warming queries, you're 
committing too fast or something. Because it's easy to make this happen 
without realizing it, and then your Solr does what Simon says, runs out 
of RAM and/or uses a whole lot of CPU and disk io.


Lance Norskog wrote:

You should query against the indexer. I'm impressed that you got 5s
replication to work reliably.

On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow si...@thegestalt.org wrote:
  

We've been trying to get a setup in which a slave replicates from a
master every few seconds (ideally every second but currently we have it
set at every 5s).

Everything seems to work fine until, periodically, the slave just stops
responding from what looks like it running out of memory:

org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.OutOfMemoryError: Java heap space


(our monitoring seems to confirm this).

Looking around my suspicion is that it takes new Readers longer to warm
than the gap between replication and thus they just build up until all
memory is consumed (which, I suppose isn't really memory 'leaking' per
se, more just resource consumption)

That said, we've tried turning off caching on the slave and that didn't
help either so it's possible I'm wrong.

Is there anything we can do about this? I'm reluctant to increase the
heap space since I suspect that will mean that there's just a longer
period between failures. Might Zoie help here? Or should we just query
against the Master?


Thanks,

Simon

Solr like for autocomplete field?

2010-11-02 Thread PeterKerk


I have a city field. Now when a user starts typing in a city textbox I want
to return found matches (like Google).

So for example, user types new, and I will return new york, new
hampshire etc.

my schema.xml

field name=city type=string indexed=true stored=true/

my current url:

http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=idfacet.field=cityfq=city:new


Basically 2 questions here:
1. is the url Im using the best practice when implementing autocomplete?
What I wanted to do, is use the facets for found matches.
2. How can I match PART of the cityname just like the SQL LIKE command,
cityname LIKE '%userinput'


Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-like-for-autocomplete-field-tp1829480p1829480.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr like for autocomplete field?

2010-11-02 Thread Matthew Hall

We used the filters talked about at Lucid Imagination for our site, it 
seems to work pretty well:


http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

Your mileage might vary, but its a pretty good place to start.

Matt

On 11/2/2010 1:56 PM, PeterKerk wrote:

I have a city field. Now when a user starts typing in a city textbox I want
to return found matches (like Google).

So for example, user types new, and I will return new york, new
hampshire etc.

my schema.xml

field name=city type=string indexed=true stored=true/

my current url:

http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=idfacet.field=cityfq=city:new


Basically 2 questions here:
1. is the url Im using the best practice when implementing autocomplete?
What I wanted to do, is use the facets for found matches.
2. How can I match PART of the cityname just like the SQL LIKE command,
cityname LIKE '%userinput'


Thanks!

Querying Solr using dismax, requested field not showing up in debug score boosts

2010-11-02 Thread zakuhn


I'm storing a set of products in solr as ducuments. I'm separating out the
name, description, keywords, and product category name into separate fields
so that I can boost them independently using the dismax handler. All the
fields are stored as text in the same way.

I'm passing these four fields in the fl param to the dismax handler, and I'm
also specifying them with a boost in the qf field. Not every record
(document) has a category name associated with it, but the problem I have is
that even when the category name comes back in the query results, I do not
see the boost I am applying to that field taking effect in the debug output
of the solr query.

Does anyone have an idea of why this could be?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Querying-Solr-using-dismax-requested-field-not-showing-up-in-debug-score-boosts-tp1829456p1829456.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query question

I... Need... more... coffee.

On Tue, Nov 2, 2010 at 11:31 AM, kenf_nc ken.fos...@realestate.com wrote:


 Jonathan, Dismax is something I've been meaning to look into, and bq does
 seem to fit the bill, although I'm worried about this line in the wiki
  :TODO:  That latter part is deprecated behavior but still works. It can be
 problematic so avoid it.
 It still seems to be the closest to what I want however so I'll play with
 it.

 Erick, that query would return all restaurants in Chicago, whether they
 matched Romantic View or not. Although the scores should sort relevant
 results to the top, the results would still contain a lot of things I
 wasn't
 interested in.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Query-question-tp1828367p1828639.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr like for autocomplete field?

Also, you might want to consider TermsComponent, see:

http://wiki.apache.org/solr/TermsComponent

Also, note that there's an autosuggestcomponent, that's recently been
committed.

Best
Erick

On Tue, Nov 2, 2010 at 1:56 PM, PeterKerk vettepa...@hotmail.com wrote:


 I have a city field. Now when a user starts typing in a city textbox I want
 to return found matches (like Google).

 So for example, user types new, and I will return new york, new
 hampshire etc.

 my schema.xml

 field name=city type=string indexed=true stored=true/

 my current url:


 http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=idfacet.field=cityfq=city:new


 Basically 2 questions here:
 1. is the url Im using the best practice when implementing autocomplete?
 What I wanted to do, is use the facets for found matches.
 2. How can I match PART of the cityname just like the SQL LIKE command,
 cityname LIKE '%userinput'


 Thanks!
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-like-for-autocomplete-field-tp1829480p1829480.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Querying Solr using dismax, requested field not showing up in debug score boosts

First, you should show us the query, as well as the debug output, it
often helps to have a second set of eyes...

Where are you specifying the qf? Under any circumstance it would
be helpful to see the definition of the request handler you're using.

Because as it stands, the best I can say is that I haven't a clue...

Best
Erick



On Tue, Nov 2, 2010 at 1:51 PM, zakuhn zak.k...@extrabux.com wrote:


 I'm storing a set of products in solr as ducuments. I'm separating out the
 name, description, keywords, and product category name into separate fields
 so that I can boost them independently using the dismax handler. All the
 fields are stored as text in the same way.

 I'm passing these four fields in the fl param to the dismax handler, and
 I'm
 also specifying them with a boost in the qf field. Not every record
 (document) has a category name associated with it, but the problem I have
 is
 that even when the category name comes back in the query results, I do not
 see the boost I am applying to that field taking effect in the debug output
 of the solr query.

 Does anyone have an idea of why this could be?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Querying-Solr-using-dismax-requested-field-not-showing-up-in-debug-score-boosts-tp1829456p1829456.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Updating last_modified field when using DIH

2010-11-02 Thread Juan Manuel Alvarez

Hello everyone!

I would like to ask you a question about DIH and delta import.

I am trying to sync Solr with a PostgreSQL database and I have a field
ent_lastModified of type timestamp without timezone.

Here is my xml file:

dataConfig
dataSource name=jdbc driver=org.postgresql.Driver
url=jdbc:postgresql://host user=XXX password=XXX readOnly=true
autoCommit=false
transactionIsolation=TRANSACTION_READ_COMMITTED
holdability=CLOSE_CURSORS_AT_COMMIT/
document
entity name='myEntity' dataSource='jdbc' pk='id'
query=' SELECT * FROM Entities'
deltaImportQuery='SELECT ent_id AS id FROM
Entities WHERE ent_id=${dataimporter.delta.id}'
  deltaQuery=' SELECT ent_id AS id FROM Entities WHERE
ent_lastModified gt; #39;${dataimporter.last_index_time}#39;'

/entity
/document
/dataConfig

Full-import works fine, but when I run a delta-import the
ent_lastModified field, I get the corresponding records, but the
ent_lastModified stays the same, so if I make another delta-import,
the same records are retreived.

I have read all the documentation at
http://wiki.apache.org/solr/DataImportHandler but I could not find an
update query for the last_modified field and Solr does not seem to
do this automatically.
I have also tried to name the field last_modified as in the example,
but its value keeps unchanged after a delta-import.

Can anyone point me in the right direction?

Thanks in advance!
Juan M.

RE: Stored or indexed?

2010-11-02 Thread Olson, Ron

Thanks for the great info! I appreciate everybody's help in getting started 
with Solr, hopefully I'll be able to get my stuff working and move on to more 
difficult questions. :)

-Original Message-
From: Elizabeth L. Murnane [mailto:emurn...@architexa.com]
Sent: Friday, October 29, 2010 12:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Stored or indexed?

Hi Ron,

In a nutshell - an indexed field is searchable, and a stored field has its 
content stored in the index so it is retrievable. Here are some examples that 
will hopefully give you a feel for how to set the indexed and stored options:

indexed=true stored=true
Use this for information you want to search on and also display in search 
results - for example, book title or author.

indexed=false stored=true
Use this for fields that you want displayed with search results but that don't 
need to be searchable - for example, destination URL, file system path, time 
stamp, or icon image.

indexed=true stored=false
Use this for fields you want to search on but don't need to get their values in 
search results. Here are some of the common reasons you would want this:

Large fields and a database: Storing a field makes your index larger, so set 
stored to false when possible, especially for big fields. For this case a 
database is often used, as the previous responder said. Use a separate 
identifier field to get the field's content from the database.

Ordering results: Say you define field name=bookName type=text 
indexed=true stored=true that is tokenized and used for searching. If you 
want to sort results based on book name, you could copy the field into a 
separate nonretrievable, nontokenized field that can be used just for sorting -
field name=bookSort type=string indexed=true stored=false
copyField source=bookName dest=bookSort

Easier searching: If you define the field field name=text type=text 
indexed=true stored=false multiValued=true/ you can use it as a 
catch-all field that contains all of the other text fields. Since solr looks in 
a default field when given a text query without field names, you can support 
this type of general phrase query by making the catch-all the default field.

indexed=false stored=false
Use this when you want to ignore fields. For example, the following will ignore 
unknown fields that don't match a defined field rather than throwing an error 
by default.
fieldtype name=ignored stored=false indexed=false
dynamicField name=* type=ignored

Elizabeth Murnane
emurn...@architexa.com
Architexa Lead Developer - www.architexa.com
Understand  Document Code In Seconds

--- On Thu, 10/28/10, Savvas-Andreas Moysidis 
savvas.andreas.moysi...@googlemail.com wrote:

From: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com
Subject: Re: Stored or indexed?
To: solr-user@lucene.apache.org
Date: Thursday, October 28, 2010, 4:25 AM

In our case, we just store a database id and do a secondary db query when
displaying the results.
This is handy and leads to a more centralised architecture when you need to
display properties of a domain object which you don't index/search.

On 28 October 2010 05:02, kenf_nc ken.fos...@realestate.com wrote:

 Interesting wiki link, I hadn't seen that table before.

 And to answer your specific question about indexed=true, stored=false, this
 is most often done when you are using analyzers/tokenizers on your field.
 This field is for search only, you would never retrieve it's contents for
 display. It may in fact be an amalgam of several fields into one 'content'
 field. You have your display copy stored in another field marked
 indexed=false, stored=true and optionally compressed. I also have simple
 string fields set to lowercase so searching is case-insensitive, and have a
 duplicate field where the string is normal case. the first one is
 indexed/not stored, the second is stored/not indexed.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Stored-or-indexed-tp1782805p1784315.html
 Sent from the Solr - User mailing list archive at Nabble.com.

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.

Re: Query question

2010-11-02 Thread Ahmet Arslan

 Erick, that query would return all restaurants in Chicago,
 whether they
 matched Romantic View or not. Although the scores should
 sort relevant
 results to the top, the results would still contain a lot
 of things I wasn't
 interested in.

How about this one?

+(city:Chicago^1000 OR (*:* -city:Chicago)) +Romantic +View

Re: Querying Solr using dismax, requested field not showing up in debug score boosts

2010-11-02 Thread zakuhn


Ok, here is the query cleaned up a bit:

solr/select/?q=mattress
q.op=AND
qt=dismaxfl=name%2Cdescription%2Cgroup_id%2Clowest_price%2Cnum_child_products%2Craw_category_string%2Ccategory_id%2Cparent_category_id%2Cstr_brand%2Cgrandparent_category_id%2Cgrandparent_category_name%2Cparent_category_name%2Ccategory_name
start=0
rows=25
indent=on
wt=php
version=2.2
mm=35%
ps=0
qs=0
sort=score%20desc
fq=-parent_id:%5B%2A%20TO%20%2A%5Dfq=-num_child_products:%5B%2A%20TO%201%5Dfq=-parent_group_id:%5B%2A%20TO%20%2A%5Dqf=keywords%5E.5%20description%5E1.5%20brand%5E0.7%20manufacturer_model%5E4%20name%5E5%20upc%5E1%20isbn%5E1%20raw_category_string%5E.8%20category_id%5E1%20str_brand%5E1%20grandparent_category_name%5E1%20parent_category_name%5E2%20category_name%5E3
facet=true
facet.limit=150
facet.mincount=1
facet.offset=0
facet.field=str_brand
facet.field=grandparent_category_id
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Querying-Solr-using-dismax-requested-field-not-showing-up-in-debug-score-boosts-tp1829456p1831414.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stored or indexed?

IMO, the very, very best way to increase your grasp of all things Solr is
to
try to answer questions on this list. Folks are pretty gentle about
correcting
mistaken posts. And I certainly remember any advice I've given that's been
corrected G.

Besides, if you try to answer the things you *do* understand, it leave more
time for the committers to answer *your* questions G...

Best
Erick

On Tue, Nov 2, 2010 at 4:39 PM, Olson, Ron rol...@lbpc.com wrote:

 Thanks for the great info! I appreciate everybody's help in getting started
 with Solr, hopefully I'll be able to get my stuff working and move on to
 more difficult questions. :)

 -Original Message-
 From: Elizabeth L. Murnane [mailto:emurn...@architexa.com]
 Sent: Friday, October 29, 2010 12:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Stored or indexed?

 Hi Ron,

 In a nutshell - an indexed field is searchable, and a stored field has its
 content stored in the index so it is retrievable. Here are some examples
 that will hopefully give you a feel for how to set the indexed and stored
 options:

 indexed=true stored=true
 Use this for information you want to search on and also display in search
 results - for example, book title or author.

 indexed=false stored=true
 Use this for fields that you want displayed with search results but that
 don't need to be searchable - for example, destination URL, file system
 path, time stamp, or icon image.

 indexed=true stored=false
 Use this for fields you want to search on but don't need to get their
 values in search results. Here are some of the common reasons you would want
 this:

 Large fields and a database: Storing a field makes your index larger, so
 set stored to false when possible, especially for big fields. For this case
 a database is often used, as the previous responder said. Use a separate
 identifier field to get the field's content from the database.

 Ordering results: Say you define field name=bookName type=text
 indexed=true stored=true that is tokenized and used for searching. If
 you want to sort results based on book name, you could copy the field into a
 separate nonretrievable, nontokenized field that can be used just for
 sorting -
 field name=bookSort type=string indexed=true stored=false
 copyField source=bookName dest=bookSort

 Easier searching: If you define the field field name=text type=text
 indexed=true stored=false multiValued=true/ you can use it as a
 catch-all field that contains all of the other text fields. Since solr looks
 in a default field when given a text query without field names, you can
 support this type of general phrase query by making the catch-all the
 default field.

 indexed=false stored=false
 Use this when you want to ignore fields. For example, the following will
 ignore unknown fields that don't match a defined field rather than throwing
 an error by default.
 fieldtype name=ignored stored=false indexed=false
 dynamicField name=* type=ignored


 Elizabeth Murnane
 emurn...@architexa.com
 Architexa Lead Developer - www.architexa.com
 Understand  Document Code In Seconds


 --- On Thu, 10/28/10, Savvas-Andreas Moysidis 
 savvas.andreas.moysi...@googlemail.com wrote:

 From: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com
 Subject: Re: Stored or indexed?
 To: solr-user@lucene.apache.org
 Date: Thursday, October 28, 2010, 4:25 AM

 In our case, we just store a database id and do a secondary db query when
 displaying the results.
 This is handy and leads to a more centralised architecture when you need to
 display properties of a domain object which you don't index/search.

 On 28 October 2010 05:02, kenf_nc ken.fos...@realestate.com wrote:

 
  Interesting wiki link, I hadn't seen that table before.
 
  And to answer your specific question about indexed=true, stored=false,
 this
  is most often done when you are using analyzers/tokenizers on your field.
  This field is for search only, you would never retrieve it's contents for
  display. It may in fact be an amalgam of several fields into one
 'content'
  field. You have your display copy stored in another field marked
  indexed=false, stored=true and optionally compressed. I also have simple
  string fields set to lowercase so searching is case-insensitive, and have
 a
  duplicate field where the string is normal case. the first one is
  indexed/not stored, the second is stored/not indexed.
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Stored-or-indexed-tp1782805p1784315.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 

 DISCLAIMER: This electronic message, including any attachments, files or
 documents, is intended only for the addressee and may contain CONFIDENTIAL,
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended
 recipient, you are hereby notified that any use, disclosure, copying or
 distribution of this message or any of the information included in or with
 it is  unauthorized and

Re: Influencing scores on values in multiValue fields

2010-11-02 Thread Imran

Thanks Mike for your suggestion. It did take me down the correct route. I
basically created another multiValue field of type 'string' and boosted
that. To get the partial matches to avoid the length normalisation I had the
'text' type multiValue field to omitNorms. The results look as per expected
so far on this configuration.

Cheers
-- Imran

On Fri, Oct 29, 2010 at 1:09 PM, Michael Sokolov soko...@ifactory.comwrote:

 How about creating another field for doing exact matches (a string);
 searching both and boosting the string match?

 -Mike

  -Original Message-
  From: Imran [mailto:imranboho...@gmail.com]
  Sent: Friday, October 29, 2010 6:25 AM
  To: solr-user@lucene.apache.org
  Subject: Influencing scores on values in multiValue fields
 
  Hi All
 
  We've got an index in which we have a multiValued field per document.
 
  Assume the multivalue field values in each document to be;
 
  Doc1:
  bar lifters
 
  Doc2:
  truck tires
  back drops
  bar lifters
 
  Doc 3:
  iron bar lifters
 
  Doc 4:
  brass bar lifters
  iron bar lifters
  tire something
  truck something
  oil gas
 
  Now when we search for 'bar lifters' the expectation (based on the
  requirements) is that we get results in the order of Doc1,
  Doc 2, Doc4 and Doc3.
  Doc 1 - since there's an exact match (and only one) for the
  search terms Doc 2 - since ther'e an exact match amongst the
  values Doc 4 - since there's a partial match on the values
  but the number of matches are more than Doc 3 Doc 3 - since
  there's a partial match
 
  However, the results come out as Doc1, Doc3, Doc2, Doc4.
  Looking at the explaination of the result it appears Doc 2 is
  loosing to Doc3 and Doc 4 is loosing to Doc3 based on length
  normalisation.
 
  We think we can see the reason for that - the field length in
  doc2 is greater than doc3 and doc 4 is greater doc3.
  However, is there any mechanism I can force doc2 to beat doc3
  and doc4 to beat doc3 with this structure.
 
  We did look at using omitNorms=true, but that messes up the
  scores for all docs. The result comes out as Doc4, Doc1,
  Doc2, Doc3 (where Doc1, Doc2 and
  Doc3 gets the same score)
  This is because the fieldNorm is not taken into account anymore (as
  expected) and the termFrequence being the only contributing
  factor. So trying to avoid length normalisation through
  omitNorms is not helping.
 
  Is there anyway where we can influence an exact match of a
  value in a multiValue field to add on to the overall score
  whilst keeping the lenght normalisation?
 
  Hope that makes sense.
 
  Cheers
  -- Imran

Re: Query question

2010-11-02 Thread Michael Sokolov


My impression was that

city:Chicago^10 +Romantic +View

would do what you want (with the standard lucene query parser and 
default operator OR), and I'm not sure about this, but I have a feeling 
that the version with Boolean operators AND/OR and parens might 
actually net out to the same thing, since under the hood all the terms 
have to be translated into optional, required or forbidden: lucene 
doesn't actually have true binary boolean operators.  At least that was 
the impression I got after some discussion at a recent conference.  I 
may have misunderstood - if so, could someone who knows set me straight?


Thanks

-Mike

On 11/2/2010 5:08 PM, Ahmet Arslan wrote:

Erick, that query would return all restaurants in Chicago,
whether they
matched Romantic View or not. Although the scores should
sort relevant
results to the top, the results would still contain a lot
of things I wasn't
interested in.

How about this one?

+(city:Chicago^1000 OR (*:* -city:Chicago)) +Romantic +View

Re: xpath processing

2010-11-02 Thread pghorpade



?xml version=1.0 encoding=UTF-8?
mods:mods xmlns:mods=http://www.loc.gov/mods/v3;  
xmlns:xlink=http://www.w3.org/1999/xlink;  
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;  
xsi:schemaLocation=http://www.loc.gov/mods/v3 
http://www.loc.gov/standards/mods/v3/mods-3-0.xsd;

mods:titleInfo
mods:titleAny place I hang my hat is home/mods:title
/mods:titleInfo
mods:titleInfo type=uniform
mods:titleSt. Louis woman/mods:title
mods:partNameAny place I hang my hat is home/mods:partName
/mods:titleInfo
mods:titleInfo type=alternative
mods:titleFree an' easy that's my style/mods:title
/mods:titleInfo
mods:name type=personal
mods:namePartArlen, Harold/mods:namePart
mods:namePart type=date1905-1986/mods:namePart
mods:role
mods:roleTerm authority=marcrelator  
type=textcreator/mods:roleTerm

/mods:role
/mods:name
mods:name type=personal
mods:namePartMercer, Johnny/mods:namePart
mods:namePart type=date1909-/mods:namePart
/mods:name
mods:name type=personal
mods:namePartDavison, R./mods:namePart
/mods:name
mods:name type=personal
mods:namePartBontemps, Arna Wendell/mods:namePart
mods:namePart type=date1902-1973/mods:namePart
/mods:name
mods:name type=personal
mods:namePartCullen, Countee/mods:namePart
mods:namePart type=date1903-1946/mods:namePart
/mods:name
mods:typeOfResourcenotated music/mods:typeOfResource
mods:originInfo
mods:place
mods:placeTerm authority=marccountry  
type=codenyu/mods:placeTerm

/mods:place
mods:place
mods:placeTerm type=textNew York/mods:placeTerm
/mods:place
mods:publisherDe Sylva, Brown amp; Henderson, Inc./mods:publisher
mods:dateIssuedc1946/mods:dateIssued
mods:dateIssued encoding=marc1946/mods:dateIssued
mods:issuancemonographic/mods:issuance
mods:dateOther type=normalized1946/mods:dateOther
mods:dateOther type=normalized1946/mods:dateOther
/mods:originInfo
mods:language
mods:languageTerm authority=iso639-2b  
type=codeeng/mods:languageTerm

/mods:language
mods:physicalDescription
mods:form authority=marcformprint/mods:form
mods:extent1 vocal score (5 p.) : ill. ; 31 cm./mods:extent
/mods:physicalDescription
mods:note type=statement of responsibilitymusic by Harold  
Arlen ; lyrics by Johnny Mercer./mods:note

mods:noteFor voice and piano./mods:note
mods:noteIncludes chord symbols./mods:note
mods:noteIllustration by R. Davison./mods:note
mods:noteFirst line: Free an' easy that's my style./mods:note
mods:noteEdward Gross presents St. Louis Woman ... Book by  
Arna Bontemps amp; Countee Cullen -- Cover./mods:note

mods:notePublisher's advertising includes musical incipits./mods:note
mods:subject authority=lcsh
mods:topicMotion picture music/mods:topic
mods:topicExcerpts/mods:topic
mods:topicVocal scores with piano/mods:topic
/mods:subject
mods:classification authority=lccM1 .S8/mods:classification
mods:identifier type=music plate1403-4 De Sylva, Brown  
Henderson, Inc./mods:identifier

mods:location
mods:physicalLocationLilly Library, Indiana University  
Bloomington/mods:physicalLocation

/mods:location
mods:recordInfo
mods:recordContentSource  
authority=marcorgIUL/mods:recordContentSource
mods:recordCreationDate  
encoding=marc990316/mods:recordCreationDate

mods:recordIdentifierLL-SSM-ALC4888/mods:recordIdentifier
/mods:recordInfo
/mods:mods

Above is my sample xml

dataConfig
dataSource name=myfilereader type=FileDataSource/
document
entity name=f rootEntity=false dataSource=null  
processor=FileListEntityProcessor fileName=.*xml recursive=true  
baseDir=C:\test_xml
entity name=x dataSource=myfilereader  
processor=XPathEntityProcessor url=${f.fileAbsolutePath}  
stream=false forEach=/mods  
transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer

field column=id template=${f.file}/
field column=collectionKey template=uw/
field column=collectionName template=University of Washington  
Pacific Northwest Sheet Music Collection/

field column=fileAbsolutePath template=${f.fileAbsolutePath}/
field column=fileName template=${f.file}/
field column=fileSize template=${f.fileSize}/
field column=fileLastModified template=${f.fileLastModified}/
field column=nameNamePart_keyword xpath=/mods/name/namepa...@type  
!= 'date']/

/entity
/entity
/document
/dataConfig

above is the data config file
The namePart element in the above xml may or may not have type attribute.
How can i get data from the namePart element which has no type attribute?
xpath=/mods/name/namepa...@type != 'date'] This is not working. I  
dont get any errors ,There is no namePart_keyword in the index.



Quoting Ken Stanley doh...@gmail.com:

Re: Ensuring stable timestamp ordering

2010-11-02 Thread Dennis Gearon

memory's cheap! (I know processing it is not' though )

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Toke Eskildsen t...@statsbiblioteket.dk
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Mon, November 1, 2010 11:45:34 PM
Subject: RE: Ensuring stable timestamp ordering

Dennis Gearon [gear...@sbcglobal.net] wrote:
 how about a timrstamp with either a GUID appended on  the end of it?

Since long (8 bytes) is the largest atomic type supported by Java, this would 
have to be represented as a String (or rather BytesRef) and would take up 4 + 
32 
bytes + 2 * 4 bytes from the internal BytesRef-attributes + some extra 
overhead. 
That is quite a large memory penalty to ensure unique timestamps.

Re: using HebMorph

I don't know the paths in the Solr package for Ubuntu. In the Solr
apache release, you go to the example/ directory. The example/solr
directory needs a new lib directory, and you copy the jars to there.
Then run 'java -jar start.jar' still in the example/ directory.

Solr should start. Now, you need to study example/solr/conf/schema.xml
and look at what Analyzers do.  Good luck!

On Tue, Nov 2, 2010 at 12:37 AM, mark peleus mark.pel...@gmail.com wrote:
 Hi

 I'm trying to use HebMorph, a new Hebrew analyzer.
 http://github.com/itaifrenkel/HebMorph/tree/master/java/

 The instructions says:

   1. Download the code from
 herehttp://github.com/synhershko/HebMorph/tree/master/java/
   .
   2. Use the hebmorph ant
 scripthttp://github.com/synhershko/HebMorph/blob/master/java/hebmorph/build.xmlto
 build the hebmorph project.
   3. Use the lucene.hebrew ant
 scripthttp://github.com/synhershko/HebMorph/blob/master/java/lucene.hebrew/build.xmlto
 build the lucene.hebrew project.
   4. Copy both jar files to the solr/lib folder.
   5. Edit your solr/conf/schema.xml file to use the analyzer you choose to
   use.

 I've installed the Solr package under ubuntu Lucyd.
 I've completed steps 1-3.

 Where do I put the jar files?
 How do I make Solr use the analyzer?

 Thanks




-- 
Lance Norskog
goks...@gmail.com

Re: how to get TermVectorComponent using xml , vs. SOLR-949

TVC is in Solr 1.4 onwards. It is configured in
example/solr/conf/solrconfig.xml as 'tvrh'.
This is not a solr/url thing, so you have to say
solr/select?q=word'qt=tvrh' and look at the bottom of the xml.

On Tue, Nov 2, 2010 at 5:34 AM, Will Milspec will.mils...@gmail.com wrote:
 Hi all,

 This seems a basic question: what's the best way to get
 TermVectorComponents. from the Solr XmL response?

 SolrJ does not include TermVectorComponents in its api; the SOLR-949 patch
 adds this ability, but after 2 years it's still not in the mainline. (And
 doesn't patch cleanly to the current head 1.4).

 I'm new to Solr and familiar with the SolrJ but not as the best means for
 getting/parsing the raw xml.  (Typically I find the dtd and right code to
 parse the dom using the dtd. In this case I've seen a few examples, but
 nothing definiive)

 Our team would rather use the out of the box solr rather than manually
 apply patches and worry about consistency during upgrades...

 Thanks in advance,

 will




-- 
Lance Norskog
goks...@gmail.com

Re: Disk usage per-field

The Lucene CheckIndex program opens an index and reads many types of
data from it. It's easy to start with it and change that to count up
the space used by terms and store data for field X.

On Tue, Nov 2, 2010 at 5:51 AM, Muneeb Ali muneeba...@hotmail.com wrote:

 Hi,

 I am currently benchmarking solr index with different fields to see the
 impact on its size/ search speed etc. A feature to find the disk usage per
 field of index would be really handy and save me alot of time. Do we have
 any updates on this?

 Has anyone tried writing custom code for it ?

 - Muneeb
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Disk-usage-per-field-tp934765p1827739.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com

Re: How to use polish stemmer - Stempel - in schema.xml?

Here's the problem: Solr is a little dumb about these Filter classes,
and so you have to make a Factory object for the Stempel Filter.

There are a lot of other FilterFactory classes. You would have to just
copy one and change the names to Stempel and it might actually work.

This will take some Solr programming- perhaps the author can help you?

On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa jakub.god...@gmail.com wrote:
 Sorry, I am not Java programmer at all. I would appreciate more
 verbose (or step by step) help.

 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:

 So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
 In this case I would assume a file StempelTokenFilterFactory.class
 in your directory org/getopt/solr/analysis/.

 And a class which extends the BaseTokenFilterFactory rigth?
 ...
 public class StempelTokenFilterFactory extends BaseTokenFilterFactory 
 implements ResourceLoaderAware {
 ...



 Am 02.11.2010 14:20, schrieb Jakub Godawa:
 This is what stempel-1.0.jar consist of after jar -xf:

 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
 org/:
 egothor  getopt

 org/egothor:
 stemmer

 org/egothor/stemmer:
 Cell.class     Diff.class    Gener.class  MultiTrie2.class
 Optimizer2.class  Reduce.class        Row.class    TestAll.class
 TestLoad.class  Trie$StrEnum.class
 Compile.class  DiffIt.class  Lift.class   MultiTrie.class
 Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
 Trie.class

 org/getopt:
 stempel

 org/getopt/stempel:
 Benchmark.class  lucene  Stemmer.class

 org/getopt/stempel/lucene:
 StempelAnalyzer.class  StempelFilter.class
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
 META-INF/:
 MANIFEST.MF
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
 res:
 tables

 res/tables:
 readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
 stemmer_200.out  stemmer_500.out  stemmer_700.out

 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 if you unzip your stempel-1.0.jar do you have the
 required directory structure and file in there?
 org/getopt/stempel/lucene/StempelFilter.class

 Regards,
 Bernd

 Am 02.11.2010 13:54, schrieb Jakub Godawa:
 Erick I've put the jar files like that before. I also added the
 directive and put the file in instanceDir/lib

 What is still a problem is that even the files are loaded:
 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 INFO: Adding 
 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
 to classloader

 I am not able to use the FilterFactory... maybe I am attempting it in
 a wrong way?

 Cheers,
 Jakub Godawa.

 2010/11/2 Erick Erickson erickerick...@gmail.com:
 The polish stemmer jar file needs to be findable by Solr, if you copy
 it to solr_home/lib and restart solr you should be set.

 Alternatively, you can add another lib directive to the solrconfig.xml
 file
 (there are several examples in that file already).

 I'm a little confused about not being able to find TokenFilter, is that
 still
 a problem?

 HTH
 Erick

 On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com 
 wrote:

 Thank you Bernd! I couldn't make it run though. Here is my problem:

 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
 directive: lib path=../lib/stempel-1.0.jar /
 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

 (...)
  !-- Polish --
   fieldType name=text_pl class=solr.TextField
    analyzer
       tokenizer class=solr.WhitespaceTokenizerFactory/
      filter class=solr.LowerCaseFilterFactory/
      filter class=org.getopt.stempel.lucene.StempelFilter /
      !--    filter
 class=org.getopt.solr.analysis.StempelTokenFilterFactory
 protected=protwords.txt / --
    /analyzer
  /fieldType
 (...)

 4. jar file is loaded but I got an error:
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
      at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 (...)

 5. Different class gave me that one:
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.getopt.solr.analysis.StempelTokenFilterFactory'
      at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
      at
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
 (...)

 Question is: How to make fieldType / and filter / work with that
 Stempel? :)

 Cheers,
 Jakub Godawa.

 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 I have ported the KStemmer for use in most recent Solr trunk version.
 My stemmer is located in the lib directory of Solr
 solr/lib/KStemmer-2.00.jar
 because it belongs to Solr.

 Write it as FilterFactory and use it as

Re: Possible memory leaks with frequent replication

Isn't that what this code does?

  onDeckSearchers++;
  if (onDeckSearchers  1) {
// should never happen... just a sanity check
log.error(logid+ERROR!!! onDeckSearchers is  + onDeckSearchers);
onDeckSearchers=1;  // reset
  } else if (onDeckSearchers  maxWarmingSearchers) {
onDeckSearchers--;
String msg=Error opening new searcher. exceeded limit of
maxWarmingSearchers=+maxWarmingSearchers + , try again later.;
log.warn(logid++ msg);
// HTTP 503==service unavailable, or 409==Conflict
throw new
SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true);
  } else if (onDeckSearchers  1) {
log.info(logid+PERFORMANCE WARNING: Overlapping
onDeckSearchers= + onDeckSearchers);
  }


On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 It's definitely a known 'issue' that you can't replicate (or do any other
 kind of index change, including a commit) at a faster frequency than your
 warming queries take to complete, or you'll wind up with something like
 you've seen.

 It's in some documentation somewhere I saw, for sure.

 The advice to 'just query against the master' is kind of odd, because,
 then... why have a slave at all, if you aren't going to query against it?  I
 guess just for backup purposes.

 But even with just one solr, or querying master, if you commit at rate such
 that commits come before the warming queries can complete, you're going to
 have the same issue.

 The only answer I know of is Don't commit (or replicate) at a faster rate
 than it takes your warming to complete.  You can reduce your warming
 queries/operations, or reduce your commit/replicate frequency.

 Would be interesting/useful if Solr noticed this going on, and gave you some
 kind of error in the log (or even an exception when started with a certain
 parameter for testing) Overlapping warming queries, you're committing too
 fast or something. Because it's easy to make this happen without realizing
 it, and then your Solr does what Simon says, runs out of RAM and/or uses a
 whole lot of CPU and disk io.

 Lance Norskog wrote:

 You should query against the indexer. I'm impressed that you got 5s
 replication to work reliably.

 On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow si...@thegestalt.org wrote:


 We've been trying to get a setup in which a slave replicates from a
 master every few seconds (ideally every second but currently we have it
 set at every 5s).

 Everything seems to work fine until, periodically, the slave just stops
 responding from what looks like it running out of memory:

 org.apache.catalina.core.StandardWrapperValve invoke
 SEVERE: Servlet.service() for servlet jsp threw exception
 java.lang.OutOfMemoryError: Java heap space


 (our monitoring seems to confirm this).

 Looking around my suspicion is that it takes new Readers longer to warm
 than the gap between replication and thus they just build up until all
 memory is consumed (which, I suppose isn't really memory 'leaking' per
 se, more just resource consumption)

 That said, we've tried turning off caching on the slave and that didn't
 help either so it's possible I'm wrong.

 Is there anything we can do about this? I'm reluctant to increase the
 heap space since I suspect that will mean that there's just a longer
 period between failures. Might Zoie help here? Or should we just query
 against the Master?


 Thanks,

 Simon










-- 
Lance Norskog
goks...@gmail.com

Re: Solr like for autocomplete field?

And the SpellingComponent.

There's nothing to help you with phrases.

On Tue, Nov 2, 2010 at 11:21 AM, Erick Erickson erickerick...@gmail.com wrote:
 Also, you might want to consider TermsComponent, see:

 http://wiki.apache.org/solr/TermsComponent

 Also, note that there's an autosuggestcomponent, that's recently been
 committed.

 Best
 Erick

 On Tue, Nov 2, 2010 at 1:56 PM, PeterKerk vettepa...@hotmail.com wrote:


 I have a city field. Now when a user starts typing in a city textbox I want
 to return found matches (like Google).

 So for example, user types new, and I will return new york, new
 hampshire etc.

 my schema.xml

 field name=city type=string indexed=true stored=true/

 my current url:


 http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=idfacet.field=cityfq=city:new


 Basically 2 questions here:
 1. is the url Im using the best practice when implementing autocomplete?
 What I wanted to do, is use the facets for found matches.
 2. How can I match PART of the cityname just like the SQL LIKE command,
 cityname LIKE '%userinput'


 Thanks!
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-like-for-autocomplete-field-tp1829480p1829480.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Lance Norskog
goks...@gmail.com

Re: xpath processing