timezone DIH and dataimport.properties
Hello. How can i set the timezone oft java in my java properties ? my problem is, that in the dataimport-properties is a wrong timezone and i dont know how to set the correct timezone ... !?!? thx - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/timezone-DIH-and-dataimport-properties-tp2864928p2864928.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to concatenate two nodes of xml with xpathentityprocessor
Vishal, i don't really understand what you're trying to achieve? indexing what (complete/sample documents, valid if possible)? And getting what exactly as result? Regards Stefan On Mon, Apr 25, 2011 at 5:01 PM, vrpar...@gmail.com vrpar...@gmail.com wrote: hello , i am using Xpathentityprocessor to do index xml files below is my xml file Full Customer name=a id=1 .. other attributes CustomerA/Customer Customer name=b id=2 .. other attributes ThisB/Customer Customer name=c id=3 .. other attributes AnyC/Customer /Full now i want to concatenate in index so that when i search it gives below result CData with id attribute--- like str id=1CustomerA/strstr id=2ThisB/str or something like that is it possible by RegexTransformer or templatetransformer? i did googling little for both but could not get excat/useful solution Thanks Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-concatenate-two-nodes-of-xml-with-xpathentityprocessor-tp2861260p2861260.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: timezone DIH and dataimport.properties
java -Duser.timezone=UTC -jar start.jar ? On Tue, Apr 26, 2011 at 9:54 AM, stockii stock.jo...@googlemail.com wrote: Hello. How can i set the timezone oft java in my java properties ? my problem is, that in the dataimport-properties is a wrong timezone and i dont know how to set the correct timezone ... !?!? thx - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/timezone-DIH-and-dataimport-properties-tp2864928p2864928.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem with autogeneratePhraseQueries
Hi, I'm new to solr. My solr instance version is: Solr Specification Version: 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Specification Version: 3.1.0 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Current Time: Tue Apr 26 08:01:09 CEST 2011 Server Start Time:Tue Apr 26 07:59:05 CEST 2011 I have following definition for textgen type: fieldType name=textgen class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 side=front preserveOriginal=1/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I'm using this type for name field in my index. As you can see I'm using autoGeneratePhraseQueries=false but for query sony vaio 4gb I'm getting following query in debug: lst name=debug str name=rawquerystringsony vaio 4gb/str str name=querystringsony vaio 4gb/str str name=parsedquery+name:sony +name:vaio +MultiPhraseQuery(name:(4gb 4) gb)/str str name=parsedquery_toString+name:sony +name:vaio +name:(4gb 4) gb/str Do you have any idea how can I avoid this MultiPhraseQuery? Best Regards, solr_beginner
Re: Problem with autogeneratePhraseQueries
What do you have in solrconfig.xml for luceneMatchVersion? If you don't set this, then its going to default to Lucene 2.9 emulation so that old solr 1.4 configs work the same way. I tried your example and it worked fine here, and I'm guessing this is probably whats happening. the default in the example/solrconfig.xml looks like this: !-- Controls what version of Lucene various components of Solr adhere to. Generally, you want to use the latest version to get all bug fixes and improvements. It is highly recommended that you fully re-index after changing this setting as it can affect both how text is indexed and queried. -- luceneMatchVersionLUCENE_31/luceneMatchVersion On Tue, Apr 26, 2011 at 6:51 AM, Solr Beginner solr_begin...@onet.pl wrote: Hi, I'm new to solr. My solr instance version is: Solr Specification Version: 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Specification Version: 3.1.0 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Current Time: Tue Apr 26 08:01:09 CEST 2011 Server Start Time:Tue Apr 26 07:59:05 CEST 2011 I have following definition for textgen type: fieldType name=textgen class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 side=front preserveOriginal=1/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I'm using this type for name field in my index. As you can see I'm using autoGeneratePhraseQueries=false but for query sony vaio 4gb I'm getting following query in debug: lst name=debug str name=rawquerystringsony vaio 4gb/str str name=querystringsony vaio 4gb/str str name=parsedquery+name:sony +name:vaio +MultiPhraseQuery(name:(4gb 4) gb)/str str name=parsedquery_toString+name:sony +name:vaio +name:(4gb 4) gb/str Do you have any idea how can I avoid this MultiPhraseQuery? Best Regards, solr_beginner
Re: Query regarding solr plugin.
Sorry, but there's too much here to debug remotely. I strongly advise you back wy up. Undo (but save) all your changes. Start by doing the simplest thing you can, just get a dummy class in place and get it called. Perhaps create a really dumb logger method that opens a text file, writes a message, and closes the file. Inefficient I know, but this is just to find out the problem. Debugging by println is an ancient technique... Once you're certain the dummy class is called, gradually build it up to the complex class you eventually want. One problem here is that you've changed a bunch of moving parts, copied jars around (it's unclear whether you have two copies of solr-core in your classpath, for instance). So knowing exactly which one of those is the issue is very difficult, especially since you may have forgotten one of the things you did. I know when I've been trying to do something for days, lots of details get lost. Try to avoid changing the underlying Solr code, can you do what you want by subclassing instead and calling your new class? That would avoid a bunch of problems. If you can't subclass, copy the whole thing and rename it to something new and call *that* rather than re-use the synonymfilterfactory. The only jar you should copy to the lib directory would be the one you put your new class in. I can't emphasize strongly enough that you'll save yourself lots of grief if you start with a fresh install and build up gradually rather than try to unravel the current code. It feels wasteful, but winds up being faster in my experience... Good Luck! Erick On Tue, Apr 26, 2011 at 12:41 AM, rajini maski rajinima...@gmail.com wrote: Thanks Erick. I have added my replies to the points you did mention. I am somewhere going wrong. I guess do I need to club both the jars or something ? If yes, how do i do that? I have no much idea about java and jar files. Please guide me here. A couple of things to try. 1 when you do a 'jar -tfv yourjar, you should see output like: 1183 Sun Jun 06 01:31:14 EDT 2010 org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class and your filter statement may need the whole path, in this example... filter class=org.apache.lucene.analysis.sinks.TokenTypeSink/ (note, this is just an example of the pathing, this class has nothing to do with your filter)... I could see this output.. 2 But I'm guessing your path is actually OK, because I'd expect to be seeing a class not found error. So my guess is that your class depends on other jars that aren't packaged up in your jar and if you find which ones they are and copy them to your lib directory you'll be OK. Or your code is throwing an error on load. Or something like that... There is jar - apache-solr-core-1.4.1.jar this has the BaseTokenFilterFacotry class and the Synonymfilterfactory class..I made the changes in second class file and created it as new. Now i created a jar of that java file and placed this in solr home/lib and also placed apache-solr-core-1.4.1.jar file in lib folder of solr home. [solr home - c:\orch\search\solr lib path - c:\orch\search\solr\lib] 3 to try to understand what's up, I'd back up a step. Make a really stupid class that doesn't do anything except derive from BaseTokenFilterFacotry and see if you can load that. If you can, then your process is OK and you need to find out what classes your new filter depend on. If you still can't, then we can see what else we can come up with.. I am perhaps doing same. In the synonymfilterfactory class, there is a function parse rules which takes delimiters as one of the input parameter. Here i changed comma ',' to '~' tilde symbol and thats it. Regards, Rajani On Mon, Apr 25, 2011 at 6:26 PM, Erick Erickson erickerick...@gmail.comwrote: Looking at things more carefully, it may be one of your dependent classes that's not being found. A couple of things to try. 1 when you do a 'jar -tfv yourjar, you should see output like: 1183 Sun Jun 06 01:31:14 EDT 2010 org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class and your filter statement may need the whole path, in this example... filter class=org.apache.lucene.analysis.sinks.TokenTypeSink/ (note, this is just an example of the pathing, this class has nothing to do with your filter)... 2 But I'm guessing your path is actually OK, because I'd expect to be seeing a class not found error. So my guess is that your class depends on other jars that aren't packaged up in your jar and if you find which ones they are and copy them to your lib directory you'll be OK. Or your code is throwing an error on load. Or something like that... 3 to try to understand what's up, I'd back up a step. Make a really stupid class that doesn't do anything except derive from BaseTokenFilterFacotry and see if you can load that. If you can, then your process is OK and you need to find out what classes your new filter depend on. If you
Re: Problem with autogeneratePhraseQueries
Thank you very much for answer. You were right. There was no luceneMatchVersion in solrconfig.xml of our dev core. We thought that values not present in core configuration are copied from main solrconfig.xml. I will investigate if our administrators did something wrong during upgrade to 3.1. On Tue, Apr 26, 2011 at 1:35 PM, Robert Muir rcm...@gmail.com wrote: What do you have in solrconfig.xml for luceneMatchVersion? If you don't set this, then its going to default to Lucene 2.9 emulation so that old solr 1.4 configs work the same way. I tried your example and it worked fine here, and I'm guessing this is probably whats happening. the default in the example/solrconfig.xml looks like this: !-- Controls what version of Lucene various components of Solr adhere to. Generally, you want to use the latest version to get all bug fixes and improvements. It is highly recommended that you fully re-index after changing this setting as it can affect both how text is indexed and queried. -- luceneMatchVersionLUCENE_31/luceneMatchVersion On Tue, Apr 26, 2011 at 6:51 AM, Solr Beginner solr_begin...@onet.pl wrote: Hi, I'm new to solr. My solr instance version is: Solr Specification Version: 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Specification Version: 3.1.0 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Current Time: Tue Apr 26 08:01:09 CEST 2011 Server Start Time:Tue Apr 26 07:59:05 CEST 2011 I have following definition for textgen type: fieldType name=textgen class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 side=front preserveOriginal=1/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I'm using this type for name field in my index. As you can see I'm using autoGeneratePhraseQueries=false but for query sony vaio 4gb I'm getting following query in debug: lst name=debug str name=rawquerystringsony vaio 4gb/str str name=querystringsony vaio 4gb/str str name=parsedquery+name:sony +name:vaio +MultiPhraseQuery(name:(4gb 4) gb)/str str name=parsedquery_toString+name:sony +name:vaio +name:(4gb 4) gb/str Do you have any idea how can I avoid this MultiPhraseQuery? Best Regards, solr_beginner
Re: how to concatenate two nodes of xml with xpathentityprocessor
Thanks Stefan currently in dataconfig file part of xPathEntityProcessor entity name=x processor=XPathEntityProcessor forEach=/FULL url=D:\Files\${_FileName} dataSource=FD field column=id xpath=/FULL/Customer/@id / field column=Customer xpath=/Full/Customer / /entity and when i do make search i get following search result result name=response numFound=1 start=0 doc arr name=Customer strCustomerA/str strAnyC/str /arr arr name=id str1/str str3/str /arr /doc /result but i want following result result name=response numFound=1 start=0 doc arr name=Combine str1,CustomerA/str str3,AnyC/str /arr /doc /result OR result name=response numFound=1 start=0 doc arr name=Combine str id=1CustomerA/str str id=3AnyC/str /arr /doc /result or any other format but i want both combine, Thanks Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-concatenate-two-nodes-of-xml-with-xpathentityprocessor-tp2861260p2865508.html Sent from the Solr - User mailing list archive at Nabble.com.
What initialize new searcher?
Hi, I'm reading solr cache documentation - http://wiki.apache.org/solr/SolrCaching I found there The current Index Searcher serves requests and when a new searcher is opened Could you explain when new searcher is opened? Does it have something to do with index commit? Best Regards, Solr Beginner
TermsCompoment + Dist. Search + Large Index + HEAP SPACE
Hi! We've got one index splitted into 4 shards á 70.000 records of large full-text data from (very dirty) OCR. Thus we got a lot of unique terms. No we try to obtain the first 400 most common words for CommonGramsFilter via TermsComponent but the request runs allways out of memory. The VM is equipped with 32 GB of RAM, 16-26 GB alocated to the Java-VM. Any Ideas how to get the most common terms without increasing VMs Memory? Thanks best regards, Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/TermsCompoment-Dist-Search-Large-Index-HEAP-SPACE-tp2865609p2865609.html Sent from the Solr - User mailing list archive at Nabble.com.
org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
Hello, i got following source org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:459) . actually this error comes in solr 3.1 only in solr 1.4.1 it works fine how to solve this problem? Thanks Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Error-loading-class-org-apache-solr-handler-dataimport-DataImpo-tp2865625p2865625.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
http://www.lucidimagination.com/blog/2011/04/01/solr-powered-isfdb-part-8/ On Tue, Apr 26, 2011 at 3:34 PM, vrpar...@gmail.com vrpar...@gmail.com wrote: Hello, i got following source org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:459) . actually this error comes in solr 3.1 only in solr 1.4.1 it works fine how to solve this problem? Thanks Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Error-loading-class-org-apache-solr-handler-dataimport-DataImpo-tp2865625p2865625.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Automatic synonyms for multiple variations of a word
On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: But somehow this feels bad (well, so does sticking word variations in what's supposed to be a synonyms file), partly because it means that the person adding new synonyms would need to know what they stem to (or always check it against Solr before editing the file). when creating the synonym map from your input file, currently the factory actually uses your Tokenizer only to pre-process the synonyms file. One idea would be to use the tokenstream up to the synonymfilter itself (including filters). This way if you put a stemmer before the synonymfilter, it would stem your synonyms file, too. I haven't totally thought the whole thing through to see if theres a big reason why this wouldn't work (the synonymsfilter is complicated, sorry). But it does seem like it would produce more consistent results... and perhaps the inconsistency isnt so obvious since in the default configuration the synonymfilter is directly after the tokenizer.
WhitespaceTokenizer and scoring(field length)
Hello, I have a problem with the whitespaceTokenizer and scoring. An example: id Titel 1 Manchester united 2 Manchester With the whitespaceTokenizer Manchester united will be splitted to Manchester and united. When i search for manchester i get id 1 and 2 in my results. What i want is that id 2 scores higher(field length). How can i fix this? -- View this message in context: http://lucene.472066.n3.nabble.com/WhitespaceTokenizer-and-scoring-field-length-tp2865784p2865784.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: TermsCompoment + Dist. Search + Large Index + HEAP SPACE
Don't know your use case, but if you just want a list of the 400 most common words you can use the lucene contrib. HighFreqTerms.java with the - t flag. You have to point it at your lucene index. You also probably don't want Solr to be running and want to give the JVM running HighFreqTerms a lot of memory. http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/misc/src/java/org/apache/lucene/misc/HighFreqTerms.java?view=log Tom http://www.hathitrust.org/blogs/large-scale-search -Original Message- From: mdz-munich [mailto:sebastian.lu...@bsb-muenchen.de] Sent: Tuesday, April 26, 2011 9:29 AM To: solr-user@lucene.apache.org Subject: TermsCompoment + Dist. Search + Large Index + HEAP SPACE Hi! We've got one index splitted into 4 shards á 70.000 records of large full-text data from (very dirty) OCR. Thus we got a lot of unique terms. No we try to obtain the first 400 most common words for CommonGramsFilter via TermsComponent but the request runs allways out of memory. The VM is equipped with 32 GB of RAM, 16-26 GB alocated to the Java-VM. Any Ideas how to get the most common terms without increasing VMs Memory? Thanks best regards, Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/TermsCompoment-Dist-Search-Large-Index-HEAP-SPACE-tp2865609p2865609.html Sent from the Solr - User mailing list archive at Nabble.com.
Apache Solr 3.1.0
I'm trying to tokenize email and IP addresses using StandardTokenizerFactory. It does correctly tokenize IP address but it divides email address into two tokens one with value before '@' and the other with value after that. It works correctly under Solr 1.4.1 Has anybody else tried similar thing on Solr 3.1.0 successfully or is it a potential bug? Thanks, Wlodek S. -- View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-3-1-0-tp2866007p2866007.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Apache Solr 3.1.0
Hi Wodek, UAX29URLEmailTokenizer includes all of StandardTokenizer's rules and adds rules to tokenize URLs and Emails: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.UAX29URLEmailTokenizerFactory Steve -Original Message- From: Wodek Siebor [mailto:siebor_wlo...@bah.com] Sent: Tuesday, April 26, 2011 11:29 AM To: solr-user@lucene.apache.org Subject: Apache Solr 3.1.0 I'm trying to tokenize email and IP addresses using StandardTokenizerFactory. It does correctly tokenize IP address but it divides email address into two tokens one with value before '@' and the other with value after that. It works correctly under Solr 1.4.1 Has anybody else tried similar thing on Solr 3.1.0 successfully or is it a potential bug? Thanks, Wlodek S. -- View this message in context: http://lucene.472066.n3.nabble.com/Apache- Solr-3-1-0-tp2866007p2866007.html Sent from the Solr - User mailing list archive at Nabble.com.
Problems with Spellchecker in 3.1
Hi, all. Sorry for any duplication - seems like what I sent yesterday never made it through... We're having some troubles with the Solr Spellcheck Response. We're running version 3.1. Overview: If we search for something really ugly like: kljhklsdjahfkljsdhf book rck then when we get back the response, there's a suggestions list for 'rck', but no suggestions list for the other two words. For 'book', that's fine, because it is 'spelled correctly' (i.e. we got hits on the word) and there shouldn't be any suggestions. For the ugly thing, though, there aren't any hits. The problem is that when we're handling the result, we can't tell the difference between no suggestions for a 'correctly spelled' term, and no suggestions for something that's odd like this. (Now - this is happening with searches that aren't as obviously garbage - i.e. words that are real words, just that just don't show up in the index and have no suggestions - this was just to illustrate the point). Our setup: We're running multiple shards, which may be part of the issue. For example, 'book' might be found in one of the shards, but not another. I don't *think* this has anything to do with our schema, since it's really how the Search Suggestions are being returned to us. But, here are some bits and pieces: From schema.xml: !-- Text field for spell checking -- field name=textSpelltype=text indexed=true stored=false multiValued=true omitNorms=true/ From solrconfig.xml: !-- The spell check component can return a list of alternative spelling suggestions. -- searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldtextSpell/str str name=spellcheckIndexDir./spellchecker/str /lst /searchComponent What we'd really like to see is the response coming back with an indication that a word wasn't found / had no suggestions. We've hacked around in the code a little bit to do this, but were wondering if anyone has come across this, and what approaches you've taken. We created new classes which extend IndexBasedSpellChecker and SpellCheckComponent, as follows (package and imports excluded for (sort of) brevity). The methods are as taken from the overridden classes, with changes noted by SD type comments... /** * This has a slight modification of Solr's AbstractLuceneSpellChecker.getSuggestions(SpellingOptions). * The modification allows correctly spelled words to be returned in the suggestion. This modification working in tandem * with the SirsiDynixSpellCheckComponent allows words with no suggestions to be returned from the spell check component * even in a sharded search. * Changes are marked with SD in the comments. */ public class SirsiDynixIndexBasedSpellChecker extends IndexBasedSpellChecker{ @Override public SpellingResult getSuggestions(SpellingOptions options) throws IOException { boolean shardRequest = false; SolrParams params = options.customParams; if(params!=null) { shardRequest = true.equals(params.get(ShardParams.IS_SHARD)); } SpellingResult result = new SpellingResult(options.tokens); IndexReader reader = determineReader(options.reader); Term term = field != null ? new Term(field, ) : null; float theAccuracy = (options.accuracy == Float.MIN_VALUE) ? spellChecker.getAccuracy() : options.accuracy; int count = Math.max(options.count, AbstractLuceneSpellChecker.DEFAULT_SUGGESTION_COUNT); for (Token token : options.tokens) { String tokenText = new String(token.buffer(), 0, token.length()); String[] suggestions = spellChecker.suggestSimilar(tokenText, count, field != null ? reader : null, //workaround LUCENE-1295 field, options.onlyMorePopular, theAccuracy); if (suggestions.length == 1 suggestions[0].equals(tokenText)) { //These are spelled the same, continue on ListString suggList = Arrays.asList(suggestions); //SD added result.add(token, suggList);//SD added continue; } if (options.extendedResults == true reader != null field != null) { term = term.createTerm(tokenText); result.add(token, reader.docFreq(term)); int countLimit = Math.min(options.count, suggestions.length); if(countLimit0) { for (int i = 0; i countLimit; i++) { term = term.createTerm(suggestions[i]); result.add(token, suggestions[i], reader.docFreq(term)); } } else if(shardRequest) { ListString suggList = Collections.emptyList(); result.add(token, suggList); } } else { if
Ebay Kleinanzeigen and Auto Suggest
Hi Someone told me that ebay is using solr. I was looking at their Auto Suggest implementation and I guess they are using Shingles and the TermsComponent. I managed to get a satisfactory implementation but I have a problem with category specific filtering. Ebay suggestions are sensitive to categories like Cars and Pets. As far as I understand it is not possible to using filters with a term query. Unless one uses multiple fields or special prefixes for the words to index I cannot think how to implement this. Is their perhaps a workaround for this limitation? Best Regards EricZ --- I am have a shingle type like: fieldType name=shingle_text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=4 / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType and a query like http://localhost:8983/solr/terms?q=*%3A*terms.fl=suggest_textterms.sort=countterms.prefix=audi
Solr Newbie: Starting embedded server with multicore
I'm just starting with Solr. I'm using Solr 3.1.0, and I want to use EmbeddedSolrServer with a multicore setup, even though I currently have only one core (various documents I read suggest starting that way even if you have one core, to get the better administrative tools supported by mutlicore). I have two questions: 1. Does the first code sample below start the server with multicore or not? 2. Why is it the first sample work and the second does not? My solr.xml looks like this: solr persistent=true cores adminPath=/admin/cores defaultCoreName=mycore sharedLib=lib core name=mycore instanceDir=mycore / /cores /solr It's in a directory called solrhome in war/WEB-INF. I can get the server to come up cleanly if I follow an example in the Packt Solr book (p. 231), but I'm not sure if this enables multi-core or not: File solrXML = new File(war/WEB-INF/solrhome/solr.xml); String solrHome = solrXML.getParentFile().getAbsolutePath(); String dataDir = solrHome + /data; coreContainer = new CoreContainer(solrHome); SolrConfig solrConfig = new SolrConfig(solrHome, solrconfig.xml, null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, mycore, solrHome); SolrCore solrCore = new SolrCore(mycore, dataDir + / + mycore, solrConfig, null, coreDescriptor); coreContainer.register(solrCore, false); embeddedSolr = new EmbeddedSolrServer(coreContainer, mycore); The documentation on the Solr wiki says I should configure the EmbeddedSolrServer for multicore like this: File home = new File( /path/to/solr/home ); File f = new File( home, solr.xml ); CoreContainer container = new CoreContainer(); container.load( /path/to/solr/home, f ); EmbeddedSolrServer server = new EmbeddedSolrServer( container, core name as defined in solr.xml ); When I try to do this, I get an error saying that it cannot find solrconfig.xml: File solrXML = new File(war/WEB-INF/solrhome/solr.xml); String solrHome = solrXML.getParentFile().getAbsolutePath(); coreContainer = new CoreContainer(); coreContainer.load(solrHome, solrXML); embeddedSolr = new EmbeddedSolrServer(coreContainer, mycore); The message says it is looking in an odd place (I removed my user name from this). Why is it looking in solrhome/mycore/conf for solrconfig.xml? Both that and my schema.xml are in solrhome/conf. How can I point it at the right place? I tried adding REMOVED\workspace-Solr\institution-webapp\war\WEB-INF\solrhome\conf to the classpath, but got the same result: SEVERE: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'REMOVED\workspace-Solr\institution-webapp\war\WEB-INF\solrhome\mycore\conf/', cwd=REMOVED\workspace-Solr\institution-webapp at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:234) at org.apache.solr.core.Config.init(Config.java:141) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:132) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:430) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
RE: TermsCompoment + Dist. Search + Large Index + HEAP SPACE
Thanks for your suggestion. It seems to be the use of shards and TermsComponent together. Now we simple requesting shard-by-shard without shard and shard.qt params and merge the results via XSLT. Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/TermsCompoment-Dist-Search-Large-Index-HEAP-SPACE-tp2865609p2866499.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What initialize new searcher?
You're on the right track. In a system where the indexing process and search process are on the same machine, commits by the index process cause a new searcher to opened. In a master/slave situation (assuming you are indexing on the master and searching on the slave), then the searchers are reopened on the slaves after a replication. Replications happen after 1 a commit happens on the master and 2 the slave polls the master and pulls down the new commits. Hope that helps Erick On Tue, Apr 26, 2011 at 8:50 AM, Solr Beginner solr_begin...@onet.pl wrote: Hi, I'm reading solr cache documentation - http://wiki.apache.org/solr/SolrCaching I found there The current Index Searcher serves requests and when a new searcher is opened Could you explain when new searcher is opened? Does it have something to do with index commit? Best Regards, Solr Beginner
Re: WhitespaceTokenizer and scoring(field length)
First, you can give us some more data to work with G... In particular, attach debugQuery=on to your http request and post the results. That will show how the documents got their score. Also, show us the fieldType definition and field definition for the field in question. Best Erick On Tue, Apr 26, 2011 at 10:27 AM, roySolr royrutten1...@gmail.com wrote: Hello, I have a problem with the whitespaceTokenizer and scoring. An example: id Titel 1 Manchester united 2 Manchester With the whitespaceTokenizer Manchester united will be splitted to Manchester and united. When i search for manchester i get id 1 and 2 in my results. What i want is that id 2 scores higher(field length). How can i fix this? -- View this message in context: http://lucene.472066.n3.nabble.com/WhitespaceTokenizer-and-scoring-field-length-tp2865784p2865784.html Sent from the Solr - User mailing list archive at Nabble.com.
Question on Batch process
I am sure that this question has been asked a few times, but I can't seem to find the sweetspot for indexing. I have about 100,000 files each containing 1,000 xml documents ready to be posted to Solr. My desire is to have it index as quickly as possible and then once completed the daily stream of ADDs will be small in comparison. The individual documents are small. Essentially web postings from the net. Title, postPostContent, date. What would be the ideal configuration? For RamBufferSize, mergeFactor, MaxbufferedDocs, etc.. My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP I have 16GB of available ram. Thanks in advance. Charlie
Re: Automatic synonyms for multiple variations of a word
Suppose your analysis stack includes lower-casing, but your synonyms are only supposed to apply to upper-case tokens. For example, PET might be a synonym of positron emission tomography, but pet wouldn't be. -Mike On 04/26/2011 09:51 AM, Robert Muir wrote: On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: But somehow this feels bad (well, so does sticking word variations in what's supposed to be a synonyms file), partly because it means that the person adding new synonyms would need to know what they stem to (or always check it against Solr before editing the file). when creating the synonym map from your input file, currently the factory actually uses your Tokenizer only to pre-process the synonyms file. One idea would be to use the tokenstream up to the synonymfilter itself (including filters). This way if you put a stemmer before the synonymfilter, it would stem your synonyms file, too. I haven't totally thought the whole thing through to see if theres a big reason why this wouldn't work (the synonymsfilter is complicated, sorry). But it does seem like it would produce more consistent results... and perhaps the inconsistency isnt so obvious since in the default configuration the synonymfilter is directly after the tokenizer.
Re: Automatic synonyms for multiple variations of a word
Mike, thanks a lot for your example: the idea here would be you would put the lowercasefilter after the synonymfilter, and then you get this exact flexibility? e.g. WhitespaceTokenizer SynonymFilter - no lowercasing of tokens are done as it analyzes your synonyms with just the tokenizer LowerCaseFilter but WhitespaceTokenizer LowerCaseFilter SynonymFilter - the synonyms are lowercased, as it analyzes synonyms with the tokenizer+filter its already inconsistent today, because if you do: LowerCaseTokenizer SynonymFilter then your synonyms are in fact all being lowercased... its just arbitrary that they are only being analyzed with the tokenizer. On Tue, Apr 26, 2011 at 4:13 PM, Mike Sokolov soko...@ifactory.com wrote: Suppose your analysis stack includes lower-casing, but your synonyms are only supposed to apply to upper-case tokens. For example, PET might be a synonym of positron emission tomography, but pet wouldn't be. -Mike On 04/26/2011 09:51 AM, Robert Muir wrote: On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: But somehow this feels bad (well, so does sticking word variations in what's supposed to be a synonyms file), partly because it means that the person adding new synonyms would need to know what they stem to (or always check it against Solr before editing the file). when creating the synonym map from your input file, currently the factory actually uses your Tokenizer only to pre-process the synonyms file. One idea would be to use the tokenstream up to the synonymfilter itself (including filters). This way if you put a stemmer before the synonymfilter, it would stem your synonyms file, too. I haven't totally thought the whole thing through to see if theres a big reason why this wouldn't work (the synonymsfilter is complicated, sorry). But it does seem like it would produce more consistent results... and perhaps the inconsistency isnt so obvious since in the default configuration the synonymfilter is directly after the tokenizer.
Re: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
I experienced the same issue. With Solr 1.x, I was copying out the 'example' directory to make my solr installation. However, for the Solr 3.x distributions, the DataImportHandler class exists in a directory that is at the same level as example: dist, not a directory within. You'll either want to take the entire apache 3.1 directory, or modify solrconfig to point to the new place you've copied it: lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / On Tue, Apr 26, 2011 at 6:38 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: http://www.lucidimagination.com/blog/2011/04/01/solr-powered-isfdb-part-8/ On Tue, Apr 26, 2011 at 3:34 PM, vrpar...@gmail.com vrpar...@gmail.com wrote: Hello, i got following source org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:459) . actually this error comes in solr 3.1 only in solr 1.4.1 it works fine how to solve this problem? Thanks Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Error-loading-class-org-apache-solr-handler-dataimport-DataImpo-tp2865625p2865625.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: term position question from analyzer stack for WordDelimiterFilterFactory
OK this is even more weird... everything is working much better except for one thing: I was testing use cases with our top query terms to make sure the below query settings wouldn't break any existing behavior, and got this most unusual result. The analyzer stack completely eliminated the word McAfee from the query terms! I'm like huh? Here is the analyzer page output for that search term: Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end0,6 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=query_synonyms.txt, expand=true, ignoreCase=true} term position 1 term text McAfee term type word source start,end0,6 payload org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position 1 term text McAfee term type word source start,end0,6 payload org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} term position term text term type source start,end payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position term text term type source start,end payload com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory {protected=protwords.txt} term position term text term type source start,end payload org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position term text term type source start,end payload -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, April 25, 2011 11:27 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory Aha! I knew something must be awry, but when I looked at the analysis page output, well it sure looked like it should match. :) OK here is the query side WDF that finally works, I just turned everything off. (yay) First I tried just completely removeing WDF from the query side analyzer stack but that didn't work. So anyway I suppose I should turn off the catenate all plus the preserve original settings, reindex, and see if I still get a match huh? (PS thank you very much for the help!!!) filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 / -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, April 25, 2011 9:24 AM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen rober...@buy.com wrote: The search and index analyzer stack are the same. Ahhh, they should not be! Using both generate and catenate in WDF at query time is a no-no. Same reason you can't have multi-word synonyms at query time: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym FilterFactory I'd recommend going back to the WDF settings in the solr example server as a starting point. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: What initialize new searcher?
Hi, Yes, typically after your index has been replicated from master to a slave a commit will be issued and the new searcher will be opened. Before being exposed to regular clients it's a good practice to warm things up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Solr Beginner solr_begin...@onet.pl To: solr-user@lucene.apache.org Sent: Tue, April 26, 2011 8:50:21 AM Subject: What initialize new searcher? Hi, I'm reading solr cache documentation - http://wiki.apache.org/solr/SolrCaching I found there The current Index Searcher serves requests and when a new searcher is opened Could you explain when new searcher is opened? Does it have something to do with index commit? Best Regards, Solr Beginner
Re: Automatic synonyms for multiple variations of a word
Yes, I see. Makes sense. It is a bit hard to see a bad case for your proposal in that light. Here is one other example; I'm not sure whether it presents difficulties or not, and may be a bit contrived, but hey, food for thought at least: Say you have set up synonyms between names and commonly-used pseudonyms or alternate names that should not be stemmed: Malcolm X = Malcolm Little Prince = Rogers Nelson Prince Little Kim = Kimberly Denise Jones Biggy Smalls etc. You don't want Malcolm Littler or Littlest Kim or Big Small to match anything. And Princely shouldn't bring up the artist. But you also have regular linguistic synonyms (not names) that *should* be stemmed (as in the original example). So little = small should imply littler = smaller and so on via stemming. Ideally you could put one SynonymFilter before the stemming and the other one after. In that case do the SynonymFilters get composed? I can't think of a believable example where that would cause a problem, but maybe you can? -Mike On 04/26/2011 04:25 PM, Robert Muir wrote: Mike, thanks a lot for your example: the idea here would be you would put the lowercasefilter after the synonymfilter, and then you get this exact flexibility? e.g. WhitespaceTokenizer SynonymFilter - no lowercasing of tokens are done as it analyzes your synonyms with just the tokenizer LowerCaseFilter but WhitespaceTokenizer LowerCaseFilter SynonymFilter - the synonyms are lowercased, as it analyzes synonyms with the tokenizer+filter its already inconsistent today, because if you do: LowerCaseTokenizer SynonymFilter then your synonyms are in fact all being lowercased... its just arbitrary that they are only being analyzed with the tokenizer. On Tue, Apr 26, 2011 at 4:13 PM, Mike Sokolovsoko...@ifactory.com wrote: Suppose your analysis stack includes lower-casing, but your synonyms are only supposed to apply to upper-case tokens. For example, PET might be a synonym of positron emission tomography, but pet wouldn't be. -Mike On 04/26/2011 09:51 AM, Robert Muir wrote: On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic otis_gospodne...@yahoo.comwrote: But somehow this feels bad (well, so does sticking word variations in what's supposed to be a synonyms file), partly because it means that the person adding new synonyms would need to know what they stem to (or always check it against Solr before editing the file). when creating the synonym map from your input file, currently the factory actually uses your Tokenizer only to pre-process the synonyms file. One idea would be to use the tokenstream up to the synonymfilter itself (including filters). This way if you put a stemmer before the synonymfilter, it would stem your synonyms file, too. I haven't totally thought the whole thing through to see if theres a big reason why this wouldn't work (the synonymsfilter is complicated, sorry). But it does seem like it would produce more consistent results... and perhaps the inconsistency isnt so obvious since in the default configuration the synonymfilter is directly after the tokenizer.
Re: Ebay Kleinanzeigen and Auto Suggest
Hi Eric, Before using the terms component, allow me to point out: * http://sematext.com/products/autocomplete/index.html (used on http://search-lucene.com/ for example) * http://wiki.apache.org/solr/Suggester Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Eric Grobler impalah...@googlemail.com To: solr-user@lucene.apache.org Sent: Tue, April 26, 2011 1:11:11 PM Subject: Ebay Kleinanzeigen and Auto Suggest Hi Someone told me that ebay is using solr. I was looking at their Auto Suggest implementation and I guess they are using Shingles and the TermsComponent. I managed to get a satisfactory implementation but I have a problem with category specific filtering. Ebay suggestions are sensitive to categories like Cars and Pets. As far as I understand it is not possible to using filters with a term query. Unless one uses multiple fields or special prefixes for the words to index I cannot think how to implement this. Is their perhaps a workaround for this limitation? Best Regards EricZ --- I am have a shingle type like: fieldType name=shingle_text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=4 / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType and a query like http://localhost:8983/solr/terms?q=*%3A*terms.fl=suggest_textterms.sort=countterms.prefix=audi i
SynonymFilterFactory case changes
So if there is a hit in the synonym filter factory, do I need to put the various case changes for a term so that the following WordDelimiterFilter analyzer can do its 'split on case changes' work? Here we see SynonymFilterFactory makes all terms lowercase because this is what is in my synonmyms.txt file and I have ignoreCase=true: macafee, mcafee Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end0,6 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=index_synonyms.txt, expand=true, ignoreCase=true} term position 1 term text macafee mcafee term type word word source start,end0,6 0,6 payload
Re: Question on Batch process
Charlie, How's this: * -Xmx2g * ramBufferSizeMB 512 * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows) * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB * use SolrStreamingUpdateServer (with params matching your number of CPU cores) or send batches of say 1000 docs with the other SolrServer impl using N threads (N=# of your CPU cores) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Charles Wardell charles.ward...@bcsolution.com To: solr-user@lucene.apache.org Sent: Tue, April 26, 2011 2:32:29 PM Subject: Question on Batch process I am sure that this question has been asked a few times, but I can't seem to find the sweetspot for indexing. I have about 100,000 files each containing 1,000 xml documents ready to be posted to Solr. My desire is to have it index as quickly as possible and then once completed the daily stream of ADDs will be small in comparison. The individual documents are small. Essentially web postings from the net. Title, postPostContent, date. What would be the ideal configuration? For RamBufferSize, mergeFactor, MaxbufferedDocs, etc.. My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP I have 16GB of available ram. Thanks in advance. Charlie
Re: term position question from analyzer stack for WordDelimiterFilterFactory
Hi Robert, I'm no WDFF expert, but all these zero look suspicious: org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} A quick visit to http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory makes me think you want: splitOnCaseChange=1 (if you want Mc Afee for some reason?) generateWordParts=1 (if you want Mc Afee for some reason?) preserveOriginal=1 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org; yo...@lucidimagination.com Sent: Tue, April 26, 2011 4:39:49 PM Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory OK this is even more weird... everything is working much better except for one thing: I was testing use cases with our top query terms to make sure the below query settings wouldn't break any existing behavior, and got this most unusual result. The analyzer stack completely eliminated the word McAfee from the query terms! I'm like huh? Here is the analyzer page output for that search term: Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=query_synonyms.txt, expand=true, ignoreCase=true} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} term position term text term type source start,end payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position term text term type source start,end payload com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory {protected=protwords.txt} term position term text term type source start,end payload org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position term text term type source start,end payload -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, April 25, 2011 11:27 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory Aha! I knew something must be awry, but when I looked at the analysis page output, well it sure looked like it should match. :) OK here is the query side WDF that finally works, I just turned everything off. (yay) First I tried just completely removeing WDF from the query side analyzer stack but that didn't work. So anyway I suppose I should turn off the catenate all plus the preserve original settings, reindex, and see if I still get a match huh? (PS thank you very much for the help!!!) filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 / -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, April 25, 2011 9:24 AM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen rober...@buy.com wrote: The search and index analyzer stack are the same. Ahhh, they should not be! Using both generate and catenate in WDF at query time is a no-no. Same reason you can't have multi-word synonyms at query time: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym FilterFactory I'd recommend going back to the WDF settings in the solr example server as a starting point. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
RE: term position question from analyzer stack for WordDelimiterFilterFactory
Yeah I am about to try turning one on at a time and see what happens. I had a meeting so couldn't do it yet... (darn those meetings) (lol) -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, April 26, 2011 2:37 PM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory Hi Robert, I'm no WDFF expert, but all these zero look suspicious: org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} A quick visit to http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel imiterFilterFactory makes me think you want: splitOnCaseChange=1 (if you want Mc Afee for some reason?) generateWordParts=1 (if you want Mc Afee for some reason?) preserveOriginal=1 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org; yo...@lucidimagination.com Sent: Tue, April 26, 2011 4:39:49 PM Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory OK this is even more weird... everything is working much better except for one thing: I was testing use cases with our top query terms to make sure the below query settings wouldn't break any existing behavior, and got this most unusual result. The analyzer stack completely eliminated the word McAfee from the query terms! I'm like huh? Here is the analyzer page output for that search term: Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=query_synonyms.txt, expand=true, ignoreCase=true} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} term position term text term type source start,end payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position term text term type source start,end payload com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory {protected=protwords.txt} term position term text term type source start,end payload org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position term text term type source start,end payload -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, April 25, 2011 11:27 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory Aha! I knew something must be awry, but when I looked at the analysis page output, well it sure looked like it should match. :) OK here is the query side WDF that finally works, I just turned everything off. (yay) First I tried just completely removeing WDF from the query side analyzer stack but that didn't work. So anyway I suppose I should turn off the catenate all plus the preserve original settings, reindex, and see if I still get a match huh? (PS thank you very much for the help!!!) filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 / -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, April 25, 2011 9:24 AM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen rober...@buy.com wrote: The search and index analyzer stack are the same. Ahhh, they should not be! Using both generate and catenate in WDF at query time is a no-no. Same reason you can't have multi-word synonyms at query time: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym FilterFactory I'd recommend going back to the WDF settings in the solr example server as a starting point. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Reader per query request
Hi, I was wondering if solr open a new lucene IndexReader for every query request? From performance point of view, is there any problem of opening a lot of IndexReaders concurrently, or application shall have some logic to reuse the same IndexReader? Thanks, cy -- View this message in context: http://lucene.472066.n3.nabble.com/Reader-per-query-request-tp2867778p2867778.html Sent from the Solr - User mailing list archive at Nabble.com.
Field Length and Highlight
Hi, I¹ve been using solr with Coldfusion9, I¹ve made a couple of adjustment to it in order to fulfill my needs of my client, I¹m using solr as a document search engine for a online library which has documents larger then 20MB and some of them have more than 20 pages. The thing is that... At first the solr didn¹t indexed all the text, I already fix it by changing the number of the maxfieldlength in the collections, now when I search for some word at the end of a document that has like 150 pages, it shows me the document but won¹t highlight the words that are almost at the end. Any ideas?
Re: SynonymFilterFactory case changes
Yes, order does matter. You're right, putting, say, lowercase in front of WordDelimiter... will mess up the operations of WDFF. The admin/analysis page is *extremely* useful for understanding what happens in the analysis of input. Make sure to check the verbose checkbox. Best Erick On Tue, Apr 26, 2011 at 5:10 PM, Robert Petersen rober...@buy.com wrote: So if there is a hit in the synonym filter factory, do I need to put the various case changes for a term so that the following WordDelimiterFilter analyzer can do its 'split on case changes' work? Here we see SynonymFilterFactory makes all terms lowercase because this is what is in my synonmyms.txt file and I have ignoreCase=true: macafee, mcafee Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=index_synonyms.txt, expand=true, ignoreCase=true} term position 1 term text macafee mcafee term type word word source start,end 0,6 0,6 payload
Re: term position question from analyzer stack for WordDelimiterFilterFactory
I second Otis' comments. Is it possible that you've gotten twisted around by trying to modify these settings and would be better off going back to the WDDF settings in the example schema? I've sometimes found that to be very useful. Also (although I don't think it applies in this case) be aware that the analysis page may introduce it's own errors, so when you see something really wonky, try a query with debugQuery=on and see if the parsed query squares with the results on the analysis page... Best Erick On Tue, Apr 26, 2011 at 5:44 PM, Robert Petersen rober...@buy.com wrote: Yeah I am about to try turning one on at a time and see what happens. I had a meeting so couldn't do it yet... (darn those meetings) (lol) -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, April 26, 2011 2:37 PM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory Hi Robert, I'm no WDFF expert, but all these zero look suspicious: org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} A quick visit to http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel imiterFilterFactory makes me think you want: splitOnCaseChange=1 (if you want Mc Afee for some reason?) generateWordParts=1 (if you want Mc Afee for some reason?) preserveOriginal=1 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org; yo...@lucidimagination.com Sent: Tue, April 26, 2011 4:39:49 PM Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory OK this is even more weird... everything is working much better except for one thing: I was testing use cases with our top query terms to make sure the below query settings wouldn't break any existing behavior, and got this most unusual result. The analyzer stack completely eliminated the word McAfee from the query terms! I'm like huh? Here is the analyzer page output for that search term: Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=query_synonyms.txt, expand=true, ignoreCase=true} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} term position term text term type source start,end payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position term text term type source start,end payload com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory {protected=protwords.txt} term position term text term type source start,end payload org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position term text term type source start,end payload -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, April 25, 2011 11:27 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory Aha! I knew something must be awry, but when I looked at the analysis page output, well it sure looked like it should match. :) OK here is the query side WDF that finally works, I just turned everything off. (yay) First I tried just completely removeing WDF from the query side analyzer stack but that didn't work. So anyway I suppose I should turn off the catenate all plus the preserve original settings, reindex, and see if I still get a match huh? (PS thank you very much for the help!!!) filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 / -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, April 25, 2011 9:24 AM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen rober...@buy.com wrote: The search and index analyzer stack
Re: Reader per query request
See below On Tue, Apr 26, 2011 at 6:15 PM, cyang2010 ysxsu...@hotmail.com wrote: Hi, I was wondering if solr open a new lucene IndexReader for every query request? no, absolutely not. Solr only opens a reader when the underlying index has changed, say a commit or a replication happens. From performance point of view, is there any problem of opening a lot of IndexReaders concurrently, or application shall have some logic to reuse the same IndexReader? Every time you open a reader, a whole new set of caches are initiated. I have a hard time imagining a situation in which opening a new searcher for each request would be a good idea. Opening a new reader, especially for a large index is a very expensive operation and should be done as rarely as possible. But Solr will do this automatically for you, by and large you don't have to think about it. Best Erick Thanks, cy -- View this message in context: http://lucene.472066.n3.nabble.com/Reader-per-query-request-tp2867778p2867778.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too many open files exception related to solrj getServer too often?
Just pushing up the topic and look for answers. -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-open-files-exception-related-to-solrj-getServer-too-often-tp2808718p2867976.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reader per query request
Thanks a lot. That makes sense. -- CY -- View this message in context: http://lucene.472066.n3.nabble.com/Reader-per-query-request-tp2867778p2867995.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SynonymFilterFactory case changes
But in this case lowercase is after WDF. The question is that when you get a hit in the SynonymFilter on a synonym and where the entries in synonmyms.txt file are all in lower case do I need to add the case changing versions to make WDF work on case changes because it appears the synonym text is replaced verbatim by what is in the txt file and so that defeats the WDF filter. In fact, adding the case changing versions of this term to the synonyms.txt file makes this use case work. (yay) -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 26, 2011 3:39 PM To: solr-user@lucene.apache.org Subject: Re: SynonymFilterFactory case changes Yes, order does matter. You're right, putting, say, lowercase in front of WordDelimiter... will mess up the operations of WDFF. The admin/analysis page is *extremely* useful for understanding what happens in the analysis of input. Make sure to check the verbose checkbox. Best Erick On Tue, Apr 26, 2011 at 5:10 PM, Robert Petersen rober...@buy.com wrote: So if there is a hit in the synonym filter factory, do I need to put the various case changes for a term so that the following WordDelimiterFilter analyzer can do its 'split on case changes' work? Here we see SynonymFilterFactory makes all terms lowercase because this is what is in my synonmyms.txt file and I have ignoreCase=true: macafee, mcafee Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=index_synonyms.txt, expand=true, ignoreCase=true} term position 1 term text macafee mcafee term type word word source start,end 0,6 0,6 payload
Re: Field Length and Highlight
(11/04/27 7:35), Alejandro Delgadillo wrote: Hi, I¹ve been using solr with Coldfusion9, I¹ve made a couple of adjustment to it in order to fulfill my needs of my client, I¹m using solr as a document search engine for a online library which has documents larger then 20MB and some of them have more than 20 pages. The thing is that... At first the solr didn¹t indexed all the text, I already fix it by changing the number of the maxfieldlength in the collections, now when I search for some word at the end of a document that has like 150 pages, it shows me the document but won¹t highlight the words that are almost at the end. Any ideas? So your maxAnalyzedChars is too small? http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars Koji -- http://www.rondhuit.com/en/
Re: Question on Batch process
Thank you Otis. Without trying to appear to stupid, when you refer to having the params matching your # of CPU cores, you are talking about the # of threads I can spawn with the StreamingUpdateSolrServer object? Up until now, I have been just utilizing post.sh or post.jar. Are these capable of that or do I need to write some code to collect a bunch of files into the buffer and send it off? Also, Do you have a sense for how long it should take to index 100,000 files or in my case 100,000,000 documents? StreamingUpdateSolrServer public StreamingUpdateSolrServer(String solrServerUrl, int queueSize, int threadCount) throws MalformedURLException Thanks again, Charlie -- Best Regards, Charles Wardell Blue Chips Technology, Inc. www.bcsolution.com On Tuesday, April 26, 2011 at 5:12 PM, Otis Gospodnetic wrote: Charlie, How's this: * -Xmx2g * ramBufferSizeMB 512 * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows) * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB * use SolrStreamingUpdateServer (with params matching your number of CPU cores) or send batches of say 1000 docs with the other SolrServer impl using N threads (N=# of your CPU cores) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Charles Wardell charles.ward...@bcsolution.com To: solr-user@lucene.apache.org Sent: Tue, April 26, 2011 2:32:29 PM Subject: Question on Batch process I am sure that this question has been asked a few times, but I can't seem to find the sweetspot for indexing. I have about 100,000 files each containing 1,000 xml documents ready to be posted to Solr. My desire is to have it index as quickly as possible and then once completed the daily stream of ADDs will be small in comparison. The individual documents are small. Essentially web postings from the net. Title, postPostContent, date. What would be the ideal configuration? For RamBufferSize, mergeFactor, MaxbufferedDocs, etc.. My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP I have 16GB of available ram. Thanks in advance. Charlie
Re: SynonymFilterFactory case changes
Ahhh, I mis-read your post.. First, it's not the synonymfilterfactory that's lowercasing anything. The ingorecase=true affects the matching, not the output. The output is probably lowercased because you have it that way in the synonyms.txt file. At least that's what I just saw using the analysis page from the Solr admin page. So yes, if you want the WDF to do anything on tokens put into the input stream by SynonymFilterFactory, you need to make the replacement be the accurate case. But I think you already figured all that out Best Erick On Tue, Apr 26, 2011 at 7:19 PM, Robert Petersen rober...@buy.com wrote: But in this case lowercase is after WDF. The question is that when you get a hit in the SynonymFilter on a synonym and where the entries in synonmyms.txt file are all in lower case do I need to add the case changing versions to make WDF work on case changes because it appears the synonym text is replaced verbatim by what is in the txt file and so that defeats the WDF filter. In fact, adding the case changing versions of this term to the synonyms.txt file makes this use case work. (yay) -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 26, 2011 3:39 PM To: solr-user@lucene.apache.org Subject: Re: SynonymFilterFactory case changes Yes, order does matter. You're right, putting, say, lowercase in front of WordDelimiter... will mess up the operations of WDFF. The admin/analysis page is *extremely* useful for understanding what happens in the analysis of input. Make sure to check the verbose checkbox. Best Erick On Tue, Apr 26, 2011 at 5:10 PM, Robert Petersen rober...@buy.com wrote: So if there is a hit in the synonym filter factory, do I need to put the various case changes for a term so that the following WordDelimiterFilter analyzer can do its 'split on case changes' work? Here we see SynonymFilterFactory makes all terms lowercase because this is what is in my synonmyms.txt file and I have ignoreCase=true: macafee, mcafee Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end 0,6 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=index_synonyms.txt, expand=true, ignoreCase=true} term position 1 term text macafee mcafee term type word word source start,end 0,6 0,6 payload
Suggester or spellcheck return stored fields
Hello all, I am trying to build an autocomplete solution for a website that I run. The current implementation of it is going to be used on who you want to send PM's too. I have it basically working up to this point, The UI is done and the suggester is working in returning possible solutions without any major problems. The problem I am currently running into is that the suggestions it is returning are not necessarily unique. To solve this, I would like to return the user ID (a stored field) along with the suggestion. This would help in other areas but would ensure things are unique. Is it possible to make suggester to return these other fields or is it strictly returning text as I assume is the case. I know I am likely stretching what the suggester is suppose to do, so I am ok rolling back to a different plan using normal queries. But would prefer to be able to use suggester if possible. Thanks for the help, Cameron
Re: How to Update Value of One Field of a Document in Index?
My schema: id, name, checksum, body, notes, date I'd like for a user to be able to add notes to the notes field, and not have to re-index the document (since the body field may contain 100MB of text). Some ideas: 1) How about creating another core which only contains id, checksum, and notes? Then, updating (delete followed by add) wouldn't be that painful? 2) What about using a multValued field? Could you just keep adding values as the user enters more notes? Pete On Sep 9, 2010, at 11:06 PM, Liam O'Boyle wrote: Hi Savannah, You can only reindex the entire document; if you only have the ID, then do a search to retrieve the rest of the data, then reindex. This assumes that all of the fields you need to index are stored (so that you can retrieve them) and not just indexed. Liam On Fri, Sep 10, 2010 at 3:29 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: I use nutch to crawl and index to Solr. My code is working. Now, I want to update the value of one of the fields of a document in the solr index after the document was already indexed, and I have only the document id. How do I do that? Thanks.
Re: What initialize new searcher?
Thank you for the answers. I'm moving forward and have few more questions but for separate threads. On Tue, Apr 26, 2011 at 10:47 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Yes, typically after your index has been replicated from master to a slave a commit will be issued and the new searcher will be opened. Before being exposed to regular clients it's a good practice to warm things up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Solr Beginner solr_begin...@onet.pl To: solr-user@lucene.apache.org Sent: Tue, April 26, 2011 8:50:21 AM Subject: What initialize new searcher? Hi, I'm reading solr cache documentation - http://wiki.apache.org/solr/SolrCaching I found there The current Index Searcher serves requests and when a new searcher is opened Could you explain when new searcher is opened? Does it have something to do with index commit? Best Regards, Solr Beginner
fieldCache only on stats page
Hi, I can see only fieldCache (nothing about filter, query or document cache) on stats page. What I'm doing wrong? We have two servers with replication. There are two cores(prod, dev) on each server. Maybe I have to add something to solrconfig.xml of cores? Best Regards, Solr Beginner
DataImportHandler in Solr 3.1.0: not updating dataimport.properties last_index_time on delta-import?
Title pretty much says it all; I've configured the DIH in 3.1.0, and it works great, except the delta-imports are always from the last time a full-import happened, not a delta-import. After a delta-import, dataimport.properties is completely untouched. The documentation implies that the delta-import should update the last_index_time: The DataImportHandler exposes a variable called last_index_time which is a timestamp value denoting the last time full-import 'or' delta-import was run - http://wiki.apache.org/solr/DataImportHandler#Delta-Import_Example Is there a configuration preventing delta-import from updating dataimport.properties? It updates properly on each full-import.