Re: Sub entities
Brian, except for your sql-syntax error in the specie_relations-query SELECT specie_id FROMspecie_relations .. (missing whitespace after FROM) your config looks okay. following questions: * is there a field named specie in your schema? (otherwise dih will silently ignore it) * did you check your mysql-query log? to see which queries were executed and what their result is? And, just as quick notice .. there is no need to use field column=foo name=foo (while both attribute have the same value). Regards Stefan On Mon, Feb 28, 2011 at 9:52 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my dataimport to work correctly but I'm a little unclear as to how the entity within an entity works in regards to search results. When I do a search for all results, it seems only the outermost responses are returned. For example, I have the following in my db config file: dataConfig dataSource type=JdbcDataSource name=mystuff batchSize=-1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db?characterEncoding=UTF8amp;zeroDateTimeBehavior=convertToNull user=user password=password/ document entity name=animal dataSource=mystuff query=SELECT * FROM animals field column=id name=id / field column=type name=type / field column=genus name=genus / !-- Add in the species -- entity name=specie_relations dataSource=mystuff query=SELECT specie_id FROMspecie_relations WHERE animal_id=${animal.id} entity name=species dataSource=mystuff query=SELECT specie FROM species WHERE id=${specie_relations.specie_id} field column=specie name=specie / /entity /entity /entity /document /dataSource /dataConfig However, specie never shows up in my search results: doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc I had hoped the results would include the species. Can it? If so, what is my malfunction?
Re: Disabling caching for fq param?
If filterCache hitratio is low then just disable it in solrconfig by deleting the section or setting its values to 0. Based on what I've read here and what I could find on the web, it seems that each fq clause essentially gets its own results cache. Is that correct? We have a corporate policy of passing the user's Oracle OLS labels into the index in order to be matched against the labels field. I currently separate this from the user's query text by sticking it into an fq param... ?q=user-entered expression fq=labels:the label values expression qf=song metadata copy field song lyrics field tie=0.1 defType=dismax ...but since its value (a collection of hundreds of label values) only apply to that user, the accompanying result set won't be reusable by other users: My understanding is that this query will result in two result sets (q and fq) being cached separately, with the union of the two sets being returned to the user. (Is that correct?) There are thousands of users, each with a unique combination of labels, so there seems to be little value in caching the result set created from the fq labels param. It would be beneficial if there were some kind of fq parameter override to indicate to Solr to not cache the results? Thanks!
Re: Problem with sorting using functions.
Also, if you're on 3.1, the function needs to be without spaces since sort will split on space to find the sort order. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 28. feb. 2011, at 22.34, John Sherwood wrote: Fair call. Thanks. On Tue, Mar 1, 2011 at 8:21 AM, Geert-Jan Brits gbr...@gmail.com wrote: sort by functionquery is only available from solr 3.1 (from : http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function) 2011/2/28 John Sherwood j...@storecrowd.com This works: /select/?q=*:*sort=price desc This throws a 400 error: /select/?q=*:*sort=sum(1, 1) desc Missing sort order. I'm using 1.4.2. I've tried all sorts of different numbers, functions, and fields but nothing seems to change that error. Any ideas?
Re: multi-core solr, specifying the data directory
Have you tried removing the dataDir tag from solrconfig.xml? Then it should fall back to default ./data relative to core instancedir. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 1. mars 2011, at 00.00, Jonathan Rochkind wrote: Unless I'm doing something wrong, in my experience in multi-core Solr in 1.4.1, you NEED to explicitly provide an absolute path to the 'data' dir. I set up multi-core like this: cores adminPath=/admin/cores core name=some_core instanceDir=some_core /core /cores Now, setting instanceDir like that works for Solr to look for the 'conf' directory in the default location you'd expect, ./some_core/conf. You'd expect it to look for the 'data' dir for an index in ./some_core/data too, by default. But it does not seem to. It's still looking for the 'data' directory in the _main_ solr.home/data, not under the relevant core directory. The only way I can manage to get it to look for the /data directory where I expect is to spell it out with a full absolute path: core name=some_core instanceDir=some_core property name=dataDir value=/path/to/main/solr/some_core/data / /core And then in the solrconfig.xml do a dataDir${dataDir}/dataDir Is this what everyone else does too? Or am I missing a better way of doing this? I would have thought it would just work, with Solr by default looking for a ./data subdir of the specified instanceDir. But it definitely doesn't seem to do that. Should it? Anyone know if Solr in trunk past 1.4.1 has been changed to do what I expect? Or am I wrong to expect it? Or does everyone else do multi-core in some different way than me where this doesn't come up? Jonathan
Re: Problem with Solr and Nutch integration
Hi Anurag The request handler has been added the solrconfig file. I'll try your attached requesthandler and see if that helps. Interestingly enough the whole setup when I was using nutch 1.2/solr 1.4.1. It is only since moving to nutch trunk/solr branch_3x that the problem has occurred. I assume that something has changed inbetween and the tutorial's request handler is incorrect for the later solr version. Which versions of solr/nutch are you using? Assuming the catalina.out file is the correct log file the output I get is shown below. This output occurs on restarting the solr-example after adding the new requesthandler. When I access the solr admin page no additional logging occurs. Can any one see the problem? Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /opt/solr/example/solr Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/opt/solr/example/solr/' Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /opt/solr/example/solr/./lib Feb 28, 2011 6:28:59 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /opt/solr/example/solr Feb 28, 2011 6:28:59 PM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /opt/solr/example/solr/solr.xml Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /opt/solr/example/solr Feb 28, 2011 6:28:59 PM org.apache.solr.core.CoreContainer init INFO: New CoreContainer: solrHome=/opt/solr/example/solr/ instance=6794958 Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/opt/solr/example/solr/' Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /opt/solr/example/solr/./lib Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/opt/solr/example/solr/./' Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /opt/solr/example/solr/././lib Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrConfig initLibs INFO: Adding specified lib dirs to ClassLoader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/commons-compress-1.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/log4j-1.2.14.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/commons-logging-1.1.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/tika-parsers-0.8.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/asm-3.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/icu4j-4_6.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/xercesImpl-2.8.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/fontbox-1.3.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/poi-3.7.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/dom4j-1.6.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/poi-ooxml-3.7.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/xml-apis-1.0.b2.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader
Re: Problem with Solr and Nutch integration
i have nutch-1.0 and Apache-solr-1.3.0 (integrated these two). On 3/1/11, Paul Rogers [via Lucene] ml-node+2601915-1461428819-146...@n3.nabble.com wrote: Hi Anurag The request handler has been added the solrconfig file. I'll try your attached requesthandler and see if that helps. Interestingly enough the whole setup when I was using nutch 1.2/solr 1.4.1. It is only since moving to nutch trunk/solr branch_3x that the problem has occurred. I assume that something has changed inbetween and the tutorial's request handler is incorrect for the later solr version. Which versions of solr/nutch are you using? Assuming the catalina.out file is the correct log file the output I get is shown below. This output occurs on restarting the solr-example after adding the new requesthandler. When I access the solr admin page no additional logging occurs. Can any one see the problem? Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /opt/solr/example/solr Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/opt/solr/example/solr/' Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /opt/solr/example/solr/./lib Feb 28, 2011 6:28:59 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /opt/solr/example/solr Feb 28, 2011 6:28:59 PM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /opt/solr/example/solr/solr.xml Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /opt/solr/example/solr Feb 28, 2011 6:28:59 PM org.apache.solr.core.CoreContainer init INFO: New CoreContainer: solrHome=/opt/solr/example/solr/ instance=6794958 Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/opt/solr/example/solr/' Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /opt/solr/example/solr/./lib Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/opt/solr/example/solr/./' Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /opt/solr/example/solr/././lib Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrConfig initLibs INFO: Adding specified lib dirs to ClassLoader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/commons-compress-1.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/log4j-1.2.14.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/commons-logging-1.1.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/tika-parsers-0.8.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/asm-3.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/icu4j-4_6.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/xercesImpl-2.8.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/fontbox-1.3.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/poi-3.7.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/dom4j-1.6.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/geronimo-stax-api_1.0_spec-1.0.1.jar' to classloader Feb 28, 2011 6:28:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/opt/solr/contrib/extraction/lib/poi-ooxml-3.7.jar' to classloader
Error during auto-warming of key
Hi, Yesterday's error log contains something peculiar: ERROR [solr.search.SolrCache] - [pool-29-thread-1] - : Error during auto- warming of key:+*:* (1.0/(7.71E-8*float(ms(const(1298682616680),date(sort_date)))+1.0))^20.0:java.lang.NullPointerException at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36) at org.apache.lucene.search.FieldCacheImpl$Entry.init(FieldCacheImpl.java:275) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:525) at org.apache.solr.search.function.LongFieldSource.getValues(LongFieldSource.java:57) at org.apache.solr.search.function.DualFloatFunction.getValues(DualFloatFunction.java:48) at org.apache.solr.search.function.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:61) at org.apache.solr.search.function.FunctionQuery$AllScorer.init(FunctionQuery.java:123) at org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:297) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:246) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:651) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:545) at org.apache.solr.search.SolrIndexSearcher.cacheDocSet(SolrIndexSearcher.java:520) at org.apache.solr.search.SolrIndexSearcher$2.regenerateItem(SolrIndexSearcher.java:296) at org.apache.solr.search.FastLRUCache.warm(FastLRUCache.java:168) at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481) at org.apache.solr.core.SolrCore$2.call(SolrCore.java:1131) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Well, i use Dismax' bf parameter to boost very recent documents. I'm not using the queryResultCache or documentCache, only filterCache and Lucene fieldCache. I've check LUCENE-1890 but am unsure if that's the issue. Anyt thoughts on this one? https://issues.apache.org/jira/browse/LUCENE-1890 Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Problem with sorting using functions
This works: /select/?q=*:*sort=price desc This throws a 400 error: /select/?q=*:*sort=sum(1, 1) desc Missing sort order. I'm using 1.4.2. I've tried all sorts of different numbers/functions/fields and nothing seems to change that error. Any ideas? **
Retrieving payload from each highlighted term
How can I get the payload from each highlighted term?
RE: Query on multivalue field
Hi Scott, Querying against a multi-valued field just works - no special incantation required. Steve -Original Message- From: Scott Yeadon [mailto:scott.yea...@anu.edu.au] Sent: Monday, February 28, 2011 11:50 PM To: solr-user@lucene.apache.org Subject: Query on multivalue field Hi, I have a variable number of text-based fields associated with each primary record which I wanted to apply a search across. I wanted to avoid the use of dynamic fields if possible or having to create a different document type in the index (as the app is based around the primary record and different views mean a lot of work to revamp pagination etc). So, is there a way to apply a query to each value of a multivalued field or is it always treated as a single field from a query perspective? Thanks. Scott.
Help with explain query syntax
Hello, I can't understand why this query is not matching anything. Could someone help me please? *Query* http://localhost:8894/solr/select?q=linguajob.plqf=company_namewt=xmlqt=dismaxdebugQuery=onexplainOther=id%3A1 response - lst name=responseHeader int name=status0/int int name=QTime12/int - lst name=params str name=explainOtherid:1/str str name=debugQueryon/str str name=qlinguajob.pl/str str name=qfcompany_name/str str name=wtxml/str str name=qtdismax/str /lst /lst result name=response numFound=0 start=0/ - lst name=debug str name=rawquerystringlinguajob.pl/str str name=querystringlinguajob.pl/str - str name=parsedquery +DisjunctionMaxQuery((company_name:(linguajob.pl linguajob) pl)~0.01) () /str - str name=parsedquery_toString +(company_name:(linguajob.pl linguajob) pl)~0.01 () /str lst name=explain/ str name=otherQueryid:1/str - lst name=explainOther - str name=1 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s) 0.0 = no match on required clause (company_name:(linguajob.pl linguajob) pl) *- What does this syntax (field:(token1 token2) token3) mean?* 0.0 = (NON-MATCH) fieldWeight(company_name:(linguajob.pl linguajob) pl in 0), product of: 0.0 = tf(phraseFreq=0.0) 1.6137056 = idf(company_name:(linguajob.pl linguajob) pl) 0.4375 = fieldNorm(field=company_name, doc=0) /str /lst str name=QParserDisMaxQParser/str null name=altquerystring/ null name=boostfuncs/ + lst name=timing ... /response There's only one document indexed: *Document* http://localhost:8894/solr/select?q=1qf=idwt=xmlqt=dismax response - lst name=responseHeader int name=status0/int int name=QTime2/int - lst name=params str name=qfid/str str name=wtxml/str str name=qtdismax/str str name=q1/str /lst /lst - result name=response numFound=1 start=0 - doc str name=company_nameLinguaJob.pl/str str name=id1/str int name=status6/int date name=timestamp2011-03-01T11:14:24.553Z/date /doc /result /response *Solr Admin Schema* Field: company_name Field Type: text Properties: Indexed, Tokenized, Stored Schema: Indexed, Tokenized, Stored Index: Indexed, Tokenized, Stored Position Increment Gap: 100 Index Analyzer: org.apache.solr.analysis.TokenizerChain Details Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: schema.UnicodeNormalizationFilterFactory args:{composed: false remove_modifiers: true fold: true version: java6 remove_diacritics: true } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Query Analyzer: org.apache.solr.analysis.TokenizerChain Details Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: schema.UnicodeNormalizationFilterFactory args:{composed: false remove_modifiers: true fold: true version: java6 remove_diacritics: true } org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt expand: true ignoreCase: true } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Docs: 1 Distinct: 5 Top 5 terms term frequency lingua 1 linguajob.pl 1 linguajobpl 1 pl 1 job 1 *Solr Analysis* Field name: company_name Field value (Index): LinguaJob.pl Field value (Query): linguajob.pl *Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text LinguaJob.pl term type word source start,end 0,12 payload schema.UnicodeNormalizationFilterFactory {composed=false, remove_modifiers=true, fold=true, version=java6, remove_diacritics=true} term position 1 term text LinguaJob.pl term type word source start,end 0,12 payload org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true, enablePositionIncrements=true} term position 1 term text LinguaJob.pl term type word source start,end 0,12 payload org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1, splitOnCaseChange=1, generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1} term position 123 term text LinguaJob.plJobpl LinguaLinguaJobpl term type wordwordword wordword source start,end 0,126,910,12 0,60,12 payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position 123 term text linguajob.pljobpl lingualinguajobpl term type wordwordword wordword source start,end 0,126,910,12 0,60,12 payload
Re: multi-core solr, specifying the data directory
I did try that, yes. I tried that first in fact! It seems to fall back to a ./data directory relative to the _main_ solr directory (the one above all the cores), not the core instancedir. Which is not what I expected either. I wonder if this should be considered a bug? I wonder if anyone has considered this and thought of changing/fixing it? On 3/1/2011 4:23 AM, Jan Høydahl wrote: Have you tried removing thedataDir tag from solrconfig.xml? Then it should fall back to default ./data relative to core instancedir. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 1. mars 2011, at 00.00, Jonathan Rochkind wrote: Unless I'm doing something wrong, in my experience in multi-core Solr in 1.4.1, you NEED to explicitly provide an absolute path to the 'data' dir. I set up multi-core like this: cores adminPath=/admin/cores core name=some_core instanceDir=some_core /core /cores Now, setting instanceDir like that works for Solr to look for the 'conf' directory in the default location you'd expect, ./some_core/conf. You'd expect it to look for the 'data' dir for an index in ./some_core/data too, by default. But it does not seem to. It's still looking for the 'data' directory in the _main_ solr.home/data, not under the relevant core directory. The only way I can manage to get it to look for the /data directory where I expect is to spell it out with a full absolute path: core name=some_core instanceDir=some_core property name=dataDir value=/path/to/main/solr/some_core/data / /core And then in the solrconfig.xml do adataDir${dataDir}/dataDir Is this what everyone else does too? Or am I missing a better way of doing this? I would have thought it would just work, with Solr by default looking for a ./data subdir of the specified instanceDir. But it definitely doesn't seem to do that. Should it? Anyone know if Solr in trunk past 1.4.1 has been changed to do what I expect? Or am I wrong to expect it? Or does everyone else do multi-core in some different way than me where this doesn't come up? Jonathan
MLT with boost
Is it possible to add function queries/boosts to the results that are by MLT? If not out of the box how would one go about achieving this functionality? Thanks
Re: please make JSONWriter public
You may have noticed the ResponseWriter code is pretty hairy! Things are package protected so that the API can change between minor release without concern for back compatibility. In 4.0 (/trunk) I hope to rework the whole ResponseWriter framework so that it is more clean and hopefully stable enough that making parts public is helpful. For now, you can: - copy the code - put your class in the same package name - make it public in your own distribution ryan On Mon, Feb 28, 2011 at 2:56 PM, Paul Libbrecht p...@hoplahup.net wrote: Hello fellow SOLR experts, may I ask to make top-level and public the class org.apache.solr.request.JSONWriter inside org.apache.solr.request.JSONResponseWriter I am re-using it to output JSON search result to code that I wish not to change on the client but the current visibility settings (JSONWriter is package protected) makes it impossible for me without actually copying the code (which is possible thanks to the good open-source nature). thanks in advance paul
Re: Sub entities
Yes, it looks like I had left off the field (misspelled it actually). I reran the full import and the fields did properly show up. However, it is still not working as expected. Using the example below, a result returned would only list one specie instead of a list of species. I have the following in my schema.xml file: field column=specie multiValued=true name=specie type=string indexed=true stored=true required=false / I reran the fullimport but it is still only listing one specie instead of multiple. Is my above declaration incorrect? On Tue, Mar 1, 2011 at 3:41 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Brian, except for your sql-syntax error in the specie_relations-query SELECT specie_id FROMspecie_relations .. (missing whitespace after FROM) your config looks okay. following questions: * is there a field named specie in your schema? (otherwise dih will silently ignore it) * did you check your mysql-query log? to see which queries were executed and what their result is? And, just as quick notice .. there is no need to use field column=foo name=foo (while both attribute have the same value). Regards Stefan On Mon, Feb 28, 2011 at 9:52 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my dataimport to work correctly but I'm a little unclear as to how the entity within an entity works in regards to search results. When I do a search for all results, it seems only the outermost responses are returned. For example, I have the following in my db config file: dataConfig dataSource type=JdbcDataSource name=mystuff batchSize=-1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db?characterEncoding=UTF8amp;zeroDateTimeBehavior=convertToNull user=user password=password/ document entity name=animal dataSource=mystuff query=SELECT * FROM animals field column=id name=id / field column=type name=type / field column=genus name=genus / !-- Add in the species -- entity name=specie_relations dataSource=mystuff query=SELECT specie_id FROM specie_relations WHERE animal_id=${animal.id} entity name=species dataSource=mystuff query=SELECT specie FROM species WHERE id=${specie_relations.specie_id} field column=specie name=specie / /entity /entity /entity /document /dataSource /dataConfig However, specie never shows up in my search results: doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc I had hoped the results would include the species. Can it? If so, what is my malfunction?
Re: Indexed, but cannot search
Thank you for your reply but the searching is still not working out. For example, when I go to: http://localhost:8983/solr/select/?q=*%3A*http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on I get the following as a response: result name=response numFound=249943 start=0 doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc /response (plus some other docs but one is enough for this example) But if I go to http://localhost:8983/solr/select/?q=type%3Ahttp://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on Mammal I only get: result name=response numFound=0 start=0 But it seems that should return at least the result I have listed above. What am I doing incorrectly? On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk wrote: q=dog is equivalent to q=text:dog (where the default search field is defined as text at the bottom of schema.xml). If you want to specify a different field, well, you need to tell it :-) Is that it? Upayavira On Mon, 28 Feb 2011 15:38 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my installation of Solr indexed using dataimport. However, I cannot seem to get search working. I can verify that the data is there by going to: http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on This gives me the response: result name=response numFound=234961 start=0 But when I go to http://localhost:8983/solr/select/?q=dogversion=2.2start=0rows=10indent=on I get the response: result name=response numFound=0 start=0 I know that dog should return some results because it is the first result when I select all the records. So what am I doing incorrectly that would prevent me from seeing results? --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Sub entities
Brian, On Tue, Mar 1, 2011 at 4:52 PM, Brian Lamb brian.l...@journalexperts.com wrote: field column=specie multiValued=true name=specie type=string indexed=true stored=true required=false / Not sure, but iirc field in this context has no column-Attribute .. that should normally not break your solr-configuration. Are you sure, that your animal has multiple species assigned? Checked the Query from the MySQL-Query-Log and verified that it returns more than one record? Otherwise you could enable http://wiki.apache.org/solr/DataImportHandler#LogTransformer for your dataimport, which outputs a log-row for every record .. just to ensure, that your Query-Results is correctly imported HTH, Regards Stefan
Re: Indexed, but cannot search
Hi, i'm not sure if it is a typo, anyway the second query you mentioned should be: http://localhost:8983/solr/select/?q=type:* HTH, Edo On Tue, Mar 1, 2011 at 4:06 PM, Brian Lamb brian.l...@journalexperts.comwrote: Thank you for your reply but the searching is still not working out. For example, when I go to: http://localhost:8983/solr/select/?q=*%3A* http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on I get the following as a response: result name=response numFound=249943 start=0 doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc /response (plus some other docs but one is enough for this example) But if I go to http://localhost:8983/solr/select/?q=type%3A http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on Mammal I only get: result name=response numFound=0 start=0 But it seems that should return at least the result I have listed above. What am I doing incorrectly? On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk wrote: q=dog is equivalent to q=text:dog (where the default search field is defined as text at the bottom of schema.xml). If you want to specify a different field, well, you need to tell it :-) Is that it? Upayavira On Mon, 28 Feb 2011 15:38 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my installation of Solr indexed using dataimport. However, I cannot seem to get search working. I can verify that the data is there by going to: http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on This gives me the response: result name=response numFound=234961 start=0 But when I go to http://localhost:8983/solr/select/?q=dogversion=2.2start=0rows=10indent=on I get the response: result name=response numFound=0 start=0 I know that dog should return some results because it is the first result when I select all the records. So what am I doing incorrectly that would prevent me from seeing results? --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source -- Edoardo Tosca Sourcesense - making sense of Open Source: http://www.sourcesense.com
Re: Indexed, but cannot search
Next question, do you have your type field set to index=true in your schema? Upayavira On Tue, 01 Mar 2011 11:06 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Thank you for your reply but the searching is still not working out. For example, when I go to: http://localhost:8983/solr/select/?q=*%3A*http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on I get the following as a response: result name=response numFound=249943 start=0 doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc /response (plus some other docs but one is enough for this example) But if I go to http://localhost:8983/solr/select/?q=type%3Ahttp://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on Mammal I only get: result name=response numFound=0 start=0 But it seems that should return at least the result I have listed above. What am I doing incorrectly? On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk wrote: q=dog is equivalent to q=text:dog (where the default search field is defined as text at the bottom of schema.xml). If you want to specify a different field, well, you need to tell it :-) Is that it? Upayavira On Mon, 28 Feb 2011 15:38 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my installation of Solr indexed using dataimport. However, I cannot seem to get search working. I can verify that the data is there by going to: http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on This gives me the response: result name=response numFound=234961 start=0 But when I go to http://localhost:8983/solr/select/?q=dogversion=2.2start=0rows=10indent=on I get the response: result name=response numFound=0 start=0 I know that dog should return some results because it is the first result when I select all the records. So what am I doing incorrectly that would prevent me from seeing results? --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Question on writing custom UpdateHandler
In your first attempt, the crux of your problem was probably that you were never closing the searcher/reader. : Or how can I perform a query on the current state of the index from within an : UpdateProcessor? If you implement UpdateRequestProcessorFactory, the getInstance method is given the SolrQueryRequest, which you cna use to access the current SolrIndexSearcher. this will only show you the state of the index as of the last commit, so it won't be real time as you are streaming new documents, but if will give you the same results as a search query happening concurrent to your update. -Hoss
Re: multi-core solr, specifying the data directory
: Unless I'm doing something wrong, in my experience in multi-core Solr in : 1.4.1, you NEED to explicitly provide an absolute path to the 'data' dir. have you looked at the example/multicore directory that was included in the 1.4.1 release? it has a solr.xml that loads two cores w/o specifying a data dir in the solr.xml (or hte solrconfig.xml) and it uses the data dir inside the specified instanceDir. If that example works for you, but your own configs do not, then we'll need more details about your own configs -- how are you running solr, what does the solrconfig.xml of the core look like, etc... -Hoss
Re: please make JSONWriter public
Ryan, honestly, hairyness was rather mild. I found it fairly readable. paul Le 1 mars 2011 à 16:46, Ryan McKinley a écrit : You may have noticed the ResponseWriter code is pretty hairy! Things are package protected so that the API can change between minor release without concern for back compatibility. In 4.0 (/trunk) I hope to rework the whole ResponseWriter framework so that it is more clean and hopefully stable enough that making parts public is helpful. For now, you can: - copy the code - put your class in the same package name - make it public in your own distribution ryan On Mon, Feb 28, 2011 at 2:56 PM, Paul Libbrecht p...@hoplahup.net wrote: Hello fellow SOLR experts, may I ask to make top-level and public the class org.apache.solr.request.JSONWriter inside org.apache.solr.request.JSONResponseWriter I am re-using it to output JSON search result to code that I wish not to change on the client but the current visibility settings (JSONWriter is package protected) makes it impossible for me without actually copying the code (which is possible thanks to the good open-source nature). thanks in advance paul
solr different sizes on master and slave
I was curious why would the size be dramatically different even though the index versions are the same? One is 1.2 Gb, and on the slave it is 512 MB I would think they should both be the same size no? Thanks
Re: Sub entities
Thanks for the help Stefan. It seems removing column=specie fixed it. On Tue, Mar 1, 2011 at 11:18 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Brian, On Tue, Mar 1, 2011 at 4:52 PM, Brian Lamb brian.l...@journalexperts.com wrote: field column=specie multiValued=true name=specie type=string indexed=true stored=true required=false / Not sure, but iirc field in this context has no column-Attribute .. that should normally not break your solr-configuration. Are you sure, that your animal has multiple species assigned? Checked the Query from the MySQL-Query-Log and verified that it returns more than one record? Otherwise you could enable http://wiki.apache.org/solr/DataImportHandler#LogTransformer for your dataimport, which outputs a log-row for every record .. just to ensure, that your Query-Results is correctly imported HTH, Regards Stefan
Re: Indexed, but cannot search
Hi all, The problem was that my fields were defined as type=string instead of type=text. Once I corrected that, it seems to be fixed. The only part that still is not working though is the search across all fields. For example: http://localhost:8983/solr/select/?q=type%3AMammal Now correctly returns the records matching mammal. But if I try to do a global search across all fields: http://localhost:8983/solr/select/?q=Mammal http://localhost:8983/solr/select/?q=text%3AMammal I get no results returned. Here is how the schema is set up: field name=text type=text indexed=true stored=false multiValued=true/ defaultSearchFieldtext/defaultSearchField copyField source=* dest=text / Thanks to everyone for your help so far. I think this is the last hurdle I have to jump over. On Tue, Mar 1, 2011 at 12:34 PM, Upayavira u...@odoko.co.uk wrote: Next question, do you have your type field set to index=true in your schema? Upayavira On Tue, 01 Mar 2011 11:06 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Thank you for your reply but the searching is still not working out. For example, when I go to: http://localhost:8983/solr/select/?q=*%3A* http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on I get the following as a response: result name=response numFound=249943 start=0 doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc /response (plus some other docs but one is enough for this example) But if I go to http://localhost:8983/solr/select/?q=type%3A http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on Mammal I only get: result name=response numFound=0 start=0 But it seems that should return at least the result I have listed above. What am I doing incorrectly? On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk wrote: q=dog is equivalent to q=text:dog (where the default search field is defined as text at the bottom of schema.xml). If you want to specify a different field, well, you need to tell it :-) Is that it? Upayavira On Mon, 28 Feb 2011 15:38 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my installation of Solr indexed using dataimport. However, I cannot seem to get search working. I can verify that the data is there by going to: http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on This gives me the response: result name=response numFound=234961 start=0 But when I go to http://localhost:8983/solr/select/?q=dogversion=2.2start=0rows=10indent=on I get the response: result name=response numFound=0 start=0 I know that dog should return some results because it is the first result when I select all the records. So what am I doing incorrectly that would prevent me from seeing results? --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Indexed, but cannot search
Traditionally, people forget to reindex ;) Hi all, The problem was that my fields were defined as type=string instead of type=text. Once I corrected that, it seems to be fixed. The only part that still is not working though is the search across all fields. For example: http://localhost:8983/solr/select/?q=type%3AMammal Now correctly returns the records matching mammal. But if I try to do a global search across all fields: http://localhost:8983/solr/select/?q=Mammal http://localhost:8983/solr/select/?q=text%3AMammal I get no results returned. Here is how the schema is set up: field name=text type=text indexed=true stored=false multiValued=true/ defaultSearchFieldtext/defaultSearchField copyField source=* dest=text / Thanks to everyone for your help so far. I think this is the last hurdle I have to jump over. On Tue, Mar 1, 2011 at 12:34 PM, Upayavira u...@odoko.co.uk wrote: Next question, do you have your type field set to index=true in your schema? Upayavira On Tue, 01 Mar 2011 11:06 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Thank you for your reply but the searching is still not working out. For example, when I go to: http://localhost:8983/solr/select/?q=*%3A* http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on I get the following as a response: result name=response numFound=249943 start=0 doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc /response (plus some other docs but one is enough for this example) But if I go to http://localhost:8983/solr/select/?q=type%3A http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on Mammal I only get: result name=response numFound=0 start=0 But it seems that should return at least the result I have listed above. What am I doing incorrectly? On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk wrote: q=dog is equivalent to q=text:dog (where the default search field is defined as text at the bottom of schema.xml). If you want to specify a different field, well, you need to tell it :-) Is that it? Upayavira On Mon, 28 Feb 2011 15:38 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my installation of Solr indexed using dataimport. However, I cannot seem to get search working. I can verify that the data is there by going to: http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on This gives me the response: result name=response numFound=234961 start=0 But when I go to http://localhost:8983/solr/select/?q=dogversion=2.2start=0rows=10inde nt=on I get the response: result name=response numFound=0 start=0 I know that dog should return some results because it is the first result when I select all the records. So what am I doing incorrectly that would prevent me from seeing results? --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: solr different sizes on master and slave
Are there pending commits on the master? I was curious why would the size be dramatically different even though the index versions are the same? One is 1.2 Gb, and on the slave it is 512 MB I would think they should both be the same size no? Thanks
Re: Indexed, but cannot search
Oh if only it were that easy :-). I have reindexed since making that change which is how I was able to get the regular search working. I have not however been able to get the search across all fields to work. On Tue, Mar 1, 2011 at 3:01 PM, Markus Jelsma markus.jel...@openindex.iowrote: Traditionally, people forget to reindex ;) Hi all, The problem was that my fields were defined as type=string instead of type=text. Once I corrected that, it seems to be fixed. The only part that still is not working though is the search across all fields. For example: http://localhost:8983/solr/select/?q=type%3AMammal Now correctly returns the records matching mammal. But if I try to do a global search across all fields: http://localhost:8983/solr/select/?q=Mammal http://localhost:8983/solr/select/?q=text%3AMammal I get no results returned. Here is how the schema is set up: field name=text type=text indexed=true stored=false multiValued=true/ defaultSearchFieldtext/defaultSearchField copyField source=* dest=text / Thanks to everyone for your help so far. I think this is the last hurdle I have to jump over. On Tue, Mar 1, 2011 at 12:34 PM, Upayavira u...@odoko.co.uk wrote: Next question, do you have your type field set to index=true in your schema? Upayavira On Tue, 01 Mar 2011 11:06 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Thank you for your reply but the searching is still not working out. For example, when I go to: http://localhost:8983/solr/select/?q=*%3A* http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on I get the following as a response: result name=response numFound=249943 start=0 doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc /response (plus some other docs but one is enough for this example) But if I go to http://localhost:8983/solr/select/?q=type%3A http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on Mammal I only get: result name=response numFound=0 start=0 But it seems that should return at least the result I have listed above. What am I doing incorrectly? On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk wrote: q=dog is equivalent to q=text:dog (where the default search field is defined as text at the bottom of schema.xml). If you want to specify a different field, well, you need to tell it :-) Is that it? Upayavira On Mon, 28 Feb 2011 15:38 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my installation of Solr indexed using dataimport. However, I cannot seem to get search working. I can verify that the data is there by going to: http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on This gives me the response: result name=response numFound=234961 start=0 But when I go to http://localhost:8983/solr/select/?q=dogversion=2.2start=0rows=10inde nt=on I get the response: result name=response numFound=0 start=0 I know that dog should return some results because it is the first result when I select all the records. So what am I doing incorrectly that would prevent me from seeing results? --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: solr different sizes on master and slave
No pending commits, what it looks like is there are almost two copies of the index on the master, not sure how that happened. On Tue, Mar 1, 2011 at 3:08 PM, Markus Jelsma markus.jel...@openindex.io wrote: Are there pending commits on the master? I was curious why would the size be dramatically different even though the index versions are the same? One is 1.2 Gb, and on the slave it is 512 MB I would think they should both be the same size no? Thanks
numberic or string type for non-sortable field?
I wonder if i shall use solr int or string for such field with following requirement multi-value facet needed sort not needed The field value is a an id. Therefore, i can store as either numeric field or just a string. Shall i choose string for efficiency? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/numberic-or-string-type-for-non-sortable-field-tp2606353p2606353.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: numberic or string type for non-sortable field?
I wonder if i shall use solr int or string for such field with following requirement multi-value facet needed sort not needed The field value is a an id. Therefore, i can store as either numeric field or just a string. Shall i choose string for efficiency? Trie based integer (tint) is preferred for faster faceting.
Re: Query on multivalue field
Thanks, but just to confirm the way multiValued fields work: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = some text...termA...more text and value 2 = some text...termB...more text and do a search such as field1:(termA termB) (where solrQueryParser defaultOperator=AND/) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Scott. On 2/03/11 12:35 AM, Steven A Rowe wrote: Hi Scott, Querying against a multi-valued field just works - no special incantation required. Steve -Original Message- From: Scott Yeadon [mailto:scott.yea...@anu.edu.au] Sent: Monday, February 28, 2011 11:50 PM To:solr-user@lucene.apache.org Subject: Query on multivalue field Hi, I have a variable number of text-based fields associated with each primary record which I wanted to apply a search across. I wanted to avoid the use of dynamic fields if possible or having to create a different document type in the index (as the app is based around the primary record and different views mean a lot of work to revamp pagination etc). So, is there a way to apply a query to each value of a multivalued field or is it always treated as a single field from a query perspective? Thanks. Scott.
Re: solr different sizes on master and slave
ok doing some more research I noticed, on the slave it has multiple folders where it keeps them for example index index.20110204010900 index.20110204013355 index.20110218125400 and then there is an index.properties that shows which index it is using. I am just curious why does it keep multiple copies? Is there a setting somewhere I can change to only keep one copy so not to lose space? Thanks On Tue, Mar 1, 2011 at 3:26 PM, Mike Franon kongfra...@gmail.com wrote: No pending commits, what it looks like is there are almost two copies of the index on the master, not sure how that happened. On Tue, Mar 1, 2011 at 3:08 PM, Markus Jelsma markus.jel...@openindex.io wrote: Are there pending commits on the master? I was curious why would the size be dramatically different even though the index versions are the same? One is 1.2 Gb, and on the slave it is 512 MB I would think they should both be the same size no? Thanks
Re: numberic or string type for non-sortable field?
: The field value is a an id. Therefore, i can store as : either numeric field : or just a string. Shall i choose string : for efficiency? : : Trie based integer (tint) is preferred for faster faceting. range faceting/filtering yes -- not for field faceting which is what i think he's asking about. in that case int would still proably be more efficient, but you don't want precision steps (that will introduce added terms) -Hoss
Re: multi-core solr, specifying the data directory
Hmm, okay, have to try to find time to install the example/multicore and see. It's definitely never worked for me, weird. Thanks. On 3/1/2011 2:38 PM, Chris Hostetter wrote: : Unless I'm doing something wrong, in my experience in multi-core Solr in : 1.4.1, you NEED to explicitly provide an absolute path to the 'data' dir. have you looked at the example/multicore directory that was included in the 1.4.1 release? it has a solr.xml that loads two cores w/o specifying a data dir in the solr.xml (or hte solrconfig.xml) and it uses the data dir inside the specified instanceDir. If that example works for you, but your own configs do not, then we'll need more details about your own configs -- how are you running solr, what does the solrconfig.xml of the core look like, etc... -Hoss
Re: solr different sizes on master and slave
The slave should not keep multiple copies _permanently_, but might temporarily after it's fetched the new files from master, but before it's committed them and fully wamred the new index searchers in the slave. Could that be what's going on, is your slave just still working on committing and warming the new version(s) of the index? [If you do 'commit' to slave (and a replication pull counts as a 'commit') so quick that you get overlapping commits before the slave was able to warm a new index... its' going to be trouble all around.] On 3/1/2011 4:27 PM, Mike Franon wrote: ok doing some more research I noticed, on the slave it has multiple folders where it keeps them for example index index.20110204010900 index.20110204013355 index.20110218125400 and then there is an index.properties that shows which index it is using. I am just curious why does it keep multiple copies? Is there a setting somewhere I can change to only keep one copy so not to lose space? Thanks On Tue, Mar 1, 2011 at 3:26 PM, Mike Franonkongfra...@gmail.com wrote: No pending commits, what it looks like is there are almost two copies of the index on the master, not sure how that happened. On Tue, Mar 1, 2011 at 3:08 PM, Markus Jelsma markus.jel...@openindex.io wrote: Are there pending commits on the master? I was curious why would the size be dramatically different even though the index versions are the same? One is 1.2 Gb, and on the slave it is 512 MB I would think they should both be the same size no? Thanks
Re: numberic or string type for non-sortable field?
Sorry i didn't make my question clear. I will only facet based on field value, not ranged query (it is just some ids for a multi-value field). And i won't do sort on the field either. In that case, is string more efficient for the requirement? -- View this message in context: http://lucene.472066.n3.nabble.com/numberic-or-string-type-for-non-sortable-field-tp2606353p2606762.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query on multivalue field
In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = some text...termA...more text and value 2 = some text...termB...more text and do a search such as field1:(termA termB) (where solrQueryParser defaultOperator=AND/) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Your best bet can be using positionIncrementGap and to issue a phrase query (implicit AND) with the appropriate slop value. Ff you have positionIncrementGap=100, you can simulate this with using q=field1:termA termB~100 http://search-lucene.com/m/Hbdvz1og7D71/
Re: numberic or string type for non-sortable field?
I will only facet based on field value, not ranged query (it is just some ids for a multi-value field). And i won't do sort on the field either. In that case, is string more efficient for the requirement? Hoss was saying to use, fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/
Searching all terms - SolrJ
Dear all, First I am sorry if this question has already been asked ( I am sure it was...) but I can't find the right option with solrj. I want to query only documents that contains ALL query terms. Let me take an example, I have 4 documents that are simple sequences ( they have only one field : text ): 1 : The cat is on the roof 2 : The dog is on the roof 3 : The cat is black 4 : the cat is black and on the roof if I search cat roof I will have doc 1,2,3,4 In my case I would like to have only : doc 1 and doc 4 (either cat or roof don't appear in doc 2 and 3). Is there a simple way to do that automatically with SolrJ or should I should something like : text:cat AND text:roof ? Thank you very much for your help ! Best regards, Victor
Re: Searching all terms - SolrJ
--- On Wed, 3/2/11, openvictor Open openvic...@gmail.com wrote: From: openvictor Open openvic...@gmail.com Subject: Searching all terms - SolrJ To: solr-user@lucene.apache.org Date: Wednesday, March 2, 2011, 12:20 AM Dear all, First I am sorry if this question has already been asked ( I am sure it was...) but I can't find the right option with solrj. I want to query only documents that contains ALL query terms. Let me take an example, I have 4 documents that are simple sequences ( they have only one field : text ): 1 : The cat is on the roof 2 : The dog is on the roof 3 : The cat is black 4 : the cat is black and on the roof if I search cat roof I will have doc 1,2,3,4 In my case I would like to have only : doc 1 and doc 4 (either cat or roof don't appear in doc 2 and 3). Is there a simple way to do that automatically with SolrJ or should I should something like : text:cat AND text:roof ? Thank you very much for your help ! You can use solrQueryParser defaultOperator=AND/ in your schema.xml
Re: Query on multivalue field
The only trick with this is ensuring the searches return the right results and don't go across value boundaries. If I set the gap to the largest text size we expect (approx 5000 chars) what impact does such a large value have (i.e. does Solr physically separate these fragments in the index or just apply the figure as part of any query? Scott. On 2/03/11 9:01 AM, Ahmet Arslan wrote: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = some text...termA...more text and value 2 = some text...termB...more text and do a search such as field1:(termA termB) (wheresolrQueryParser defaultOperator=AND/) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Your best bet can be using positionIncrementGap and to issue a phrase query (implicit AND) with the appropriate slop value. Ff you have positionIncrementGap=100, you can simulate this with using q=field1:termA termB~100 http://search-lucene.com/m/Hbdvz1og7D71/
Re: Distances in spatial search (Solr 4.0)
Hi Bill, I was using a different approach to sort by the distance with the dist() function, since geodist() is not documented on the wiki ( http://wiki.apache.org/solr/FunctionQuery) Tried something like: sort=dist(2, 45.15,-93.85, lat, lng) asc I made some tests with geodist() function as you pointed and got different results. Is it safe to assume that geodist() is the correct way of doing it? Also, can you clear up how can I see the distance using the _Val_ as you told? Thanks! Alexandre On Tue, Mar 1, 2011 at 12:03 AM, Bill Bell billnb...@gmail.com wrote: Use sort with geodist() to sort by distance. Getting the distance returned us documented on the wiki if you are not using score. see reference to _Val_ Bill Bell Sent from mobile On Feb 28, 2011, at 7:54 AM, Alexandre Rocco alel...@gmail.com wrote: Hi guys, We are implementing a separate index on our website, that will be dedicated to spatial search. I've downloaded a build of Solr 4.0 to try the spatial features and got the geodist working really fast. We now have 2 other features that will be needed on this project: 1. Returning the distance from the reference point to the search hit (in kilometers) 2. Sorting by the distance. On item 2, the wiki doc points that a distance function can be used but I was not able to find good info on how to accomplish it. Also, returning the distance (item 1) is noted as currently being in development and there is some workaround to get it. Anyone had experience with the spatial feature and could help with some pointers on how to achieve it? Thanks, Alexandre
Re: multi-core solr, specifying the data directory
I tried this in my 1.4.0 installation (commenting out what had been working, hoping the default would be as you said works in the example): solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=bpro instanceDir=bpro !-- property name=solr.data.dir value=solr/bpro/data/ -- /core core name=pfapp instanceDir=pfapp property name=solr.data.dir value=solr/pfapp/data/ /core /cores /solr In the log after starting up, I get these messages (among many others): ... Mar 1, 2011 7:51:23 PM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /usr/local/tomcat/solr/solr.xml Mar 1, 2011 7:51:23 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: No /solr/home in JNDI Mar 1, 2011 7:51:23 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) Mar 1, 2011 7:51:23 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/' Mar 1, 2011 7:51:23 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/bpro/' ... Mar 1, 2011 7:51:24 PM org.apache.solr.core.SolrCore init INFO: [bpro] Opening new SolrCore at solr/bpro/, dataDir=./solr/data/ ... Mar 1, 2011 7:51:25 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/pfapp/' ... Mar 1, 2011 7:51:26 PM org.apache.solr.core.SolrCore init INFO: [pfapp] Opening new SolrCore at solr/pfapp/, dataDir=solr/pfapp/data/ and it's pretty clearly using the wrong directory at that point. Some more details: /usr/local/tomcat has the usual tomcat distribution (this is 6.0.29) conf/server.xml has: Host name=localhost appBase=webapps unpackWARs=true autoDeploy=true xmlValidation=false xmlNamespaceAware=false Aliasrosen/Alias Aliasrosen.ifactory.com/Alias Context path= docBase=/usr/local/tomcat/webapps/solr / /Host There is a solrconfig.xml in each of the core directories (should there only be one of these?). I believe these are pretty generic (and they are identical); the one in the bpro folder has: !-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration . -- dataDir${solr.data.dir:./solr/data}/dataDir -Mike On 3/1/2011 4:38 PM, Jonathan Rochkind wrote: Hmm, okay, have to try to find time to install the example/multicore and see. It's definitely never worked for me, weird. Thanks. On 3/1/2011 2:38 PM, Chris Hostetter wrote: : Unless I'm doing something wrong, in my experience in multi-core Solr in : 1.4.1, you NEED to explicitly provide an absolute path to the 'data' dir. have you looked at the example/multicore directory that was included in the 1.4.1 release? it has a solr.xml that loads two cores w/o specifying a data dir in the solr.xml (or hte solrconfig.xml) and it uses the data dir inside the specified instanceDir. If that example works for you, but your own configs do not, then we'll need more details about your own configs -- how are you running solr, what does the solrconfig.xml of the core look like, etc... -Hoss
Re: Query on multivalue field
Each token has a position set on it. So if you index the value alpha beta gamma, it winds up stored in Solr as (sort of, for the way we want to look at it) document1: alpha:position 1 beta:position 2 gamma: postition 3 If you set the position increment gap large, then after one value in a multi-valued field ends, the position increment gap will be added to the positions for the next value. Solr doesn't actually internally have much of any idea of a multi-valued field, ALL a multi-valued indexed field is, is a position increment gap seperating tokens from different 'values'. So index in a multi-valued field, with position increment gap 1, the values: [alpha beta gamma, aleph bet], you get kind of like: document1: alpha: 1 beta: 2 gamma: 3 aleph: 10004 bet: 10005 A large position increment gap, as far as I know and can tell (please someone correct me if I'm wrong, I am not a Solr developer) has no effect on the size or efficiency of your index on disk. I am not sure why positionIncrementGap doesn't just default to a very large number, to provide behavior that more matches what people expect from the idea of a multi-valued field. So maybe there is some flaw in my understanding, that justifies some reason for it not to be this way? But I set my positionIncrementGap very large, and haven't seen any issues. On 3/1/2011 5:46 PM, Scott Yeadon wrote: The only trick with this is ensuring the searches return the right results and don't go across value boundaries. If I set the gap to the largest text size we expect (approx 5000 chars) what impact does such a large value have (i.e. does Solr physically separate these fragments in the index or just apply the figure as part of any query? Scott. On 2/03/11 9:01 AM, Ahmet Arslan wrote: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = some text...termA...more text and value 2 = some text...termB...more text and do a search such as field1:(termA termB) (wheresolrQueryParser defaultOperator=AND/) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Your best bet can be using positionIncrementGap and to issue a phrase query (implicit AND) with the appropriate slop value. Ff you have positionIncrementGap=100, you can simulate this with using q=field1:termA termB~100 http://search-lucene.com/m/Hbdvz1og7D71/
Re: multi-core solr, specifying the data directory
This definitely matches my own experience, and I've heard it from others. I haven't heard of anyone who HAS gotten it to work like that. But apparently there's a distributed multi-core example which claims to work like it doesn't for us. One of us has to try the Solr distro multi-core example, as Hoss suggested/asked, to see if the problem exhibits even there, and if not, figure out what the difference is. Sorry, haven't found time to figure out how to install and start up the demo. I am running in Tomcat, I wonder if container could matter, and maybe it somehow works in Jetty or something? Jonathan On 3/1/2011 7:05 PM, Michael Sokolov wrote: I tried this in my 1.4.0 installation (commenting out what had been working, hoping the default would be as you said works in the example): solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=bpro instanceDir=bpro !--property name=solr.data.dir value=solr/bpro/data/ -- /core core name=pfapp instanceDir=pfapp property name=solr.data.dir value=solr/pfapp/data/ /core /cores /solr In the log after starting up, I get these messages (among many others): ... Mar 1, 2011 7:51:23 PM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /usr/local/tomcat/solr/solr.xml Mar 1, 2011 7:51:23 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: No /solr/home in JNDI Mar 1, 2011 7:51:23 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) Mar 1, 2011 7:51:23 PM org.apache.solr.core.SolrResourceLoaderinit INFO: Solr home set to 'solr/' Mar 1, 2011 7:51:23 PM org.apache.solr.core.SolrResourceLoaderinit INFO: Solr home set to 'solr/bpro/' ... Mar 1, 2011 7:51:24 PM org.apache.solr.core.SolrCoreinit INFO: [bpro] Opening new SolrCore at solr/bpro/, dataDir=./solr/data/ ... Mar 1, 2011 7:51:25 PM org.apache.solr.core.SolrResourceLoaderinit INFO: Solr home set to 'solr/pfapp/' ... Mar 1, 2011 7:51:26 PM org.apache.solr.core.SolrCoreinit INFO: [pfapp] Opening new SolrCore at solr/pfapp/, dataDir=solr/pfapp/data/ and it's pretty clearly using the wrong directory at that point. Some more details: /usr/local/tomcat has the usual tomcat distribution (this is 6.0.29) conf/server.xml has: Host name=localhost appBase=webapps unpackWARs=true autoDeploy=true xmlValidation=false xmlNamespaceAware=false Aliasrosen/Alias Aliasrosen.ifactory.com/Alias Context path= docBase=/usr/local/tomcat/webapps/solr / /Host There is a solrconfig.xml in each of the core directories (should there only be one of these?). I believe these are pretty generic (and they are identical); the one in the bpro folder has: !-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration . -- dataDir${solr.data.dir:./solr/data}/dataDir -Mike On 3/1/2011 4:38 PM, Jonathan Rochkind wrote: Hmm, okay, have to try to find time to install the example/multicore and see. It's definitely never worked for me, weird. Thanks. On 3/1/2011 2:38 PM, Chris Hostetter wrote: : Unless I'm doing something wrong, in my experience in multi-core Solr in : 1.4.1, you NEED to explicitly provide an absolute path to the 'data' dir. have you looked at the example/multicore directory that was included in the 1.4.1 release? it has a solr.xml that loads two cores w/o specifying a data dir in the solr.xml (or hte solrconfig.xml) and it uses the data dir inside the specified instanceDir. If that example works for you, but your own configs do not, then we'll need more details about your own configs -- how are you running solr, what does the solrconfig.xml of the core look like, etc... -Hoss
[ANNOUNCE] Web Crawler
Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document processing pipeline * a solr indexer The crawler has a web administration in order to manage web sites to be crawled. Each web site crawl is configured with a lot of possible parameters (no all mandatory) : * number of simultaneous items crawled by site * recrawl period rules based on item type (html, PDF, …) * item type inclusion / exclusion rules * item path inclusion / exclusion / strategy rules * max depth * web site authentication * language * country * tags * collections * ... The pileline includes various ready to use stages (text extraction, language detection, Solr ready to index xml writer, ...). All is very configurable and extendible either by scripting or java coding. With scripting technology, you can help the crawler to handle javascript links or help the pipeline to extract relevant title and cleanup the html pages (remove menus, header, footers, ..) With java coding, you can develop your own pipeline stage stage The Crawl Anywhere web site provides good explanations and screen shots. All is documented in a wiki. The current version is 1.1.4. You can download and try it out from here : www.crawl-anywhere.com Regards Dominique
Re: Searching all terms - SolrJ
Yes but I want to leave the choice to the user. He can either search all the terms or just some. Is there any more flexible solution ? Even if I have to code it by hand ? 2011/3/1 Ahmet Arslan iori...@yahoo.com --- On Wed, 3/2/11, openvictor Open openvic...@gmail.com wrote: From: openvictor Open openvic...@gmail.com Subject: Searching all terms - SolrJ To: solr-user@lucene.apache.org Date: Wednesday, March 2, 2011, 12:20 AM Dear all, First I am sorry if this question has already been asked ( I am sure it was...) but I can't find the right option with solrj. I want to query only documents that contains ALL query terms. Let me take an example, I have 4 documents that are simple sequences ( they have only one field : text ): 1 : The cat is on the roof 2 : The dog is on the roof 3 : The cat is black 4 : the cat is black and on the roof if I search cat roof I will have doc 1,2,3,4 In my case I would like to have only : doc 1 and doc 4 (either cat or roof don't appear in doc 2 and 3). Is there a simple way to do that automatically with SolrJ or should I should something like : text:cat AND text:roof ? Thank you very much for your help ! You can use solrQueryParser defaultOperator=AND/ in your schema.xml
Re: Query on multivalue field
Tested it out and seems to work well as long as I set the gap to a value much longer than the text - 1 appear to work fine for our current data. Thanks heaps for all the help guys! Scott. On 2/03/11 11:13 AM, Jonathan Rochkind wrote: Each token has a position set on it. So if you index the value alpha beta gamma, it winds up stored in Solr as (sort of, for the way we want to look at it) document1: alpha:position 1 beta:position 2 gamma: postition 3 If you set the position increment gap large, then after one value in a multi-valued field ends, the position increment gap will be added to the positions for the next value. Solr doesn't actually internally have much of any idea of a multi-valued field, ALL a multi-valued indexed field is, is a position increment gap seperating tokens from different 'values'. So index in a multi-valued field, with position increment gap 1, the values: [alpha beta gamma, aleph bet], you get kind of like: document1: alpha: 1 beta: 2 gamma: 3 aleph: 10004 bet: 10005 A large position increment gap, as far as I know and can tell (please someone correct me if I'm wrong, I am not a Solr developer) has no effect on the size or efficiency of your index on disk. I am not sure why positionIncrementGap doesn't just default to a very large number, to provide behavior that more matches what people expect from the idea of a multi-valued field. So maybe there is some flaw in my understanding, that justifies some reason for it not to be this way? But I set my positionIncrementGap very large, and haven't seen any issues. On 3/1/2011 5:46 PM, Scott Yeadon wrote: The only trick with this is ensuring the searches return the right results and don't go across value boundaries. If I set the gap to the largest text size we expect (approx 5000 chars) what impact does such a large value have (i.e. does Solr physically separate these fragments in the index or just apply the figure as part of any query? Scott. On 2/03/11 9:01 AM, Ahmet Arslan wrote: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = some text...termA...more text and value 2 = some text...termB...more text and do a search such as field1:(termA termB) (wheresolrQueryParser defaultOperator=AND/) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Your best bet can be using positionIncrementGap and to issue a phrase query (implicit AND) with the appropriate slop value. Ff you have positionIncrementGap=100, you can simulate this with using q=field1:termA termB~100 http://search-lucene.com/m/Hbdvz1og7D71/
Re: numberic or string type for non-sortable field?
Can I know why? I thought solr is tuned for string if no sorting of facet by range query is needed. -- View this message in context: http://lucene.472066.n3.nabble.com/numberic-or-string-type-for-non-sortable-field-tp2606353p2607932.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr different sizes on master and slave
Indeed, the slave should not have useless copies but it does, at least in 1.4.0, i haven't seen it in 3.x, but that was just a small test that did not exactly meet my other production installs. In 1.4.0 Solr does not remove old copies at startup and it does not cleanly abort running replications at shutdown. Between shutdown and startup there might be a higher index version, it will then proceed as expected; download the new version and continue. Old copies will appear. There is an earlier thread i started but without patch. You can, however, work around the problem by letting Solr delete a running replication by: 1. disable polling and then 2) abort replication. You can also write a script that will compare current and available replication directories before startup and act accordingly. The slave should not keep multiple copies _permanently_, but might temporarily after it's fetched the new files from master, but before it's committed them and fully wamred the new index searchers in the slave. Could that be what's going on, is your slave just still working on committing and warming the new version(s) of the index? [If you do 'commit' to slave (and a replication pull counts as a 'commit') so quick that you get overlapping commits before the slave was able to warm a new index... its' going to be trouble all around.] On 3/1/2011 4:27 PM, Mike Franon wrote: ok doing some more research I noticed, on the slave it has multiple folders where it keeps them for example index index.20110204010900 index.20110204013355 index.20110218125400 and then there is an index.properties that shows which index it is using. I am just curious why does it keep multiple copies? Is there a setting somewhere I can change to only keep one copy so not to lose space? Thanks On Tue, Mar 1, 2011 at 3:26 PM, Mike Franonkongfra...@gmail.com wrote: No pending commits, what it looks like is there are almost two copies of the index on the master, not sure how that happened. On Tue, Mar 1, 2011 at 3:08 PM, Markus Jelsma markus.jel...@openindex.io wrote: Are there pending commits on the master? I was curious why would the size be dramatically different even though the index versions are the same? One is 1.2 Gb, and on the slave it is 512 MB I would think they should both be the same size no? Thanks
Re: Indexed, but cannot search
Hmm, please provide analyzer of text and output of debugQuery=true. Anyway, if field type is fieldType text and the catchall field text is fieldType text as well and you reindexed, it should work as expected. Oh if only it were that easy :-). I have reindexed since making that change which is how I was able to get the regular search working. I have not however been able to get the search across all fields to work. On Tue, Mar 1, 2011 at 3:01 PM, Markus Jelsma markus.jel...@openindex.iowrote: Traditionally, people forget to reindex ;) Hi all, The problem was that my fields were defined as type=string instead of type=text. Once I corrected that, it seems to be fixed. The only part that still is not working though is the search across all fields. For example: http://localhost:8983/solr/select/?q=type%3AMammal Now correctly returns the records matching mammal. But if I try to do a global search across all fields: http://localhost:8983/solr/select/?q=Mammal http://localhost:8983/solr/select/?q=text%3AMammal I get no results returned. Here is how the schema is set up: field name=text type=text indexed=true stored=false multiValued=true/ defaultSearchFieldtext/defaultSearchField copyField source=* dest=text / Thanks to everyone for your help so far. I think this is the last hurdle I have to jump over. On Tue, Mar 1, 2011 at 12:34 PM, Upayavira u...@odoko.co.uk wrote: Next question, do you have your type field set to index=true in your schema? Upayavira On Tue, 01 Mar 2011 11:06 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Thank you for your reply but the searching is still not working out. For example, when I go to: http://localhost:8983/solr/select/?q=*%3A* http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on I get the following as a response: result name=response numFound=249943 start=0 doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc /response (plus some other docs but one is enough for this example) But if I go to http://localhost:8983/solr/select/?q=type%3A http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on Mammal I only get: result name=response numFound=0 start=0 But it seems that should return at least the result I have listed above. What am I doing incorrectly? On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk wrote: q=dog is equivalent to q=text:dog (where the default search field is defined as text at the bottom of schema.xml). If you want to specify a different field, well, you need to tell it :-) Is that it? Upayavira On Mon, 28 Feb 2011 15:38 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my installation of Solr indexed using dataimport. However, I cannot seem to get search working. I can verify that the data is there by going to: http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on This gives me the response: result name=response numFound=234961 start=0 But when I go to http://localhost:8983/solr/select/?q=dogversion=2.2start=0rows=10inde nt=on I get the response: result name=response numFound=0 start=0 I know that dog should return some results because it is the first result when I select all the records. So what am I doing incorrectly that would prevent me from seeing results? --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: multi-core solr, specifying the data directory
: !-- Used to specify an alternate directory to hold all index data :other than the default ./data under the Solr home. :If replication is in use, this should match the replication : configuration : . -- : dataDir${solr.data.dir:./solr/data}/dataDir that directive says use the solr.data.dir system property to pick a path, if it is not set, use ./solr/data (realtive the CWD) if you want it to use the default, then you need to eliminate it completley, or you need to change it to the empty string... dataDir${solr.data.dir:}/dataDir or... dataDir/dataDir -Hoss
Re: numberic or string type for non-sortable field?
: Can I know why? I thought solr is tuned for string if no sorting of facet by : range query is needed. tuned for string doesn't really mean anything to me, i'm not sure what that's in refrence to. nothing thta i know of is particularly optimized for strings. Almost anything can be indexed/stored/represented as a string (in some form ot another) and that tends to work fine in solr, but some things are optimized for other more specialized datatypes. the reason i suggested that using ints might (marginally) be better is because of the FieldCache and the fieldValueCache -- the int representation uses less memory then if it was holding strings representing hte same ints. worrying about that is really a premature optimization though -- model your data in the way that makes the most sense -- if your ids are inherently ints, model them as ints until you come up with a reason to model them otherwise and move on to the next problem. -Hoss
Re: Distances in spatial search (Solr 4.0)
See http://wiki.apache.org/solr/SpatialSearch and yest use sort=geodist()+asc This Wiki page has everything you should need\. On Tue, Mar 1, 2011 at 3:49 PM, Alexandre Rocco alel...@gmail.com wrote: Hi Bill, I was using a different approach to sort by the distance with the dist() function, since geodist() is not documented on the wiki ( http://wiki.apache.org/solr/FunctionQuery) Tried something like: sort=dist(2, 45.15,-93.85, lat, lng) asc I made some tests with geodist() function as you pointed and got different results. Is it safe to assume that geodist() is the correct way of doing it? Also, can you clear up how can I see the distance using the _Val_ as you told? Thanks! Alexandre On Tue, Mar 1, 2011 at 12:03 AM, Bill Bell billnb...@gmail.com wrote: Use sort with geodist() to sort by distance. Getting the distance returned us documented on the wiki if you are not using score. see reference to _Val_ Bill Bell Sent from mobile On Feb 28, 2011, at 7:54 AM, Alexandre Rocco alel...@gmail.com wrote: Hi guys, We are implementing a separate index on our website, that will be dedicated to spatial search. I've downloaded a build of Solr 4.0 to try the spatial features and got the geodist working really fast. We now have 2 other features that will be needed on this project: 1. Returning the distance from the reference point to the search hit (in kilometers) 2. Sorting by the distance. On item 2, the wiki doc points that a distance function can be used but I was not able to find good info on how to accomplish it. Also, returning the distance (item 1) is noted as currently being in development and there is some workaround to get it. Anyone had experience with the spatial feature and could help with some pointers on how to achieve it? Thanks, Alexandre
Re: Question about Nested Span Near Query
I am not 100% sure. But I why did you not use the standard confix for text ? fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType You are using: - fieldtype name=text class=solr.TextField - analyzer tokenizer class=solr.StandardTokenizerFactory luceneMatchVersion=LUCENE_29 / filter class=solr.StandardFilterFactory / filter class=solr.LowerCaseFilterFactory / - !-- filter class=solr.StopFilterFactory luceneMatchVersion=LUCENE_29/ filter class=solr.EnglishPorterFilterFactory/ -- /analyzer /fieldtype Can you try a more standard approach ? solr.WhitespaceTokenizerFactory solr.LowerCaseFilterFactory ?? Thanks. On Mon, Feb 28, 2011 at 2:38 AM, Ahsan |qbal ahsan.iqbal...@gmail.com wrote: Hi Bill Any update.. On Thu, Feb 24, 2011 at 8:58 PM, Ahsan |qbal ahsan.iqbal...@gmail.com wrote: Hi schema and document are attached. On Thu, Feb 24, 2011 at 8:24 PM, Bill Bell billnb...@gmail.com wrote: Send schema and document in XML format and I'll look at it Bill Bell Sent from mobile On Feb 24, 2011, at 7:26 AM, Ahsan |qbal ahsan.iqbal...@gmail.com wrote: Hi To narrow down the issue I indexed a single document with one of the sample queries (given below) which was giving issue. *evaluation of loan and lease portfolios for purposes of assessing the adequacy of * Now when i Perform a search query (*TextContents:evaluation of loan and lease portfolios for purposes of assessing the adequacy of*) the parsed query is *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation, Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true), Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0, true), Contents:purposes], 0, true), Contents:of], 0, true), Contents:assessing], 0, true), Contents:the], 0, true), Contents:adequacy], 0, true), Contents:of], 0, true)* and search is not successful. If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from end it works fine. Issue seems to come on relatively long phrases but I have not been able to find a pattern and its really mind boggling coz I thought this issue might be due to large position list but this is a single document with one phrase. So its definitely not related to size of index. Any ideas whats going on?? On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal ahsan.iqbal...@gmail.comwrote: Hi It didn't search.. (means no results found even results exist) one observation is that it works well even in the long phrases but when the long phrases contain stop words and same stop word exist two or more time in the phrase then, solr can't search with query parsed in this way. On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, What do you mean by this doesn't work fine? Does it not work correctly or is it slow or ... I was going to suggest you look at Surround QP, but it looks like you already did that. Wouldn't it be better to get Surround QP to work? Otis Sematext ::
indexing mysql dateTime/timestamp into solr date field
Hi, I can't seem to be able to index to a solr date field from a query result using DataImportHandler. Anyone else know how to resoleve the problem? entity name=title query=select ID, title_full as TITLE_NAME, YEAR, COUNTRY_OF_ORIGIN, modified as RELEASE_DATE from title limit 10 field column=ID name=id / field column=TITLE_NAME name=title_name / field column=YEAR name=year / field column=COUNTRY_OF_ORIGIN name=country / field column=RELEASE_DATE name=release_date / When i check the solr document, there is no term populated for release_date field. All other fields are populated with terms. The field, release_date is a solr date type field. Appreciate your help. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-mysql-dateTime-timestamp-into-solr-date-field-tp2608327p2608327.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: More Date Math: NOW/WEEK
: Digging into the source code of DateMathParser.java, i found the following : comment: :99 // NOTE: consciously choosing not to support WEEK at this time, : 100 // because of complexity in rounding down to the nearest week 101 : // arround a month/year boundry. 102 // (Not to mention: it's not clear : what people would *expect*) : : I was able to implement a work-around in my ruby client using the following : pseudo code: : wd=NOW.wday; NOW-#{wd}DAY/DAY the main issue that comment in DateMathParser.java is refering to is what the ambiguity of what should happen when you try do something like 2009-01-02T00:00:00Z/WEEK WEEK would be the only unit where rounding changed a unit *larger* then the one you rounded on -- ie: rounding day only affects hours, minutes, seconds, millis; rounding on month only affects days, hours, minutes, seconds, millies; but in an example like the one above, where Jan 2 2009 was a friday. rounding down a week (using logic similar to what you have) would result in 2008-12-28T00:00:00Z -- changing the month and year. It's not really clear that that is what people would expect -- i'm guessing at least a few people would expect it to stop at the 1st of the month. the ambiguity of what behavior makes the most sense is why never got arround to implementing it -- it's certianly possible, but the various options seemed too confusing to really be very generally useful and easy to understand as you point out: people who really want special logic like this (and know how they want it to behave) have an easy workarround by evaluating NOW in the client since every week has exactly seven days. -Hoss
Re: indexing mysql dateTime/timestamp into solr date field
: query=select ID, title_full as TITLE_NAME, YEAR, : COUNTRY_OF_ORIGIN, modified as RELEASE_DATE from title limit 10 Are you certian that the first 10 results returned (you have limit 10) all have a value in the modified field? if modified is nullable you could very easily just happen to be getting 10 docs that don't have values in that field. -Hoss
Re: Searching all terms - SolrJ
: Yes but I want to leave the choice to the user. : : He can either search all the terms or just some. : : Is there any more flexible solution ? Even if I have to code it by hand ? the declaration in the schema dictates the default. you can override the default at query time using the q.op param (ie: q.op=AND, q.op=OR) in the request. in SolrJ you would just call solrQuery.set(q.op,OR) on your SolrQuery object. -Hoss
Re: indexing mysql dateTime/timestamp into solr date field
Yes, I am pretty sure every row has a modified field. I did my testing before posting question. I tried with adding DateFormatTransformer, still not help. entity name=title query=select ID, title_full as TITLE_NAME, YEAR, COUNTRY_OF_ORIGIN, modified as RELEASE_DATE from title limit 10 transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer field column=ID name=id / field column=TITLE_NAME name=title_name / field column=YEAR name=year / field column=COUNTRY_OF_ORIGIN name=country / field column=RELEASE_DATE name=release_date dateTimeFormat=-MM-dd/ I assume it is ok to just get the date part of the information out of a datetime field? Any thought on this? -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-mysql-dateTime-timestamp-into-solr-date-field-tp2608327p2608452.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing mysql dateTime/timestamp into solr date field
field column=date dateTimeFormat=-MM-dd'T'hh:mm:ss / Did you convert the date to standard GMT format as above in DIH? Also add transformer=DateFormatTransformer,... http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html On Tue, Mar 1, 2011 at 7:54 PM, cyang2010 ysxsu...@hotmail.com wrote: Yes, I am pretty sure every row has a modified field. I did my testing before posting question. I tried with adding DateFormatTransformer, still not help. entity name=title query=select ID, title_full as TITLE_NAME, YEAR, COUNTRY_OF_ORIGIN, modified as RELEASE_DATE from title limit 10 transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer field column=ID name=id / field column=TITLE_NAME name=title_name / field column=YEAR name=year / field column=COUNTRY_OF_ORIGIN name=country / field column=RELEASE_DATE name=release_date dateTimeFormat=-MM-dd/ I assume it is ok to just get the date part of the information out of a datetime field? Any thought on this? -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-mysql-dateTime-timestamp-into-solr-date-field-tp2608327p2608452.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [ANNOUNCE] Web Crawler
Dominique, The obvious number one question is of course why you re-invented this wheel when there are several existing crawlers to choose from. Your website says the reason is that the UIs on existing crawlers (e.g. Nutch, Heritrix, ...) weren't sufficiently user-friendly or had the site-specific configuration you wanted. Well if that is the case, why didn't you add/enhance such capabilities for an existing crawler? ~ David Smiley - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p2608956.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing mysql dateTime/timestamp into solr date field
Bill, I did try to use the way you suggested above. Unfortunately it does not work either. It is pretty much the same as my last reply, except the dateTimeFormat=-MM-dd'T'hh:mm:ss Thanks, cyang -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-mysql-dateTime-timestamp-into-solr-date-field-tp2608327p2609053.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching all terms - SolrJ
Great ! Thank you very much Chris, it will come handy ! Best regards, Victor 2011/3/1 Chris Hostetter hossman_luc...@fucit.org : Yes but I want to leave the choice to the user. : : He can either search all the terms or just some. : : Is there any more flexible solution ? Even if I have to code it by hand ? the declaration in the schema dictates the default. you can override the default at query time using the q.op param (ie: q.op=AND, q.op=OR) in the request. in SolrJ you would just call solrQuery.set(q.op,OR) on your SolrQuery object. -Hoss
how to debug dataimporthandler
I wonder how to run dataimporthandler in debug mode. Currently i can't get data correctly into index through dataimporthandler, especially a timestamp column to solr date field. I want to debug the process. According to this wiki page: Commands The handler exposes all its API as http requests . The following are the possible operations •full-import : Full Import operation can be started by hitting the URL http://:/solr/dataimport?command=full-import ... ■clean : (default 'true'). Tells whether to clean up the index before the indexing is started ■commit: (default 'true'). Tells whether to commit after the operation ■optimize: (default 'true'). Tells whether to optimize after the operation ■debug : (default false). Runs in debug mode.It is used by the interactive development mode (see here) ■Please note that in debug mode, documents are never committed automatically. If you want to run debug mode and commit the results too, add 'commit=true' as a request parameter. Therefore, i run http://:/solr/dataimport?command=full-import debug=true Not only i didn't see log with DEBUG level, but also it crashes my machine a few times. I was surprised it can even do that ... Did someone ever try to debug the process before? What is your experience with it? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-debug-dataimporthandler-tp2611506p2611506.html Sent from the Solr - User mailing list archive at Nabble.com.
MorelikeThis not working with Shards(distributed Search)
Hi, I am experimenting with the *morelikethis* to see if it also works with *distributed* search.But i did not get the solution yet.Can anyone help me regarding this. please provide me detailed description. as I didnt find it by updating MoreLikeThisComponent.java,MoreLikeThisHandler.java,ShardRequest.java specified in the AlternateDistributedMLT.patch . Thanks in advance.. Isha Garg