Re: Fuzzy Query Param
if this is edit distance implementation, what is the result apply to CJK query? For example, 您好~3 Floyd 2011/6/30 entdeveloper cameron.develo...@gmail.com I'm using Solr trunk. If it's levenstein/edit distance, that's great, that's what I want. It just didn't seem to be officially documented anywhere so I wanted to find out for sure. Thanks for confirming. -- View this message in context: http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3122418.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: conditionally update document on unique id
On Thu, Jun 30, 2011 at 2:06 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Wed, Jun 29, 2011 at 4:32 PM, eks dev eks...@googlemail.com wrote: req.getSearcher().getFirstMatch(t) != -1; Yep, this is currently the fastest option we have. Just for my understanding, this method won't use any caches but still may be faster across repeated runs for the same token? I'm asking because Eks said that they have 50%-55% duplicate documents. -- Regards, Shalin Shekhar Mangar.
Taxonomy faceting
I have a hierarchical taxonomy of documents that I would like users to be able to search either through search or drill-down faceting. The documents may appear at multiple points in the hierarchy. I've got a solution working as follows: a multivalued field labelled category which for each document defines where in the tree it should appear. For example: doc1 has the category field set to 0/topics, 1/topics/computing, 2/topic/computing/systems. I then facet on the 'category' field, filter the results with fq={!raw f=category}1/topics/computing to get everything below that point on the tree, and use f.category.facet.prefix to restrict the facet fields to the current level. Full query something like: http://localhost:8080/solr/select/?q=somethingfacet=truefacet.field=categoryfq={!rawf=category}1/topics/computingf.category.facet.prefix=2/topic/computing Playing around with the results, it seems to work ok but despite reading lots about faceting I can't help feel there might be a better solution. Are there better ways to achieve this? Any comments/suggestions are welcome. (Any suggestions as to what interface I can put on top of this are also gratefully received!). Thanks, Russell
MergerFacor effect on indexes
my solrconfig.xml configuration is as : mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor5/mergeFactor maxMergeDocs10/maxMergeDocs maxFieldLength1/maxFieldLength unlockOnStartupfalse/unlockOnStartup /mainIndex my solrconfig.xml configuration is as : *mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor5/mergeFactor maxMergeDocs10/maxMergeDocs maxFieldLength1/maxFieldLength unlockOnStartupfalse/unlockOnStartup /mainIndex* and index size is 12mb. but when i change my mergeFactor i am not finding any effect in my indexes., ie. the no of segments are exactly same. i am not getting which configuration will effect the no of segments. as i suppose it is mergefactor. and my next problem is which configuration defines the number of docs per segments and what will be the size of this segment so that next segments will be created please make me clear about these points - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/MergerFacor-effect-on-indexes-tp3125146p3125146.html Sent from the Solr - User mailing list archive at Nabble.com.
How to use solr clustering to show in search results
wanted to use clustering in my search results, i configured solr for clustering and i got following json for clusters. But i am not getting how to use it to show in search results. as corresponding to one doc i have number of fields and up till now i am showing name, description and id. now in clusters i have labels and doc id. then how to use my docs in clusters, i am really confused what to do Please reply. * clusters:[ { labels:[ Complement any Business Casual or Semi-formal Attire ], docs:[ 7799, 7801 ] }, { labels:[ Design ], docs:[ 8252, 7885 ] }, { labels:[ Elegant Ring has an Akoya Cultured Pearl ], docs:[ 8142, 8139 ] }, { labels:[ Feel Amazing in these Scintillating Earrings Perfect ], docs:[ 12250, 12254 ] }, { labels:[ Formal Evening Attire ], docs:[ 8151, 8004 ] }, { labels:[ Pave Set ], docs:[ 7788, 8169 ] }, { labels:[ Subtle Look or Layer it or Attach ], docs:[ 8014, 8012 ] }, { labels:[ Three-stone Setting is Elegant and Fun ], docs:[ 8335, 8337 ] }, { labels:[ Other Topics ], docs:[ 8038, 7850, 7795, 7989, 7797 ] { ]* - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-use-solr-clustering-to-show-in-search-results-tp3125149p3125149.html Sent from the Solr - User mailing list archive at Nabble.com.
Adding german phonetic to solr
Hi all, does solar support german phonetic? Searching for how to add german phonetic to solr on google does not deliver good results, just lots of JIRA stuff. I searched for cologne phonetic too. The wikis http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered my question. Please, can someone tell me how to do it or where to look for appropriate information. Nice regards Jürgen
Re: Adding german phonetic to solr
Jürgen, I haven't had the time to deploy it but i heard about Kölner Phonetik that was to be contributed as part of apache-commons-codec. It probably still is just a patch in a jira issue. https://issues.apache.org/jira/browse/CODEC-106 The contribution was posted to commons-dev on september 15th 2010. Bringing this reachable into Solr would be interesting but it's a bit of work. We have used the Double-Metaphone indexer with Lucene with reasonable success in ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess is really a desirable feature of a phonetic environment. You might want to also care for all the proper nouns around for which tradition phonetics is doomed to fail if, at least, your texts are a bit with international names! paul Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit : Hi all, does solar support german phonetic? Searching for how to add german phonetic to solr on google does not deliver good results, just lots of JIRA stuff. I searched for cologne phonetic too. The wikis http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered my question. Please, can someone tell me how to do it or where to look for appropriate information. Nice regards Jürgen
How to optimize solr indexes
when i run solr/admin page i got this information, it shows optimize=true, but i have not set optimize=true in configuration file than how it is optimizing the indexes. and how can i set it to false then . /Schema Information Unique Key: UID_PK Default Search Field: text numDocs: 2881 maxDoc: 2881 numTerms: 41960 version: 1309429290159 optimized: true current: true hasDeletions: false directory: org.apache.lucene.store.SimpleFSDirectory:org.apache.lucene.store.SimpleFSDirectory@ C:\apache-solr-1.4.0\example\example-DIH\solr\db\data\index lastModified: 2011-06-30T10:25:04.89Z/ - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-optimize-solr-indexes-tp3125293p3125293.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy faceting
That's a good way. How does it perform? Another way would be to store the parent topics in a field. Whenever a parent node is drilled-into, simply search for all documents with that parent. Perhaps not as elegant as your approach though. I'd be interested in the performance comparison between the two approaches. I have a hierarchical taxonomy of documents that I would like users to be able to search either through search or drill-down faceting. The documents may appear at multiple points in the hierarchy. I've got a solution working as follows: a multivalued field labelled category which for each document defines where in the tree it should appear. For example: doc1 has the category field set to 0/topics, 1/topics/computing, 2/topic/computing/systems. I then facet on the 'category' field, filter the results with fq={!raw f=category}1/topics/computing to get everything below that point on the tree, and use f.category.facet.prefix to restrict the facet fields to the current level. Full query something like: http://localhost:8080/solr/select/?q=somethingfacet=truefacet.field=categoryfq={!rawf=category}1/topics/computingf.category.facet.prefix=2/topic/computing Playing around with the results, it seems to work ok but despite reading lots about faceting I can't help feel there might be a better solution. Are there better ways to achieve this? Any comments/suggestions are welcome. (Any suggestions as to what interface I can put on top of this are also gratefully received!). Thanks, Russell
Re: How to optimize solr indexes
when i run solr/admin page i got this information, it shows optimize=true, but i have not set optimize=true in configuration file than how it is optimizing the indexes. and how can i set it to false then . /Schema Information Unique Key: UID_PK Default Search Field: text numDocs: 2881 maxDoc: 2881 numTerms: 41960 version: 1309429290159 optimized: true current: true hasDeletions: false directory: org.apache.lucene.store.SimpleFSDirectory:org.apache.lucene.store.SimpleFSDirectory@ C:\apache-solr-1.4.0\example\example-DIH\solr\db\data\index lastModified: 2011-06-30T10:25:04.89Z/ It seems that you are using DIH. By default both delta and full import issues an optimize at the end.
Re: Fuzzy Query Param
Good question... I think in Lucene 4.0, the edit distance is (will be) in Unicode code points, but in past releases, it's UTF16 code units. Mike McCandless http://blog.mikemccandless.com 2011/6/30 Floyd Wu floyd...@gmail.com: if this is edit distance implementation, what is the result apply to CJK query? For example, 您好~3 Floyd 2011/6/30 entdeveloper cameron.develo...@gmail.com I'm using Solr trunk. If it's levenstein/edit distance, that's great, that's what I want. It just didn't seem to be officially documented anywhere so I wanted to find out for sure. Thanks for confirming. -- View this message in context: http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3122418.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to optimize solr indexes
and if i want to set it as optimize=false then what i need to do ?? - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-optimize-solr-indexes-tp3125293p3125474.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: Adding german phonetic to solr
Hi Paul, thanks for the quick reply. I replaced commons-codec-1.4.jar with commons-codec-1.5.jar to get the ColognePhonetic. In schema.xml I added filter class=solr.PhoneticFilterFactory encoder=ColognePhonetic inject=true/ but then I get org.apache.solr.common.SolrException: Unknown encoder: ColognePhonetic [[CAVERPHONE, SOUNDEX, METAPHONE, DOUBLEMETAPHONE, REFINEDSOUNDEX]]. How do I get PhoneticFilterFactory to know ColognePhonetic? Or is my approach completely wrong? Jürgen Von: Paul Libbrecht p...@hoplahup.net An: solr-user@lucene.apache.org Gesendet: Donnerstag, den 30. Juni 2011, 12:09:18 Uhr Betreff: Re: Adding german phonetic to solr Jürgen, I haven't had the time to deploy it but i heard about Kölner Phonetik that was to be contributed as part of apache-commons-codec. It probably still is just a patch in a jira issue. https://issues.apache.org/jira/browse/CODEC-106 The contribution was posted to commons-dev on september 15th 2010. Bringing this reachable into Solr would be interesting but it's a bit of work. We have used the Double-Metaphone indexer with Lucene with reasonable success in ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess is really a desirable feature of a phonetic environment. You might want to also care for all the proper nouns around for which tradition phonetics is doomed to fail if, at least, your texts are a bit with international names! paul Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit : Hi all, does solar support german phonetic? Searching for how to add german phonetic to solr on google does not deliver good results, just lots of JIRA stuff. I searched for cologne phonetic too. The wikis http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory y and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered my question. Please, can someone tell me how to do it or where to look for appropriate information. Nice regards Jürgen
Re: Looking for Custom Highlighting guidance
Thanks for the suggestion Mike, I will give that a shot. Having no familiarity with FastVectorHighlighter is there somewhere specific I should be looking? On Wed, Jun 29, 2011 at 3:20 PM, Mike Sokolov soko...@ifactory.com wrote: Does the phonetic analysis preserve the offsets of the original text field? If so, you should probably be able to hack up FastVectorHighlighter to do what you want. -Mike On 06/29/2011 02:22 PM, Jamie Johnson wrote: I have a schema with a text field and a text_phonetic field and would like to perform highlighting on them in such a way that the tokens that match are combined. What would be a reasonable way to accomplish this?
Re: Looking for Custom Highlighting guidance
It's going to be a bit complicated, but I would start by looking at providing a facility for merging an array of FieldTermStacks. The constructor for FieldTermStack() takes a fieldName and builds up a list of TermInfos (terms with positions and offsets): I *think* that if you make two of these, merge them, and hand that to the FieldPhraseList constructor (this is done in the main FVH class), you should get what you want. This is a bit speculative; I haven't tried it. -Mike On 06/30/2011 08:26 AM, Jamie Johnson wrote: Thanks for the suggestion Mike, I will give that a shot. Having no familiarity with FastVectorHighlighter is there somewhere specific I should be looking? On Wed, Jun 29, 2011 at 3:20 PM, Mike Sokolovsoko...@ifactory.com wrote: Does the phonetic analysis preserve the offsets of the original text field? If so, you should probably be able to hack up FastVectorHighlighter to do what you want. -Mike On 06/29/2011 02:22 PM, Jamie Johnson wrote: I have a schema with a text field and a text_phonetic field and would like to perform highlighting on them in such a way that the tokens that match are combined. What would be a reasonable way to accomplish this?
Re: AW: Adding german phonetic to solr
Jürgen, clearly the Cologne-phonetic was not yet supported, please read: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/PhoneticFilterFactory.java one would need to add the line about Cologne-phonetic and recompile. It'd make sense to open a jira issue for this. paul Le 30 juin 2011 à 14:24, Jürgen Tiedemann a écrit : Hi Paul, thanks for the quick reply. I replaced commons-codec-1.4.jar with commons-codec-1.5.jar to get the ColognePhonetic. In schema.xml I added filter class=solr.PhoneticFilterFactory encoder=ColognePhonetic inject=true/ but then I get org.apache.solr.common.SolrException: Unknown encoder: ColognePhonetic [[CAVERPHONE, SOUNDEX, METAPHONE, DOUBLEMETAPHONE, REFINEDSOUNDEX]]. How do I get PhoneticFilterFactory to know ColognePhonetic? Or is my approach completely wrong? Jürgen Von: Paul Libbrecht p...@hoplahup.net An: solr-user@lucene.apache.org Gesendet: Donnerstag, den 30. Juni 2011, 12:09:18 Uhr Betreff: Re: Adding german phonetic to solr Jürgen, I haven't had the time to deploy it but i heard about Kölner Phonetik that was to be contributed as part of apache-commons-codec. It probably still is just a patch in a jira issue. https://issues.apache.org/jira/browse/CODEC-106 The contribution was posted to commons-dev on september 15th 2010. Bringing this reachable into Solr would be interesting but it's a bit of work. We have used the Double-Metaphone indexer with Lucene with reasonable success in ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess is really a desirable feature of a phonetic environment. You might want to also care for all the proper nouns around for which tradition phonetics is doomed to fail if, at least, your texts are a bit with international names! paul Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit : Hi all, does solar support german phonetic? Searching for how to add german phonetic to solr on google does not deliver good results, just lots of JIRA stuff. I searched for cologne phonetic too. The wikis http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory y and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered my question. Please, can someone tell me how to do it or where to look for appropriate information. Nice regards Jürgen
Re: How to optimize solr indexes
--- On Thu, 6/30/11, Romi romijain3...@gmail.com wrote: From: Romi romijain3...@gmail.com Subject: Re: How to optimize solr indexes To: solr-user@lucene.apache.org Date: Thursday, June 30, 2011, 3:01 PM and if i want to set it as optimize=false then what i need to do ?? When calling import, use dataimport?command=delta-importoptimize=false See other command available, like clean, commit, entity, etc. http://wiki.apache.org/solr/DataImportHandler#Commands
Re: Multicore clustering setup problem
Sure, thanks for having a look! By the way, if I attempt to hit a solr URL, I get this error, followed by the stacktrace. If I set abortOnConfigurationError to false (I've found you must put the setting in both solr.xml and solrconfig.xml for both cores otherwise you keep getting the error), then the main URL to solr ( http://localhost/solr) lists just the first core. HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml - org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.clustering.ClusteringComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at *Tomcat Log:* INFO: [core1] Added SolrEventListener: org.apache.solr.core.QuerySenderListener{queries=[{q=solr rocks,start=0,rows=10}, {q=static firstSearcher warming query from solrconfig.xml}]} Jun 30, 2011 8:51:23 AM org.apache.solr.request.XSLTResponseWriter init INFO: xsltCacheLifetimeSeconds=5 Jun 30, 2011 8:51:23 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.clustering.ClusteringComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at org.apache.solr.core.SolrCore.init(SolrCore.java:551) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4450) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:583) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Caused by: java.lang.ClassNotFoundException: org.apache.solr.handler.clustering.ClusteringComponent at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:592) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at
Re: Taxonomy faceting
On Thu, 2011-06-30 at 11:38 +0200, Russell B wrote: a multivalued field labelled category which for each document defines where in the tree it should appear. For example: doc1 has the category field set to 0/topics, 1/topics/computing, 2/topic/computing/systems. I then facet on the 'category' field, filter the results with fq={!raw f=category}1/topics/computing to get everything below that point on the tree, and use f.category.facet.prefix to restrict the facet fields to the current level. Lucid Imagination did a webcast on this, as far as I remember? Playing around with the results, it seems to work ok but despite reading lots about faceting I can't help feel there might be a better solution. The '1/topics/computing'-solution works at a single level, so if you are interested in a multi-level result like - topic - computing - hardware - software - biology - plants - animals you have to do more requests. Are there better ways to achieve this? Taxonomy faceting is a bit of a mess right now, but it is also an area where a lot is happening. For SOLR, there is https://issues.apache.org/jira/browse/SOLR-64 (single path/document hierarchical faceting) https://issues.apache.org/jira/browse/SOLR-792 (pivot faceting, now part of trunk AFAIR) https://issues.apache.org/jira/browse/SOLR-2412 (multi path/document hierarchical faceting, very experimental) Just yesterday, another multi path/document hierarchical faceting solution was added to the Lucene 3.x branch and Lucene trunk. It has been used by IBM for some time and appears to be mature and stable. https://issues.apache.org/jira/browse/LUCENE-3079 However, this solution requires a sidecar index for the taxonomy and I am a bit worried about how this fits into the Solr index workflow.
Re: Text field case sensitivity problem
I'm not familiar with the CharFilters, I'll look into those now. Is the solr.LowerCaseFilterFactory not handling wildcards the expected result or is this a bug? On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov soko...@ifactory.com wrote: I wonder whether CharFilters are applied to wildcard terms? I suspect they might be. If that's the case, you could use the MappingCharFilter to perform lowercasing (and strip diacritics too if you want that) -Mike On 06/15/2011 10:12 AM, Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Text field case sensitivity problem
I think my answer is here... On wildcard and fuzzy searches, no text analysis is performed on the search word. taken from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On Thu, Jun 30, 2011 at 10:23 AM, Jamie Johnson jej2...@gmail.com wrote: I'm not familiar with the CharFilters, I'll look into those now. Is the solr.LowerCaseFilterFactory not handling wildcards the expected result or is this a bug? On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov soko...@ifactory.com wrote: I wonder whether CharFilters are applied to wildcard terms? I suspect they might be. If that's the case, you could use the MappingCharFilter to perform lowercasing (and strip diacritics too if you want that) -Mike On 06/15/2011 10:12 AM, Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Returning total matched document count with SolrJ
Hi, I am using Solr 3.1 and using the SolrJ client. Does anyone know how i can get the *TOTAL* number of matched documents returned with the QueryResponse? I am interested in the total documents matched not just the result returned with the limit applied. Any help will be appreciated. Thanks.
RE: Returning total matched document count with SolrJ
SolrDocumentList docs = queryResponse.getResults(); long totalMatches = docs.getNumFound(); -Michael
Problems with SolrCloud
Dear ladies and gentlemen. Can I ask you to help me with SolrCloud 1) I try to setup a SolrCloud on 2 computers with 3 Zookepers, but it fails:( I need to set Zookeper port to 8001, so I change clientPort=8001 in solr/zoo.cfg. When I try the command from the example C, to run shard1, it works: java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar But if I change it to and try to run shard1: java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf -DzkRun -DzkHost=localhost:8001,localhost:8004 -jar start.jar it fails with the following message: SEVERE: java.lang.IllegalArgumentException: solr/zoo_data/myid file is missing 2) to solve it I tried to set *-Dsolr.solr.home=/data/a.sapegin/SolrCloud/shard1* (without any slashes in the end) But then I receive another exception: Caused by: org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /data/a.sapegin/SolrCloud/shard1//zoo.cfg I think this // is a bug. Could you please help? Thank You in advance, Kind Regards, -- Andrey Sapegin, Software Developer, Unister GmbH Dittrichring 18-20 | 04109 Leipzig +49 (0)341 492885069, +4915778339304, andrey.sape...@unister-gmbh.de www.unister.de
Re: Text field case sensitivity problem
Yes, after posting that response, I read some more and came to the same conclusion... there seems to be some interest on the dev list in building a capability to specify an analysis chain for use with wildcard and related queries, but it doesn't exist now. -Mike On 06/30/2011 10:34 AM, Jamie Johnson wrote: I think my answer is here... On wildcard and fuzzy searches, no text analysis is performed on the search word. taken from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On Thu, Jun 30, 2011 at 10:23 AM, Jamie Johnsonjej2...@gmail.com wrote: I'm not familiar with the CharFilters, I'll look into those now. Is the solr.LowerCaseFilterFactory not handling wildcards the expected result or is this a bug? On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolovsoko...@ifactory.com wrote: I wonder whether CharFilters are applied to wildcard terms? I suspect they might be. If that's the case, you could use the MappingCharFilter to perform lowercasing (and strip diacritics too if you want that) -Mike On 06/15/2011 10:12 AM, Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolovsoko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: MergerFacor effect on indexes
Hi Romi, after doing the changes, to se the impact you'll have to index some documents, Solr won't change your index unless you add more documents and commit them. It looks like your maxMergeDocs parameter is too small, I would use a grater value here. You can see an good explanation on how the merge policy works in Solr here: http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ The default Merge policy has changed in 3_x and trunk, you can probably also take a look at http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Regards, Tomás On Thu, Jun 30, 2011 at 6:47 AM, Romi romijain3...@gmail.com wrote: my solrconfig.xml configuration is as : mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor5/mergeFactor maxMergeDocs10/maxMergeDocs maxFieldLength1/maxFieldLength unlockOnStartupfalse/unlockOnStartup /mainIndex my solrconfig.xml configuration is as : *mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor5/mergeFactor maxMergeDocs10/maxMergeDocs maxFieldLength1/maxFieldLength unlockOnStartupfalse/unlockOnStartup /mainIndex* and index size is 12mb. but when i change my mergeFactor i am not finding any effect in my indexes., ie. the no of segments are exactly same. i am not getting which configuration will effect the no of segments. as i suppose it is mergefactor. and my next problem is which configuration defines the number of docs per segments and what will be the size of this segment so that next segments will be created please make me clear about these points - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/MergerFacor-effect-on-indexes-tp3125146p3125146.html Sent from the Solr - User mailing list archive at Nabble.com.
token exceeding provided text size error since Solr 3.2
A bug was introduced between Solr 3.1 and 3.2. With Solr 3.2 we are now getting the follwing error when querying several pdf and word documents: SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token 17 exceeds length of provided text sized 168 at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:474) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:378) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token 17 exceeds length of provided text sized 168 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:467) ... 24 more
Re: Text field case sensitivity problem
Jamie - there is a JIRA about this, at least one: https://issues.apache.org/jira/browse/SOLR-218 Erik On Jun 15, 2011, at 10:12 , Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Returning total matched document count with SolrJ
Thanks Michael. Quite helpful. On Thu, Jun 30, 2011 at 4:06 PM, Michael Ryan mr...@moreover.com wrote: SolrDocumentList docs = queryResponse.getResults(); long totalMatches = docs.getNumFound(); -Michael
Re: Strip Punctuation From Field
Not that I'm aware of. This is probably something you want to do at the application layer. If you want to do it in Solr, a good place would be an UpdateRequestProcessor, but I guess you'll have to implement your own. On Wed, Jun 29, 2011 at 4:12 PM, Curtis Wilde galv...@gmail.com wrote: From all I've read, using something like PatternReplaceFilterFactory allows you to replace / remove text in an index, but is there anything similar that allows manipulation of the text in the associated field? For example, if I pulled a status from Twitter like, Hi, this is a #hashtag. I would like to remove the # from that string and use it for both the index, and also the field value that is returned from a query, i.e., Hi, this is a hashtag.
Re: Text field case sensitivity problem
Yes, and this too: https://issues.apache.org/jira/browse/SOLR-219 On 06/30/2011 12:46 PM, Erik Hatcher wrote: Jamie - there is a JIRA about this, at least one:https://issues.apache.org/jira/browse/SOLR-218 Erik On Jun 15, 2011, at 10:12 , Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolovsoko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Wildcard search not working if full word is queried
Hi everyone, I'm having some trouble figuring out why a query with an exact word followed by the * wildcard, eg. teste*, returns no results while a query for test* returns results that have the word teste in them. I've created a couple of pasties: Exact word with wildcard : http://pastebin.com/n9SMNsH0 Similar word: http://pastebin.com/jQ56Ww6b Parameters other than title, description and content have no effect other than filtering out unwanted results. In a two of the four results, the title has the complete word teste. On the other two, the word appears in the other fields. Does anyone have any insights about what I'm doing wrong? Thanks in advance. Regards, Celso
Re: Multicore clustering setup problem
It looks like the whole clustering component JAR is not in the classpath. I remember that I once dealt with a similar issue in Solr 1.4 and the cause was the relative path of the lib tag being resolved against the core's instanceDir, which made the path incorrect when directly copying and pasting from the single core configuration. Try correcting the relative lib paths or replacing them with absolute ones, it should solve the problem. Cheers, Staszek
Re: Wildcard search not working if full word is queried
I would run that word through the analyzer, I suspect that the word 'teste' is being stemmed to 'test' in the index, at least that is the first place I would check. François On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote: Hi everyone, I'm having some trouble figuring out why a query with an exact word followed by the * wildcard, eg. teste*, returns no results while a query for test* returns results that have the word teste in them. I've created a couple of pasties: Exact word with wildcard : http://pastebin.com/n9SMNsH0 Similar word: http://pastebin.com/jQ56Ww6b Parameters other than title, description and content have no effect other than filtering out unwanted results. In a two of the four results, the title has the complete word teste. On the other two, the word appears in the other fields. Does anyone have any insights about what I'm doing wrong? Thanks in advance. Regards, Celso
Core Administration
Hi, I am researching about core administration using Solr. My requirement is to be able to provision/create/delete indexes dynamically. I have tried it and it works. Apparently core admin handler will create a new core by specifying the instance Directory (required), along with data directory, and so on. The issue I'm having is that a separate app that lives on a different machine need to create these new cores on demand along with creating new schema.xml and data directories. The required instance directory, data directory and others need to be separate from each core. My first approach is to write a tool that would take additional params that can code gen the schema config files and so on based on different type of documents. ie: Homes, People, etc... But I need to know if Solr already handles that case. I wouldn't want to have to write the tool if Solr already supports creating cores with new configs on the fly. Thanks, Z
Solr Importing database field issues . how to I use postgres pgpool connection?
I am using postgres database and pgpool . Postgres database port : 5432 is woking fine. But I am using Pgpool port : is Not Working. MY importing xml file (*myproduct.xml*) *Working * dataSource name=jdbc driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/x user= x password=x readOnly=true autoCommit=false / *Not Working * dataSource name=jdbc driver=org.postgresql.Driver url=jdbc:postgresql://localhost:/x user= x password=x readOnly=true autoCommit=false / It is pgpool problem or solr problem? please any onle let me know the issues and How to I salve pgpool this problem? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Importing-database-field-issues-how-to-I-use-postgres-pgpool-connection-tp3126212p3126212.html Sent from the Solr - User mailing list archive at Nabble.com.
Problems with SolrCloud
Dear ladies and gentlemen. Can I ask you to help me with SolrCloud 1) I try to setup a SolrCloud on 2 computers with 3 Zookepers, but it fails:( I need to set Zookeper port to 8001, so I change clientPort=8001 in solr/zoo.cfg. When I try the command from the example C, to run shard1, it works: java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar But if I change it to and try to run shard1: java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf -DzkRun -DzkHost=localhost:8001,localhost:8004 -jar start.jar it fails with the following message: SEVERE: java.lang.IllegalArgumentException: solr/zoo_data/myid file is missing 2) to solve it I tried to set *-Dsolr.solr.home=/data/a.sapegin/SolrCloud/shard1* (without any slashes in the end) But then I receive another exception: Caused by: org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /data/a.sapegin/SolrCloud/shard1//zoo.cfg I think this // is a bug. Could you please help? Thank You in advance, Kind Regards, -- Andrey Sapegin, Software Developer, Unister GmbH Dittrichring 18-20 | 04109 Leipzig +49 (0)341 492885069, +4915778339304, andrey.sape...@unister-gmbh.de www.unister.de
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On 6/29/2011 10:16 PM, Shawn Heisey wrote: I was thinking perhaps I might actually decrease the termIndexInterval value below the default of 128. I know from reading the Hathi Trust blog that memory usage for the tii file is much more than the size of the file would indicate, but if I increase it from 13MB to 26MB, it probably would still be OK. Decreasing the termIndexInterval to 64 almost doubled the tii file size, as expected. It made the filterCache warming much faster, but made the queryResultCache warming very very slow. Regular queries also seem like they're slower. I am trying again with 256. I may go back to the default before I'm done. I'm guessing that a lot of trial and error was put into choosing the default value. It's been fun having a newer index available on my backup servers. I've been able to do a lot of trials, learned a lot of things that don't work and a few that do. I might do some experiments with trunk once I've moved off 1.4.1. Thanks, Shawn
Re: Core Administration
I have an idea. I believe I can discover the Properties of an object (C# reflection) and then code gen schema.xml file based on the field type and other meta data of that type (possibly from database). After that, I should be able to ftp the files over to the solr machine. Then I can invoke core admin to create the new index on the fly. My original question would be, is there a tool that already does what I'm describing? Z On Thu, Jun 30, 2011 at 2:32 PM, zarni aung zau...@gmail.com wrote: Hi, I am researching about core administration using Solr. My requirement is to be able to provision/create/delete indexes dynamically. I have tried it and it works. Apparently core admin handler will create a new core by specifying the instance Directory (required), along with data directory, and so on. The issue I'm having is that a separate app that lives on a different machine need to create these new cores on demand along with creating new schema.xml and data directories. The required instance directory, data directory and others need to be separate from each core. My first approach is to write a tool that would take additional params that can code gen the schema config files and so on based on different type of documents. ie: Homes, People, etc... But I need to know if Solr already handles that case. I wouldn't want to have to write the tool if Solr already supports creating cores with new configs on the fly. Thanks, Z
Re: Multicore clustering setup problem
Staszek, That makes sense, but this has always been a multi-core setup, so the paths have not changed, and the clustering component worked fine for core0. The only thing new is I have fine tuned core1 (to begin implementing it). Previously the solrconfig.xml file was very basic. I replaced it with core0's solrconfig.xml and made very minor changes to it (unrelated to clustering) - it's a nearly identical solrconfig.xml file so I'm surprised it doesn't work for core1. In other words, the paths here are the same for core0 and core1: lib dir=../../dist/ regex=apache-solr-cell-\d.*\.jar / lib dir=../../dist/ regex=apache-solr-clustering-\d.*\.jar / lib dir=../../contrib/clustering/lib/downloads/ / lib dir=../../contrib/clustering/lib/ / Again, I'm wondering if perhaps since both cores have the clustering component, if it should have a shared configuration in a different file used by both cores(?). Perhaps the duplicate clusteringComponent configuration for both cores is the problem? Thanks for looking at this! On Thu, Jun 30, 2011 at 1:29 PM, Stanislaw Osinski stanislaw.osin...@carrotsearch.com wrote: It looks like the whole clustering component JAR is not in the classpath. I remember that I once dealt with a similar issue in Solr 1.4 and the cause was the relative path of the lib tag being resolved against the core's instanceDir, which made the path incorrect when directly copying and pasting from the single core configuration. Try correcting the relative lib paths or replacing them with absolute ones, it should solve the problem. Cheers, Staszek
Re: Core Administration
Zarni, Am 30.06.2011 20:32, schrieb zarni aung: But I need to know if Solr already handles that case. I wouldn't want to have to write the tool if Solr already supports creating cores with new configs on the fly. there isn't. you have to create the directory structure the related files yourself. solr (the AdminCoreHandler) does only activate the core for usage. Few Weeks ago, there was a Question about modifying Configuration Files from the Browser: http://search.lucidimagination.com/search/document/ec79172e7613d1a/modifying_configuration_from_a_browser Regards Stefan
Re: Core Administration
Thank you very much Stefan. This helps. Zarni On Thu, Jun 30, 2011 at 4:10 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Zarni, Am 30.06.2011 20:32, schrieb zarni aung: But I need to know if Solr already handles that case. I wouldn't want to have to write the tool if Solr already supports creating cores with new configs on the fly. there isn't. you have to create the directory structure the related files yourself. solr (the AdminCoreHandler) does only activate the core for usage. Few Weeks ago, there was a Question about modifying Configuration Files from the Browser: http://search.**lucidimagination.com/search/** document/ec79172e7613d1a/**modifying_configuration_from_**a_browserhttp://search.lucidimagination.com/search/document/ec79172e7613d1a/modifying_configuration_from_a_browser Regards Stefan
Re: TermVectors and custom queries
Perhaps a better question, is this possible? On Mon, Jun 27, 2011 at 5:15 PM, Jamie Johnson jej2...@gmail.com wrote: I have a field named content with the following definition field name=content type=text indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true/ I'm now trying to execute a query against content and get back the term vectors for the pieces that matched my query, but I must be messing something up. My query is as follows: http://localhost:8983/solr/select/?qt=tvrhq=content:testfl=contenttv.all=true where the word test is in my content field. When I get information back though I am getting the term vectors for all of the tokens in that field. How do I get back just the ones that match my search?
Re: After the query component has the results, can I do more filtering on them?
unfortunately the userIdsToScore updates very often. I'd get more Ids almost every single query (hence why I made the new component). But I see the problem of not being able to score the whole resultSet. I'd actually need to do this now that I think about it. I want to get a whole whack of users (lets say 10,000), score them using my system, and then 'remember' the top 3500 of these users in the result cache or something. How would I go about operating on the whole resultSet rather then just the 'rows' I set. I wonder if I can set rows to be really large, score them in the component, and then remember all of these results in the result cache and then dynamically change rows in my component so not all 3500 (or w/e number I choose) are returned. -- View this message in context: http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3127560.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: After the query component has the results, can I do more filtering on them?
Sorry for the double post but in this case, is it possible for me to access the queryResultCache in my component and play with it? Ideally what I want is this: 1) I have 1 (just a random large number) total results. 2) In my component I access all of these results, score them, and take the top 3500 (a random smaller number) and drop the rest. 3) The 3500 I have now should end up going into the queryResultCache and essentially replacing the other one. 4) The number returned to the user should then be rows and subsequent queries which are the same just gets them from my new result cache. I'm pretty noob about all if this so I'm hoping someone can help. -- View this message in context: http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3127581.html Sent from the Solr - User mailing list archive at Nabble.com.
JOIN, query on the parent?
Hello- I'm looking for a way to find all the links from a set of results. Consider: doc id:1 type:X link:a link:b /doc doc id:2 type:X link:a link:c /doc doc id:3 type:Y link:a /doc Is there a way to search for all the links from stuff of type X -- in this case (a,b,c) If I'm understanding the {!join stuff, it lets you search on the children, but i don't really see how to limit the parent values. Am I missing something, or is this a further extension to the JoinQParser? thanks ryan
Re: Taxonomy faceting
: Lucid Imagination did a webcast on this, as far as I remember? that was me ... the webcast was a pre-run of my apachecon talk... http://www.lucidimagination.com/why-lucid/webinars/mastering-power-faceted-search http://people.apache.org/~hossman/apachecon2010/facets/ ...taxonomy stuff comes up ~slide 30 : The '1/topics/computing'-solution works at a single level, so if you are : interested in a multi-level result like if you want to show the whole tree when facetig you can just leave the depth number prefix out of terms, thta should work fine (but i haven't though about hard) : Are there better ways to achieve this? : : Taxonomy faceting is a bit of a mess right now, but it is also an area : where a lot is happening. For SOLR, there is right, some of which i havne't been able to keep up on and can't comment on -- but in my experience if you are serious organizing your data in a taxonomy then you probably already have some data structure in your application layer that models the whole thing in memory, and maps nodeIds to nodeLabels and what not. What usually works fine is to just index the nodeIds for the entire ancestory of the category each Document is in can work fine for the filtering (ie: fq=cat:1234), and to generate the facet presentation you do a simple facet.field=ancestorCategoriesfacet.limit=-1 to get all the counts in a big hashmap and then use that to annotate your own own category tree data structure that you use to generate the presentaiton. -Hoss
Uninstall Solr
Hi All, How to *uninstall* Solr completely ? Any help will be appreciated. Regards, Gaurav
Re: Uninstall Solr
How'd you install it? Generally you just delete the directory where you installed it. But you might be deploying solr.war in a container somewhere besides Solr's example Jetty setup, in which case you need to undeploy it from those other containers and remove the remnants. Curious though... why uninstall it? Solr makes a mighty fine hammer to have around :) Erik On Jun 30, 2011, at 19:49 , GAURAV PAREEK wrote: Hi All, How to *uninstall* Solr completely ? Any help will be appreciated. Regards, Gaurav