Re: Query regarding Solr-2242 patch for getting distinct facet counts.
I am pretty sure it does not yet support distributed shards.. But the patch was written for 4.0... So there might be issues with running it on 1.4.1. On 5/26/11 11:08 PM, rajini maski rajinima...@gmail.com wrote: The patch solr 2242 for getting count of distinct facet terms doesn't work for distributedProcess (https://issues.apache.org/jira/browse/SOLR-2242) The error log says HTTP ERROR 500 Problem accessing /solr/select. Reason: For input string: numFacetTerms java.lang.NumberFormatException: For input string: numFacetTerms at java.lang.NumberFormatException.forInputString(NumberFormatException.java: 48) at java.lang.Long.parseLong(Long.java:403) at java.lang.Long.parseLong(Long.java:461) at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:331) at org.apache.solr.schema.TrieField.toInternal(TrieField.java:344) at org.apache.solr.handler.component.FacetComponent$DistribFieldFacet.add(Fac etComponent.java:619) at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponen t.java:265) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComp onent.java:235) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa ndler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas e.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java :338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav a:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl er.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216 ) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnect ion.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:41 0) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:5 82) The query I passed : http://localhost:8983/solr/select?q=*:*facet=truefacet.field=2facet.fie ld=648facet.mincount=1facet.limit=-1f.2.facet.numFacetTerms=1rows=0sh ards=localhost:8983/solr,localhost:8985/solrtwo Anyone can suggest me the changes i need to make to enable the same funcionality for shards? When i do it across single core.. I get the correct results. I have applied the solr 2242 patch in solr1.4.1 Awaiting for reply Regards, Rajani
HTMLStripTransformer will remove the content in XML??
I have an XML string like this: ?xml version=1.0 encoding=UTF-8?languageintl![CDATA[hello]]/intlloc![CDATA[solr ]]/loc/language By using HTMLStripTransformer, I expect to get 'hello,solr'. But actual this transformer will remove ALL THE TEXT INSIDE! Did I do something silly, or is it a bug? Thank you
Re: Query regarding Solr-2242 patch for getting distinct facet counts.
No such issues . Successfully integrated with 1.4.1 and it works across single index. for f.2.facet.numFacetTerms=1 parameter it will give the distinct count result for f.2.facet.numFacetTerms=2 parameter it will give counts as well as results for facets. But this is working only across single index not distributed process. The conditions you have added in simple facet.java- if namedistinct count ==int ( 0, 1 and 2 condtions).. Should it be added in distributed process function to enable it work across shards? Rajani On Fri, May 27, 2011 at 12:33 PM, Bill Bell billnb...@gmail.com wrote: I am pretty sure it does not yet support distributed shards.. But the patch was written for 4.0... So there might be issues with running it on 1.4.1. On 5/26/11 11:08 PM, rajini maski rajinima...@gmail.com wrote: The patch solr 2242 for getting count of distinct facet terms doesn't work for distributedProcess (https://issues.apache.org/jira/browse/SOLR-2242) The error log says HTTP ERROR 500 Problem accessing /solr/select. Reason: For input string: numFacetTerms java.lang.NumberFormatException: For input string: numFacetTerms at java.lang.NumberFormatException.forInputString(NumberFormatException.java: 48) at java.lang.Long.parseLong(Long.java:403) at java.lang.Long.parseLong(Long.java:461) at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:331) at org.apache.solr.schema.TrieField.toInternal(TrieField.java:344) at org.apache.solr.handler.component.FacetComponent$DistribFieldFacet.add(Fac etComponent.java:619) at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponen t.java:265) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComp onent.java:235) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa ndler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas e.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java :338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav a:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl er.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216 ) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnect ion.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:41 0) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:5 82) The query I passed : http://localhost:8983/solr/select?q=*:*facet=truefacet.field=2facet.fie ld=648facet.mincount=1facet.limit=-1f.2.facet.numFacetTerms=1rows=0sh ards=localhost:8983/solr,localhost:8985/solrtwo Anyone can suggest me the changes i need to make to enable the same funcionality for shards? When i do it across single core.. I get the correct results. I have applied the solr 2242 patch in solr1.4.1 Awaiting for reply Regards, Rajani
Facet Query
Hi When I do a facet query on my data, it shows me a list of all the words present in my database with their count. Is it possible to not get the results of common words like a, an, the, http and so one but only get the count of stuff we need like microsoft, ipad, solr, etc. -- Thanx Regards Jasneet Sabharwal
Re: Facet Query
which analyzer do you use for indexing ? You could exclude those stop words during indexing http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters On Fri, May 27, 2011 at 1:36 PM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.com wrote: Hi When I do a facet query on my data, it shows me a list of all the words present in my database with their count. Is it possible to not get the results of common words like a, an, the, http and so one but only get the count of stuff we need like microsoft, ipad, solr, etc. -- Thanx Regards Jasneet Sabharwal -- Chandan Tamrakar * *
Re: Problem with spellchecking, dont want multiple request to SOLR
mm ok. I configure 2 spellcheckers: searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namespell_what/str str name=fieldspell_what/str str name=buildOnOptimizetrue/str str name=spellcheckIndexDirspellchecker_what/str /lst lst name=spellchecker str name=namespell_where/str str name=fieldspell_where/str str name=buildOnOptimizetrue/str str name=spellcheckIndexDirspellchecker_where/str /lst /searchComponent How can i enable it in my search request handler and search both in one request? -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2992076.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet Query
Are you talking about a facet query or a facet field? If it's a facet query, I don't get what's going on. If it's a facet field... well, if it's a fixed set of words you're interested in, filter the query to only those words and you'll get counts only for them. If you just need to filter out common words, I don't remember exactly how it works, but when you declare the text field (or its type) you can specify a processor that does exactly that: removes common words from the indexed field and, hence, you shouldn't get counts on them, because they just aren't there. Sorry if my information is inexact. I haven't had to deal with this feature yet. El 27/05/2011, a las 09:51, Jasneet Sabharwal escribió: Hi When I do a facet query on my data, it shows me a list of all the words present in my database with their count. Is it possible to not get the results of common words like a, an, the, http and so one but only get the count of stuff we need like microsoft, ipad, solr, etc. -- Thanx Regards Jasneet Sabharwal
Re: copyField of dates unworking?
copyfield source=date dest=text/ The letter f should be capital. copyfield =copyField
RE: Spellcheck: Two dictionaries
That uber dictionary is not what i want. I get also suggestions form the where in the what. An example: what where chelseaLondon Soccerclub Bondon London When i type soccerclub london i want the suggestion from the what dictionary. Did you mean Soccerclub Bondon. With the uber dictionary i don't get this suggestion because it is spelled correctly.(based on the where) -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-Two-dictionaries-tp2931458p2992093.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: HTMLStripTransformer will remove the content in XML??
I would expect that it doesn't understand CDATA and thinks of everything between and as a 'tag'. Best Regards, Bryan Rasmussen On Fri, May 27, 2011 at 9:41 AM, Ellery Leung elleryle...@be-o.com wrote: I have an XML string like this: ?xml version=1.0 encoding=UTF-8?languageintl![CDATA[hello]]/intlloc![CDATA[solr ]]/loc/language By using HTMLStripTransformer, I expect to get 'hello,solr'. But actual this transformer will remove ALL THE TEXT INSIDE! Did I do something silly, or is it a bug? Thank you
frange vs TrieRange
Hello, I have to perform range queries agains a date field. It is a TrieDateField, and I'm already using it for sorting. Hence, there will be already en entry in the FieldCache for it. According to: http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/ frange queries are typically faster than normal range queries when there are many terms between the endpoints (though it could be slower, if there's less than a 5% of terms between the endpoints). The cost of this speedup is the memory associated with a FieldCache entry for the field. In my case, there's no additional memory overhead, as there's already such entry. It also states that TrieRange queries have the best space/speed tradeoff. Now my doubt is: if I have no memory overhead, then I only care about relative speed between frange and trie. The good speed/space tradeoff of trie is not the measure I need in this case, but just a comparison at pure speed level. Does anybody know if there's data about this? Any clue on whether to choose frange or trie in this case? Thanks, Juan
TermFreqVector Problem
Hi all here is what I have been trying and the problem I am trying to see how many times a single word appears in a field. Basically, I have a field called universal, and lets say the field is like this: car house road age sex school education education tree garden and I am searching useing the word education so I am expecting 2 as my result. I have did the configurations on http://wiki.apache.org/solr/TermVectorComponent and my piece code is this : TermFreqVector vector = this.reader.getTermFreqVector(this.docId, universal); int index = vector.indexOf(education); int freq = vector.getTermFrequencies()[index]; but here as vector.indexOf(education); returns -1 i got an error. in addition, i have tried this too: TermFreqVector vector = reader.getTermFreqVector(this.docId, universal); String universalTerms[] = vector.getTerms(); to see the lenght of universalTerms array, and it is 1 and only value that array stores is the field value: universalTerms[0]= car house road age sex school education education tree garden anyone can help me with this? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/TermFreqVector-Problem-tp2992163p2992163.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: HTMLStripTransformer will remove the content in XML??
Got it. Actually I use solr.MappingCharFilterFactory to replace the ![CDATA[ and ]] to empty first, and use HTMLStripCharFilterFactory to get hello and solr. For future reference, here is part of schema.xml fieldType name=textMaxWord class=solr.TextField analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mappings.txt/ charFilter class=solr.HTMLStripCharFilterFactory / ... In mappings.txt (2 lines) ![CDATA[ = ]] = Restart Solr It works. Thank you -Original Message- From: bryan rasmussen [mailto:rasmussen.br...@gmail.com] Sent: 2011年5月27日 4:20 下午 To: solr-user@lucene.apache.org; elleryle...@be-o.com Subject: Re: HTMLStripTransformer will remove the content in XML?? I would expect that it doesn't understand CDATA and thinks of everything between and as a 'tag'. Best Regards, Bryan Rasmussen On Fri, May 27, 2011 at 9:41 AM, Ellery Leung elleryle...@be-o.com wrote: I have an XML string like this: ?xml version=1.0 encoding=UTF-8?languageintl![CDATA[hello]]/intlloc![CDATA[solr ]]/loc/language By using HTMLStripTransformer, I expect to get 'hello,solr'. But actual this transformer will remove ALL THE TEXT INSIDE! Did I do something silly, or is it a bug? Thank you
Re: Returning documents using multi-valued field
Thanks for you answer James :) For guys who would meet up with this problem, http://markmail.org/thread/xce4qyzs5367yplo also speaks about this, and reaches James' conclusion too. On Thu, May 26, 2011 at 10:19 PM, Dyer, James james.d...@ingrambook.comwrote: This is a limitation of Lucene/Solr in that there is no way to tell it to not match across mutli-valued field occurences. A workaround is to convert your query to a phrase and add a slop factor less than your posititonIncrementGap. ex: q=alice trudy~99 ... This example assumes that your positionIncrementGap is set to 100 (the default I think) or greater. This tells it that rather than search for a strict phrase, the words in the phrase can be up to 99 positions apart. Because the multi-valued fields are implemented under-the-covers by simply increasing the position of the next occurrence by the positionIncrementGap value, this will effectively prevent Lucene/Solr from matching across occurences. The downside to this workaround is that wildcards are not permitted in phrase searches. So if you need wildcard support also, then you're out of luck. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Kurt Sultana [mailto:kurtanat...@gmail.com] Sent: Thursday, May 26, 2011 3:05 PM To: solr-user@lucene.apache.org Subject: Re: Returning documents using multi-valued field Hi, maybe I wasn't so clear in my previous post. Here's another go (I'd like a reply :) ): Currently I'm issuing this query on Solr: http://localhost:9001/solrfacetsearch/master_Shop/select/?q=%28keyword_text_mv%3A%28alice+AND+trudy%29%29+AND+%28catalogId%3A%22Default%22%29+AND+%28catalogVersion%3AOnline%29start=0rows=http://localhost:9001/solrfacetsearch/master_Shop/select/?q=%28keyword_text_mv%3A%28alice+AND+trudy%29%29+AND+%28catalogId%3A%22Default%22%29+AND+%28catalogVersion%3AOnline%29start=0rows=2147483647facet=truefacet.field=category_string_mvsort=preferred_boolean+desc%2Cgeo_distance+ascfacet.mincount=1facet.limit=50facet.sort=indexradius=111.84681460272012long=5.2864094qt=geolat=52.2119418debugQuery=on 2147483647 facet=truefacet.field=category_string_mvsort=preferred_boolean+desc%2Cgeo_distance+ascfacet.mincount=1facet.limit=50facet.sort=indexradius=111.84681460272012long=5.2864094qt=geolat=52.2119418debugQuery=on where as you can see I'm searching for keywords Alice AND Trudy. This query returns a document which contains: arr name=keyword_text_mv stralice jill/str strtrudy alex/str /arr The problem is I'd like the document to be returned only if it contains a string alice trudy in one of its values, in other words, if it contains : arr name=keyword_text_mv stralice trudy/str strjill alex/str /arr How could I achieve this? I'm supporting the code written by someone else and I'm quite new to Solr. Thanks in advance :) Kurt On Wed, May 25, 2011 at 11:44 AM, Kurt Sultana kurtanat...@gmail.com wrote: Hi all, I'm quite new to Solr and I'm supporting an existing Solr search engine which was written by someone else. I've been reading on Solr for the last couple of weeks so I'd consider myself beyond the basics. A particular field, let's say name, is multi-valued. For example, a document has a field name with values Alice, Trudy. We want that the document is returned when Alice or Trudy is input and not when Alice Trudy is entered. Currently the document is even with Alice Trudy. How could this be done? Thanks a lot! Kurt
How to disable QueryElevationComponent
Hi, in my indexed document i do not want a uniqueKey field, but when i do not give any uniqueKey in schema.xml then it shows an exception org.apache.solr.common.SolrException: QueryElevationComponent requires the schema to have a uniqueKeyField. it means QueryElevationComponent requires a uniqueKey field.then how can i disable this QueryEvelationComponent. please reply. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-disable-QueryElevationComponent-tp2992195p2992195.html Sent from the Solr - User mailing list archive at Nabble.com.
test
test - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/test-tp2992199p2992199.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to disable QueryElevationComponent
Remove the component configuration from your solrconfig. Hi, in my indexed document i do not want a uniqueKey field, but when i do not give any uniqueKey in schema.xml then it shows an exception org.apache.solr.common.SolrException: QueryElevationComponent requires the schema to have a uniqueKeyField. it means QueryElevationComponent requires a uniqueKey field.then how can i disable this QueryEvelationComponent. please reply. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-disable-QueryElevationComponent- tp2992195p2992195.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to disable QueryElevationComponent
i removed searchComponent name=elevator class=org.apache.solr.handler.component.QueryElevationComponent str name=queryFieldTypestring/str str name=config-fileelevate.xml/str /searchComponent from solrconfig.xml but it is showing the following exception: java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImporter.identifyPk(DataImporter.java:152) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:111) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:113) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:486) at org.apache.solr.core.SolrCore.init(SolrCore.java:588) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-disable-QueryElevationComponent-tp2992195p2992320.html Sent from the Solr - User mailing list archive at Nabble.com.
what is the need of setting writeLockTimeout and commitLockTimeout in solrconfig.xml
I wanted to have the basic idea of setting these parameters in solrconfig.xml writeLockTimeout/writeLockTimeout commitLockTimeout/commitLockTimeout what actually writeLockTimeout and commitLockTimeout indicates here. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-the-need-of-setting-writeLockTimeout-and-commitLockTimeout-in-solrconfig-xml-tp2992358p2992358.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH render html entities
Is there any way to render html entities in DIH for a specific field? Thanks -- Anass
Re: Documents update
2011/5/27 Denis Kuzmenok forward...@ukr.net: Hi. I have and indexed database which is indexed few times a day and contain tinyint flag (like is_enabled, is_active, etc), and content isn't changed too often, but flags are. So if i index via post.jar only flags then entire document is deleted and there's only unique key and flags. Is there any way to index certain columns, and not to change all document? [...] Not with 1.4, but apparently there is a patch for trunk. Not sure if it is in 3.1. If you are on 1.4, you could first query Solr to get the data for the document to be changed, change the modified values, and make a complete XML, including all fields, for post.jar. Regards, Gora
Re: Documents update
I'm using 3.1 now. Indexing lasts for a few hours, and have big plain size. Getting all documents would be rather slow :( Not with 1.4, but apparently there is a patch for trunk. Not sure if it is in 3.1. If you are on 1.4, you could first query Solr to get the data for the document to be changed, change the modified values, and make a complete XML, including all fields, for post.jar. Regards, Gora
Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor
On Thu, May 26, 2011 at 6:52 PM, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Hi All, I am using Solr 3.1 for one of our search based applications. We are using DIH to index our data and TikaEntityProcessor to index attachments. Currently we are running into an issue while extracting content from one of our MS Excel 2007 files, using TikaEntityProcessor. [...] Have not done this with Tika, but we have run into similar issues while trying to convert Microsoft Word documents externally, before indexing to Solr. It turned out in our case that these documents were referring external URLs, which were not always accessible to our converter sitting behind a firewall. Also, does someone know of a way to just skip this type of behaviour for that file and move to the next document to be indexed ? [...] This is probably not of much help to you, but what we ended up doing was killing a conversion process that was taking longer than a maximum time. Regards, Gora
Re: DIH render html entities
Sorry my question was not clear. when I get data from database, some field contains some html special chars, and what i want to do is just convert them automatically. On Fri, May 27, 2011 at 1:00 PM, Gora Mohanty g...@mimirtech.com wrote: On Fri, May 27, 2011 at 3:50 PM, anass talby anass.ta...@gmail.com wrote: Is there any way to render html entities in DIH for a specific field? [...] This does not make too much sense: What do you mean by rendering HTML entities. DIH just indexes, so where would it render HTML to, even if it could? Please take a look at http://wiki.apache.org/solr/UsingMailingLists Regards, Gora -- Anass
Re: Nested grouping/field collapsing
Hi, I was wondering if this issue had already been raised. We currently have a use case where nested field collapsing would be really helpful I.e Collapse on field X then Collapse on Field Y within the groups returned by field X The current behavior of specifying multiple fields seem to be returning mutiple result sets. Has this already been feature requested ? Does anybody know of a workaround ? Many thanks, Martijn
Re: Nested grouping/field collapsing
I've found the same issue. As long as I know, the only solution is to create a copy field which combines both-fields values and facet on this field. If one of the fields has a set of distinct values known in advance and its cardinality c is not too big, it isn't a great problem: you can do with c queries. El 27/05/2011, a las 15:03, Martijn Laarman escribió: Hi, I was wondering if this issue had already been raised. We currently have a use case where nested field collapsing would be really helpful I.e Collapse on field X then Collapse on Field Y within the groups returned by field X The current behavior of specifying multiple fields seem to be returning mutiple result sets. Has this already been feature requested ? Does anybody know of a workaround ? Many thanks, Martijn
Re: Comma delemitered words shawn in terms like one word.
Thanks I was looking exactly for this. I needed to spli tokens based on comma. On Fri, Jun 18, 2010 at 10:12 PM, Joe Calderon calderon@gmail.comwrote: set generateWordParts=1 on wordDelimiter or use PatternTokenizerFactory to split on commas http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternTokenizerFactory you can use the analysis page to see what your filter chains are going to do before you index /admin/analysis.jsp On Fri, Jun 18, 2010 at 6:41 AM, Vitaliy Avdeev vavd...@sistyma.net wrote: Hello. In indexing text I have such string John,Mark,Sam. Then I looks at it in TermVectorComponent it looks like this johnmarksam. I am using this type for storing data fieldType name=textTight2 class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType What filter I need to use to get John Mark Sam as different words? -- Thanks and Regards Abhay Kumar Singh
Splitting fields
Hello, I am in an odd position. The application server I use has built-in integration with SOLR. Unfortunately, its native capabilities are fairly limited, specifically, it only supports a standard/pre-defined set of fields which can be indexed. As a result, it has left me kludging how I work with Solr and doing things like putting what I'd like to be multiple, separate fields into a single Solr field. As an example, I may put a customer id and name into a single field called 'custom1'. Ideally, I'd like this information to be returned in separate fields...and even better would be for them to be indexed as separate fields but I can live without the latter. Currently, I'm building out a json representation of this information which makes it easy for me to deal with when I extract the results...but it all feels wrong. I do have complete control over the actual Solr installation (just not the indexing call to Solr), so I was hoping there may be a way to configure Solr to take my single field and split it up into a different field for each key in my json representation. I don't see anything native to Solr that would do this for me but there are a few features that I thought sounded similar and was hoping to get some opinions on how I may be able to move forward with this... Poly fields, such as the spatial location, might help? Can I build my own poly-field that would split up the main field into subfields? Do poly-fields let me return the subfields? I don't quite have my head around polyfields yet. Another option although I suspect this won't be considered a good approach, but what about extending the copyField functionality of schema.xml to support my needs? It would seem not entirely unreasonable that copyField would provide a means to extract only a portion of the contents of the source field to place in the destination field, no? I'm sure people more familiar with Solr's architecture could explain why this isn't really an appropriate thing for Solr to handle (just because it could doesn't mean it should)... The other - and probably best -- option would be to leverage Solr directly, bypassing the native integration of my application server, which we've already done for most cases. I'd love to go this route but I'm having a hard time figuring out how to easily accomplish the same functionality provided by my app server integration...perhaps someone on the list could help me with this path forward? Here is what I'm trying to accomplish: I'm indexing documents (text, pdf, html...) but I need to include fields in the results of my searches which are only available from a db query. I know how to have Solr index results from a db query, but I'm having trouble getting it to index the documents that are associated to each record of that query (full path/filename is one of the fields of that query). I started to try to use the dataImport handler to do this, by setting up a FileDataSource in addition to my jdbc data source. I tried to leverage the filedatasource to populate a sub-entity based on the db field that contains the full path/filename, but I wasn't sure how to specify the db field from the root query/entity. Before I spent too much time, I also realized I wasn't sure how to get Solr to deal with binary file types this way either which upon further reading seemed like I would need to leverage Tika - can that be done within the confines of dataimporthandler? Any advice is greatly appreciated. Thanks in advance, Joe
Re: solr Invalid Date in Date Math String/Invalid Date String
The * endpoint for range terms wasn't implemented yet in 1.4.1 As a workaround, we use very large and very small values. -Mike On 05/27/2011 12:55 AM, alucard001 wrote: Hi all I am using SOLR 1.4.1 (according to solr info), but no matter what date field I use (date or tdate) defined in default schema.xml, I cannot do a search in solr-admin analysis.jsp: fieldtype: date(or tdate) fieldvalue(index): 2006-12-22T13:52:13Z (I type it in manually, no trailing space) fieldvalue(query): The only success case: 2006-12-22T13:52:13Z All search below are failed: * TO NOW [* TO NOW] 2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z 2006\-12\-22T00\:00\:00Z TO 2006\-12\-22T23\:59\:59Z [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z] [2006\-12\-22T00\:00\:00Z TO 2006\-12\-22T23\:59\:59Z] 2006-12-22T00:00:00.000Z TO 2006-12-22T23:59:59.999Z 2006\-12\-22T00\:00\:00\.000Z TO 2006\-12\-22T23\:59\:59\.999Z [2006-12-22T00:00:00.000Z TO 2006-12-22T23:59:59.999Z] [2006\-12\-22T00\:00\:00\.000Z TO 2006\-12\-22T23\:59\:59\.999Z] 2006-12-22T00:00:00Z TO * 2006\-12\-22T00\:00\:00Z TO * [2006-12-22T00:00:00Z TO *] [2006\-12\-22T00\:00\:00Z TO *] 2006-12-22T00:00:00.000Z TO * 2006\-12\-22T00\:00\:00\.000Z TO * [2006-12-22T00:00:00.000Z TO *] [2006\-12\-22T00\:00\:00\.000Z TO *] (vice versa) I get either: Invalid Date in Date Math String or Invalid Date String error What's wrong with it? Can anyone please help me on that? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-Invalid-Date-in-Date-Math-String-Invalid-Date-String-tp2991763p2991763.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: highlighting in multiValued field
Hi Bob, Hmm... I don't think this approach will scale with bigger and more documents :( Thanks for your help though; I think I should take a look at customizing highlight component to achieve this... Thanks, Jeff On May 27, 2011, at 12:24 PM, Bob Sandiford bob.sandif...@sirsidynix.com wrote: The only thing I can think of is to post-process your snippets. I.E. pull the highlighting tags out of the strings, look for the match in your result description field looking for a match, and if you find one, replace that description with the original highlight text (i.e. with the highlight tags still in place). Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com Join the conversation - you may even get an iPad or Nook out of it! Like us on Facebook! Follow us on Twitter! -Original Message- From: Jeffrey Chang [mailto:jclal...@gmail.com] Sent: Friday, May 27, 2011 12:16 AM To: solr-user@lucene.apache.org Subject: Re: highlighting in multiValued field Hi Bob, I have no idea how I missed that! Thanks for pointing me to use hl.snippets - that did the magic! Please allow me squeeze one more question along the same line. Since I'm now able to display multiple snippets - what I'm trying to achieve is, determine which highlighted snippet maps back to what position in the original document. e.g. If I search for Tel, with highlighting and hl.snippets=2 it'll return me: doc ... arr name=descID str1/str str2/str str3/str /arr arr name=description strTel to talent 1/str strTel to talent 2/str strTel to talent 3/str /arr ... /doc lst name=highlighting lst name=1 arr name=description stremTel/em to talent 1/str stremTel/em to talent 2/str /arr /lst ... Is there a way for me to figure out which highlighted snippet belongs to which descID so I can display also display the non-highlighted rows for my search results. Or is this not the way how highlighting is designed and to be used? Thanks so much, Jeff [snip]
RE: Spellcheck: Two dictionaries
You're up against a couple of real limitations with Solr's spell checking. The first limitation is that you can only use 1 dictionary per query. The second limitation is that if a word is in the dictionary it never tries to correct it. This will happen even if you *don't* combine your two dictionaries (albeit it will happen less because the dictionary you use will be smaller). The best workaround to this second limitation is to use spellcheck.onlyMorePopular=true. This is a pretty bad solution though because onlyMorePopular then makes the spellchecker assume *all* of the words in the query need to be re-spelled. The solr spellchecker really does need a hybrid option that will both correct the obviously misspelled words and also try some more popular alternates. It could then try different combinations, creating collation queries and testing them against the index prior to returning them. SOLR-2010 (included in 3.1) got us part of the way there but there is still more work to do. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: roySolr [mailto:royrutten1...@gmail.com] Sent: Friday, May 27, 2011 3:09 AM To: solr-user@lucene.apache.org Subject: RE: Spellcheck: Two dictionaries That uber dictionary is not what i want. I get also suggestions form the where in the what. An example: what where chelseaLondon Soccerclub Bondon London When i type soccerclub london i want the suggestion from the what dictionary. Did you mean Soccerclub Bondon. With the uber dictionary i don't get this suggestion because it is spelled correctly.(based on the where) -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-Two-dictionaries-tp2931458p2992093.html Sent from the Solr - User mailing list archive at Nabble.com.
Nested grouping/field collapsing
Hi, I was wondering if this issue had already been raised. We currently have a use case where nested field collapsing would be really helpful I.e Collapse on field X then Collapse on Field Y within the groups returned by field X The current behavior of specifying multiple fields seem to be returning mutiple result sets. Has this already been feature requested ? Does anybody know of a workaround ? Many thanks, Martijn
Re: Nested grouping/field collapsing
Did you try pivot? Bill Bell Sent from mobile On May 27, 2011, at 4:13 AM, Martijn Laarman mpdre...@gmail.com wrote: Hi, I was wondering if this issue had already been raised. We currently have a use case where nested field collapsing would be really helpful I.e Collapse on field X then Collapse on Field Y within the groups returned by field X The current behavior of specifying multiple fields seem to be returning mutiple result sets. Has this already been feature requested ? Does anybody know of a workaround ? Many thanks, Martijn
Re: what is the need of setting autocommit in solrconfig.xml
On 5/27/2011 6:48 AM, Romi wrote: What is the benifit of setting autocommit in solrconfig.xml. i read somewhere that these settings control how often pending updates will be automatically pushed to the index. does it mean if solr server is running then it automaticaly starts indexing process if it finds any updates in database??? No, it means it automatically commits recently added documents to the index so that they become searchable.
Re: Edgengram
For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb
Re: Similarity per field
I'm still not having any luck with this. Has anyone actually gotten this to work so far? I feel like I've followed the directions to the letter but it just doesn't work. Thanks, Brian Lamb On Wed, May 25, 2011 at 2:48 PM, Brian Lamb brian.l...@journalexperts.comwrote: I looked at the patch page and saw the files that were changed. I went into my install and looked at those same files and found that they had indeed been changed. So it looks like I have the correct version of solr. On Wed, May 25, 2011 at 1:01 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I sent a mail in about this topic a week ago but now that I have more information about what I am doing, as well as a better understanding of how the similarity class works, I wanted to start a new thread with a bit more information about what I'm doing, what I want to do, and how I can make it work correctly. I have written a similarity class that I would like applied to a specific field. This is how I am defining the fieldType: fieldType name=edgengram_cust class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=1 side=front / /analyzer similarity class=my.package.similarity.MySimilarity/ /fieldType And then I assign a specific field to that fieldType: field name=myfield multiValued=true type=edgengram_cust indexed=true stored=true required=false omitNorms=true / Then, I restarted solr and did a fullimport. However, the changes I have made do not appear to be taking hold. For simplicity, right now I just have the idf function returning 1. When I do a search with debugQuery=on, the idf behaves as it normally does. However, when I search on this field, the idf should be 1 and that is not the case. To try and nail down where the problem occurs, I commented out the similarity class definition in the fieldType and added it globally to the schema file: similarity class=my.package.similarity.MySimilarity/ Then, I restarted solr and did a fullimport. This time, the idf scores were all 1. So it seems to me the problem is not with my similarity class but in trying to apply it to a specific fieldType. According to https://issues.apache.org/jira/browse/SOLR-2338, this should be in the trunk now yes? I have run svn up on both my lucene and solr installs and it still is not recognizing it on a per field basis. Is the tag different inside a fieldType? Did I not update solr correctly? Where is my mistake? Thanks, Brian Lamb
Re: Nested grouping/field collapsing
Can you open a Lucene issue (against the new grouping module) for this? I think this is a compelling use case that we should try to support. In theory, with the general two-pass grouping collector, this should be possible, but will require three passes, and we also must generalize the 2nd pass collector to accept arbitrary collectors for each group (today it's hardwired to sort-by-SortField collectors). I suspect coupling the single-pass grouping collector (currently still a patch on LUCENE-3129) with the two-pass collector could also work. Also, can you describe more details about the two fields you want to group/collapse by? Mike http://blog.mikemccandless.com On Fri, May 27, 2011 at 6:13 AM, Martijn Laarman mpdre...@gmail.com wrote: Hi, I was wondering if this issue had already been raised. We currently have a use case where nested field collapsing would be really helpful I.e Collapse on field X then Collapse on Field Y within the groups returned by field X The current behavior of specifying multiple fields seem to be returning mutiple result sets. Has this already been feature requested ? Does anybody know of a workaround ? Many thanks, Martijn
Result Grouping always returns grouped output
Hello, I am using the latest nightly build of Solr 4.0 and I would like to use grouping/field collapsing while maintaining compatibility with my current parser. I am using the regular webinterface to test it, the same commands like in the wiki, just with the field names matching my dataset. Grouping itself works, group=true and group.field return the expected results, but neither group.main=true or group.format=simple seem to change anything. Do I have to include something special in solrconconfig.xml or scheme.xml to make the simple output work? Thanks for any hints, K
Re: Pivot with Stats (or Stats with Pivot)
Nobody? Please, help edua...@calandra.com.br 17/05/2011 16:13 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Pivot with Stats (or Stats with Pivot) Hi All, Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot!
Re: copyField of dates unworking?
On May 27, 2011, at 1:04 AM, Ahmet Arslan wrote: The letter f should be capital Hah! Well-spotted! Thanks. -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep PGP.sig Description: This is a digitally signed message part
problem getting Solr to commit
We verified with the fiddler proxy server that when we use the Java CommonsHttpSolrServer to communicate with our Solr server we are not able to get the client to post a commit/ message back to Solr. The result is that we can't force the tail end of a batch job to commit after it has run and we can't do integration testing that needs data to have been committed to Solr. We have tried all variations of the .commit method and the workaround posted a http://www.mail-archive.com/solr-dev@lucene.apache.org/msg12289.html Our solution was to hack the source code supplied by Solr for SimplePostTool to create a utility to post a commit/ tag to Solr. http://stackoverflow.com/questions/6141417/solr-lucene-server-does-not-post-commit-message This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message. This footer also confirms that this e-mail message has been scanned for the presence of computer viruses. Any views expressed in this message are those of the individual sender, except where the sender specifies and with authority, states them to be the views of Iowa Student Loan.
Re: very slow commits and overlapping commits
I managed to get a thread dump during a slow commit: resin-tcp-connection-*:5062-129 Id=12721 in RUNNABLE total cpu time=391530.ms user time=390620.ms at java.lang.String.intern(Native Method) at org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74) at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:356) at org.apache.lucene.index.FieldInfos.(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:691) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:667) at org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:956) at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:5207) at org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:4370) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4209) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4200) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:2195) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2158) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2122) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:230) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:181) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:409) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:48) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:70) at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:173) at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:229) at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:274) at com.caucho.server.port.TcpConnection.run(TcpConnection.java:511) at com.caucho.util.ThreadPool.runTasks(ThreadPool.java:520) at com.caucho.util.ThreadPool.run(ThreadPool.java:442) at java.lang.Thread.run(Thread.java:619) It looks like Lucene's StringHelper is hardcoding the max size of the hash table of SimpleStringInterner to 1024 and I might be hitting that limit, causing an actual call to java.lang.String.intern(). I think I need to reduce the number of fields in my index. Any other things I can do to help in this case. Bill On Wed, May 25, 2011 at 11:28 AM, Bill Au bill.w...@gmail.com wrote: I am taking a snapshot after every commit. From looking at the snapshots, it does not look like the delay in caused by segments merging because I am not seeing any large new segments after a commit. I still can't figure out why there is a 2 minutes gap between start commit and SolrDelectionPolicy.onCommit. Will changing the deletion policy make any difference? I am using the default deletion policy now. Bill 2011/5/21 Erick Erickson erickerick...@gmail.com Well, committing less offside a possibilty g. Here's what's probably happening. When you pass certain thresholds, segments are merged which can take quite some time. His are you triggering commits? If it's external, think about using auto commit instead. Best Erick On May 20, 2011 6:04 PM, Bill Au bill.w...@gmail.com wrote: On my Solr 1.4.1 master I am doing commits regularly at a fixed interval. I noticed that from time to time commit will take longer than the commit interval, causing commits to overlap. Then things will get worse as commit will take longer and longer. Here is the logs for a long commit: [2011-05-18 23:47:30.071] start commit(optimize=false,waitFlush=false,waitSearcher=false,expungeDeletes=false) [2011-05-18 23:49:48.119] SolrDeletionPolicy.onCommit: commits:num=2 [2011-05-18 23:49:48.119] commit{dir=/var/opt/resin3/5062/solr/data/index,segFN=segments_5cpa,version=1247782702272,generation=249742,filenames=[_4dqu_2g.del, _4e66.tis, _4e3r.tis, _4e59.nrm, _4e68_1.del, _4e4n.prx, _4e4n.fnm, _4e67.fnm, _4e3r.frq, _4e3r.tii, _4e6d.fnm, _4e6c.prx, _4e68.fdx, _4e68.nrm, _4e6a.frq, _4e68.fdt, _4dqu.fnm, _4e4n.tii, _4e69.fdx, _4e69.fdt, _4e0e.nrm, _4e4n.tis, _4e6e.fnm, _4e3r.prx, _4e66.fnm, _4e3r.nrm, _4e0e.prx,
Re: Nested grouping/field collapsing
Thanks Mike, I've opened https://issues.apache.org/jira/browse/SOLR-2553 for this. It's exciting to hear a workable implementation might be possible! On Fri, May 27, 2011 at 6:23 PM, Michael McCandless luc...@mikemccandless.com wrote: Can you open a Lucene issue (against the new grouping module) for this? I think this is a compelling use case that we should try to support. In theory, with the general two-pass grouping collector, this should be possible, but will require three passes, and we also must generalize the 2nd pass collector to accept arbitrary collectors for each group (today it's hardwired to sort-by-SortField collectors). I suspect coupling the single-pass grouping collector (currently still a patch on LUCENE-3129) with the two-pass collector could also work. Also, can you describe more details about the two fields you want to group/collapse by? Mike http://blog.mikemccandless.com On Fri, May 27, 2011 at 6:13 AM, Martijn Laarman mpdre...@gmail.com wrote: Hi, I was wondering if this issue had already been raised. We currently have a use case where nested field collapsing would be really helpful I.e Collapse on field X then Collapse on Field Y within the groups returned by field X The current behavior of specifying multiple fields seem to be returning mutiple result sets. Has this already been feature requested ? Does anybody know of a workaround ? Many thanks, Martijn
Custom Scoring relying on another server.
I know this question has been asked before but I think my situation is a little different. Basically I need to do custom scores that the traditional function queries simply won't allow me to do. I actually need to hit another server from Java (passing in a bunch of things mostly relying on how to score result). So I want to extend the current scorer and add in the things I need it to do for the scoring (make a trip to the scoring server with a bunch of parameters, and come back with the scores). Can someone point me to the right direction to doing this? Exactly where does the document scoring happen in Solr? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Scoring-relying-on-another-server-tp2994546p2994546.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck Phrases
are there any updates on this? any third party apps that can make this work as expected? On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James james.d...@ingrambook.comwrote: Tanner, Currently Solr will only make suggestions for words that are not in the dictionary, unless you specifiy spellcheck.onlyMorePopular=true. However, if you do that, then it will try to improve every word in your query, even the ones that are spelled correctly (so while it might change brake to break it might also change leg to log.) You might be able to alleviate some of the pain by setting the thresholdTokenFrequency so as to remove misspelled and rarely-used words from your dictionary, although I personally haven't been able to get this parameter to work. It also doesn't seem to be documented on the wiki but it is in the 1.4.1. source code, in class IndexBasedSpellChecker. Its also mentioned in SmileyPugh's book. I tried setting it like this, but got a ClassCastException on the float value: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_spelling/str lst name=spellchecker str name=namespellchecker/str str name=fieldSpelling_Dictionary/str str name=fieldTypetext_spelling/str str name=buildOnOptimizetrue/str str name=thresholdTokenFrequency.001/str /lst /searchComponent I have it on my to-do list to look into this further but haven't yet. If you decide to try it and can get it to work, please let me know how you do it. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Tanner Postert [mailto:tanner.post...@gmail.com] Sent: Wednesday, February 23, 2011 12:53 PM To: solr-user@lucene.apache.org Subject: Spellcheck Phrases right now when I search for 'brake a leg', solr returns valid results with no indication of misspelling, which is understandable since all of those terms are valid words and are probably found in a few pieces of our content. My question is: is there any way for it to recognize that the phase should be break a leg and not brake a leg and suggest the proper phrase?
Re: K-Stemmer for Solr 3.1
Where can one find the KStemmer source for 4.0? On 5/12/11 11:28 PM, Bernd Fehling wrote: I backported a Lucid KStemmer version from solr 4.0 which I found somewhere. Just changed from import org.apache.lucene.analysis.util.CharArraySet; // solr4.0 to import org.apache.lucene.analysis.CharArraySet; // solr3.1 Bernd Am 12.05.2011 16:32, schrieb Mark: java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z Would you mind explaining your modifications? Thanks On 5/11/11 11:14 PM, Bernd Fehling wrote: Am 12.05.2011 02:05, schrieb Mark: It appears that the older version of the Lucid Works KStemmer is incompatible with Solr 3.1. Has anyone been able to get this to work? If not, what are you using as an alternative? Thanks Lucid KStemmer works nice with Solr3.1 after some minor mods to KStemFilter.java and KStemFilterFactory.java. What problems do you have? Bernd
LucidWorks source
Is LucidWorks source no longer available? In earlier versions their source code was available but after the latest install I can not seem to find it?
RE: solr Invalid Date in Date Math String/Invalid Date String
Thank you Mike. So I understand that now. But what about the other items that have values on both size? They don't work at all. -Original Message- From: Mike Sokolov [mailto:soko...@ifactory.com] Sent: 2011年5月27日 10:23 下午 To: solr-user@lucene.apache.org Cc: alucard001 Subject: Re: solr Invalid Date in Date Math String/Invalid Date String The * endpoint for range terms wasn't implemented yet in 1.4.1 As a workaround, we use very large and very small values. -Mike On 05/27/2011 12:55 AM, alucard001 wrote: Hi all I am using SOLR 1.4.1 (according to solr info), but no matter what date field I use (date or tdate) defined in default schema.xml, I cannot do a search in solr-admin analysis.jsp: fieldtype: date(or tdate) fieldvalue(index): 2006-12-22T13:52:13Z (I type it in manually, no trailing space) fieldvalue(query): The only success case: 2006-12-22T13:52:13Z All search below are failed: * TO NOW [* TO NOW] 2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z 2006\-12\-22T00\:00\:00Z TO 2006\-12\-22T23\:59\:59Z [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z] [2006\-12\-22T00\:00\:00Z TO 2006\-12\-22T23\:59\:59Z] 2006-12-22T00:00:00.000Z TO 2006-12-22T23:59:59.999Z 2006\-12\-22T00\:00\:00\.000Z TO 2006\-12\-22T23\:59\:59\.999Z [2006-12-22T00:00:00.000Z TO 2006-12-22T23:59:59.999Z] [2006\-12\-22T00\:00\:00\.000Z TO 2006\-12\-22T23\:59\:59\.999Z] 2006-12-22T00:00:00Z TO * 2006\-12\-22T00\:00\:00Z TO * [2006-12-22T00:00:00Z TO *] [2006\-12\-22T00\:00\:00Z TO *] 2006-12-22T00:00:00.000Z TO * 2006\-12\-22T00\:00\:00\.000Z TO * [2006-12-22T00:00:00.000Z TO *] [2006\-12\-22T00\:00\:00\.000Z TO *] (vice versa) I get either: Invalid Date in Date Math String or Invalid Date String error What's wrong with it? Can anyone please help me on that? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-Invalid-Date-in-Date-Math-String-Inv alid-Date-String-tp2991763p2991763.html Sent from the Solr - User mailing list archive at Nabble.com.