FW: What is the format of data contained in a Named List?
Hi, Thanks for your reply. But I need one clarification. When you say it will contain the data you requested for, do you mean the data as requested in fl parameter of the query? Thanks. Aman -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Friday, July 17, 2009 11:30 AM To: solr-user@lucene.apache.org Subject: Re: What is the format of data contained in a Named List? The contents of SolrDocument is fixed. It will only contain the data that you requested for On Fri, Jul 17, 2009 at 10:36 AM, Kartik1 wrote: > > A named list contains a key value pair. At the very basic level, if we want > to access the data that is contained in named list > > NamedList foo = thisIsSolrQueryResponseObject.getValues(); > Entry bar = null; > // Creating a iterator to iterate through the response > Iterator> It =foo.iterator(); > while (It.hasNext()) { > bar = It.next(); > SolrDocumentList solDocLst = (SolrDocumentList) bar.getValue(); > for (int k = 0; k < solDocLst.size(); k++) { > SolrDocument doc = solDocLst.get(k); > .. > > now what will this SolrDocument contain ? Will it contain all the values > that match that particular record or only some values? Is this the correct > way to iterate through the response ? I don't know lucene and only a little > bit Solr. > > > -- > View this message in context: > http://www.nabble.com/What-is-the-format-of-data-contained-in-a-Named-List--tp24528649p24528649.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: What is the format of data contained in a Named List?
The contents of SolrDocument is fixed. It will only contain the data that you requested for On Fri, Jul 17, 2009 at 10:36 AM, Kartik1 wrote: > > A named list contains a key value pair. At the very basic level, if we want > to access the data that is contained in named list > > NamedList foo = thisIsSolrQueryResponseObject.getValues(); > Entry bar = null; > // Creating a iterator to iterate through the response > Iterator> It =foo.iterator(); > while (It.hasNext()) { > bar = It.next(); > SolrDocumentList solDocLst = (SolrDocumentList) bar.getValue(); > for (int k = 0; k < solDocLst.size(); k++) { > SolrDocument doc = solDocLst.get(k); > .. > > now what will this SolrDocument contain ? Will it contain all the values > that match that particular record or only some values? Is this the correct > way to iterate through the response ? I don't know lucene and only a little > bit Solr. > > > -- > View this message in context: > http://www.nabble.com/What-is-the-format-of-data-contained-in-a-Named-List--tp24528649p24528649.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
What is the format of data contained in a Named List?
A named list contains a key value pair. At the very basic level, if we want to access the data that is contained in named list NamedList foo = thisIsSolrQueryResponseObject.getValues(); Entry bar = null; // Creating a iterator to iterate through the response Iterator> It =foo.iterator(); while (It.hasNext()) { bar = It.next(); SolrDocumentList solDocLst = (SolrDocumentList) bar.getValue(); for (int k = 0; k < solDocLst.size(); k++) { SolrDocument doc = solDocLst.get(k); .. now what will this SolrDocument contain ? Will it contain all the values that match that particular record or only some values? Is this the correct way to iterate through the response ? I don't know lucene and only a little bit Solr. -- View this message in context: http://www.nabble.com/What-is-the-format-of-data-contained-in-a-Named-List--tp24528649p24528649.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheck with misspelled words in index
I think you can just tell the spellchecker to only supply "more popular" suggestions, which would naturally omit these rare misspellings: true -Peter On Wed, Jul 15, 2009 at 7:30 PM, Jay Hill wrote: > We had the same thing to deal with recently, and a great solution was posted > to the list. Create a stopwords filter on the field your using for your > spell checking, and then populate a custom stopwords file with known > misspelled words: > > positionIncrementGap="100" > > > > > words="misspelled_words.txt"/> > > > > > Your spell field would look like this: > multiValued="true"/> > > Then add words like "cusine" to messpelled_words.txt > > -Jay > > > On Tue, Jul 14, 2009 at 11:40 PM, Chris Williams wrote: > >> Hi, >> I'm having some trouble getting the correct results from the >> spellcheck component. I'd like to use it to suggest correct product >> titles on our site, however some of our products have misspellings in >> them outside of our control. For example, there's 2 products with the >> misspelled word "cusine" (and 25k with the correct spelling >> "cuisine"). So if someone searches for the word "cusine" on our site, >> I would like to show the 2 misspelled products, and a suggestion with >> "Did you mean cuisine?". >> >> However, I can't seem to ever get any spelling suggestions when I >> search by the word "cusine", and correctlySpelled is always true. >> Misspelled words that don't appear in the index work fine. >> >> I noticed that setting onlyMorePopular to true will return suggestions >> for the misspelled word, but I've found that it doesn't work great for >> other words and produces suggestions too often for correctly spelled >> words. >> >> I incorrectly had thought that by setting thresholdTokenFrequency >> higher on my spelling dictionary that these misspellings would not >> appear in my spelling index and thus I would get suggestions for them, >> but as I see now, the spellcheck doesn't quite work like that. >> >> Is there any way to somehow get spelling suggestions to work for these >> misspellings in my index if they have a low frequency? >> >> Thanks in advance, >> Chris >> > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Wikipedia or reuters like index for testing facets?
AWS provides some standard data sets, including an extract of all wikipedia content: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2345&categoryID=249 Looks like it's not being updated often, so this or another AWS data set could be a consistent basis for benchmarking? -Peter On Wed, Jul 15, 2009 at 2:21 PM, Jason Rutherglen wrote: > Yeah that's what I was thinking of as an alternative, use enwiki > and randomly generate facet data along with it. However for > consistent benchmarking the random data would need to stay the > same so that people could execute the same benchmark > consistently in their own environment. > > On Tue, Jul 14, 2009 at 6:28 PM, Mark Miller wrote: >> Why don't you just randomly generate the facet data? Thats prob the best way >> right? You can control the uniques and ranges. >> >> On Wed, Jul 15, 2009 at 1:21 AM, Grant Ingersoll wrote: >> >>> Probably not as generated by the EnwikiDocMaker, but the WikipediaTokenizer >>> in Lucene can pull out richer syntax which could then be Teed/Sinked to >>> other fields. Things like categories, related links, etc. Mostly, though, >>> I was just commenting on the fact that it isn't hard to at least use it for >>> getting docs into Solr. >>> >>> -Grant >>> >>> On Jul 14, 2009, at 7:38 PM, Jason Rutherglen wrote: >>> >>> You think enwiki has enough data for faceting? On Tue, Jul 14, 2009 at 2:56 PM, Grant Ingersoll wrote: > At a min, it is trivial to use the EnWikiDocMaker and then send the doc > over > SolrJ... > > On Jul 14, 2009, at 4:07 PM, Mark Miller wrote: > > On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen < >> jason.rutherg...@gmail.com> wrote: >> >> Is there a standard index like what Lucene uses for contrib/benchmark >>> for >>> executing faceted queries over? Or maybe we can randomly generate one >>> that >>> works in conjunction with wikipedia? That way we can execute real world >>> queries against faceted data. Or we could use the Lucene/Solr mailing >>> lists >>> and other data (ala Lucid's faceted site) as a standard index? >>> >>> >> I don't think there is any standard set of docs for solr testing - there >> is >> not a real benchmark contrib - though I know more than a few of us have >> hacked up pieces of Lucene benchmark to work with Solr - I think I've >> done >> it twice now ;) >> >> Would be nice to get things going. I was thinking the other day: I >> wonder >> how hard it would be to make Lucene Benchmark generic enough to accept >> Solr >> impls and Solr algs? >> >> It does a lot that would suck to duplicate. >> >> -- >> -- >> - Mark >> >> http://www.lucidimagination.com >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > > >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>> Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >> >> >> -- >> -- >> - Mark >> >> http://www.lucidimagination.com >> > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Distributed search has problem for facet
Thanks for the bug report... this looks like an escaping bug. But, it looks like it stems from a really weird field name? facet.field=authorname: Shouldn't that be facet.field=authorname -Yonik http://www.lucidimagination.com On Thu, Jul 16, 2009 at 8:14 PM, zehua wrote: > > I use two shards in two different machines. Here is my URL > http://machine1:8900/solr/select/?shards=machine1:8900/solr,machine2:8900/solr&q=body\::dell&start=0&rows=10&facet=true&facet.field=authorname > > This works great for 1.3.0 I just download the Solr 1.4 from trunk and it > breaks. If I run the query for individual node, the facet works great. If I > run distributed search in two machine without facet, it also works. The > following is the error from two machine: > > Is there some setting different in 1.3 and 1.4 > > Error in first machine: > INFO: [] webapp=/solr path=/select > params={facet=true&fl=contentkey,score&start=0&q=body\::dell&f.authorname:.facet.limit=160&facet.field=authorname:&isShard=true&wt=javabin&fsv=true&rows=11&version=1} > hits=69011 status=0 QTime=5715 > Jul 16, 2009 5:11:41 PM org.apache.solr.common.SolrException log > SEVERE: java.lang.NullPointerException > at > org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:331) > at > org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > > Jul 16, 2009 5:11:41 PM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/select/ > params={facet=true&shards=dttest09:8900/solr,dttest10:8900/solr&start=0&q=body\::dell&facet.field=authorname:&rows=11} > status=500 QTime=6082 > Jul 16, 2009 5:11:41 PM org.apache.solr.common.SolrException log > SEVERE: java.lang.NullPointerException > at > org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:331) > at > org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCol
can i use solr to do this
Hi, every solr document I have a creation date which is the default time stamp "NOW". What I like to know how can I have facets like the following: Past 24 Hours (3) Past 7 days (23) Past 15 days (33) Past 30 days (59) Is this possible? i.e. range query as facet? Regards Anton __ Ta semester! - sök efter resor hos Kelkoo. Jämför pris på flygbiljetter och hotellrum här: http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052
Distributed search has problem for facet
I use two shards in two different machines. Here is my URL http://machine1:8900/solr/select/?shards=machine1:8900/solr,machine2:8900/solr&q=body\::dell&start=0&rows=10&facet=true&facet.field=authorname This works great for 1.3.0 I just download the Solr 1.4 from trunk and it breaks. If I run the query for individual node, the facet works great. If I run distributed search in two machine without facet, it also works. The following is the error from two machine: Is there some setting different in 1.3 and 1.4 Error in first machine: INFO: [] webapp=/solr path=/select params={facet=true&fl=contentkey,score&start=0&q=body\::dell&f.authorname:.facet.limit=160&facet.field=authorname:&isShard=true&wt=javabin&fsv=true&rows=11&version=1} hits=69011 status=0 QTime=5715 Jul 16, 2009 5:11:41 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:331) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Jul 16, 2009 5:11:41 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select/ params={facet=true&shards=dttest09:8900/solr,dttest10:8900/solr&start=0&q=body\::dell&facet.field=authorname:&rows=11} status=500 QTime=6082 Jul 16, 2009 5:11:41 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:331) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnect
Re: Lock timed out 2 worker running
This is relaly odd. Just to clarify... 1) you are running a normal solr installation (in a servlet container) and using SolrJ to send updates to Solr from another application, correct? 2) Do you have any special custom plugins running 3) do you have any other apps that might be attempting to access the index directly? 4) what OS are you using? ... what type of filesystem? (local disk or some shared network drive) 5) are these errors appearing after Solr crashes and you restart it? 6) what version of Solr are you using? No matter how many worker threads you have, there should only be one IndexWriter using the index/lockfile from Solr ... so this error should really never happen in normal usage. : Jul 10, 2009 4:01:55 AM org.apache.solr.common.SolrException log : SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed : out: SimpleFSLock@ : /projects/msim/indexdata/data/index/lucene-0614ba206dd0e0871ca4eecf8f2e853a-write.lock : at org.apache.lucene.store.Lock.obtain(Lock.java:85) : at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1140) : at org.apache.lucene.index.IndexWriter.(IndexWriter.java:938) : at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:116) : at : org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) : at : org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:167) : at : org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:221) : at : org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) : at : org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196) : at : org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) : at : org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) : at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) : at : org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) : at : org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) : at : org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) : at : org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) : at : org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210) : at : org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) : at : org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) : at : org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) : at : org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) : at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:542) : at : org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) : at : org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870) : at : org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) : at : org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) : at : org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) : at : org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685) : at java.lang.Thread.run(Thread.java:619) -Hoss
Re: Change in DocListAndSetNC not messing everything
Hey Hoss thanks for answering to this so concrete question. I actually realized that maybe I was not clear enough in my explanation so I just post it again in another thread trying to give more detail and writing the code of my hack in getDocListAndSetNC: http://www.nabble.com/Custom-funcionality-in-SolrIndexSearcher-td24475706.html Any comment or advice is more than welcome Thanks in advance : For testing, what I have done is do some hacks to SolrIndexSearcher's : getDocListAndSetNC funcion. I fill the ids array in my own order or I just : don't add some docs id's (and so change this id's array size). I have been : testing it and the performance is dramatically better that using the patch. : Can anyone tell me witch is the best way to hack DocListAndSetNC? I mean, I : know this change can make me go mad in the future, when I decide to update : trunk version or update to new releases. : My hack provably is too specific for my use case but could upload the source : in case someone can advice me what to do. : Thanks in advance, If you can post the changes you found useful for your specific case, it might help people spot possible ways to refactor SolrIndexSearcher code so similar types of extension like that would be simpler (without custom patching) -Hoss -- View this message in context: http://www.nabble.com/Change-in-DocListAndSetNC-not-messing-everything-tp24387830p24525598.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Create incremental snapshot
: Thanks for the reply Asif. We have already tried removing the optimization : step. Unfortunately the commit command alone is also causing an identical : behaviour . Is there any thing else that we are missing ? the hardlinking behavior of snapshots is based on the files in the index directory, and the files in the index directory are based on the current segments of your index -- so if you make enough changes to your index to cause all of hte segments to change every snapshot will be different. optimizing garuntees you that every segment will be different (because all the old segment are gone, and a new segment is created) but if your merge settings are set to be really aggressive, then it's euqally possible that some number of delete/add calls will also cause every segment to be replaced. without your configs, and directory listings of subsequent snapshots, it's hard to guess what the problem might be (if you already stoped optimizing on every batch) But i think we have an XY problem here... : >> This process continues for around 160,000 documents i.e. 800 times and by : >> the end of it we have 800 snapshots. Why do you keep 800 snapshots? you really only need snapshots arround long enough to ensure that a slave isn't snappulling in hte middle of deleteing it ... unless you have some really funky usecase where you want some of your query boxes to deliberately fetch old versions of hte index, you odn't really need more then couple of snapshots at one time. it can be prudent to keep more snapshots then you "need" arround in case of logical index corruption (ie: someone foolishly deletes a bunch of docs they shouldn't have) because snapshots are *usually* more disk space efficient then full backup copies -- but if you are finding that that's not hte case, why bother keeping them? -Hoss
Re: boosting MoreLikeThis results
: I have a need to boost MoreLikeThis results. So far the only boost that I : have come across is boosting query field by using mlt.qf. But what I really : need is to use boost query and boost function like those in the : DisMaxRequestHandler. Is that possible at all either out-of-the-box or by : writing a custom handler, or are we limited by the MoreLikeThis of Lucene? Lucene's MLT just builds a query, augmenting that query with additional boost/function queries should be straight forward. As i mentioned in another thread: we should really refactor MoreLikeThisHandler into a MoreLIkeThisQParserPlugin ... that way you could mix and match it with other boosts using the "query" function type. -Hoss
Re: Boosting for most recent documents
: Does anyone know if Solr supports sorting by internal document ids, : i.e, like Sort.INDEXORDER in Lucene? If so, how? It does not. in Solr the decisison to make "score desc" the default search ment there is no way to request simple docId ordering. : Also, if anyone have any insight on if function query loads up unique : terms (like field sorts) in memory or not. It uses the exact same FieldCache as sorting. -Hoss
Re: Change in DocListAndSetNC not messing everything
: For testing, what I have done is do some hacks to SolrIndexSearcher's : getDocListAndSetNC funcion. I fill the ids array in my own order or I just : don't add some docs id's (and so change this id's array size). I have been : testing it and the performance is dramatically better that using the patch. : Can anyone tell me witch is the best way to hack DocListAndSetNC? I mean, I : know this change can make me go mad in the future, when I decide to update : trunk version or update to new releases. : My hack provably is too specific for my use case but could upload the source : in case someone can advice me what to do. : Thanks in advance, If you can post the changes you found useful for your specific case, it might help people spot possible ways to refactor SolrIndexSearcher code so similar types of extension like that would be simpler (without custom patching) -Hoss
Re: Getting Facet Count of combination of fields
An interesting analogy for this feature is that you're doing a count(*) on a group by in SQL. While it's true that you can pre-compute these if you have a small set of combination you know you want to show a-priori, if you want to present a more dynamic customer experience, you need to be able to run these on arbitrary combination of fields at query time. I would like to see support for aggregating functions beyond just a count. Here's a sample list of what's available in mysql: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html Looking at the patch in SOLR-792, it doesn't look terribly difficult to add function level abstraction and to go beyond 2 levels. I have some code that does this but it's implemented as an ant task and wired to a CSV parser. I'll see what it would take to refactor it... What would you imagine the query looking like if functions were added? http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=cat,inStock{count},price{min,average,max}&wt=json&indent=on It also gets very interesting when you want to start filtering, sorting and managing response sizes on the facet values such as this quickly contrived example: facet.tree=cat,inStock{return the first 10 counts greater than 5 descending sorted} Brian From: ashokcz To: solr-user@lucene.apache.org Sent: Thursday, July 16, 2009 3:22:48 AM Subject: Re: Getting Facet Count of combination of fields hmmm but in my case it will be dynamic . they may choose different fields at run time and accordingly i need to populate the values ... Avlesh Singh wrote: > > If you create a field called "brand_year_of_manufacturing" and populate it > with the "brandName - YOM" data while indexing, you can achieve the > desired > with a simple facet on this field. > > Cheers > Avlesh > > On Thu, Jul 16, 2009 at 1:19 PM, ashokcz > wrote: > >> >> Hi all, >> i have a scenario where i need to get facet count for combination of >> fields. >> Say i have two fields Manufacturer and Year of manufacture. >> I search for something and it gives me 15 results and my facet count as >> like >> this : >> Manufacturer : Nokia(5);Motorola(7);iphone(3) >> Year of manufacture : 2007 (4) ; 2008 (4) 2009 (7). >> But what i need is combination of count . >> Say >> Nokia - 2007 - 1 >> Nokia - 2008 - 1 >> Nokia - 2009 - 2 >> >> Somethig like this >> >> >> is there any way we can get this kind of facet counts from single solr >> search hits ??? >> >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/Getting-Facet-Count-of-combination-of-fields-tp24511923p24511923.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Getting-Facet-Count-of-combination-of-fields-tp24511923p24513837.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: posting binary file and metadata in two separate documents
: Subject: posting binary file and metadata in two separate documents there was some discussion a while back about that fact that you can push multiple "ContentStreams" to SOlr in a single request, and while the existing handelrs all just iterate over and process them seperately, it would be *possible* for a variant of ExtractingRequest handler to use the first stream to get document metadat, and have that metdata refrence the other streams in some way for large chunks of text) But no one has attempted to implement that as far as i know. : "http://localhost:8983/solr/update/extract?ext.literal.id=2&ext.literal.some_code1=code1&ext.literal.some_code2=code2&ext.idx.attr=true\&ext.def.fl=text"; : -F "myfi...@myfile.pdf" : : Where I have large numbers of ext.literal params this becomes a bit of a : chore.. and it would be the same case in an html form with many params... : can I pass both files to '/update/extract' as documents, (files) linked : together? Or are there any other options like this? Perhaps something I : can do with Solrj. there's no reason those params have ot be in the URL. you can do a multipart POST with application/x-www-form-urlencoded in one part and your pdf file in another part (just like doing a POST from a massive HTML form with an '' option) -Hoss
Re: Filtering MoreLikeThis results
: At least in trunk, if you request for: : http://localhost:8084/solr/core_A/mlt?q=id:7468365&fq=price[100 TO 200] : It will filter the MoreLikeThis results I think a big part of the confusion people have about this is the distinction between the MLT RequestHandler, and the MLT SearchComponent. In MLT Handler the "primary" result set is the list of similar documents, so params like fq influence that list of related documents. In MLT Component, QueryComponent is generating the 'primary' result (and using the fq) and the MLT is just finding similar documents (perhaps for some seconadary navigation in the UI) Anyone interested in taking a stab at updating hte docs to try and explain this?... http://wiki.apache.org/solr/MoreLikeThis http://wiki.apache.org/solr/MoreLikeThisHandler (I think in an ideal world, the MLT Handler would be refactored and replaced with a MLTQParserPlugin, because ultimatley that's the only thing special about it: it's "parsing" a bunch of input options to produce a Query) -Hoss
Re: Problems Issuing Parallel Queries with SolrJ
Actually, it's obvious that the second case wouldn't work after looking at SimpleHttpConnectionManager. So my question boils down to being able to use a single CommonsHttpSolrServer in a multithreaded fashion. danben wrote: > > I have a running Solr (1.3) server that I want to query with SolrJ, and > I'm running a benchmark that uses a pool of 10 threads to issue 1000 > random queries to the server. Each query executes 7 searches in parallel. > > My first attempt was to use a single instance of CommonsHttpSolrServer, > using the default MultiThreadedHttpConnectionManager, but (as mentioned in > SOLR-861), I quickly ran out of memory as every created thread blocked > indefinitely on MultiThreadedHttpConnectionManager. > > Then I tried creating a pool of CommonsHttpSolrServer in which each > SolrServer receives a newly-instantiated SimpleHttpConnectionManager, but > execution of my test resulted in the following: > > Caused by: java.lang.IllegalStateException: Unexpected release of an > unknown connection. > at > org.apache.commons.httpclient.SimpleHttpConnectionManager.releaseConnection(SimpleHttpConnectionManager.java:225) > at > org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) > at > org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) > at > org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1186) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:394) > > Looking into the httpclient code, I can see that this exception is only > thrown when the connection manager attempts to release an HttpConnection > that it is not currently referencing, but since I instantiate connection > managers on a per-thread basis I'm not sure what would cause that. > > I assume that SolrJ must be used by someone to execute parallel queries; > is there something obvious (or not) that I'm missing? > -- View this message in context: http://www.nabble.com/Problems-Issuing-Parallel-Queries-with-SolrJ-tp24522927p24522973.html Sent from the Solr - User mailing list archive at Nabble.com.
Problems Issuing Parallel Queries with SolrJ
I have a running Solr (1.3) server that I want to query with SolrJ, and I'm running a benchmark that uses a pool of 10 threads to issue 1000 random queries to the server. Each query executes 7 searches in parallel. My first attempt was to use a single instance of CommonsHttpSolrServer, using the default MultiThreadedHttpConnectionManager, but (as mentioned in SOLR-861), I quickly ran out of memory as every created thread blocked indefinitely on MultiThreadedHttpConnectionManager. Then I tried creating a pool of CommonsHttpSolrServer in which each SolrServer receives a newly-instantiated SimpleHttpConnectionManager, but execution of my test resulted in the following: Caused by: java.lang.IllegalStateException: Unexpected release of an unknown connection. at org.apache.commons.httpclient.SimpleHttpConnectionManager.releaseConnection(SimpleHttpConnectionManager.java:225) at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1186) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:394) Looking into the httpclient code, I can see that this exception is only thrown when the connection manager attempts to release an HttpConnection that it is not currently referencing, but since I instantiate connection managers on a per-thread basis I'm not sure what would cause that. I assume that SolrJ must be used by someone to execute parallel queries; is there something obvious (or not) that I'm missing? -- View this message in context: http://www.nabble.com/Problems-Issuing-Parallel-Queries-with-SolrJ-tp24522927p24522927.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlight arbitrary text
Interesting, many sites don't store text in Lucene/Solr and so need a way to highlight text stored in a database (or some equivalent), they have two options, re-analyze the doc for the term positions or access the term vectors from Solr and hand them to the client who then performs the highlighting? Maybe a new highlighter server component that can live on dedicated document servers is a good idea? Distributed search could include the highlighting server (including accessing the term vectors) in it's multi server search request. This could reduce the sometimes heavy load highlighting causes on slave servers? On Thu, Jul 16, 2009 at 7:56 AM, Erik Hatcher wrote: > > On Jul 16, 2009, at 7:41 AM, Shalin Shekhar Mangar wrote: > >> On Thu, Jul 16, 2009 at 4:52 PM, Anders Melchiorsen >> wrote: >> >>> >>> What we want to do is to have an extra text highlighted by the Solr >>> highlighter. That text should never be stored in the Solr index, but >>> rather >>> be provided in an HTTP request along with the search query. >>> >>> Is this possible? >>> >> >> I don't think it is possible currently but I see how it can be useful. Can >> you please open a jira issue so we don't forget about it? > > One trick worth noting is the FieldAnalysisRequestHandler can provide > offsets from external text, which could be used for client-side highlighting > (see the showmatch parameter too). > > Erik > >
Re: Any benefit from compressed object pointers? (java6u14)
I am going to do some (large scale) indexing tests using Lucene & will post to both this and the Lucene list. More info on compressed pointers: http://wikis.sun.com/display/HotSpotInternals/CompressedOops -Glen Newton http://zzzoot.blogspot.com/search?q=lucene 2009/7/16 Kevin Peterson : > I noticed that Ubuntu pushed java 6 update 14 as an update to 9.04 today. > This update includes compressed object pointers which are designed to reduce > memory requirements with 64bit JVMs. > > Has anyone experimented with this to see if it provides any benefit to Solr? > > If not, can anyone comment on whether they would expect it provide a > significant benefit? I don't know enough about how the internal caches are > structured to say anything. > > Our servers are running on 8.04 LTS, so upgrading would be a bit of a chore. > I'm hoping that someone can say "it's not worth it" before we try it > ourselves. > -- -
Any benefit from compressed object pointers? (java6u14)
I noticed that Ubuntu pushed java 6 update 14 as an update to 9.04 today. This update includes compressed object pointers which are designed to reduce memory requirements with 64bit JVMs. Has anyone experimented with this to see if it provides any benefit to Solr? If not, can anyone comment on whether they would expect it provide a significant benefit? I don't know enough about how the internal caches are structured to say anything. Our servers are running on 8.04 LTS, so upgrading would be a bit of a chore. I'm hoping that someone can say "it's not worth it" before we try it ourselves.
Re: Solr 1.4 Release Date
Agreed! We are pushing towards it - one of the holds up is that Lucene 2.9 is about to release, so we are waiting for that. We really need to prune down the JIRA list though. A few have been tackling it, but many of the issues are still up in the air. I think once Lucene 2.9 releases though, Solr 1.4 will shortly follow one way or another. Lucene 2.9 is right on the verge - only a handful of pretty much finished issues to resolve. On Thu, Jul 16, 2009 at 4:05 PM, Daniel Alheiros wrote: > Come on it's time to cut this release, folks! I'm just waiting for that > since it was forecasted for early summer. :) > > Cheers > > -Original Message- > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > Sent: 15 July 2009 02:18 > To: solr-user@lucene.apache.org > Subject: Re: Solr 1.4 Release Date > > > I just looked at SOLR JIRA today and saw some 40 open issues marked for > 1.4, so > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: pof > > To: solr-user@lucene.apache.org > > Sent: Tuesday, July 14, 2009 12:37:33 AM > > Subject: Re: Solr 1.4 Release Date > > > > > > Any updates on this? > > > > Cheers. > > > > Gurjot Singh wrote: > > > > > > Hi, I am curious to know when is the scheduled/tentative release > > > date of Solr 1.4. > > > > > > Thanks, > > > Gurjot > > > > > > > > > > -- > > View this message in context: > > http://www.nabble.com/Solr-1.4-Release-Date-tp23260381p24473570.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > http://www.bbc.co.uk/ > This e-mail (and any attachments) is confidential and may contain personal > views which are not the views of the BBC unless specifically stated. > If you have received it in error, please delete it from your system. > Do not use, copy or disclose the information in any way nor act in reliance > on it and notify the sender immediately. > Please note that the BBC monitors e-mails sent or received. > Further communication will signify your consent to this. > > -- -- - Mark http://www.lucidimagination.com
Re: Multivalued fields and scoring/sorting
Assuming that you know the unique ID when constructing the query (which it sounds like you do) why not try a boost query with a high boost for 2 and a lower boost for 1 - then the default sort by score should match your desired ordering, and this order can be further tweaked with other bf or bq arguments. -Peter On Thu, Jul 16, 2009 at 9:15 AM, Matt Schraeder wrote: > The first number is a unique ID that points to a particular customer, > the second is a value. It basically tells us whether or not a customer > already has that product or not. The main use of it is to be able to > search our product listing for products the customer does not already > have. > > The alternative would be to put that in a second index, but that would > mean that I would be doing two searches for every single search I want > to complete, which I am not sure would be a very good option. > avl...@gmail.com 7/16/2009 12:04:53 AM >>> > > The harsh reality of life is that you cannot sort on multivalued > fields. > If you can explain your domain problem (the significance of numbers > "818", > "2" etc), maybe people can come up with an alternate index design which > fits > into your use cases. > > Cheers > Avlesh > > On Thu, Jul 16, 2009 at 1:18 AM, Matt Schraeder > wrote: > >> I am trying to come up with a way to sort (or score, and sort based > on >> the score) of a multivalued field. I was looking at FunctionQueries > and >> saw fieldvalue, but as that only works on single valued fields that >> doesn't help me. >> >> The field is as follows: >> >> > sortMissingLast="true" omitNorms="true"> >> >> >> >> >> >> >> >> >> >> >> > multiValued="true" /> >> >> The actual data that gets put in this field is a string consisting of > a >> number, a space, and a 1 or a 2. For example: >> >> "818 2" >> "818 1" >> "950 1" >> "1022 2" >> >> I want to be able to give my search results given a boost if a >> particular document contains "818 2" and a smaller boost if the > document >> contains "818 1" but not "818 2". >> >> The end result would be documents sorted as follows: >> >> 1) Documents with "818 2" >> 2) Documents with "818 1" but not "818 2" >> 3) Documents that contain neither "818 2" nor "818 1" >> >> Is this possible with solr? How would I go about doing this? >> > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Word frequency count in the index
I haven't researched old versions of Lucene, but I think it has always been a vector space, tf.idf engine. I don't see any hint of probabilistic scoring. A bit of background about stop words and idf. They are two versions of the same thing. Stop words are a manual, on/off decision about what words are important. That decision is high risk and easy to get wrong. We have a movie titled "To be and to have". Oops. Inverse document frequency (idf) replaces that on/off control with a proportional weight calculated from the index. For Netflix, that means that "weeds: season 2" has a high weight for "weeds" and lower weights for "season" and "2". In my control theory course, my professor told me to only use proportional control when on/off didn't work. Well, stop words don't work and idf does. For a longer list of movie titles entirely made of stop words, go here: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html wunder On 7/16/09 8:50 AM, "Daniel Alheiros" wrote: > Hi Walter, > > Has it always been there? Which version of Lucene are we talking about? > > Regards, > Daniel > > -Original Message- > From: Walter Underwood [mailto:wunderw...@netflix.com] > Sent: 16 July 2009 15:04 > To: solr-user@lucene.apache.org > Subject: Re: Word frequency count in the index > > Lucene uses a tf.idf relevance formula, so it automatically finds common > words (stop words) in your documents and gives them lower weight. I > recommend not removing stop words at all and letting Lucene handle the > weighting. > > wunder > > On 7/16/09 3:29 AM, "Pooja Verlani" wrote: > >> Hi, >> >> Is there any way in SOLR to know the count of each word indexed in the > >> solr ? >> I want to find out the different word frequencies to figure out ' >> application specific stop words'. >> >> Please let me know if its possible. >> >> Thank you, >> Regards, >> Pooja > > > http://www.bbc.co.uk/ > This e-mail (and any attachments) is confidential and may contain personal > views which are not the views of the BBC unless specifically stated. > If you have received it in error, please delete it from your system. > Do not use, copy or disclose the information in any way nor act in reliance on > it and notify the sender immediately. > Please note that the BBC monitors e-mails sent or received. > Further communication will signify your consent to this. >
Re: DefaultSearchField ? "important"
On Thu, Jul 16, 2009 at 12:33 AM, Erik Hatcher wrote: > > On Jul 15, 2009, at 2:59 PM, Mani Kumar wrote: > >> @mark, @otis: >> > > Can I answer too? :) > your welcome :) ... thanks > > > yeah copying all the fields to one text field will work but what if i want >> to assign specific weightage to specific fields? >> >> e.g. i have a three fields >> >> 1) title >> 2) tags >> 3) description >> >> i copied all of them to a new field called "all_text". >> >> now i want to search in all the fields with weightage assigned to title^4, >> tags^2, description^1 >> >> how it'll work then? >> > > What you want, then, is the dismax query parser. > &defType=dismax&qf=title^5.0 tags^2.0 description^1.0 kinda thing. It > spreads query terms across multiple fields with field weights individually > controllable. - yes i am aware of dismax query parser ... but was just wondering it can be done using single text field... but my other question is still unanswered: how weightage is considered ? is it a multiplication to scores of terms in fields ... ex. terms in title * 4, terms in tags * 2 lets say i wanted to give twice importance to terms in title than tags. so then shall i use title^2 tags^1?? or something else? > >Erik > > thanks!mani
RE: Solr 1.4 Release Date
Come on it's time to cut this release, folks! I'm just waiting for that since it was forecasted for early summer. :) Cheers -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: 15 July 2009 02:18 To: solr-user@lucene.apache.org Subject: Re: Solr 1.4 Release Date I just looked at SOLR JIRA today and saw some 40 open issues marked for 1.4, so Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: pof > To: solr-user@lucene.apache.org > Sent: Tuesday, July 14, 2009 12:37:33 AM > Subject: Re: Solr 1.4 Release Date > > > Any updates on this? > > Cheers. > > Gurjot Singh wrote: > > > > Hi, I am curious to know when is the scheduled/tentative release > > date of Solr 1.4. > > > > Thanks, > > Gurjot > > > > > > -- > View this message in context: > http://www.nabble.com/Solr-1.4-Release-Date-tp23260381p24473570.html > Sent from the Solr - User mailing list archive at Nabble.com. http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
RE: How to filter old revisions
Hi Are you ever going to search for earlier revisions or only the latest? If in your use cases you need the latest, just replace earlier revisions with the latest on your index Regards, Daniel -Original Message- From: Reza Safari [mailto:r.saf...@lukkien.com] Sent: 15 July 2009 12:22 To: solr-user@lucene.apache.org Subject: Re: How to filter old revisions Revision is a field. Sorting is not an option because then I sort all document! I want to filter a subset of documents with the same root version number (field) and get only one document of the subset with highest revision number. In other words the root revision number of all documents retrieved by search must be unique. E.g. If these are all my documents 10 1 10 <.> 11 2 10 <.> 12 1 12 <.> Search must return only documents with id=11 and id=12 (document with id=10 must be ignored because 11 has the same root_revision_id as document with id= 10 and the revision number of document with id= 11 is > revision number of document with id= 10) Gr, Reza On Jul 15, 2009, at 11:52 AM, Shalin Shekhar Mangar wrote: > On Wed, Jul 15, 2009 at 3:19 PM, Reza Safari > wrote: > >> Hi, >> >> How is possible to search for max values e.g. >> >> doc1 has revision number 1 >> doc2 has revision number 2 >> doc3 has revision number 3 >> >> doc1, doc2 and doc3 have all same root revision id e.g. 1 >> >> I want search result with doc's with only highest revision number? >> > > What is "revision"? Is it a field in your Solr document? > > There's no way to find max but you can always sort descending by the > revision field and take the first. > > -- > Regards, > Shalin Shekhar Mangar. -- Reza Safari LUKKIEN Copernicuslaan 15 6716 BM Ede The Netherlands - http://www.lukkien.com t: +31 (0) 318 698000 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited. http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: Dedicated Slave Master
Hey Grant, It's a middleman, not a backup. We don't have any issues in the current setup, just trying to make sure we have a solution in case this becomes an issue. I'm concerned about a situation with dozens of searchers. The i/o and network load on the indexer might become significant at that point. Do you know of any deployments with a large number of searchers? Do you think my concerns are valid, even if we had dozens of searchers? Thanks, Wojtek Grant Ingersoll-6 wrote: > > Hi Wojtek, > > Is this a backup or is it a middleman? I can't say that I have seen > the middleman approach before, but that doesn't mean it won't work. > Are you actually having an issue with the current setup or just trying > to make sure you don't in the future? > > -Grant > -- View this message in context: http://www.nabble.com/Dedicated-Slave-Master-tp24502657p24519372.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Word frequency count in the index
Hi Walter, Has it always been there? Which version of Lucene are we talking about? Regards, Daniel -Original Message- From: Walter Underwood [mailto:wunderw...@netflix.com] Sent: 16 July 2009 15:04 To: solr-user@lucene.apache.org Subject: Re: Word frequency count in the index Lucene uses a tf.idf relevance formula, so it automatically finds common words (stop words) in your documents and gives them lower weight. I recommend not removing stop words at all and letting Lucene handle the weighting. wunder On 7/16/09 3:29 AM, "Pooja Verlani" wrote: > Hi, > > Is there any way in SOLR to know the count of each word indexed in the > solr ? > I want to find out the different word frequencies to figure out ' > application specific stop words'. > > Please let me know if its possible. > > Thank you, > Regards, > Pooja http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: Word frequency count in the index
Plus there is a single class that you can run from the command line in Lucene's contrib. I think it's called HighFreqTerms or something close to that. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Grant Ingersoll > To: solr-user@lucene.apache.org > Sent: Thursday, July 16, 2009 6:35:28 AM > Subject: Re: Word frequency count in the index > > In the trunk version, the TermsComponent should give you this: > http://wiki.apache.org/solr/TermsComponent. Also, you can use the > LukeRequestHandler to get the top words in each field. > > Alternatively, you may just want to point Luke at your index. > > On Jul 16, 2009, at 6:29 AM, Pooja Verlani wrote: > > > Hi, > > > > Is there any way in SOLR to know the count of each word indexed in the solr > > ? > > I want to find out the different word frequencies to figure out ' > > application specific stop words'. > > > > Please let me know if its possible. > > > > Thank you, > > Regards, > > Pooja > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search
Re: Highlight arbitrary text
On Jul 16, 2009, at 7:41 AM, Shalin Shekhar Mangar wrote: On Thu, Jul 16, 2009 at 4:52 PM, Anders Melchiorsen wrote: What we want to do is to have an extra text highlighted by the Solr highlighter. That text should never be stored in the Solr index, but rather be provided in an HTTP request along with the search query. Is this possible? I don't think it is possible currently but I see how it can be useful. Can you please open a jira issue so we don't forget about it? One trick worth noting is the FieldAnalysisRequestHandler can provide offsets from external text, which could be used for client-side highlighting (see the showmatch parameter too). Erik
Re: wildcards and German umlauts
Hi, I've got the same problem: searching using wildcards and umlaut -> no results. Just as you descriped it: "if i type complete word (such as "übersicht"). But there are no hits, if i use wildcards (such as "über*") Searching with wildcards and without umlauts works as well." Anyone found the solution to this problem or have any new ideas? -- View this message in context: http://www.nabble.com/wildcards-and-German-umlauts-tp14836043p24517583.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Word frequency count in the index
Lucene uses a tf.idf relevance formula, so it automatically finds common words (stop words) in your documents and gives them lower weight. I recommend not removing stop words at all and letting Lucene handle the weighting. wunder On 7/16/09 3:29 AM, "Pooja Verlani" wrote: > Hi, > > Is there any way in SOLR to know the count of each word indexed in the solr > ? > I want to find out the different word frequencies to figure out ' > application specific stop words'. > > Please let me know if its possible. > > Thank you, > Regards, > Pooja
Re: Multivalued fields and scoring/sorting
The first number is a unique ID that points to a particular customer, the second is a value. It basically tells us whether or not a customer already has that product or not. The main use of it is to be able to search our product listing for products the customer does not already have. The alternative would be to put that in a second index, but that would mean that I would be doing two searches for every single search I want to complete, which I am not sure would be a very good option. >>> avl...@gmail.com 7/16/2009 12:04:53 AM >>> The harsh reality of life is that you cannot sort on multivalued fields. If you can explain your domain problem (the significance of numbers "818", "2" etc), maybe people can come up with an alternate index design which fits into your use cases. Cheers Avlesh On Thu, Jul 16, 2009 at 1:18 AM, Matt Schraeder wrote: > I am trying to come up with a way to sort (or score, and sort based on > the score) of a multivalued field. I was looking at FunctionQueries and > saw fieldvalue, but as that only works on single valued fields that > doesn't help me. > > The field is as follows: > > sortMissingLast="true" omitNorms="true"> > > > > > > > > > > > multiValued="true" /> > > The actual data that gets put in this field is a string consisting of a > number, a space, and a 1 or a 2. For example: > > "818 2" > "818 1" > "950 1" > "1022 2" > > I want to be able to give my search results given a boost if a > particular document contains "818 2" and a smaller boost if the document > contains "818 1" but not "818 2". > > The end result would be documents sorted as follows: > > 1) Documents with "818 2" > 2) Documents with "818 1" but not "818 2" > 3) Documents that contain neither "818 2" nor "818 1" > > Is this possible with solr? How would I go about doing this? >
Re: Getting Facet Count of combination of fields
On Jul 16, 2009, at 4:35 AM, Koji Sekiguchi wrote: ashokcz wrote: Hi all, i have a scenario where i need to get facet count for combination of fields. Say i have two fields Manufacturer and Year of manufacture. I search for something and it gives me 15 results and my facet count as like this : Manufacturer : Nokia(5);Motorola(7);iphone(3) Year of manufacture : 2007 (4) ; 2008 (4) 2009 (7). But what i need is combination of count . Say Nokia - 2007 - 1 Nokia - 2008 - 1 Nokia - 2009 - 2 Somethig like this is there any way we can get this kind of facet counts from single solr search hits ??? Are you looking at this? https://issues.apache.org/jira/browse/SOLR-792 A note on SOLR-792 - it's something that can be accomplished from the client by making multiple requests to Solr. First facet on Manufacturer, iterate through each and request facets on Year with an fq (filter query) set for each Manufacturer. Erik
Re: Highlight arbitrary text
On Thu, Jul 16, 2009 at 4:52 PM, Anders Melchiorsen wrote: > > What we want to do is to have an extra text highlighted by the Solr > highlighter. That text should never be stored in the Solr index, but rather > be provided in an HTTP request along with the search query. > > Is this possible? > I don't think it is possible currently but I see how it can be useful. Can you please open a jira issue so we don't forget about it? -- Regards, Shalin Shekhar Mangar.
Re: Highlight arbitrary text
On Wed, 15 Jul 2009 11:54:22 +0200, Anders Melchiorsen wrote: > Is it possible to have Solr highlight an arbitrary text that is posted at > request time? Hi again. I wonder whether my question was too terse to be well understood. What we want to do is to have an extra text highlighted by the Solr highlighter. That text should never be stored in the Solr index, but rather be provided in an HTTP request along with the search query. Is this possible? Cheers, Anders.
Re: Dedicated Slave Master
Hi Wojtek, Is this a backup or is it a middleman? I can't say that I have seen the middleman approach before, but that doesn't mean it won't work. Are you actually having an issue with the current setup or just trying to make sure you don't in the future? -Grant On Jul 15, 2009, at 1:39 PM, wojtekpia wrote: I'm building a high load system that will require several search slaves (at least 2, but this may grow to 5-10+ in the near future). I plan to have a single indexer that replicates to the search slaves. I want indexing to be as fast as possible, so I've considered adding another machine between my indexer and the search slaves (slave master) to reduce the amount of work the indexer has to do (file IO, network traffic to/from slaves). I will have frequent updates to my index (every 15 minutes), but they will be small, and the index will be optimized nightly. The size of the optimized index is about 5GB. Has anyone considered or implemented a solution like this? Did you see significant performance improvement on the indexer by doing it? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Dedicated-Slave-Master-tp24502657p24502657.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Word frequency count in the index
In the trunk version, the TermsComponent should give you this: http://wiki.apache.org/solr/TermsComponent . Also, you can use the LukeRequestHandler to get the top words in each field. Alternatively, you may just want to point Luke at your index. On Jul 16, 2009, at 6:29 AM, Pooja Verlani wrote: Hi, Is there any way in SOLR to know the count of each word indexed in the solr ? I want to find out the different word frequencies to figure out ' application specific stop words'. Please let me know if its possible. Thank you, Regards, Pooja -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Word frequency count in the index
Hi, Is there any way in SOLR to know the count of each word indexed in the solr ? I want to find out the different word frequencies to figure out ' application specific stop words'. Please let me know if its possible. Thank you, Regards, Pooja
Re: Getting Facet Count of combination of fields
hmmm but in my case it will be dynamic . they may choose different fields at run time and accordingly i need to populate the values ... Avlesh Singh wrote: > > If you create a field called "brand_year_of_manufacturing" and populate it > with the "brandName - YOM" data while indexing, you can achieve the > desired > with a simple facet on this field. > > Cheers > Avlesh > > On Thu, Jul 16, 2009 at 1:19 PM, ashokcz > wrote: > >> >> Hi all, >> i have a scenario where i need to get facet count for combination of >> fields. >> Say i have two fields Manufacturer and Year of manufacture. >> I search for something and it gives me 15 results and my facet count as >> like >> this : >> Manufacturer : Nokia(5);Motorola(7);iphone(3) >> Year of manufacture : 2007 (4) ; 2008 (4) 2009 (7). >> But what i need is combination of count . >> Say >> Nokia - 2007 - 1 >> Nokia - 2008 - 1 >> Nokia - 2009 - 2 >> >> Somethig like this >> >> >> is there any way we can get this kind of facet counts from single solr >> search hits ??? >> >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/Getting-Facet-Count-of-combination-of-fields-tp24511923p24511923.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Getting-Facet-Count-of-combination-of-fields-tp24511923p24513837.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting Facet Count of combination of fields
hmmm thanks Koji :handshake: will trymy hands on 1.4 version and see my luck =^D Koji Sekiguchi-2 wrote: > > ashokcz wrote: >> Hi thanks "Koji Sekiguchi-2" for your reply . >> Ya i was looking something like that . >> So when doing the solrrequest is should have extra config parameters >> facet.tree and i shud give the fields csv to specify the hierarchy >> will >> try and see if its giving me desired results . >> But just one doubt i am using Solr-1.2.0 Version . is this feature >> present >> in Solr-1.2.0 Version ?? >> thanks . >> >> > Unfortunately, no. I think it will be 1.4 or later. > > Koji > > > > -- View this message in context: http://www.nabble.com/Getting-Facet-Count-of-combination-of-fields-tp24511923p24513763.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting Facet Count of combination of fields
If you create a field called "brand_year_of_manufacturing" and populate it with the "brandName - YOM" data while indexing, you can achieve the desired with a simple facet on this field. Cheers Avlesh On Thu, Jul 16, 2009 at 1:19 PM, ashokcz wrote: > > Hi all, > i have a scenario where i need to get facet count for combination of > fields. > Say i have two fields Manufacturer and Year of manufacture. > I search for something and it gives me 15 results and my facet count as > like > this : > Manufacturer : Nokia(5);Motorola(7);iphone(3) > Year of manufacture : 2007 (4) ; 2008 (4) 2009 (7). > But what i need is combination of count . > Say > Nokia - 2007 - 1 > Nokia - 2008 - 1 > Nokia - 2009 - 2 > > Somethig like this > > > is there any way we can get this kind of facet counts from single solr > search hits ??? > > > > > -- > View this message in context: > http://www.nabble.com/Getting-Facet-Count-of-combination-of-fields-tp24511923p24511923.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Getting Facet Count of combination of fields
ashokcz wrote: Hi thanks "Koji Sekiguchi-2" for your reply . Ya i was looking something like that . So when doing the solrrequest is should have extra config parameters facet.tree and i shud give the fields csv to specify the hierarchy will try and see if its giving me desired results . But just one doubt i am using Solr-1.2.0 Version . is this feature present in Solr-1.2.0 Version ?? thanks . Unfortunately, no. I think it will be 1.4 or later. Koji
Re: Getting Facet Count of combination of fields
Hi thanks "Koji Sekiguchi-2" for your reply . Ya i was looking something like that . So when doing the solrrequest is should have extra config parameters facet.tree and i shud give the fields csv to specify the hierarchy will try and see if its giving me desired results . But just one doubt i am using Solr-1.2.0 Version . is this feature present in Solr-1.2.0 Version ?? thanks . Koji Sekiguchi-2 wrote: > > ashokcz wrote: >> Hi all, >> i have a scenario where i need to get facet count for combination of >> fields. >> Say i have two fields Manufacturer and Year of manufacture. >> I search for something and it gives me 15 results and my facet count as >> like >> this : >> Manufacturer : Nokia(5);Motorola(7);iphone(3) >> Year of manufacture : 2007 (4) ; 2008 (4) 2009 (7). >> But what i need is combination of count . >> Say >> Nokia - 2007 - 1 >> Nokia - 2008 - 1 >> Nokia - 2009 - 2 >> >> Somethig like this >> >> >> is there any way we can get this kind of facet counts from single solr >> search hits ??? >> >> > > Are you looking at this? > https://issues.apache.org/jira/browse/SOLR-792 > > Koji > > > -- View this message in context: http://www.nabble.com/Getting-Facet-Count-of-combination-of-fields-tp24511923p24512961.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting Facet Count of combination of fields
ashokcz wrote: Hi all, i have a scenario where i need to get facet count for combination of fields. Say i have two fields Manufacturer and Year of manufacture. I search for something and it gives me 15 results and my facet count as like this : Manufacturer : Nokia(5);Motorola(7);iphone(3) Year of manufacture : 2007 (4) ; 2008 (4) 2009 (7). But what i need is combination of count . Say Nokia - 2007 - 1 Nokia - 2008 - 1 Nokia - 2009 - 2 Somethig like this is there any way we can get this kind of facet counts from single solr search hits ??? Are you looking at this? https://issues.apache.org/jira/browse/SOLR-792 Koji
Getting Facet Count of combination of fields
Hi all, i have a scenario where i need to get facet count for combination of fields. Say i have two fields Manufacturer and Year of manufacture. I search for something and it gives me 15 results and my facet count as like this : Manufacturer : Nokia(5);Motorola(7);iphone(3) Year of manufacture : 2007 (4) ; 2008 (4) 2009 (7). But what i need is combination of count . Say Nokia - 2007 - 1 Nokia - 2008 - 1 Nokia - 2009 - 2 Somethig like this is there any way we can get this kind of facet counts from single solr search hits ??? -- View this message in context: http://www.nabble.com/Getting-Facet-Count-of-combination-of-fields-tp24511923p24511923.html Sent from the Solr - User mailing list archive at Nabble.com.