Re: Field missing when use distributed search + dismax
Hi. Lance. Thanks for replying. Yes. I especially checked the schema.xml and did another simple test. The broker is running on localhost:7499/solr. A solr instance is running on localhost:7498/solr. For this test, I only use these 2 instances. 7499's index is empty. 7498 has 12 documents in index. I copied the schema.xml from 7498 to 7499 before test. 1. http://localhost:7498/solr/select I get: . result name="response" numFound="12" start="0"> - gppost_6179 gppost . 2. http://localhost:7499/solr/select I get: 3. http://localhost:7499/solr/select?shards=localhost:7498/solr I get: - gppost_6179 - gppost_6282 So strange! I then checked with standard searchhandler. 1. http://localhost:7499/solr/select?shards=localhost:7498/solr&q=marship - member_marship11 member 2010-01-21T00:00:00Z And 2. http://localhost:7499/solr/select?shards=localhost:7498/solr&q=marship&qt=dismax result name="response" numFound="1" start="0"> - member_marship11 So strange! On Wed, Jun 23, 2010 at 11:12 AM, Lance Norskog wrote: > Do all of the Solr instances, including the broker, use the same > schema.xml? > > On 6/22/10, Scott Zhang wrote: > > Hi. All. > >I was using distributed search over 30 solr instance, the previous one > > was using the standard query handler. And the result was returned > correctly. > > each result has 2 fields. "ID" and "type". > >Today I want to use search withk dismax, I tried search with each > > instance with dismax. It works correctly, return "ID" and "type" for each > > result. The strange thing is when I > > use distributed search, the result only have "ID". The field "type" > > disappeared. I need that "type" to know what the "ID" refer to. Why solr > > "eat" my "type"? > > > > > > Thanks. > > Regards. > > Scott > > > > > -- > Lance Norskog > goks...@gmail.com >
Re: Nested table support ability
Hi Otis, Thanks for the update. My paramteric search has to span across customer table and 30 child tables. We have close to 1 million customers. Do you think Lucene/Solr is the right fsolution for such requirements? or database search would be more optimal. Regards, Amit -- View this message in context: http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html Sent from the Solr - User mailing list archive at Nabble.com.
about function query
I want to integrate document's timestamp into scoring of search. And I find an example in the book "Solr 1.4 Enterprise Search Server" about function query. I want to boost a document which is newer. so it may be a function such as 1/(timestamp+1) . But the function query is added to the final result, not multiplied. So I can't adjust the parameter well. e.g search term is term1, topdocs are doc1 with score 2.0; doc2 with score 1.5. search term is term2, topdocs are doc1 with score 20; doc2 with score 15. it is hard to adjust the relative score of these 2 docs with add a value. i if it is multiply, it's easy. if doc1 is very old, we assign a score 1,and doc2 is new, we assign a score 2 thus total score is 2.0*1 1.5*2 . So doc2 rank higher than doc1 but when use add, 2.0 + weight*1, 1.5 +weight*2, it's hard to get a proper weight. if we let weight is 1, it works well for term1 but with term2, it 20 +1*1.5 15+1*2 time has little influence on the final result.
Re: Field missing when use distributed search + dismax
Do all of the Solr instances, including the broker, use the same schema.xml? On 6/22/10, Scott Zhang wrote: > Hi. All. >I was using distributed search over 30 solr instance, the previous one > was using the standard query handler. And the result was returned correctly. > each result has 2 fields. "ID" and "type". >Today I want to use search withk dismax, I tried search with each > instance with dismax. It works correctly, return "ID" and "type" for each > result. The strange thing is when I > use distributed search, the result only have "ID". The field "type" > disappeared. I need that "type" to know what the "ID" refer to. Why solr > "eat" my "type"? > > > Thanks. > Regards. > Scott > -- Lance Norskog goks...@gmail.com
Re: Help with highlighting
You need to share with us the Solr request you made, any any custom request handler settings that might map to. Chances are you just need to twiddle with the highlighter parameters (see wiki for docs) to get it to do what you want. Erik On Jun 22, 2010, at 4:42 PM, n...@frameweld.com wrote: Hi, I need help with highlighting fields that would match a query. So far, my results only highlight if the field is from all_text, and I would like it to use other fields. It simply isn't the case if I just turn highlighting on. Any ideas why it only applies to all_text? Here is my schema: sortMissingLast="true" omitNorms="true" /> sortMissingLast="true" omitNorms="true" /> sortMissingLast="true" omitNorms="true"/> sortMissingLast="true" omitNorms="true"/> sortMissingLast="true" omitNorms="true"/> sortMissingLast="true" omitNorms="true"/> sortMissingLast="true" omitNorms="true"/> indexed="true" /> positionIncrementGap="100"> positionIncrementGap="100"> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> protected="protwords.txt"/> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> protected="protwords.txt"/> positionIncrementGap="100" > synonyms="synonyms.txt" ignoreCase="true" expand="false"/> words="stopwords.txt"/> generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/> protected="protwords.txt"/> positionIncrementGap="100" > outputUnigrams="false" /> sortMissingLast="true" omitNorms="true"> class="solr.StrField" /> stored="true" /> multiValued="true" /> stored="true" /> stored="true" /> allowDups="true" multiValued="true" /> stored="true" allowDups="true" /> unique_key all_text
Re: collapse exception
Martijn - Maybe the patches to SolrIndexSearcher could be extracted into a new issue so that we can put in the infrastructure at least. That way this could truly be a drop-in plugin without it actually being in core. I haven't looked at the specifics, but I imagine we could get the core stuff adjusted to suit this plugin. Erik On Jun 22, 2010, at 5:24 PM, Martijn v Groningen wrote: I checked your stacktrace and I can't remember putting SolrIndexSearcher.getDocListAndSet(...) in the doQuery(...) method. I guess the patch was modified before it was applied. I think the error occurs when you do a field collapse search with a fq parameter. That is the only reason I can think of why this exception is thrown. When this component become a contrib? Using patch is so annoying Patching is a bit of a hassle. This patch has some changes in the SolrIndexSearcher which makes it difficult to make it a contrib or an extension. On 22 June 2010 04:52, Li Li wrote: I don't know because it's patched by someone else but I can't get his help. When this component become a contrib? Using patch is so annoying 2010/6/22 Martijn v Groningen : What version of Solr and which patch are you using? On 21 June 2010 11:46, Li Li wrote: it says "Either filter or filterList may be set in the QueryCommand, but not both." I am newbie of solr and have no idea of the exception. What's wrong with it? thank you. java.lang.IllegalArgumentException: Either filter or filterList may be set in the QueryCommand, but not both. at org.apache.solr.search.SolrIndexSearcher $QueryCommand.setFilter(SolrIndexSearcher.java:1711) at org .apache .solr .search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java: 1286) at org .apache .solr .search .fieldcollapse .NonAdjacentDocumentCollapser .doQuery(NonAdjacentDocumentCollapser.java:205) at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser .executeCollapse(AbstractDocumentCollapser.java:246) at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser .collapse(AbstractDocumentCollapser.java:173) at org .apache .solr .handler .component.CollapseComponent.doProcess(CollapseComponent.java:174) at org .apache .solr .handler .component.CollapseComponent.process(CollapseComponent.java:127) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:203) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 338) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org .apache .catalina .core .ApplicationFilterChain .internalDoFilter(ApplicationFilterChain.java:235) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 206) at org .apache .catalina .core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org .apache .catalina .core.StandardContextValve.invoke(StandardContextValve.java:191) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina .core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org .apache .catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org .apache .coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint $Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: SOLR partial string matching question
you want a combination of WhitespaceTokenizer and EdgeNGramFilter http://lucene.apache.org/solr/api/org/apache/solr/analysis/WhitespaceTokenizerFactory.html http://lucene.apache.org/solr/api/org/apache/solr/analysis/EdgeNGramFilterFactory.html the first will create tokens for each word the second will create multiple tokens from each word prefix use the analysis link from the admin page to test your filter chain and make sure its doing what you want. On Tue, Jun 22, 2010 at 4:06 PM, Vladimir Sutskever wrote: > Hi, > > Can you guys make a recommendation for which types/filters to use accomplish > the following partial keyword match: > > > A. Actual Indexed Term: "bank of america" > > B. User Enters Search Term: "of ameri" > > > I would like SOLR to match document "bank of america" with the partial string > "of ameri" > > Any suggestions? > > > > Kind regards, > > Vladimir Sutskever > Investment Bank - Technology > JPMorgan Chase, Inc. > > > > This email is confidential and subject to important disclaimers and > conditions including on offers for the purchase or sale of > securities, accuracy and completeness of information, viruses, > confidentiality, legal privilege, and legal entity disclaimers, > available at http://www.jpmorgan.com/pages/disclosures/email.
SOLR partial string matching question
Hi, Can you guys make a recommendation for which types/filters to use accomplish the following partial keyword match: A. Actual Indexed Term: "bank of america" B. User Enters Search Term: "of ameri" I would like SOLR to match document "bank of america" with the partial string "of ameri" Any suggestions? Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc. This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
Re: collapse exception
I checked your stacktrace and I can't remember putting SolrIndexSearcher.getDocListAndSet(...) in the doQuery(...) method. I guess the patch was modified before it was applied. I think the error occurs when you do a field collapse search with a fq parameter. That is the only reason I can think of why this exception is thrown. When this component become a contrib? Using patch is so annoying Patching is a bit of a hassle. This patch has some changes in the SolrIndexSearcher which makes it difficult to make it a contrib or an extension. On 22 June 2010 04:52, Li Li wrote: > I don't know because it's patched by someone else but I can't get his > help. When this component become a contrib? Using patch is so annoying > > 2010/6/22 Martijn v Groningen : >> What version of Solr and which patch are you using? >> >> On 21 June 2010 11:46, Li Li wrote: >>> it says "Either filter or filterList may be set in the QueryCommand, >>> but not both." I am newbie of solr and have no idea of the exception. >>> What's wrong with it? thank you. >>> >>> java.lang.IllegalArgumentException: Either filter or filterList may be >>> set in the QueryCommand, but not both. >>> at >>> org.apache.solr.search.SolrIndexSearcher$QueryCommand.setFilter(SolrIndexSearcher.java:1711) >>> at >>> org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1286) >>> at >>> org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser.doQuery(NonAdjacentDocumentCollapser.java:205) >>> at >>> org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:246) >>> at >>> org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:173) >>> at >>> org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:174) >>> at >>> org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) >>> at >>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203) >>> at >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> at >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >>> at >>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >>> at >>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >>> at >>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >>> at >>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >>> at >>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) >>> at >>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) >>> at >>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >>> at >>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) >>> at java.lang.Thread.run(Thread.java:619) >>> >> >> >> >> -- >> Met vriendelijke groet, >> >> Martijn van Groningen >> > -- Met vriendelijke groet, Martijn van Groningen
Re: Field Collapsing SOLR-236
What exactly did not work? Patching, compiling or running it? On 22 June 2010 16:06, Rakhi Khatwani wrote: > Hi, > I tried checking out the latest code (rev 956715) the patch did not > work on it. > Infact i even tried hunting for the revision mentioned earlier in this > thread (i.e. rev 955615) but cannot find it in the repository. (it has > revision 955569 followed by revision 955785). > > Any pointers?? > Regards > Raakhi > > On Tue, Jun 22, 2010 at 2:03 AM, Martijn v Groningen < > martijn.is.h...@gmail.com> wrote: > >> Oh in that case is the code stable enough to use it for production? >> - Well this feature is a patch and I think that says it all. >> Although bugs are fixed it is deferentially an experimental feature >> and people should keep that in mind when using one of the patches. >> Does it support features which solr 1.4 normally supports? >> - As far as I know yes. >> >> am using facets as a workaround but then i am not able to sort on any >> other field. is there any workaround to support this feature?? >> - Maybee http://wiki.apache.org/solr/Deduplication prevents from >> adding duplicates in you index, but then you miss the collapse counts >> and other computed values >> >> On 21 June 2010 09:04, Rakhi Khatwani wrote: >> > Hi, >> > Oh in that case is the code stable enough to use it for production? >> > Does it support features which solr 1.4 normally supports? >> > >> > I am using facets as a workaround but then i am not able to sort on any >> > other field. is there any workaround to support this feature?? >> > >> > Regards, >> > Raakhi >> > >> > On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen < >> > martijn.is.h...@gmail.com> wrote: >> > >> >> Hi Rakhi, >> >> >> >> The patch is not compatible with 1.4. If you want to work with the >> >> trunk. I'll need to get the src from >> >> https://svn.apache.org/repos/asf/lucene/dev/trunk/ >> >> >> >> Martijn >> >> >> >> On 18 June 2010 13:46, Rakhi Khatwani wrote: >> >> > Hi Moazzam, >> >> > >> >> > Where did u get the src code from?? >> >> > >> >> > I am downloading it from >> >> > https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 >> >> > >> >> > and the latest revision in this location is 955469. >> >> > >> >> > so applying the latest patch(dated 17th june 2010) on it still >> generates >> >> > errors. >> >> > >> >> > Any Pointers? >> >> > >> >> > Regards, >> >> > Raakhi >> >> > >> >> > >> >> > On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan >> >> wrote: >> >> > >> >> >> I knew it wasn't me! :) >> >> >> >> >> >> I found the patch just before I read this and applied it to the trunk >> >> >> and it works! >> >> >> >> >> >> Thanks Mark and martijn for all your help! >> >> >> >> >> >> - Moazzam >> >> >> >> >> >> On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen >> >> >> wrote: >> >> >> > I've added a new patch to the issue, so building the trunk (rev >> >> >> > 955615) with the latest patch should not be a problem. Due to >> recent >> >> >> > changes in the Lucene trunk the patch was not compatible. >> >> >> > >> >> >> > On 17 June 2010 20:20, Erik Hatcher >> wrote: >> >> >> >> >> >> >> >> On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote: >> >> >> >>> >> >> >> >>> p.s. I'd be glad to contribute our Maven build re-organization >> back >> >> to >> >> >> the >> >> >> >>> community to get Solr properly Mavenized so that it can be >> >> distributed >> >> >> and >> >> >> >>> released more often. For us the benefit of this structure is >> that >> >> we >> >> >> will >> >> >> >>> be able to overlay addons such as RequestHandlers and other third >> >> party >> >> >> >>> support without having to rebuild Solr from scratch. >> >> >> >> >> >> >> >> But you don't have to rebuild Solr from scratch to add a new >> request >> >> >> handler >> >> >> >> or other plugins - simply compile your custom stuff into a JAR and >> >> put >> >> >> it in >> >> >> >> /lib (or point to it with in solrconfig.xml). >> >> >> >> >> >> >> >>> Ideally, a Maven Archetype could be created that would allow one >> >> >> rapidly >> >> >> >>> produce a Solr webapp and fire it up in Jetty in mere seconds. >> >> >> >> >> >> >> >> How's that any different than cd example; java -jar start.jar? Or >> do >> >> >> you >> >> >> >> mean a Solr client webapp? >> >> >> >> >> >> >> >>> Finally, with projects such as Bobo, integration with Spring >> would >> >> make >> >> >> >>> configuration more consistent and request significantly less java >> >> >> coding >> >> >> >>> just to add new capabilities everytime someone authors a new >> >> >> RequestHandler. >> >> >> >> >> >> >> >> It's one line of config to add a new request handler. How many >> >> >> ridiculously >> >> >> >> ugly confusing lines of Spring XML would it take? >> >> >> >> >> >> >> >>> The biggest thing I learned about Solr in my work thusfar is >> that >> >> >> patches >> >> >> >>> like these could be standalone modules in separate projects if it >> >> >> weren't >> >> >>
Help with highlighting
Hi, I need help with highlighting fields that would match a query. So far, my results only highlight if the field is from all_text, and I would like it to use other fields. It simply isn't the case if I just turn highlighting on. Any ideas why it only applies to all_text? Here is my schema: unique_key all_text
Re: OOM on sorting on dynamic fields
Fields over i'm sorting to are dynamic so one query sorts on erick_time_1,erick_timeA_1 and other sorts on erick_time_2 and so on.What we see in the heap are a lot of arrays,most of them,filled with 0s maybe due to the fact that this timestamps fields are not present in all the documents. By the way, I have a script that generates the OOM in 10 minutes on our solr instance and with the temporary patch it runned without any problems. The side effect is that when the cache is purged next query that regenerates the cache is a little bit slower. I'm aware that the solution is unelegant and we are investigating to solve the problem in another way. Regards, Matteo On 22 June 2010 19:25, Erick Erickson wrote: > Hmmm, I'm missing something here then. Sorting over 15 fields of type long > shouldn't use much memory, even if all the values are unique. When you say > "12-15 dynamic fields", are you talking about 12-15 fields per query out of > XXX total fields? And is XXX large? At a guess, how many different fields > do > you think you're sorting over cumulative by the time you get your OOM? > Note if you sort over the field "erick_time" in 10 different queries, I'm > only counting that as 1 field. I guess another way of asking this is > "how many dynamic fields are there total?". > > If this is really a sorting issue, you should be able to force this to > happen > almost immediately by firing off enough sort queries at the server. It'll > tell you a lot if you can't make this happen, even on a relatively small > test machine. > > Best > Erick > > On Tue, Jun 22, 2010 at 12:59 PM, Matteo Fiandesio < > matteo.fiande...@gmail.com> wrote: > >> Hi Erick, >> the index is quite small (1691145 docs) but sorting is massive and >> often on unique timestamp fields. >> >> OOM occur after a range of time between three and four hours. >> Depending as well if users browse a part of the application. >> >> We use solrj to make the queries so we did not use Readers objects >> directly. >> >> Without sorting we don't see the problem >> Regards, >> Matteo >> >> On 22 June 2010 17:01, Erick Erickson wrote: >> > H.. A couple of details I'm wondering about. How many >> > documents are we talking about in your index? Do you get >> > OOMs when you start fresh or does it take a while? >> > >> > You've done some good investigations, so it seems like there >> > could well be something else going on here than just "the usual >> > suspects" of sorting >> > >> > I'm wondering if you aren't really closing readers somehow. >> > Are you updating your index frequently and re-opening readers often? >> > If so, how? >> > >> > I'm assuming that if you do NOT sort on all these fields, you don't have >> > the problem, is that true? >> > >> > Best >> > Erick >> > >> > On Fri, Jun 18, 2010 at 10:52 AM, Matteo Fiandesio < >> > matteo.fiande...@gmail.com> wrote: >> > >> >> Hello, >> >> we are experiencing OOM exceptions in our single core solr instance >> >> (on a (huge) amazon EC2 machine). >> >> We investigated a lot in the mailing list and through jmap/jhat dump >> >> analyzing and the problem resides in the lucene FieldCache that fills >> >> the heap and blows up the server. >> >> >> >> Our index is quite small but we have a lot of sort queries on fields >> >> that are dynamic,of type long representing timestamps and are not >> >> present in all the documents. >> >> Those queries apply sorting on 12-15 of those fields. >> >> >> >> We are using solr 1.4 in production and the dump shows a lot of >> >> Integer/Character and Byte Array filled up with 0s. >> >> With solr's trunk code things does not change. >> >> >> >> In the mailing list we saw a lot of messages related to this issues: >> >> we tried truncating the dates to day precision,using missingSortLast = >> >> true,changing the field type from slong to long,setting autowarming to >> >> different values,disabling and enabling caches with different values >> >> but we did not manage to solve the problem. >> >> >> >> We were thinking to implement an LRUFieldCache field type to manage >> >> the FieldCache as an LRU and preventing but, before starting a new >> >> development, we want to be sure that we are not doing anything wrong >> >> in the solr configuration or in the index generation. >> >> >> >> Any help would be appreciated. >> >> Regards, >> >> Matteo >> >> >> > >> >
Re: anyone use hadoop+solr?
Muneeb Ali wrote: > > Hi Blargy, > > Nice to hear that I am not alone ;) > > Well we have been using Hadoop for other data-intensive services, those > that can be done in parallel. We have multiple nodes, which are used by > Hadoop for all our MapReduce jobs. I personally don't have much experience > with its use and hence wouldn't be able to help you much with that. > > Our indexing takes 6+ hours to index 15 million documents (using > solrj.streamUpdateSolrServer). I wanted to explore hadoop for this task, > as it can be done in parallel. > > I have just started investigating into this, will keep this post updated > if found anything helpful. > > -Neeb > Would you mind explaining how your full indexing strategy is implemented using the StreamingUpdateSolrServer? I am currently only familar with using the DataImportHandler. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p915227.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: example for searching hibernate entities
as always: it depends. take a look into hibernate search also, which is lucene powered. Peter. > I have complex data model with bi directional relations I Use hibernate > as ORM provider.so I have several model objects representing data model. All > together my model objetcs are 75 to 100 and my database each table has > several records like 20,000. > please suggest in my case will text search help me? > are there any example searching on hibernate entities? > > > > >
Re: anyone use hadoop+solr?
We (Attensity Group) have been using SOLR-1301 for 6+ months now because we have a ready Hadoop cluster and need to be able to re/index up to 3 billion docs. I read the various emails and wasn't sure what you're asking. Cheers... On Tue, Jun 22, 2010 at 8:27 AM, Neeb wrote: > > Hey James, > > Just wondering if you ever had a chance to try out hadoop with solr? Would > appreciate any information/directions you could give. > > I am particularly interested in indexing using a mapreduce job. > > Cheers, > -Ali > -- > View this message in context: > http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914450.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Performance related question on DISMAX handler..
Hi, I just want to know if there will be any overhead / performance degradation if I use the Dismax search handler instead of standard search handler? We are planning to index millions of documents and not sure if using Dismax will slow down the search performance. Would be great if someone can share their thoughts. Thanks, BB -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-related-question-on-DISMAX-handler-tp914892p914892.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Change the Solr searcher
Sounds like what you want is to override Solr's "query" component. Have a look at the built-in one and go from there. Erik On Jun 22, 2010, at 1:38 PM, sarfaraz masood wrote: I am a novice in solr / lucene. but i have gone thru the documentations of both.I have even implemented programs in lucene for searching etc. My problem is to apply a new search technique other than the one used by solr. Now as i know that lucene has its own searcher which is used by solr as well. *Ques.. Cant i replace this searcher part in SOLR by a java program that returns documents as per my algorithm ? i.e I only want to change the searcher part of solr. I have studied abt customizing the scoring which is absolutely not my aim.My aim is replace the searcher. Plz help me in this regards. I will be highly gratefull to you for your assistance in this work of mine. If any part of this mail was not clear to you then plz lemme know, i will expain that you. Regards -sarfaraz
Change the Solr searcher
I am a novice in solr / lucene. but i have gone thru the documentations of both.I have even implemented programs in lucene for searching etc. My problem is to apply a new search technique other than the one used by solr. Now as i know that lucene has its own searcher which is used by solr as well. *Ques.. Cant i replace this searcher part in SOLR by a java program that returns documents as per my algorithm ? i.e I only want to change the searcher part of solr. I have studied abt customizing the scoring which is absolutely not my aim.My aim is replace the searcher. Plz help me in this regards. I will be highly gratefull to you for your assistance in this work of mine. If any part of this mail was not clear to you then plz lemme know, i will expain that you. Regards -sarfaraz
Re: OOM on sorting on dynamic fields
Hmmm, I'm missing something here then. Sorting over 15 fields of type long shouldn't use much memory, even if all the values are unique. When you say "12-15 dynamic fields", are you talking about 12-15 fields per query out of XXX total fields? And is XXX large? At a guess, how many different fields do you think you're sorting over cumulative by the time you get your OOM? Note if you sort over the field "erick_time" in 10 different queries, I'm only counting that as 1 field. I guess another way of asking this is "how many dynamic fields are there total?". If this is really a sorting issue, you should be able to force this to happen almost immediately by firing off enough sort queries at the server. It'll tell you a lot if you can't make this happen, even on a relatively small test machine. Best Erick On Tue, Jun 22, 2010 at 12:59 PM, Matteo Fiandesio < matteo.fiande...@gmail.com> wrote: > Hi Erick, > the index is quite small (1691145 docs) but sorting is massive and > often on unique timestamp fields. > > OOM occur after a range of time between three and four hours. > Depending as well if users browse a part of the application. > > We use solrj to make the queries so we did not use Readers objects > directly. > > Without sorting we don't see the problem > Regards, > Matteo > > On 22 June 2010 17:01, Erick Erickson wrote: > > H.. A couple of details I'm wondering about. How many > > documents are we talking about in your index? Do you get > > OOMs when you start fresh or does it take a while? > > > > You've done some good investigations, so it seems like there > > could well be something else going on here than just "the usual > > suspects" of sorting > > > > I'm wondering if you aren't really closing readers somehow. > > Are you updating your index frequently and re-opening readers often? > > If so, how? > > > > I'm assuming that if you do NOT sort on all these fields, you don't have > > the problem, is that true? > > > > Best > > Erick > > > > On Fri, Jun 18, 2010 at 10:52 AM, Matteo Fiandesio < > > matteo.fiande...@gmail.com> wrote: > > > >> Hello, > >> we are experiencing OOM exceptions in our single core solr instance > >> (on a (huge) amazon EC2 machine). > >> We investigated a lot in the mailing list and through jmap/jhat dump > >> analyzing and the problem resides in the lucene FieldCache that fills > >> the heap and blows up the server. > >> > >> Our index is quite small but we have a lot of sort queries on fields > >> that are dynamic,of type long representing timestamps and are not > >> present in all the documents. > >> Those queries apply sorting on 12-15 of those fields. > >> > >> We are using solr 1.4 in production and the dump shows a lot of > >> Integer/Character and Byte Array filled up with 0s. > >> With solr's trunk code things does not change. > >> > >> In the mailing list we saw a lot of messages related to this issues: > >> we tried truncating the dates to day precision,using missingSortLast = > >> true,changing the field type from slong to long,setting autowarming to > >> different values,disabling and enabling caches with different values > >> but we did not manage to solve the problem. > >> > >> We were thinking to implement an LRUFieldCache field type to manage > >> the FieldCache as an LRU and preventing but, before starting a new > >> development, we want to be sure that we are not doing anything wrong > >> in the solr configuration or in the index generation. > >> > >> Any help would be appreciated. > >> Regards, > >> Matteo > >> > > >
Re: solr with hadoop
I was playing around w/ Sqoop the other day, its a simple Cloudera tool for imports (mysql -> hdfs) @ http://www.cloudera.com/developers/downloads/sqoop/ It seems to me (it would be pretty efficient) to dump to HDFS and have something like Data Import Handler be able to read from hdfs:// directly ... Has this route been discussed / developed before (ie DIH w/ hdfs:// handler)? - Jon On Jun 22, 2010, at 12:29 PM, MitchK wrote: > > I wanted to add a Jira-issue about exactly what Otis is asking here. > Unfortunately, I haven't time for it because of my exams. > > However, I'd like to add a question to Otis' ones: > If you destribute the indexing-progress this way, are you able to replicate > the different documents correctly? > > Thank you. > - Mitch > > Otis Gospodnetic-2 wrote: >> >> Stu, >> >> Interesting! Can you provide more details about your setup? By "load >> balance the indexing stage" you mean "distribute the indexing process", >> right? Do you simply take your content to be indexed, split it into N >> chunks where N matches the number of TaskNodes in your Hadoop cluster and >> provide a map function that does the indexing? What does the reduce >> function do? Does that call IndexWriter.addAllIndexes or do you do that >> outside Hadoop? >> >> Thanks, >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> - Original Message >> From: Stu Hood >> To: solr-user@lucene.apache.org >> Sent: Monday, January 7, 2008 7:14:20 PM >> Subject: Re: solr with hadoop >> >> As Mike suggested, we use Hadoop to organize our data en route to Solr. >> Hadoop allows us to load balance the indexing stage, and then we use >> the raw Lucene IndexWriter.addAllIndexes method to merge the data to be >> hosted on Solr instances. >> >> Thanks, >> Stu >> >> >> >> -Original Message- >> From: Mike Klaas >> Sent: Friday, January 4, 2008 3:04pm >> To: solr-user@lucene.apache.org >> Subject: Re: solr with hadoop >> >> On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote: >> >>> I have huge index base (about 110 millions documents, 100 fields >>> each). But size of the index base is reasonable, it's about 70 Gb. >>> All I need is increase performance, since some queries, which match >>> big number of documents, are running slow. >>> So I was thinking is any benefits to use hadoop for this? And if >>> so, what direction should I go? Is anybody did something for >>> integration Solr with Hadoop? Does it give any performance boost? >>> >> Hadoop might be useful for organizing your data enroute to Solr, but >> I don't see how it could be used to boost performance over a huge >> Solr index. To accomplish that, you need to split it up over two >> machines (for which you might find hadoop useful). >> >> -Mike >> >> >> >> >> >> >> > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: OOM on sorting on dynamic fields
Hi Erick, the index is quite small (1691145 docs) but sorting is massive and often on unique timestamp fields. OOM occur after a range of time between three and four hours. Depending as well if users browse a part of the application. We use solrj to make the queries so we did not use Readers objects directly. Without sorting we don't see the problem Regards, Matteo On 22 June 2010 17:01, Erick Erickson wrote: > H.. A couple of details I'm wondering about. How many > documents are we talking about in your index? Do you get > OOMs when you start fresh or does it take a while? > > You've done some good investigations, so it seems like there > could well be something else going on here than just "the usual > suspects" of sorting > > I'm wondering if you aren't really closing readers somehow. > Are you updating your index frequently and re-opening readers often? > If so, how? > > I'm assuming that if you do NOT sort on all these fields, you don't have > the problem, is that true? > > Best > Erick > > On Fri, Jun 18, 2010 at 10:52 AM, Matteo Fiandesio < > matteo.fiande...@gmail.com> wrote: > >> Hello, >> we are experiencing OOM exceptions in our single core solr instance >> (on a (huge) amazon EC2 machine). >> We investigated a lot in the mailing list and through jmap/jhat dump >> analyzing and the problem resides in the lucene FieldCache that fills >> the heap and blows up the server. >> >> Our index is quite small but we have a lot of sort queries on fields >> that are dynamic,of type long representing timestamps and are not >> present in all the documents. >> Those queries apply sorting on 12-15 of those fields. >> >> We are using solr 1.4 in production and the dump shows a lot of >> Integer/Character and Byte Array filled up with 0s. >> With solr's trunk code things does not change. >> >> In the mailing list we saw a lot of messages related to this issues: >> we tried truncating the dates to day precision,using missingSortLast = >> true,changing the field type from slong to long,setting autowarming to >> different values,disabling and enabling caches with different values >> but we did not manage to solve the problem. >> >> We were thinking to implement an LRUFieldCache field type to manage >> the FieldCache as an LRU and preventing but, before starting a new >> development, we want to be sure that we are not doing anything wrong >> in the solr configuration or in the index generation. >> >> Any help would be appreciated. >> Regards, >> Matteo >> >
Re: anyone use hadoop+solr?
Hi Blargy, Nice to hear that I am not alone ;) Well we have been using Hadoop for other data-intensive services, those that can be done in parallel. We have multiple nodes, which are used by Hadoop for all our MapReduce jobs. I personally don't have much experience with its use and hence wouldn't be able to help you much with that. Our indexing takes 6+ hours to index 15 million documents (using solrj.streamUpdateSolrServer). I wanted to explore hadoop for this task, as it can be done in parallel. I have just started investigating into this, will keep this post updated if found anything helpful. -Neeb -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914659.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: anyone use hadoop+solr?
Well, the patch consumes the data from a csv. You have to modify the input to use TableInputFormat (I don't remember if it's called exaclty like that) and it will work. Once you've done that, you have to specify as much reducers as shards you want. I know 2 ways to index using hadoop method 1 (solr-1301 & nutch): -Map: just get data from the source and create key-value -Reduce: does the analysis and index the data So, the index is build on the reducer side method 2 (hadoop lucene index contrib) -Map: does analysis and open indexWriter to add docs -Reducer: Merge small indexs build in the map So, indexs are build on the map side method 2 has no good integration with Solr at the moment. In the jira (SOLR-1301) there's a good explanation of the advantages and disadvantages of indexing on the map or reduce side. I recomend you to read with detail all the comments on the jira to know exactly how it works. -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Data Import Handler Rich Format Documents
On 6/18/2010 2:42 PM, Chris Hostetter wrote: : > I don't think DIH can do that, but who knows, let's see what others say. : Looks like the ExtractingRequestHandler uses Tika as well. I might just use : this but I'm wondering if there will be a large performance difference between : using it to batch content in over rolling my own Transformer? I'm confused ... You're using DIH, and some of your fields are URLs to documents that you want to parse with Tika? Why would you need a custom Transformer? http://wiki.apache.org/solr/DataImportHandler#Tika_Integration http://wiki.apache.org/solr/TikaEntityProcessor -Hoss Ok, I'm trying to integrate the TikaEntityProcessor as suggested. I'm using Solr Version: 1.4.0 and getting the following error: java.lang.ClassNotFoundException: Unable to load BinURLDataSource or org.apache.solr.handler.dataimport.BinURLDataSource curl -s http://test.html|curl http://localhost:9080/solr/update/extract?extractOnly=true --data-binary @- -H 'Content-type:text/html' ... works fine so presumably my Tika processor is working. My data-config.xml looks like this: query="select CONTENT_URL from my_database where content_id='${my_database.CONTENT_ID}'"> url="http://www.mysite.com/${my_database.content_url}"; I added the entity name="my_database_url" section to an existing (working) database entity to be able to have Tika index the content pointed to by the content_url. Is there anything obviously wrong with what I've tried so far? Thanks - Tod
Re: anyone use hadoop+solr?
Need, Seems like we are in the same boat. Our index consist of 5M records which roughly equals around 30 gigs. All in all thats not too bad however our indexing process (we use DIH but I'm now revisiting that idea) takes a whopping 30+ hours!!! I just bought the Hadoop In Action early edition but haven't had time to read it yet. I was wondering what resources you are using to learn Hadoop and more importantly its applications to Solr. Would you mind explaining your thought process on how you will be using Hadoop in more detail? -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914606.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr with hadoop
I wanted to add a Jira-issue about exactly what Otis is asking here. Unfortunately, I haven't time for it because of my exams. However, I'd like to add a question to Otis' ones: If you destribute the indexing-progress this way, are you able to replicate the different documents correctly? Thank you. - Mitch Otis Gospodnetic-2 wrote: > > Stu, > > Interesting! Can you provide more details about your setup? By "load > balance the indexing stage" you mean "distribute the indexing process", > right? Do you simply take your content to be indexed, split it into N > chunks where N matches the number of TaskNodes in your Hadoop cluster and > provide a map function that does the indexing? What does the reduce > function do? Does that call IndexWriter.addAllIndexes or do you do that > outside Hadoop? > > Thanks, > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: Stu Hood > To: solr-user@lucene.apache.org > Sent: Monday, January 7, 2008 7:14:20 PM > Subject: Re: solr with hadoop > > As Mike suggested, we use Hadoop to organize our data en route to Solr. > Hadoop allows us to load balance the indexing stage, and then we use > the raw Lucene IndexWriter.addAllIndexes method to merge the data to be > hosted on Solr instances. > > Thanks, > Stu > > > > -Original Message- > From: Mike Klaas > Sent: Friday, January 4, 2008 3:04pm > To: solr-user@lucene.apache.org > Subject: Re: solr with hadoop > > On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote: > >> I have huge index base (about 110 millions documents, 100 fields >> each). But size of the index base is reasonable, it's about 70 Gb. >> All I need is increase performance, since some queries, which match >> big number of documents, are running slow. >> So I was thinking is any benefits to use hadoop for this? And if >> so, what direction should I go? Is anybody did something for >> integration Solr with Hadoop? Does it give any performance boost? >> > Hadoop might be useful for organizing your data enroute to Solr, but > I don't see how it could be used to boost performance over a huge > Solr index. To accomplish that, you need to split it up over two > machines (for which you might find hadoop useful). > > -Mike > > > > > > > -- View this message in context: http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: anyone use hadoop+solr?
Thanks Marc, Well I have an HBASE storage architecture and solr master-slave setup with two slave servers. Would this patch work with my setup? Do I need sharding in place? and what tasks would be run at map and reduce phases? I was thinking something like: At Map: read documents as key/value and convert it to solrInputDoc and add it to the server. At Reduce: merge index? and commit>optimioze? Also is there any quick guidelines on how to get start with this setup? As I am new to hadoop as well as fairly new to Solr. Appreciate your help, -A -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914587.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr with hadoop
I think a good solution could be to use hadoop with SOLR-1301 to build solr shards and then use solr distributed search against these shards (you will have to copy to local from HDFS to search against them) -- View this message in context: http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914576.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: anyone use hadoop+solr?
I think there's people using this patch in production: https://issues.apache.org/jira/browse/SOLR-1301 I have tested it myself indexing data from CSV and from HBase and it works properly -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914553.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ
Hi, there are reasons for both options. Usually it is a good idea to put the default configuration into the solrconfig.xml (and even fix some of the configuration) in order to have simple client-side code. But sometimesit is necessary to have some flexibility for the actual query. In this situation one would use the client-side approach. If done right, this does not mean to put the parameters in the servlet code. Cheers, Sven --On Dienstag, 22. Juni 2010 17:52 +0200 "Jan Høydahl / Cominvent" wrote: Hi, Sometimes I do both. I put the defaults in solrconfig.xml and thus have one place to define all kind of low-level default settings. But then I make a possibility in the application space to add/override any parameters as well. This gives you great flexibility to let server administrators (with access to solrconfig.xml) tune low level stuff, but also gives programmers a middle layer to put domain-space config instead of locking it down on the search node or up in the web interfaces. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 21. juni 2010, at 22.29, Saïd Radhouani wrote: I completely agreed. Thanks a lot! -S On Jun 21, 2010, at 9:08 PM, Abdelhamid ABID wrote: Why would someone port the solr config into servlet code ? IMO the first option would be the best choice, one obvious reason is that, when alter the solr config you only need to restart the server, whereas changing in the source drive you to redeploy your app and restart the server. On 6/21/10, Saïd Radhouani wrote: Hello, I'm developing a Web application that communicate with Solr using SolrJ. I have three search interfaces, and I'm facing two options: 1- Configuring one SearchHandler per search interface in solrconfig.xml Or 2- Write the configuration in the java servlet code that is using SolrJ It there any significant difference between these two options ? If yes, what's the best choice? Thanks, -Saïd -- Abdelhamid ABID Software Engineer- J2EE / WEB
Re: Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ
Hi, Sometimes I do both. I put the defaults in solrconfig.xml and thus have one place to define all kind of low-level default settings. But then I make a possibility in the application space to add/override any parameters as well. This gives you great flexibility to let server administrators (with access to solrconfig.xml) tune low level stuff, but also gives programmers a middle layer to put domain-space config instead of locking it down on the search node or up in the web interfaces. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 21. juni 2010, at 22.29, Saïd Radhouani wrote: > I completely agreed. Thanks a lot! > > -S > > On Jun 21, 2010, at 9:08 PM, Abdelhamid ABID wrote: > >> Why would someone port the solr config into servlet code ? >> IMO the first option would be the best choice, one obvious reason is that, >> when alter the solr config you only need to restart the server, whereas >> changing in the source drive you to redeploy your app and restart the >> server. >> >> >> >> On 6/21/10, Saïd Radhouani wrote: >>> >>> Hello, >>> >>> I'm developing a Web application that communicate with Solr using SolrJ. I >>> have three search interfaces, and I'm facing two options: >>> >>> 1- Configuring one SearchHandler per search interface in solrconfig.xml >>> >>> Or >>> >>> 2- Write the configuration in the java servlet code that is using SolrJ >>> >>> It there any significant difference between these two options ? If yes, >>> what's the best choice? >>> >>> Thanks, >>> >>> -Saïd >> >> >> >> >> -- >> Abdelhamid ABID >> Software Engineer- J2EE / WEB >
Re: solr with hadoop
Hi, We currently have a master-slave setup for solr with two slave servers. We are using Solrj (stream-update-solr-server) to index master slave, which takes 6 hours to index around 15 million documents. I would like to explore hadoop, in particularly for indexing job using mapreduce approach. - I have read some comments on the JIRA tickets, but it still seems unclear how this setup will work. - I am not sure as what tasks will be done at map phase and what on reduce phase. - And would it merge the multiple indices together into one during reduce phase or is this a separate task out of mapreduce? Any directions and guidance over this setup would be highly appreciated. Thanks in advance, -Ali -- View this message in context: http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: anyone use hadoop+solr?
Hey James, Just wondering if you ever had a chance to try out hadoop with solr? Would appreciate any information/directions you could give. I am particularly interested in indexing using a mapreduce job. Cheers, -Ali -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914450.html Sent from the Solr - User mailing list archive at Nabble.com.
Field missing when use distributed search + dismax
Hi. All. I was using distributed search over 30 solr instance, the previous one was using the standard query handler. And the result was returned correctly. each result has 2 fields. "ID" and "type". Today I want to use search withk dismax, I tried search with each instance with dismax. It works correctly, return "ID" and "type" for each result. The strange thing is when I use distributed search, the result only have "ID". The field "type" disappeared. I need that "type" to know what the "ID" refer to. Why solr "eat" my "type"? Thanks. Regards. Scott
Re: OOM on sorting on dynamic fields
H.. A couple of details I'm wondering about. How many documents are we talking about in your index? Do you get OOMs when you start fresh or does it take a while? You've done some good investigations, so it seems like there could well be something else going on here than just "the usual suspects" of sorting I'm wondering if you aren't really closing readers somehow. Are you updating your index frequently and re-opening readers often? If so, how? I'm assuming that if you do NOT sort on all these fields, you don't have the problem, is that true? Best Erick On Fri, Jun 18, 2010 at 10:52 AM, Matteo Fiandesio < matteo.fiande...@gmail.com> wrote: > Hello, > we are experiencing OOM exceptions in our single core solr instance > (on a (huge) amazon EC2 machine). > We investigated a lot in the mailing list and through jmap/jhat dump > analyzing and the problem resides in the lucene FieldCache that fills > the heap and blows up the server. > > Our index is quite small but we have a lot of sort queries on fields > that are dynamic,of type long representing timestamps and are not > present in all the documents. > Those queries apply sorting on 12-15 of those fields. > > We are using solr 1.4 in production and the dump shows a lot of > Integer/Character and Byte Array filled up with 0s. > With solr's trunk code things does not change. > > In the mailing list we saw a lot of messages related to this issues: > we tried truncating the dates to day precision,using missingSortLast = > true,changing the field type from slong to long,setting autowarming to > different values,disabling and enabling caches with different values > but we did not manage to solve the problem. > > We were thinking to implement an LRUFieldCache field type to manage > the FieldCache as an LRU and preventing but, before starting a new > development, we want to be sure that we are not doing anything wrong > in the solr configuration or in the index generation. > > Any help would be appreciated. > Regards, > Matteo >
RE: example for searching hibernate entities
Have you already looked at Hibernate Search? It combines Hibernate ORM with indexing/searching functionality of Lucene. The latest version even comes with the Solr analyzers. http://www.hibernate.org/subprojects/search.html Regards, Tom -Original Message- From: fachhoch [mailto:fachh...@gmail.com] Sent: dinsdag 22 juni 2010 16:23 To: solr-user@lucene.apache.org Subject: example for searching hibernate entities I have complex data model with bi directional relations I Use hibernate as ORM provider.so I have several model objects representing data model. All together my model objetcs are 75 to 100 and my database each table has several records like 20,000. please suggest in my case will text search help me? are there any example searching on hibernate entities? -- View this message in context: http://lucene.472066.n3.nabble.com/example-for-searching-hibernate-entit ies-tp914279p914279.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching across multiple repeating fields
Perhaps my answer is useless, bc I don't have an answer to your direct question, but: You *might* want to consider if your concept of a solr-document is on the correct granular level, i.e: your problem posted could be tackled (afaik) by defining a document being a 'sub-event' with only 1 daterange. So for each event-doc you have now, this is replaced by several sub-event docs in this proposed situation. Additionally each sub-event doc gets an additional field 'parent-eventid' which maps to something like an event-id (which you're probably using) . So several sub-event docs can point to the same event-id. Lastly, all sub-event docs belonging to a particular event implement all the other fields that you may have stored in that particular event-doc. Now you can query for events based on data-rages like you envisioned, but instead of returning events you return sub-event-docs. However since all data of the original event (except the multiple dateranges) is available in the subevent-doc this shouldn't really bother the client. If you need to display all dates of an event (the only info missing from the returned solr-doc) you could easily store it in a RDB and fetch it using the defined parent-eventid. The only caveat I see, is that possibly multiple sub-events with the same 'parent-eventid' might get returned for a particular query. This however depends on the type of queries you envision. i.e: 1) If you always issue queries with date-filters, and *assuming* that sub-events of a particular event don't temporally overlap, you will never get multiple sub-events returned. 2) if 1) doesn't hold and assuming you *do* mind multiple sub-events of the same actual event, you could try to use Field Collapsing on 'parent-eventid' to only return the first sub-event per parent-eventid that matches the rest of your query. (Note however, that Field Collapsing is a patch at the moment. http://wiki.apache.org/solr/FieldCollapsing) Not sure if this helped you at all, but at the very least it was a nice conceptual exercise ;-) Cheers, Geert-Jan 2010/6/22 Mark Allan > Hi all, > > Firstly, I apologise for the length of this email but I need to describe > properly what I'm doing before I get to the problem! > > I'm working on a project just now which requires the ability to store and > search on temporal coverage data - ie. a field which specifies a date range > during which a certain event took place. > > I hunted around for a few days and couldn't find anything which seemed to > fit, so I had a go at writing my own field type based on solr.PointType. > It's used as follows: > schema.xml > dimension="2" subFieldSuffix="_i"/> > multiValued="true"/> > data.xml > > >... >1940,1945 > > > > Internally, this gets stored as: >1940,1945 >1940 >1945 > > In due course, I'll declare the subfields as a proper date type, but in the > meantime, this works absolutely fine. I can search for an individual date > and Solr will check (queryDate > daterange_0 AND queryDate < daterange_1 ) > and the correct documents are returned. My code also allows the user to > input a date range in the query but I won't complicate matters with that > just now! > > The problem arises when a document has more than one "daterange" field > (imagine a news broadcast which covers a variety of topics and hence time > periods). > > A document with two daterange fields > >... >19820402,19820614 >1990,2000 > > gets stored internally as > name="daterange">19820402,198206141990,2000 >198204021990 >198206142000 > > In this situation, searching for 1985 should yield zero results as it is > contained within neither daterange, however, the above document is returned > in the result set. What Solr is doing is checking that the queryDate (1985) > is greater than *any* of the values in daterange_0 AND queryDate is less > than *any* of the values in daterange_1. > > How can I get Solr to respect the positions of each item in the daterange_0 > and _1 arrays? Ideally I'd like the search to use the following logic, thus > preventing the above document from being returned in a search for 1985: >(queryDate > daterange_0[0] AND queryDate < daterange_1[0]) OR > (queryDate > daterange_0[1] AND queryDate < daterange_1[1]) > > Someone else had a very similar problem recently on the mailing list with a > multiValued PointType field but the thread went cold without a final > solution. > > While I could filter the results when they get back to my application > layer, it seems like it's not really the right place to do it. > > Any help getting Solr to respect the positions of items in arrays would be > very gratefully received. > > Many thanks, > Mark > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >
Re: performance sorting multivalued field
Curiosity is good . Do be aware, though, that the behavior is not guaranteed, it's just "how things happen to work" and may change without warning Erick On Tue, Jun 22, 2010 at 4:01 AM, Marc Sturlese wrote: > > >>Well, sorting requires that all the unique values in the target field > >>get loaded into memory > That's what I tought, thanks. > > >>But a larger question is whether what your doing is worthwhile > >>even as just a measurement. You say > >>"This is good for me, I don't care for my tests". I claim that > >>you do care > I just like play with things. First checked the behavior of sorting on > multiValued field and what I noticed was, let's say you have docs with > field > called 'num': > doc1->num:2;doc2->num:1,num:4;doc3->num:5 > Sorting by the field num what you get is: > After sorting asc I get: doc2,doc1,doc3. > The behavior seems to be always the same (I am not saying it works like > that > but it's what I've seen in my examples) > After seeing that I just decided to check the performance. The point is > simply curiosity. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p913626.html > Sent from the Solr - User mailing list archive at Nabble.com. >
example for searching hibernate entities
I have complex data model with bi directional relations I Use hibernate as ORM provider.so I have several model objects representing data model. All together my model objetcs are 75 to 100 and my database each table has several records like 20,000. please suggest in my case will text search help me? are there any example searching on hibernate entities? -- View this message in context: http://lucene.472066.n3.nabble.com/example-for-searching-hibernate-entities-tp914279p914279.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field Collapsing SOLR-236
Hi, I tried checking out the latest code (rev 956715) the patch did not work on it. Infact i even tried hunting for the revision mentioned earlier in this thread (i.e. rev 955615) but cannot find it in the repository. (it has revision 955569 followed by revision 955785). Any pointers?? Regards Raakhi On Tue, Jun 22, 2010 at 2:03 AM, Martijn v Groningen < martijn.is.h...@gmail.com> wrote: > Oh in that case is the code stable enough to use it for production? > - Well this feature is a patch and I think that says it all. > Although bugs are fixed it is deferentially an experimental feature > and people should keep that in mind when using one of the patches. > Does it support features which solr 1.4 normally supports? >- As far as I know yes. > > am using facets as a workaround but then i am not able to sort on any > other field. is there any workaround to support this feature?? >- Maybee http://wiki.apache.org/solr/Deduplication prevents from > adding duplicates in you index, but then you miss the collapse counts > and other computed values > > On 21 June 2010 09:04, Rakhi Khatwani wrote: > > Hi, > >Oh in that case is the code stable enough to use it for production? > > Does it support features which solr 1.4 normally supports? > > > > I am using facets as a workaround but then i am not able to sort on any > > other field. is there any workaround to support this feature?? > > > > Regards, > > Raakhi > > > > On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen < > > martijn.is.h...@gmail.com> wrote: > > > >> Hi Rakhi, > >> > >> The patch is not compatible with 1.4. If you want to work with the > >> trunk. I'll need to get the src from > >> https://svn.apache.org/repos/asf/lucene/dev/trunk/ > >> > >> Martijn > >> > >> On 18 June 2010 13:46, Rakhi Khatwani wrote: > >> > Hi Moazzam, > >> > > >> > Where did u get the src code from?? > >> > > >> > I am downloading it from > >> > https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 > >> > > >> > and the latest revision in this location is 955469. > >> > > >> > so applying the latest patch(dated 17th june 2010) on it still > generates > >> > errors. > >> > > >> > Any Pointers? > >> > > >> > Regards, > >> > Raakhi > >> > > >> > > >> > On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan > >> wrote: > >> > > >> >> I knew it wasn't me! :) > >> >> > >> >> I found the patch just before I read this and applied it to the trunk > >> >> and it works! > >> >> > >> >> Thanks Mark and martijn for all your help! > >> >> > >> >> - Moazzam > >> >> > >> >> On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen > >> >> wrote: > >> >> > I've added a new patch to the issue, so building the trunk (rev > >> >> > 955615) with the latest patch should not be a problem. Due to > recent > >> >> > changes in the Lucene trunk the patch was not compatible. > >> >> > > >> >> > On 17 June 2010 20:20, Erik Hatcher > wrote: > >> >> >> > >> >> >> On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote: > >> >> >>> > >> >> >>> p.s. I'd be glad to contribute our Maven build re-organization > back > >> to > >> >> the > >> >> >>> community to get Solr properly Mavenized so that it can be > >> distributed > >> >> and > >> >> >>> released more often. For us the benefit of this structure is > that > >> we > >> >> will > >> >> >>> be able to overlay addons such as RequestHandlers and other third > >> party > >> >> >>> support without having to rebuild Solr from scratch. > >> >> >> > >> >> >> But you don't have to rebuild Solr from scratch to add a new > request > >> >> handler > >> >> >> or other plugins - simply compile your custom stuff into a JAR and > >> put > >> >> it in > >> >> >> /lib (or point to it with in solrconfig.xml). > >> >> >> > >> >> >>> Ideally, a Maven Archetype could be created that would allow one > >> >> rapidly > >> >> >>> produce a Solr webapp and fire it up in Jetty in mere seconds. > >> >> >> > >> >> >> How's that any different than cd example; java -jar start.jar? Or > do > >> >> you > >> >> >> mean a Solr client webapp? > >> >> >> > >> >> >>> Finally, with projects such as Bobo, integration with Spring > would > >> make > >> >> >>> configuration more consistent and request significantly less java > >> >> coding > >> >> >>> just to add new capabilities everytime someone authors a new > >> >> RequestHandler. > >> >> >> > >> >> >> It's one line of config to add a new request handler. How many > >> >> ridiculously > >> >> >> ugly confusing lines of Spring XML would it take? > >> >> >> > >> >> >>> The biggest thing I learned about Solr in my work thusfar is > that > >> >> patches > >> >> >>> like these could be standalone modules in separate projects if it > >> >> weren't > >> >> >>> for having to hack the configuration and solrj methods up to > adopt > >> >> them. > >> >> >>> Which brings me to SolrJ, great API if it would stay generic and > >> have > >> >> less > >> >> >>> concern for adding method each time some custom collections an
Searching across multiple repeating fields
Hi all, Firstly, I apologise for the length of this email but I need to describe properly what I'm doing before I get to the problem! I'm working on a project just now which requires the ability to store and search on temporal coverage data - ie. a field which specifies a date range during which a certain event took place. I hunted around for a few days and couldn't find anything which seemed to fit, so I had a go at writing my own field type based on solr.PointType. It's used as follows: schema.xml dimension="2" subFieldSuffix="_i"/> multiValued="true"/> data.xml ... 1940,1945 Internally, this gets stored as: 1940,1945 1940 1945 In due course, I'll declare the subfields as a proper date type, but in the meantime, this works absolutely fine. I can search for an individual date and Solr will check (queryDate > daterange_0 AND queryDate < daterange_1 ) and the correct documents are returned. My code also allows the user to input a date range in the query but I won't complicate matters with that just now! The problem arises when a document has more than one "daterange" field (imagine a news broadcast which covers a variety of topics and hence time periods). A document with two daterange fields ... 19820402,19820614 1990,2000 gets stored internally as 19820402,198206141990,2000str> 198204021990arr> 198206142000arr> In this situation, searching for 1985 should yield zero results as it is contained within neither daterange, however, the above document is returned in the result set. What Solr is doing is checking that the queryDate (1985) is greater than *any* of the values in daterange_0 AND queryDate is less than *any* of the values in daterange_1. How can I get Solr to respect the positions of each item in the daterange_0 and _1 arrays? Ideally I'd like the search to use the following logic, thus preventing the above document from being returned in a search for 1985: (queryDate > daterange_0[0] AND queryDate < daterange_1[0]) OR (queryDate > daterange_0[1] AND queryDate < daterange_1[1]) Someone else had a very similar problem recently on the mailing list with a multiValued PointType field but the thread went cold without a final solution. While I could filter the results when they get back to my application layer, it seems like it's not really the right place to do it. Any help getting Solr to respect the positions of items in arrays would be very gratefully received. Many thanks, Mark -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
How to wait for StreamingUpdateSolrServer to finish?
I'm prototyping using StreamingUpdateSolrServer. I want to send a commit (or optimize) after I'm done adding all of my docs, rather than wait for the autoCommit to kick in. However, since StreamingUpdateSolrServer is multi-threaded, I can't simply call commit when I'm done, because that can happen before the StreamingUpdateSolrServer actually sends all the docs. I would think that calling the method blockUntilFinished() before issuing the commit would do the trick, but I still get my commit sent before the last document is sent. I've tried this with both Solr 1.4.0 and the latest release candidate for Solr 1.4.1. Has anybody else had this experience? Should I file a bug on blockUntilFinished()? -- Stephen Duncan Jr www.stephenduncanjr.com
[NEWS] New Response Writer for Native PHP Solr Client
Hi Solr users, If you are using Apache Solr via PHP, I have some good news for you. There is a new response writer for the PHP native extension, currently available as a plugin. This new feature adds a new response writer class to the org.apache.solr.request package. This class is used by the PHP Native Solr Client driver to prepare the query response from Solr. This response writer allows you to configure the way the data is serialized for the PHP client. You can use your own class name and you can also control how the properties are serialized as well. The formatting of the response data is very similar to the way it is currently done by the PECL extension on the client side. The only difference now is that this serialization is happening on the server side instead. You will find this new response writer particularly useful when dealing with responses for - highlighting - admin threads responses - more like this responses to mention just a few You can pass the "objectClassName" request parameter to specify the class name to be used for serializing objects. Please note that the class must be available on the client side to avoid a PHP_Incomplete_Object error during the unserialization process. You can also pass in the "objectPropertiesStorageMode" request parameter with either a 0 (independent properties) or a 1 (combined properties). These parameters can also be passed as a named list when loading the response writer in the solrconfig.xml file Having this control allows you to create custom objects which gives the flexibility of implementing custom __get methods, ArrayAccess, Traversable and Iterator interfaces on the PHP client side. Until this class in incorporated into Solr, you simply have to copy the jar file containing this plugin into your lib directory under $SOLR_HOME The jar file is available here https://issues.apache.org/jira/browse/SOLR-1967 Then set up the configuration as shown below and then restart your servlet container Below is an example configuration in solrconfig.xml SolrObject 0 Below is an example implementation on the PHP client side. Support for specifying custom response writers will be available starting from the 0.9.11 version (released today) of the PECL extension for Solr currently available here http://pecl.php.net/package/solr Here is an example of how to use the new response writer with the PHP client. $property_name; } else if (isset($_properties[$property_name])) { return $_properties[$property_name]; } return null; } } $options = array ( 'hostname' => 'localhost', 'port' => 8983, 'path' => '/solr/' ); $client = new SolrClient($options); $client->setResponseWriter("phpnative"); $response = $client->ping(); $query = new SolrQuery(); $query->setQuery(":"); $query->set("objectClassName", "SolrClass"); $query->set("objectPropertiesStorageMode", 1); $response = $client->query($query); $resp = $response->getResponse(); ?> Documentation of the changes to the PECL extension are available here http://docs.php.net/manual/en/solrclient.construct.php http://docs.php.net/manual/en/solrclient.setresponsewriter.php Please contact me at ie...@php.net, if you have any questions or comments. -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Alternative for field collapsing
Thanks Peter :) On Tue, Jun 22, 2010 at 3:08 PM, Peter Karich wrote: > ups, sorry. I meant Martijn! Not the germanized Martin :-/ > > Peter. > > > Hi, > > I wanted to apply field collapsing on the title(type string). but > > want to show only one document (and the count of such documents) per > title > > rather than show all the documents. > > > > Regards > > Raakhi > > > > > > On Tue, Jun 22, 2010 at 12:59 AM, Peter Karich wrote: > > > > > >> Hi Raakhi, > >> > >> First, field collapsing works pretty well in our system. And, as Martin > >> has said on 17.06.2010 in the other thread "Field Collapsing SOLR-236": > >> > >> I've added a new patch to the issue, so building the trunk (rev > >> 955615) with the latest patch should not be a problem. Due to recent > >> changes in the Lucene trunk the patch was not compatible. > >> > >> Second, if the id is unique applying field collapse make no sense. So I > >> suppose you will apply field collapsing to the title, right? > >> But in this case, why doesn't a simple query ala q=title:'my > >> title'&sort=price asc work for you? Or what do you want to achieve? > >> (The title should be of type string, I think) > >> > >> Regards, > >> Peter. > >> > >> > >>> Hi, > >>> I have an index with the following fields: > >>> id (unique) > >>> title > >>> description > >>> price. > >>> > >>> Suppose i want to find unique documents and count of all documents with > >>> > >> the > >> > >>> same title, sorted on price. > >>> How do i go about it. > >>> Knowing that field collapsing is not stable with 1.4. > >>> if i go about using facet's on id, it sorts either on id or on the > count, > >>> but not on the price, > >>> > >>> Any Suggestions?? > >>> Regards, > >>> Raakhi > >>> > >>> > >
Re: Alternative for field collapsing
ups, sorry. I meant Martijn! Not the germanized Martin :-/ Peter. > Hi, > I wanted to apply field collapsing on the title(type string). but > want to show only one document (and the count of such documents) per title > rather than show all the documents. > > Regards > Raakhi > > > On Tue, Jun 22, 2010 at 12:59 AM, Peter Karich wrote: > > >> Hi Raakhi, >> >> First, field collapsing works pretty well in our system. And, as Martin >> has said on 17.06.2010 in the other thread "Field Collapsing SOLR-236": >> >> I've added a new patch to the issue, so building the trunk (rev >> 955615) with the latest patch should not be a problem. Due to recent >> changes in the Lucene trunk the patch was not compatible. >> >> Second, if the id is unique applying field collapse make no sense. So I >> suppose you will apply field collapsing to the title, right? >> But in this case, why doesn't a simple query ala q=title:'my >> title'&sort=price asc work for you? Or what do you want to achieve? >> (The title should be of type string, I think) >> >> Regards, >> Peter. >> >> >>> Hi, >>> I have an index with the following fields: >>> id (unique) >>> title >>> description >>> price. >>> >>> Suppose i want to find unique documents and count of all documents with >>> >> the >> >>> same title, sorted on price. >>> How do i go about it. >>> Knowing that field collapsing is not stable with 1.4. >>> if i go about using facet's on id, it sorts either on id or on the count, >>> but not on the price, >>> >>> Any Suggestions?? >>> Regards, >>> Raakhi >>> >>>
Re: Alternative for field collapsing
Hi Raakhi, yes, then the collapse patch works perfectly in our case. If you don't get the patch applied correctly, try asking directly here: https://issues.apache.org/jira/browse/SOLR-236 I did the same and got immediately response from Martin & Co or try the latest patch: 2010-06-17 03:08 PM Martijn van Groningen Querying is simple: q=peter&collapse.field=title and you will get back only one document for the same title containing 'peter' and additionally the 'similar'/collapse-count for every document: title 4512 4010 ... Regards, Peter. > Hi, > I wanted to apply field collapsing on the title(type string). but > want to show only one document (and the count of such documents) per title > rather than show all the documents. > > Regards > Raakhi > > > On Tue, Jun 22, 2010 at 12:59 AM, Peter Karich wrote: > > >> Hi Raakhi, >> >> First, field collapsing works pretty well in our system. And, as Martin >> has said on 17.06.2010 in the other thread "Field Collapsing SOLR-236": >> >> I've added a new patch to the issue, so building the trunk (rev >> 955615) with the latest patch should not be a problem. Due to recent >> changes in the Lucene trunk the patch was not compatible. >> >> Second, if the id is unique applying field collapse make no sense. So I >> suppose you will apply field collapsing to the title, right? >> But in this case, why doesn't a simple query ala q=title:'my >> title'&sort=price asc work for you? Or what do you want to achieve? >> (The title should be of type string, I think) >> >> Regards, >> Peter. >> >> >>> Hi, >>> I have an index with the following fields: >>> id (unique) >>> title >>> description >>> price. >>> >>> Suppose i want to find unique documents and count of all documents with >>> >> the >> >>> same title, sorted on price. >>> How do i go about it. >>> Knowing that field collapsing is not stable with 1.4. >>> if i go about using facet's on id, it sorts either on id or on the count, >>> but not on the price, >>> >>> Any Suggestions?? >>> Regards, >>> Raakhi >>> >>> >>> >> >> >> > -- http://karussell.wordpress.com/
Re: OOM on sorting on dynamic fields
First of all thanks for your answers. Those OOMEs are pretty nasty for our production environment. I didn't try the solution of ordering by function as it was a solr 1.5 feature and we prefer to use a stable version 1.4. I made a temporary patch that it looks is working fine. I patched the lucene-core-2.9.1 source code adding those this lines in the abstract static class Cache's get method ... public Object get(IndexReader reader, Entry key) throws IOException { Map innerCache; Object value; + final Object readerKey = reader.getFieldCacheKey(); + CacheEntry[] cacheEntries = wrapper.getCacheEntries(); +if(cacheEntries.length>A_TUNED_INT_VALUE){ + readerCache.clear(); + } ... I didn't notice any delay or concurrence problem. On 22 June 2010 07:27, Lance Norskog wrote: > No, this is basic to how Lucene works. You will need larger EC2 instances. > > On Mon, Jun 21, 2010 at 2:08 AM, Matteo Fiandesio > wrote: >> Compiling solr with lucene 2.9.3 instead of 2.9.1 will solve this issue? >> Regards, >> Matteo >> >> On 19 June 2010 02:28, Lance Norskog wrote: >>> The Lucene implementation of sorting creates an array of four-byte >>> ints for every document in the index, and another array of the unique >>> values in the field. >>> If the timestamps are 'date' or 'tdate' in the schema, they do not >>> need the second array. >>> >>> You can also sort by a field's with a function query. This does not >>> build the arrays, but might be a little slower. >>> Yes, the sort arrays (and also facet values for a field) should be >>> controlled by a fixed-size cache, but they are not. >>> >>> On Fri, Jun 18, 2010 at 7:52 AM, Matteo Fiandesio >>> wrote: Hello, we are experiencing OOM exceptions in our single core solr instance (on a (huge) amazon EC2 machine). We investigated a lot in the mailing list and through jmap/jhat dump analyzing and the problem resides in the lucene FieldCache that fills the heap and blows up the server. Our index is quite small but we have a lot of sort queries on fields that are dynamic,of type long representing timestamps and are not present in all the documents. Those queries apply sorting on 12-15 of those fields. We are using solr 1.4 in production and the dump shows a lot of Integer/Character and Byte Array filled up with 0s. With solr's trunk code things does not change. In the mailing list we saw a lot of messages related to this issues: we tried truncating the dates to day precision,using missingSortLast = true,changing the field type from slong to long,setting autowarming to different values,disabling and enabling caches with different values but we did not manage to solve the problem. We were thinking to implement an LRUFieldCache field type to manage the FieldCache as an LRU and preventing but, before starting a new development, we want to be sure that we are not doing anything wrong in the solr configuration or in the index generation. Any help would be appreciated. Regards, Matteo >>> >>> >>> >>> -- >>> Lance Norskog >>> goks...@gmail.com >>> >> > > > > -- > Lance Norskog > goks...@gmail.com >
Re: LocalParams?
E.g. take a look at: http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html Peter. > Huh? Read through the wiki: See http://wiki.apache.org/solr/LocalParams but I > still don't understand its utility? > > Can someone explain to me why this would even be used? Any examples to help > clarify? Thanks! >
Re: performance sorting multivalued field
>>Well, sorting requires that all the unique values in the target field >>get loaded into memory That's what I tought, thanks. >>But a larger question is whether what your doing is worthwhile >>even as just a measurement. You say >>"This is good for me, I don't care for my tests". I claim that >>you do care I just like play with things. First checked the behavior of sorting on multiValued field and what I noticed was, let's say you have docs with field called 'num': doc1->num:2;doc2->num:1,num:4;doc3->num:5 Sorting by the field num what you get is: After sorting asc I get: doc2,doc1,doc3. The behavior seems to be always the same (I am not saying it works like that but it's what I've seen in my examples) After seeing that I just decided to check the performance. The point is simply curiosity. -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p913626.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr string field
It's ok It was a problem with my schema Thanks anyway -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Monday, June 21, 2010 5:09 PM To: solr-user@lucene.apache.org Subject: Re: solr string field Or even better for an exact string query: q={!raw f=field_name}sony vaio (that's NOT URL encoded, but needs to be when sending the request over HTTP) Erik On Jun 21, 2010, at 9:43 AM, Jan Høydahl / Cominvent wrote: > Hi, > > You either need to quote your string: http://localhost:8983/solr/select?q= > "sony+vaio" > or to escape the space: http://localhost:8983/solr/select?q=sony\+vaio > > If you do not do one of these, your query will be parsed as > text:sony OR text:vaio, which will not match your string field. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Training in Europe - www.solrtraining.com > > On 21. juni 2010, at 14.42, ZAROGKIKAS,GIORGOS wrote: > >> Hi >> I use a string Field in my solr schema >> but when I query a value with space it doesn't give me results >> >> e.g I have a value "sony vaio" when I query with "sony vaio" I >> get 0 results >> but when I query "sony*" I get my results >> >> how can I query a string field with a space between the values >> or how can I have exact search in a string >> >> >> Thanks in advance >> >
Re: Alternative for field collapsing
Hi, I wanted to apply field collapsing on the title(type string). but want to show only one document (and the count of such documents) per title rather than show all the documents. Regards Raakhi On Tue, Jun 22, 2010 at 12:59 AM, Peter Karich wrote: > Hi Raakhi, > > First, field collapsing works pretty well in our system. And, as Martin > has said on 17.06.2010 in the other thread "Field Collapsing SOLR-236": > > I've added a new patch to the issue, so building the trunk (rev > 955615) with the latest patch should not be a problem. Due to recent > changes in the Lucene trunk the patch was not compatible. > > Second, if the id is unique applying field collapse make no sense. So I > suppose you will apply field collapsing to the title, right? > But in this case, why doesn't a simple query ala q=title:'my > title'&sort=price asc work for you? Or what do you want to achieve? > (The title should be of type string, I think) > > Regards, > Peter. > > > Hi, > > I have an index with the following fields: > > id (unique) > > title > > description > > price. > > > > Suppose i want to find unique documents and count of all documents with > the > > same title, sorted on price. > > How do i go about it. > > Knowing that field collapsing is not stable with 1.4. > > if i go about using facet's on id, it sorts either on id or on the count, > > but not on the price, > > > > Any Suggestions?? > > Regards, > > Raakhi > > > > > > >
Re: OOM on sorting on dynamic fields
No, this is basic to how Lucene works. You will need larger EC2 instances. On Mon, Jun 21, 2010 at 2:08 AM, Matteo Fiandesio wrote: > Compiling solr with lucene 2.9.3 instead of 2.9.1 will solve this issue? > Regards, > Matteo > > On 19 June 2010 02:28, Lance Norskog wrote: >> The Lucene implementation of sorting creates an array of four-byte >> ints for every document in the index, and another array of the unique >> values in the field. >> If the timestamps are 'date' or 'tdate' in the schema, they do not >> need the second array. >> >> You can also sort by a field's with a function query. This does not >> build the arrays, but might be a little slower. >> Yes, the sort arrays (and also facet values for a field) should be >> controlled by a fixed-size cache, but they are not. >> >> On Fri, Jun 18, 2010 at 7:52 AM, Matteo Fiandesio >> wrote: >>> Hello, >>> we are experiencing OOM exceptions in our single core solr instance >>> (on a (huge) amazon EC2 machine). >>> We investigated a lot in the mailing list and through jmap/jhat dump >>> analyzing and the problem resides in the lucene FieldCache that fills >>> the heap and blows up the server. >>> >>> Our index is quite small but we have a lot of sort queries on fields >>> that are dynamic,of type long representing timestamps and are not >>> present in all the documents. >>> Those queries apply sorting on 12-15 of those fields. >>> >>> We are using solr 1.4 in production and the dump shows a lot of >>> Integer/Character and Byte Array filled up with 0s. >>> With solr's trunk code things does not change. >>> >>> In the mailing list we saw a lot of messages related to this issues: >>> we tried truncating the dates to day precision,using missingSortLast = >>> true,changing the field type from slong to long,setting autowarming to >>> different values,disabling and enabling caches with different values >>> but we did not manage to solve the problem. >>> >>> We were thinking to implement an LRUFieldCache field type to manage >>> the FieldCache as an LRU and preventing but, before starting a new >>> development, we want to be sure that we are not doing anything wrong >>> in the solr configuration or in the index generation. >>> >>> Any help would be appreciated. >>> Regards, >>> Matteo >>> >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> > -- Lance Norskog goks...@gmail.com
Re: Mr Lance : customize the search algorithm of solr
Solr depends on Lucene's implementation of queries and how it returns document hits. I can't help you architect these changes. On Mon, Jun 21, 2010 at 7:47 AM, sarfaraz masood wrote: > Mr Lance > > Thanks > a lot for ur reply.. I am a novice a solr / lucene. but i have gone > thru the documentations of both.I have even implemented programs in > lucene for searching etc. > > My problem is to apply a new search technique other than the one used by solr. > > Step 1: My algorithm finds the tf idf values of all the terms in each url > and makes a chart like this : - > > term 1 term2 term3 ... > url 1 0.7 0.6 0.7 > url 2 0.0 0.5 0.4 > url 3 > 0.7 0.8 0.6 > .. > . > . > . > . > (urls with 0 tf idf means word doesnt exist there.) > > This ways i first construct a complete chart of term tf idf to urls.. > > Step 2 (Searcher ) > then > depending on words in the query i select the correct urls by applying > mathematical formulae. This result should be shown to the user in > descending order. > > Now as i know that lucene has its own searcher > which is used by solr as well. cant i replace this searcher part in > SOLR by a java program that returns urls by my algorithm. Rest every > thing should be of solr. > > Only change the searcher part. I have > studied abt customizing the scoring which is absolutely not my aim.My > aim seems to be replacing the searcher. It is a work similar to BM25 > work which u had mentioned in your reply viz providing an alternate to > lucene search. > > Plz help me in this regards. I will be highly gratefull to you for your > assistance in this work of mine. > > If any part of this mail was not clear to you then plz lemme know, i will > expain that you. > > Regards > > -sarfaraz > > -- Lance Norskog goks...@gmail.com