Re: Tokenizer or Filter ?
Actually, you may be able to get by using PatternReplaceCharFilterFactory - copy the source value to two fields, one that treats d2.*/d2 as the delimiter pattern to delete and then other uses d1.*/d1 as the delimiter pattern to delete, so the first field has only d1 and then second has only d2. You can use a second pattern char filter to remove the [/]d[12 markers as well, probably changing them to a space in both cases. See: http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilterFactory.html -- Jack Krupansky On Tue, Jan 13, 2015 at 11:40 AM, Jack Krupansky jack.krupan...@gmail.com wrote: Would it be sufficient for your user case to simply extract all the d1 into one field and all the d2 in another field? If so, the update processor script would be very simple, simply matching all d1.*/d1 and copying them to a separate field value and same for d2. If you want examples of script update processors, see my Solr e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Krupansky On Tue, Jan 13, 2015 at 9:21 AM, tomas.kalas kala...@email.cz wrote: Thanks Jack for your advice. Can you please explain me little more, how it works? From Apache Wiki it's not to clear for me. I can write some javaScript code when i want filtering some data ? In this case i have d1bla bla bla/d1 d2 bla bla bla /d2 d1bla bla bla /d1 and i want filtering d2 bla bla bla /d2, But in other case i want filtering all d1 /d1 then i suppose i used it at indexed data and filtering from them? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179173.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr large boolean filter
Hello, We have a similar requirement where a large list of IDs needs to be sent to SOLR in filter query. Could someone please help understand if this feature is now supported in the new versions of SOLR? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-tp4070747p4179276.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow faceting performance on a docValues field
Shawn, Thanks for the suggestion, but experimentally, in my case the same query with facet.method=enum returns in almost the same amount of time. Regards David On Tuesday, January 13, 2015 12:02 PM, Shawn Heisey apa...@elyograg.org wrote: On 1/13/2015 10:35 AM, David Smith wrote: I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits. I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck. The facet part of the query is as follows: params: { facet.range: eventDate, f.eventDate.facet.range.end: 2015-05-13T16:37:18.000Z, f.eventDate.facet.range.gap: +1DAY, start: 0, rows: 10, f.eventDate.facet.range.start: 2005-03-13T16:37:18.000Z, f.eventDate.facet.mincount: 1, facet: true, debugQuery: true, _: 1421169383802 } And, the relevant schema definition is as follows: field name=eventDate type=tdate indexed=true stored=true multiValued=false docValues=true/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index. I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM. I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field. 24GB of RAM to cache 175GB is probably not enough in the general case, but if you're seeing very little disk I/O activity for this query, then we'll leave that alone and you can worry about it later. What I would try immediately is setting the facet.method parameter to enum and seeing what that does to the facet time. I've had good luck generally with that, even in situations where the docs indicated that the default (fc) was supposed to work better. I have never explored the relationship between facet.method and docValues, though. I'm out of ideas after this. I don't have enough experience with faceting to help much. Thanks, Shawn
Slow faceting performance on a docValues field
I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits. I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck. The facet part of the query is as follows: params: { facet.range: eventDate, f.eventDate.facet.range.end: 2015-05-13T16:37:18.000Z, f.eventDate.facet.range.gap: +1DAY, start: 0, rows: 10, f.eventDate.facet.range.start: 2005-03-13T16:37:18.000Z, f.eventDate.facet.mincount: 1, facet: true, debugQuery: true, _: 1421169383802 } And, the relevant schema definition is as follows: field name=eventDate type=tdate indexed=true stored=true multiValued=false docValues=true/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index. I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM. I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field. Any suggestions welcome. Regards, David
Improved suggester question
The suggester is not working for me with Solr 4.10.2 Can anyone shed light over why I might be getting the exception below when I build the dictionary? response lst name=responseHeader int name=status500/int int name=QTime26/int /lst lst name=error str name=msglen must be = 32767; got 35680/str str name=trace java.lang.IllegalArgumentException: len must be = 32767; got 35680 at org.apache.lucene.util.OfflineSorter$ByteSequencesWriter.write(OfflineSorter.java:479) at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build(AnalyzingSuggester.java:493) at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:190) at org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:160) at org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:165) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:200) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:603) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) /str int name=code500/int /lst /response Thank you. I've configured my suggester as follows: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namemySuggester/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldtext/str str name=weightFieldmedsite_id/str str name=suggestAnalyzerFieldTypetext_general/str str name=buildOnCommittrue/str str name=threshold0.1/str /lst /searchComponent requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=suggeston/str str name=suggest.dictionarymySuggester/str str name=suggest.count10/str /lst arr name=components strsuggest/str /arr /requestHandler
Re: Logging in Solr's DataImportHandler
Mikhail, Thanks - it works now.The script transformer was really not needed, a template transformer is clearer, and the log transformer is now working. On Mon, Dec 8, 2014 at 1:56 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dan, Usually it works well. Can you describe how you run it particularly, eg what you download exactly and what's the command line ? On Fri, Dec 5, 2014 at 11:37 PM, Dan Davis dansm...@gmail.com wrote: I have a script transformer and a log transformer, and I'm not seeing the log messages, at least not where I expect. Is there anyway I can simply log a custom message from within my script? Can the script easily interact with its containers logger? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Highting whole pharse
Hi, hl.usePhraseHighlighter is valid for standard highlighter. May be you are using one of the other highlighters? May be you have omitTermFreqAndPositions=true in definition of text_general field type? Ahmet On Tuesday, January 13, 2015 5:52 PM, meena.sri...@mathworks.com meena.sri...@mathworks.com wrote: Highlighting does not highlight the whole Phrase, instead each word gets highlighted. I tried all the suggestions that was given, with no luck These are my special setting I tried for phrase highlighting hl.usePhraseHighlighter=true hl.q=query http://localhost.mathworks.com:8983/solr/db/select?q=syndrome%3A%22Override+ignored+for+property%22rows=1fl=syndrome_idwt=jsonindent=truehl=truehl.simple.pre=%3Cem%3Ehl.simple.post=%3C%2Fem%3Ehl.usePhraseHighlighter=truehl.q=%22Override+ignored+for+property%22hl.fragsize=1000 This is from my schema.xml field name=syndrome type=text_general indexed=true stored=true/ Should I add parameters in the indexing stage itself to make this work? Thanks for your time. Meena -- View this message in context: http://lucene.472066.n3.nabble.com/Highting-whole-pharse-tp4179219.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to configure Solr PostingsFormat block size
Thanks Michael and Hoss, assuming I've written the subclass of the postings format, I need to tell Solr to use it. Do I just do something like: fieldType name=ocr class=solr.TextField postingsFormat=MySubclass / Is there a way to set this for all fieldtypes or would that require writing a custom CodecFactory? Tom On Mon, Jan 12, 2015 at 4:46 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : It looks like this is a good starting point: : : http://wiki.apache.org/solr/SolrConfigXml#codecFactory The default SchemaCodecFactory already supports defining a diff posting format per fieldType - but there isn't much in solr to let you tweak individual options on specific posting formats via configuration. So what you'd need to do is write a small subclass of Lucene41PostingsFormat that called super(yourMin, yourMax) in it's constructor.
Suggester questions
I am having some trouble getting the suggester to work. The spell requestHandler is working, but I didn't like the results I was getting from the word breaking dictionary and turned them off. So some basic questions: - How can I check on the status of a dictionary? - How can I see what is in that dictionary? - How do I actually manually rebuild the dictionary - all attempts to set spellcheck.build=on or suggest.build=on have led to nearly instant results (0 suggestions for the latter), indicating something is wrong. Thanks, Daniel Davis
Re: Slow faceting performance on a docValues field
On 1/13/2015 10:35 AM, David Smith wrote: I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits. I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck. The facet part of the query is as follows: params: { facet.range: eventDate, f.eventDate.facet.range.end: 2015-05-13T16:37:18.000Z, f.eventDate.facet.range.gap: +1DAY, start: 0, rows: 10, f.eventDate.facet.range.start: 2005-03-13T16:37:18.000Z, f.eventDate.facet.mincount: 1, facet: true, debugQuery: true, _: 1421169383802 } And, the relevant schema definition is as follows: field name=eventDate type=tdate indexed=true stored=true multiValued=false docValues=true/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index. I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM. I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field. 24GB of RAM to cache 175GB is probably not enough in the general case, but if you're seeing very little disk I/O activity for this query, then we'll leave that alone and you can worry about it later. What I would try immediately is setting the facet.method parameter to enum and seeing what that does to the facet time. I've had good luck generally with that, even in situations where the docs indicated that the default (fc) was supposed to work better. I have never explored the relationship between facet.method and docValues, though. I'm out of ideas after this. I don't have enough experience with faceting to help much. Thanks, Shawn
Re: Slow faceting performance on a docValues field
Range Faceting won't use the DocValues even if they are there set, it translates each gap to a filter. This means that it will end up using the FilterCache, which should cause faster followup queries if you repeat the same gaps (and don't commit). You may also want to try interval faceting, it will use DocValues instead of filters. The API is different, you'll have to provide the intervals yourself. Tomás On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey apa...@elyograg.org wrote: On 1/13/2015 10:35 AM, David Smith wrote: I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits. I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck. The facet part of the query is as follows: params: { facet.range: eventDate, f.eventDate.facet.range.end: 2015-05-13T16:37:18.000Z, f.eventDate.facet.range.gap: +1DAY, start: 0, rows: 10, f.eventDate.facet.range.start: 2005-03-13T16:37:18.000Z, f.eventDate.facet.mincount: 1, facet: true, debugQuery: true, _: 1421169383802 } And, the relevant schema definition is as follows: field name=eventDate type=tdate indexed=true stored=true multiValued=false docValues=true/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index. I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM. I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field. 24GB of RAM to cache 175GB is probably not enough in the general case, but if you're seeing very little disk I/O activity for this query, then we'll leave that alone and you can worry about it later. What I would try immediately is setting the facet.method parameter to enum and seeing what that does to the facet time. I've had good luck generally with that, even in situations where the docs indicated that the default (fc) was supposed to work better. I have never explored the relationship between facet.method and docValues, though. I'm out of ideas after this. I don't have enough experience with faceting to help much. Thanks, Shawn
Re: Best way to implement Spotlight of certain results
Maybe I can use grouping, but my understanding of the feature is not up to figuring that out :) I tried something like http://localhost:8983/solr/collection/select?q=childhood+cancergroup=ongroup.query=childhood+cancer Because the group.limit=1, I get a single result, and no other results. If I add group.field=title, then I get each result, in a group of 1 member... Eric's re-ranking I do understand - I can re-rank the top-N to make sure the spotlighted result is always first, avoiding the potential problem of having to overweight the title field.In practice, I may not ever need to use the reranking, but its there if I need it.This is enough, because it gives me talking points. On Fri, Jan 9, 2015 at 3:05 PM, Michał B. . m.bienkow...@gmail.com wrote: Maybe I understand you badly but I thing that you could use grouping to achieve such effect. If you could prepare two group queries one with exact match and other, let's say, default than you will be able to extract matches from grouping results. i.e (using default solr example collection) http://localhost:8983/solr/collection1/select?q=*:*group=truegroup.query=manu%3A%22Ap+Computer+Inc.%22group.query=name:Apple%2060%20GB%20iPod%20with%20Video%20Playback%20Blackgroup.limit=10 this query will return two groups one with exact match second with the rest standard results. Regars, Michal 2015-01-09 20:44 GMT+01:00 Erick Erickson erickerick...@gmail.com: Hmm, I wonder if the RerankingQueryParser might help here? See: https://cwiki.apache.org/confluence/display/solr/Query+Re-Ranking Best, Erick On Fri, Jan 9, 2015 at 10:35 AM, Dan Davis dansm...@gmail.com wrote: I have a requirement to spotlight certain results if the query text exactly matches the title or see reference (indexed by me as alttitle_t). What that means is that these matching results are shown above the top-10/20 list with different CSS and fields. Its like feeling lucky on google :) I have considered three ways of implementing this: 1. Assume that edismax qf/pf will boost these results to be first when there is an exact match on these important fields. The downside then is that my relevancy is constrained and I must maintain my configuration with title and alttitle_t as top search fields (see XML snippet below). I may have to overweight them to achieve the always first criteria. Another less major downside is that I must always return the spotlight summary field (for display) and the image to display on each search. These could be got from a database by the id, however, it is convenient to get them from Solr. 2. Issue two searches for every user search, and use a second set of parameters (change the search type and fields to search only by exact matching a specific string field spottitle_s). The search for the spotlight can then have its own configuration. The downside here is that I am using Django and pysolr for the front-end, and pysolr is both synchronous and tied to the requestHandler named select. Convention. Of course, running in parallel is not a fix-all - running a search takes some time, even if run in parallel. 3. Automate the population of elevate.xml so that all these 959 queries are here. This is probably best, but forces me to restart/reload when there are changes to this components. The elevation can be done through a query. What I'd love to do is to configure the select requestHandler to run both searches and return me both sets of results. Is there anyway to do that - apply the same q= parameter to two configured way to run a search? Something like sub queries? I suspect that approach 1 will get me through my demo and a brief evaluation period, but that either approach 2 or 3 will be the winner. Here's a snippet from my current qf/pf configuration: str name=qf title^100 alttitle_t^100 ... text /str str name=pf title^1000 alttitle_t^1000 ... text^10 /str Thanks, Dan Davis -- Michał Bieńkowski
Re: Occasionally getting error in solr suggester component.
I think you are probably getting bitten by one of the issues addressed in LUCENE-5889 I would recommend against using buildOnCommit=true - with a large index this can be a performance-killer. Instead, build the index yourself using the Solr spellchecker support (spellcheck.build=true) -Mike On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote: Hi all, I am experiencing a problem in Solr SuggestComponent Occasionally solr suggester component throws an error like Solr failed: {responseHeader:{status:500,QTime:1},error:{msg:suggester was not built,trace:java.lang.IllegalStateException: suggester was not built\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.lookup(AnalyzingInfixSuggester.java:368)\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.lookup(AnalyzingInfixSuggester.java:342)\n\tat org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\tat org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:199)\n\tat org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:234)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)\n\tat org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)\n\tat org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:680)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)\n\tat org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1002)\n\tat org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)\n\tat org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:745)\n,code:500}} This is not freequently happening, but idexing and suggestor component working togethere this error will occur. In solr config searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namehaSuggester/str str name=lookupImplAnalyzingInfixLookupFactory/str !-- org.apache.solr.spelling.suggest.fst -- str name=suggestAnalyzerFieldTypetextSpell/str str name=dictionaryImplDocumentDictionaryFactory/str !-- org.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory -- str name=fieldname/str str name=weightFieldpackageWeight/str str name=buildOnCommittrue/str /lst /searchComponent requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=suggesttrue/str str name=suggest.count10/str /lst arr name=components strsuggest/str /arr /requestHandler Can any one suggest where to look to figure out this error and why these errors are occurring? Thanks, dhanesh s.r --
Re: Slow faceting performance on a docValues field
Just a side question. In your first example you have dates set with time but in the second (where you set intervals) time is not set. Is this something that can be resolved having a field that only sets date (without time), and then use regular field faceting and facet.sort=index? If that's possible in your use case that may be faster. Tomás On Tue, Jan 13, 2015 at 11:12 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: No, you are not misreading, right now there is no automatic way of generating the intervals on the server side similar to range faceting... I guess it won't work in your case. Maybe you should create a Jira to add this feature to interval faceting. Tomás On Tue, Jan 13, 2015 at 10:44 AM, David Smith dsmiths...@yahoo.com.invalid wrote: Tomás, Thanks for the response -- the performance of my query makes perfect sense in light of your information. I looked at Interval faceting. My required interval is 1 day. I cannot change that requirement. Unless I am mis-reading the doc, that means to facet a 10 year range, the query needs to specify over 3,600 intervals ?? f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]etc,etc Each query would be 185MB in size if I structure it this way. I assume I must be mis-understanding how to use Interval faceting with dates. Are there any concrete examples you know of? A google search did not come up with much. Kind regards, Dave On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Range Faceting won't use the DocValues even if they are there set, it translates each gap to a filter. This means that it will end up using the FilterCache, which should cause faster followup queries if you repeat the same gaps (and don't commit). You may also want to try interval faceting, it will use DocValues instead of filters. The API is different, you'll have to provide the intervals yourself. Tomás On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey apa...@elyograg.org wrote: On 1/13/2015 10:35 AM, David Smith wrote: I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits. I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck. The facet part of the query is as follows: params: { facet.range: eventDate, f.eventDate.facet.range.end: 2015-05-13T16:37:18.000Z, f.eventDate.facet.range.gap: +1DAY, start: 0, rows: 10, f.eventDate.facet.range.start: 2005-03-13T16:37:18.000Z, f.eventDate.facet.mincount: 1, facet: true, debugQuery: true, _: 1421169383802 } And, the relevant schema definition is as follows: field name=eventDate type=tdate indexed=true stored=true multiValued=false docValues=true/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index. I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM. I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field. 24GB of RAM to cache 175GB is probably not enough in the general case, but if you're seeing very little disk I/O activity for this query, then we'll leave that alone and you can worry about it later. What I would try immediately is setting the facet.method parameter to enum and seeing what that does to the facet time. I've had good luck generally with that, even in situations where the docs indicated that the default (fc) was supposed to work better. I have never explored the relationship between facet.method and docValues, though. I'm out of ideas after this. I don't have enough experience with faceting to help much. Thanks, Shawn
Re: Unexplained leader initiated recovery after updates - SolrCmdDistributor no longer retries on RemoteSolrException
We are experiencing unexpected recovery events when a leader is sending updates to a replica. A java.net.SocketException: Connection reset² is encountered when updating the replica which triggers the recovery. In our previous Solr 4.6.1 installation, update errors triggered retry logic in the SolrCmdDistributor and the updates continued without triggering a leader initialized recovery. In our current 4.10.2 installation, this retry logic no longer occurs. It looks like the fix for https://issues.apache.org/jira/browse/SOLR-5509 removed this retry logic. See https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apach e/solr/update/SolrCmdDistributor.java?r1=1546672r2=1546164pathrev=1546672 . This change was introduced with Solr 4.7. The commit to remove the retry logic appears to have been removed when investigating an unstable test. I am wondering if the retry logic should be restored for production use. Should I open a ticket to restore the retry logic? Thanks, Lindsay On 2015-01-12, 5:36 PM, Lindsay Martin lmar...@abebooks.com wrote: I have uncovered some additional details in the shard leader log: 2015-01-11 09:38:00.693 [qtp268575911-3617101] INFO org.apache.solr.update.processor.LogUpdateProcessor [listings] webapp=/solr path=/update params{distrib.from=http://solr05.search.abebooks.com:8983/solr/listings/; u pdate.distrib=TOLEADERwt=javabinversion=2} {add=[14065572860 (1490024273004199936)]} 0 707 2015-01-11 09:38:00.913 [updateExecutor-1-thread-35734] ERROR org.apache.solr.update.StreamingSolrServers error java.net.SocketException: Connection reset snip
Re: Slow faceting performance on a docValues field
Tomás, Thanks for the response -- the performance of my query makes perfect sense in light of your information. I looked at Interval faceting. My required interval is 1 day. I cannot change that requirement. Unless I am mis-reading the doc, that means to facet a 10 year range, the query needs to specify over 3,600 intervals ?? f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]etc,etc Each query would be 185MB in size if I structure it this way. I assume I must be mis-understanding how to use Interval faceting with dates. Are there any concrete examples you know of? A google search did not come up with much. Kind regards, Dave On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Range Faceting won't use the DocValues even if they are there set, it translates each gap to a filter. This means that it will end up using the FilterCache, which should cause faster followup queries if you repeat the same gaps (and don't commit). You may also want to try interval faceting, it will use DocValues instead of filters. The API is different, you'll have to provide the intervals yourself. Tomás On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey apa...@elyograg.org wrote: On 1/13/2015 10:35 AM, David Smith wrote: I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits. I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck. The facet part of the query is as follows: params: { facet.range: eventDate, f.eventDate.facet.range.end: 2015-05-13T16:37:18.000Z, f.eventDate.facet.range.gap: +1DAY, start: 0, rows: 10, f.eventDate.facet.range.start: 2005-03-13T16:37:18.000Z, f.eventDate.facet.mincount: 1, facet: true, debugQuery: true, _: 1421169383802 } And, the relevant schema definition is as follows: field name=eventDate type=tdate indexed=true stored=true multiValued=false docValues=true/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index. I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM. I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field. 24GB of RAM to cache 175GB is probably not enough in the general case, but if you're seeing very little disk I/O activity for this query, then we'll leave that alone and you can worry about it later. What I would try immediately is setting the facet.method parameter to enum and seeing what that does to the facet time. I've had good luck generally with that, even in situations where the docs indicated that the default (fc) was supposed to work better. I have never explored the relationship between facet.method and docValues, though. I'm out of ideas after this. I don't have enough experience with faceting to help much. Thanks, Shawn
Re: Slow faceting performance on a docValues field
No, you are not misreading, right now there is no automatic way of generating the intervals on the server side similar to range faceting... I guess it won't work in your case. Maybe you should create a Jira to add this feature to interval faceting. Tomás On Tue, Jan 13, 2015 at 10:44 AM, David Smith dsmiths...@yahoo.com.invalid wrote: Tomás, Thanks for the response -- the performance of my query makes perfect sense in light of your information. I looked at Interval faceting. My required interval is 1 day. I cannot change that requirement. Unless I am mis-reading the doc, that means to facet a 10 year range, the query needs to specify over 3,600 intervals ?? f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]etc,etc Each query would be 185MB in size if I structure it this way. I assume I must be mis-understanding how to use Interval faceting with dates. Are there any concrete examples you know of? A google search did not come up with much. Kind regards, Dave On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Range Faceting won't use the DocValues even if they are there set, it translates each gap to a filter. This means that it will end up using the FilterCache, which should cause faster followup queries if you repeat the same gaps (and don't commit). You may also want to try interval faceting, it will use DocValues instead of filters. The API is different, you'll have to provide the intervals yourself. Tomás On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey apa...@elyograg.org wrote: On 1/13/2015 10:35 AM, David Smith wrote: I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits. I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck. The facet part of the query is as follows: params: { facet.range: eventDate, f.eventDate.facet.range.end: 2015-05-13T16:37:18.000Z, f.eventDate.facet.range.gap: +1DAY, start: 0, rows: 10, f.eventDate.facet.range.start: 2005-03-13T16:37:18.000Z, f.eventDate.facet.mincount: 1, facet: true, debugQuery: true, _: 1421169383802 } And, the relevant schema definition is as follows: field name=eventDate type=tdate indexed=true stored=true multiValued=false docValues=true/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index. I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM. I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field. 24GB of RAM to cache 175GB is probably not enough in the general case, but if you're seeing very little disk I/O activity for this query, then we'll leave that alone and you can worry about it later. What I would try immediately is setting the facet.method parameter to enum and seeing what that does to the facet time. I've had good luck generally with that, even in situations where the docs indicated that the default (fc) was supposed to work better. I have never explored the relationship between facet.method and docValues, though. I'm out of ideas after this. I don't have enough experience with faceting to help much. Thanks, Shawn
Re: How to configure Solr PostingsFormat block size
: assuming I've written the subclass of the postings format, I need to tell : Solr to use it. : : Do I just do something like: : : fieldType name=ocr class=solr.TextField postingsFormat=MySubclass / the postingFormat xml tag in schema.xml just refers to the name of the postingFormat in SPI -- which is discussed in the PostingFormat javadocs... https://lucene.apache.org/core/4_10_0/core/org/apache/lucene/codecs/PostingsFormat.html ...the nuts bolts of it is that the PostingFormat baseclass should take care of all the SPI name registration that you need based on what you pass to the super() construction ... allthough now that i think about it, i'm not sure how you'd go about specifying your own name for the PostingFormat when also doing something like subclassing Lucene41PostingsFormat ... there's no Lucene41PostingsFormat constructor you can call from your subclass to override the name. not sure what the expectation is there in the java API. : Is there a way to set this for all fieldtypes or would that require writing : a custom CodecFactory? SchemaCodecFactory uses the Lucene default for any fieldType that doesn't define it's own postingFormat -- so if you wnated to change the postingFormat or *every* fieldType, then yes: you'd need to override the CodecFactory itself. -Hoss http://www.lucidworks.com/
Re: Slow faceting performance on a docValues field
What is stumping me is that the search result has 3 hits, yet faceting those 3 hits takes 24 seconds. The documentation for facet.method=fc is quite explicit about how Solr does faceting: fc (stands for Field Cache) The facet counts are calculated by iterating over documents that match the query and summing the terms that appear in each document. This was the default method for single valued fields prior to Solr 1.4. If a search yielded millions of hits, I could understand 24 seconds to calculate the facets. But not for a search with only 3 hits. What am I missing? Regards, David On Tuesday, January 13, 2015 1:12 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: No, you are not misreading, right now there is no automatic way of generating the intervals on the server side similar to range faceting... I guess it won't work in your case. Maybe you should create a Jira to add this feature to interval faceting. Tomás On Tue, Jan 13, 2015 at 10:44 AM, David Smith dsmiths...@yahoo.com.invalid wrote: Tomás, Thanks for the response -- the performance of my query makes perfect sense in light of your information. I looked at Interval faceting. My required interval is 1 day. I cannot change that requirement. Unless I am mis-reading the doc, that means to facet a 10 year range, the query needs to specify over 3,600 intervals ?? f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]etc,etc Each query would be 185MB in size if I structure it this way. I assume I must be mis-understanding how to use Interval faceting with dates. Are there any concrete examples you know of? A google search did not come up with much. Kind regards, Dave On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Range Faceting won't use the DocValues even if they are there set, it translates each gap to a filter. This means that it will end up using the FilterCache, which should cause faster followup queries if you repeat the same gaps (and don't commit). You may also want to try interval faceting, it will use DocValues instead of filters. The API is different, you'll have to provide the intervals yourself. Tomás On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey apa...@elyograg.org wrote: On 1/13/2015 10:35 AM, David Smith wrote: I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits. I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck. The facet part of the query is as follows: params: { facet.range: eventDate, f.eventDate.facet.range.end: 2015-05-13T16:37:18.000Z, f.eventDate.facet.range.gap: +1DAY, start: 0, rows: 10, f.eventDate.facet.range.start: 2005-03-13T16:37:18.000Z, f.eventDate.facet.mincount: 1, facet: true, debugQuery: true, _: 1421169383802 } And, the relevant schema definition is as follows: field name=eventDate type=tdate indexed=true stored=true multiValued=false docValues=true/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index. I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM. I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field. 24GB of RAM to cache 175GB is probably not enough in the general case, but if you're seeing very little disk I/O activity for this query, then we'll leave that alone and you can worry about it later. What I would try immediately is setting the facet.method parameter to enum and seeing what that does to the facet time. I've had good luck generally with that, even in situations where the docs indicated that the default (fc) was supposed to work better. I have never explored the relationship between facet.method and docValues, though. I'm out of ideas after this. I don't have enough experience with faceting to help much. Thanks, Shawn
Re: Slow faceting performance on a docValues field
fc, fcs and enum only apply for field faceting, not range faceting. Tomás On Tue, Jan 13, 2015 at 11:24 AM, David Smith dsmiths...@yahoo.com.invalid wrote: What is stumping me is that the search result has 3 hits, yet faceting those 3 hits takes 24 seconds. The documentation for facet.method=fc is quite explicit about how Solr does faceting: fc (stands for Field Cache) The facet counts are calculated by iterating over documents that match the query and summing the terms that appear in each document. This was the default method for single valued fields prior to Solr 1.4. If a search yielded millions of hits, I could understand 24 seconds to calculate the facets. But not for a search with only 3 hits. What am I missing? Regards, David On Tuesday, January 13, 2015 1:12 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: No, you are not misreading, right now there is no automatic way of generating the intervals on the server side similar to range faceting... I guess it won't work in your case. Maybe you should create a Jira to add this feature to interval faceting. Tomás On Tue, Jan 13, 2015 at 10:44 AM, David Smith dsmiths...@yahoo.com.invalid wrote: Tomás, Thanks for the response -- the performance of my query makes perfect sense in light of your information. I looked at Interval faceting. My required interval is 1 day. I cannot change that requirement. Unless I am mis-reading the doc, that means to facet a 10 year range, the query needs to specify over 3,600 intervals ?? f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]etc,etc Each query would be 185MB in size if I structure it this way. I assume I must be mis-understanding how to use Interval faceting with dates. Are there any concrete examples you know of? A google search did not come up with much. Kind regards, Dave On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Range Faceting won't use the DocValues even if they are there set, it translates each gap to a filter. This means that it will end up using the FilterCache, which should cause faster followup queries if you repeat the same gaps (and don't commit). You may also want to try interval faceting, it will use DocValues instead of filters. The API is different, you'll have to provide the intervals yourself. Tomás On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey apa...@elyograg.org wrote: On 1/13/2015 10:35 AM, David Smith wrote: I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits. I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck. The facet part of the query is as follows: params: { facet.range: eventDate, f.eventDate.facet.range.end: 2015-05-13T16:37:18.000Z, f.eventDate.facet.range.gap: +1DAY, start: 0, rows: 10, f.eventDate.facet.range.start: 2005-03-13T16:37:18.000Z, f.eventDate.facet.mincount: 1, facet: true, debugQuery: true, _: 1421169383802 } And, the relevant schema definition is as follows: field name=eventDate type=tdate indexed=true stored=true multiValued=false docValues=true/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index. I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM. I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field. 24GB of RAM to cache 175GB is probably not enough in the general case, but if you're seeing very little disk I/O activity for this query, then we'll leave that alone and you can worry about it later. What I would try immediately is setting the facet.method parameter to enum and seeing what that does to the facet time. I've had good luck generally with that, even in situations where the docs indicated that the default (fc) was supposed to work better. I have never explored the relationship between facet.method and docValues, though. I'm out of ideas after this. I
Re: Solr grouping problem - need help
bq: My question is for indexed=false, stored=true field..what is optimized way to get unique values in such field. There isn't any. To do this you'll have to read the doc from disk, it'll be decompressed along the way and then the field is read. Note that this happens automatically when you call doc.getFieldValue or similar. At the stored=true level, you're always talking about complete documents. indexed=true is about putting the field data into efficient-access structures. They're completely different beasts. your original question was: Please guide me how i can tell solr not to tokenize stored field to decide unique groups.. Simply declare the field type you care about as a string type in schema.xml. The use a copyFeld directive to copy the data to the new type, and group on the new field. There are examples in the schema.xml of string types and copyFields that should help. Best, Erick On Tue, Jan 13, 2015 at 9:00 AM, Naresh Yadav nyadav@gmail.com wrote: Erick, my schema is same no change in that.. *Schema :* field name=tenant_pool type=text stored=true/ my guess is i had not mentioned indexed true or falsemay be default indexed is true My question is for indexed=false, stored=true field..what is optimized way to get unique values in such field.. On Tue, Jan 13, 2015 at 10:07 PM, Erick Erickson erickerick...@gmail.com wrote: Something is very wrong here. Have you perhaps been changing your schema without re-indexing? And I recommend you completely remove your data directory (the one with index and tlog subdirectories) after you change your schema.xml file. Because you're trying to group on a field that is _not_ indexed, you should be getting an error returned, something like: can not use FieldCache on a field which is neither indexed nor has doc values: As far as the tokenization comment is, just start by making the field you want to group on be stored=false indexed=true type=string Best, Erick On Tue, Jan 13, 2015 at 5:09 AM, Naresh Yadav nyadav@gmail.com wrote: Hi jack, Thanks for replying, i am new to solr please guide me on this. I have many such columns in my schema so copy field will create lot of duplicate fields beside i do not need any search on original field. My usecase is i do not want any search on tenant_pool field thats why i declared it as stored field not indexed. I just need to get unique values in this field. Please show some direction. On Tue, Jan 13, 2015 at 6:16 PM, Jack Krupansky jack.krupan...@gmail.com wrote: That's your job. The easiest way is to do a copyField to a string field. -- Jack Krupansky On Tue, Jan 13, 2015 at 7:33 AM, Naresh Yadav nyadav@gmail.com wrote: *Schema :* field name=tenant_pool type=text stored=true/ *Code :* SolrQuery q = new SolrQuery().setQuery(*:*); q.set(GroupParams.GROUP, true); q.set(GroupParams.GROUP_FIELD, tenant_pool); *Data :* tenant_pool : Baroda Farms tenant_pool : Ketty Farms *Output coming :* groupValue=Farms, docs=2 *Expected Output :* groupValue=Baroda Farms, docs=1 groupValue=Ketty Farms, docs=1 Please guide me how i can tell solr not to tokenize stored field to decide unique groups.. I want unique groups as exact value of field not the tokens which solr is doing currently. Thanks Naresh -- Cheers, Naresh Yadav +919960523401 http://nareshyadav.blogspot.com/ SSE, MetrixLine Inc.
RE: Distributed unit tests and SSL doesn't have a valid keystore
Thanks, we will supress it for now! M. -Original message- From:Mark Miller markrmil...@gmail.com Sent: Monday 12th January 2015 19:25 To: solr-user@lucene.apache.org Subject: Re: Distributed unit tests and SSL doesn't have a valid keystore I'd have to do some digging. Hossman might know offhand. You might just want to use @SupressSSL on the tests :) - Mark On Mon Jan 12 2015 at 8:45:11 AM Markus Jelsma markus.jel...@openindex.io wrote: Hi - in a small Maven project depending on Solr 4.10.3, running unit tests that extend BaseDistributedSearchTestCase randomly fail with SSL doesn't have a valid keystore, and a lot of zombie threads. We have a solrtest.keystore file laying around, but where to put it? Thanks, Markus
Re: Slow faceting performance on a docValues field
Could probably write a custom SearchComponent to prepend and expand the query for the required use case. Though if something then has to parse that query back, it would still be an issue. Regards, Alex Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 13 January 2015 at 14:12, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: No, you are not misreading, right now there is no automatic way of generating the intervals on the server side similar to range faceting... I guess it won't work in your case. Maybe you should create a Jira to add this feature to interval faceting. Tomás On Tue, Jan 13, 2015 at 10:44 AM, David Smith dsmiths...@yahoo.com.invalid wrote: Tomás, Thanks for the response -- the performance of my query makes perfect sense in light of your information. I looked at Interval faceting. My required interval is 1 day. I cannot change that requirement. Unless I am mis-reading the doc, that means to facet a 10 year range, the query needs to specify over 3,600 intervals ?? f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]etc,etc Each query would be 185MB in size if I structure it this way. I assume I must be mis-understanding how to use Interval faceting with dates. Are there any concrete examples you know of? A google search did not come up with much. Kind regards, Dave On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Range Faceting won't use the DocValues even if they are there set, it translates each gap to a filter. This means that it will end up using the FilterCache, which should cause faster followup queries if you repeat the same gaps (and don't commit). You may also want to try interval faceting, it will use DocValues instead of filters. The API is different, you'll have to provide the intervals yourself. Tomás On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey apa...@elyograg.org wrote: On 1/13/2015 10:35 AM, David Smith wrote: I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin): process: { time: 24709, query: { time: 54 }, facet: { time: 24574 }, The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits. I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck. The facet part of the query is as follows: params: { facet.range: eventDate, f.eventDate.facet.range.end: 2015-05-13T16:37:18.000Z, f.eventDate.facet.range.gap: +1DAY, start: 0, rows: 10, f.eventDate.facet.range.start: 2005-03-13T16:37:18.000Z, f.eventDate.facet.mincount: 1, facet: true, debugQuery: true, _: 1421169383802 } And, the relevant schema definition is as follows: field name=eventDate type=tdate indexed=true stored=true multiValued=false docValues=true/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index. I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM. I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field. 24GB of RAM to cache 175GB is probably not enough in the general case, but if you're seeing very little disk I/O activity for this query, then we'll leave that alone and you can worry about it later. What I would try immediately is setting the facet.method parameter to enum and seeing what that does to the facet time. I've had good luck generally with that, even in situations where the docs indicated that the default (fc) was supposed to work better. I have never explored the relationship between facet.method and docValues, though. I'm out of ideas after this. I don't have enough experience with faceting to help much. Thanks, Shawn
Re: Occasionally getting error in solr suggester component.
Related question - I see mention of needing to rebuild the spellcheck/suggest dictionary after solr core reload. I see spellcheckIndexDir in both the old wiki entry and the solr reference guide https://cwiki.apache.org/confluence/display/solr/Spell+Checking. If this parameter is provided, it sounds like the index is stored on the filesystem and need not be rebuilt each time the core is reloaded. Is this a correct understanding? On Tue, Jan 13, 2015 at 2:17 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: I think you are probably getting bitten by one of the issues addressed in LUCENE-5889 I would recommend against using buildOnCommit=true - with a large index this can be a performance-killer. Instead, build the index yourself using the Solr spellchecker support (spellcheck.build=true) -Mike On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote: Hi all, I am experiencing a problem in Solr SuggestComponent Occasionally solr suggester component throws an error like Solr failed: {responseHeader:{status:500,QTime:1},error:{msg:suggester was not built,trace:java.lang.IllegalStateException: suggester was not built\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester. lookup(AnalyzingInfixSuggester.java:368)\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester. lookup(AnalyzingInfixSuggester.java:342)\n\tat org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\tat org.apache.solr.spelling.suggest.SolrSuggester. getSuggestions(SolrSuggester.java:199)\n\tat org.apache.solr.handler.component.SuggestComponent. process(SuggestComponent.java:234)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody( SearchHandler.java:218)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:135)\n\tat org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper. handleRequest(RequestHandlers.java:246)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:777)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:418)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:207)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:243)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:210)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:225)\n\tat org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:123)\n\tat org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:168)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:98)\n\tat org.apache.catalina.valves.AccessLogValve.invoke( AccessLogValve.java:927)\n\tat org.apache.catalina.valves.RemoteIpValve.invoke( RemoteIpValve.java:680)\n\tat org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:118)\n\tat org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:407)\n\tat org.apache.coyote.http11.AbstractHttp11Processor.process( AbstractHttp11Processor.java:1002)\n\tat org.apache.coyote.AbstractProtocol$AbstractConnectionHandler. process(AbstractProtocol.java:579)\n\tat org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor. run(JIoEndpoint.java:312)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:745)\n,code:500}} This is not freequently happening, but idexing and suggestor component working togethere this error will occur. In solr config searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namehaSuggester/str str name=lookupImplAnalyzingInfixLookupFactory/str !-- org.apache.solr.spelling.suggest.fst -- str name=suggestAnalyzerFieldTypetextSpell/str str name=dictionaryImplDocumentDictionaryFactory/str !-- org.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory -- str name=fieldname/str str name=weightFieldpackageWeight/str str name=buildOnCommittrue/str /lst /searchComponent requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=suggesttrue/str str name=suggest.count10/str /lst arr name=components strsuggest/str /arr /requestHandler Can any one suggest where to look to figure out this error and why these errors are occurring? Thanks, dhanesh s.r --
Re: Frequent deletions
On 1/13/2015 12:10 AM, ig01 wrote: Unfortunately this is the case, we do have hundreds of millions of documents on one Solr instance/server. All our configs and schema are with default configurations. Our index size is 180G, does that mean that we need at least 180G heap size? If you have hundreds of millions of documents and the index is only 180GB, they must be REALLY tiny documents. The number of documents has a lot more impact on the heap requirements than the index size on disk. As described in my previous email, I have about 130GB of total index on my dev Solr server, and the heap is only 7GB. Everything I ask that machine to do, which includes optimizing shards that are up to 20GB each, works flawlessly. When a Solr index has 500 million documents, the amount of memory required to construct a single entry in the filterCache is over 60MB. The size of the filterCache in the default example config is 512 ... which means that if that cache ends up fully utilized, that's in the neighborhood of 30GB of RAM required for just one Solr cache. The amount of memory required for the Lucene FieldCache could be insane with 500 million documents, depending on the exact nature of the queries that you are doing. The index size on disk has a different tie to memory -- the RAM that is not allocated to programs is automatically used by the operating system for caching data on the disk. If you have plenty of RAM so the OS disk cache can effectively keep relevant parts of the index in memory, performance will not suffer. Anytime Solr must actually ask the disk for index data, it will be slow. With 120GB out of the 140GB total allocated to Solr, that leaves 20GB to cache 180GB of index data. That's almost certainly not enough. Although the OS disk cache requirements have no direct correlation with OOME exceptions, slow performance due to insufficient caching might lead *indirectly* to OOME, because the slow performance means that it's more likely you'll have many queries happening at the same time, which will lead to larger heap requirements. Thanks, Shawn
Solr fails to start with log file not found error
I get this error when starting Solr using the script in bin/solr tail cannot open `[path]/logs/solr.log’ for reading: No such file or directory It does not happen every time, but it does happen a lot. It sometimes clears up after a while. I have tried creating an empty file, but solr then just says: Backing up [path]/logs/solr.log And repeats the same error. I am guessing the problem is that it cannot get the error from the log file because the log file has not been created yet, but then how do I debug this? Running Solr 4.10.2 on Debian 7 using Jetty with the default IcedTea 2.5.3 java version 1.7.0_65 Thanks for any help or pointers.
Re: leader split-brain at least once a day - need help
Hi Mark, we're currently at 4.10.2, update to 4.10.3 ist scheduled for tomorrow. T Am 12.01.15 um 17:30 schrieb Mark Miller: bq. ClusterState says we are the leader, but locally we don't think so Generally this is due to some bug. One bug that can lead to it was recently fixed in 4.10.3 I think. What version are you on? - Mark On Mon Jan 12 2015 at 7:35:47 AM Thomas Lamy t.l...@cytainment.de wrote: Hi, I found no big/unusual GC pauses in the Log (at least manually; I found no free solution to analyze them that worked out of the box on a headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G before) on one of the nodes, after checking allocation after 1 hour run time was at about 2-3GB. That didn't move the time frame where a restart was needed, so I don't think Solr's JVM GC is the problem. We're trying to get all of our node's logs (zookeeper and solr) into Splunk now, just to get a better sorted view of what's going on in the cloud once a problem occurs. We're also enabling GC logging for zookeeper; maybe we were missing problems there while focussing on solr logs. Thomas Am 08.01.15 um 16:33 schrieb Yonik Seeley: It's worth noting that those messages alone don't necessarily signify a problem with the system (and it wouldn't be called split brain). The async nature of updates (and thread scheduling) along with stop-the-world GC pauses that can change leadership, cause these little windows of inconsistencies that we detect and log. -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data On Wed, Jan 7, 2015 at 5:01 AM, Thomas Lamy t.l...@cytainment.de wrote: Hi there, we are running a 3 server cloud serving a dozen single-shard/replicate-everywhere collections. The 2 biggest collections are ~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5, Tomcat 7.0.56, Oracle Java 1.7.0_72-b14 10 of the 12 collections (the small ones) get filled by DIH full-import once a day starting at 1am. The second biggest collection is updated usind DIH delta-import every 10 minutes, the biggest one gets bulk json updates with commits once in 5 minutes. On a regular basis, we have a leader information mismatch: org.apache.solr.update.processor.DistributedUpdateProcessor; Request says it is coming from leader, but we are the leader or the opposite org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState says we are the leader, but locally we don't think so One of these pop up once a day at around 8am, making either some cores going to recovery failed state, or all cores of at least one cloud node into state gone. This started out of the blue about 2 weeks ago, without changes to neither software, data, or client behaviour. Most of the time, we get things going again by restarting solr on the current leader node, forcing a new election - can this be triggered while keeping solr (and the caches) up? But sometimes this doesn't help, we had an incident last weekend where our admins didn't restart in time, creating millions of entries in /solr/oversser/queue, making zk close the connection, and leader re-elect fails. I had to flush zk, and re-upload collection config to get solr up again (just like in https://gist.github.com/ isoboroff/424fcdf63fa760c1d1a7). We have a much bigger cloud (7 servers, ~50GiB Data in 8 collections, 1500 requests/s) up and running, which does not have these problems since upgrading to 4.10.2. Any hints on where to look for a solution? Kind regards Thomas -- Thomas Lamy Cytainment AG Co KG Nordkanalstrasse 52 20097 Hamburg Tel.: +49 (40) 23 706-747 Fax: +49 (40) 23 706-139 Sitz und Registergericht Hamburg HRA 98121 HRB 86068 Ust-ID: DE213009476 -- Thomas Lamy Cytainment AG Co KG Nordkanalstrasse 52 20097 Hamburg Tel.: +49 (40) 23 706-747 Fax: +49 (40) 23 706-139 Sitz und Registergericht Hamburg HRA 98121 HRB 86068 Ust-ID: DE213009476 -- Thomas Lamy Cytainment AG Co KG Nordkanalstrasse 52 20097 Hamburg Tel.: +49 (40) 23 706-747 Fax: +49 (40) 23 706-139 Sitz und Registergericht Hamburg HRA 98121 HRB 86068 Ust-ID: DE213009476
Re: Extending solr analysis in index time
Dear Markus, Unfortunately I can not use payload since I want to retrieve this score to each user as a simple field alongside other fields. Unfortunately payload does not provide that. Also I dont want to change the default similarity method of Lucene, I just want to have this filed to do the sorting in some cases. Best regards. On Mon, Jan 12, 2015 at 10:26 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You mention having a list with important terms, then using payloads would be the most straightforward i suppose. You still need a custom similarity and custom query parser. Payloads work for us very well. M -Original message- From:Ahmet Arslan iori...@yahoo.com.INVALID Sent: Monday 12th January 2015 19:50 To: solr-user@lucene.apache.org Subject: Re: Extending solr analysis in index time Hi Ali, Reading your example, if you could somehow replace idf component with your importance weight, I think your use case looks like TFIDFSimilarity. Tf component remains same. https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html I also suggest you ask this in lucene mailing list. Someone familiar with similarity package can give insight on this. Ahmet On Monday, January 12, 2015 6:54 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Could you clarify what you mean by Lucene reverse index? That's not a term I am familiar with. -- Jack Krupansky On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Jack, Thank you very much. Yeah I was thinking of function query for sorting, but I have to problems in this case, 1) function query do the process at query time which I dont want to. 2) I also want to have the score field for retrieving and showing to users. Dear Alexandre, Here is some more explanation about the business behind the question: I am going to provide a field for each document, lets refer it as document_score. I am going to fill this field based on the information that could be extracted from Lucene reverse index. Assume I have a list of terms, called important terms and I am going to extract the term frequency for each of the terms inside this list per each document. To be honest I want to use the term frequency for calculating document_score. document_score should be storable since I am going to retrieve this field for each document. I also want to do sorting on document_store in case of preferred by user. I hope I did convey my point. Best regards. On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky jack.krupan...@gmail.com wrote: Won't function queries do the job at query time? You can add or multiply the tf*idf score by a function of the term frequency of arbitrary terms, using the tf, mul, and add functions. See: https://cwiki.apache.org/confluence/display/solr/Function+Queries -- Jack Krupansky On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Jack, Hi, I think you misunderstood my need. I dont want to change the default scoring behavior of Lucene (tf-idf) I just want to have another field to do sorting for some specific queries (not all the search business), however I am aware of Lucene payload. Thank you very much. On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky jack.krupan...@gmail.com wrote: You would do that with a custom similarity (scoring) class. That's an expert feature. In fact a SUPER-expert feature. Start by completely familiarizing yourself with how TF*IDF similarity already works: http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html And to use your custom similarity class in Solr: https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity -- Jack Krupansky On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian alinazem...@gmail.com wrote: Hi everybody, I am going to add some analysis to Solr at the index time. Here is what I am considering in my mind: Suppose I have two different fields for Solr schema, field a and field b. I am going to use the created reverse index in a way that some terms are considered as important ones and tell lucene to calculate a value based on these terms frequency per each document. For example let the word hello considered as important word with the weight of 2.0. Suppose the term frequency for this word at field a is 3 and at field b is 6 for document 1. Therefor the score value would be 2*3+(2*6)^2. I want to
Re: Tokenizer or Filter ?
Thanks Jack for your advice. Can you please explain me little more, how it works? From Apache Wiki it's not to clear for me. I can write some javaScript code when i want filtering some data ? In this case i have d1bla bla bla/d1 d2 bla bla bla /d2 d1bla bla bla /d1 and i want filtering d2 bla bla bla /d2, But in other case i want filtering all d1 /d1 then i suppose i used it at indexed data and filtering from them? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179173.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting error while indexing XML files on Hadoop
Hi to all from Istanbul, Turkey, I can say that I'm a newbie in Solr Hadoop, I’m trying to index XML files (ipod_other.xml from lucidworks’ example files, converted into sequence file format), using SolrXMLIngestMapper jars. I’ve modified the schema.xml file by making the necesssary addions of the fields stated in the ipod_other.xml file. *Here’s my command:* hadoop jar jobjar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.SolrXMLIngestMapper -c hdp1 -i /user/hadoop/output/1420812982906sfu/part-r-0 -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -s http://dc2vmhadappt01:8983/solr In the end I constatly get Didn’t ingest any documents, failing error. Anybody out there to help me out with this problem, any help is appreciated.. Thanks *Here are the addions to the schema.xml:* field name=id type=string indexed=true stored=true required=true multiValued=false / field name=name multiValued=true stored=true type=text_en indexed=true/ field name=sku type=text_en_splitting_tight indexed=true stored=true omitNorms=true/ field name=manu type=text_general indexed=true stored=true omitNorms=true/ field name=cat type=string indexed=true stored=true multiValued=true/ field name=features type=text_general indexed=true stored=true multiValued=true/ field name=includes type=text_general indexed=true stored=true termVectors=true termPositions=true termOffsets=true / field name=weight type=float indexed=true stored=true/ field name=price type=float indexed=true stored=true/ field name=popularity type=int indexed=true stored=true / field name=inStock type=boolean indexed=true stored=true / field name=store type=location indexed=true stored=true/ dynamicField name=*_dt type=dateindexed=true stored=true/ field name=data_source stored=false type=text_en indexed=true/ *And here is the ipod_other.xml file;* add doc field name=idF8V7067-APL-KIT/field field name=nameBelkin Mobile Power Cord for iPod w/ Dock/field field name=manuBelkin/field field name=catelectronics/field field name=catconnector/field field name=featurescar power adapter, white/field field name=weight4/field field name=price19.95/field field name=popularity1/field field name=inStockfalse/field field name=store45.17614,-93.87341/field field name=manufacturedate_dt2005-08-01T16:30:25Z/field /doc doc field name=idIW-02/field field name=nameiPod amp; iPod Mini USB 2.0 Cable/field field name=manuBelkin/field field name=catelectronics/field field name=catconnector/field field name=featurescar power adapter for iPod, white/field field name=weight2/field field name=price11.50/field field name=popularity1/field field name=inStockfalse/field field name=store37.7752,-122.4232/field field name=manufacturedate_dt2006-02-14T23:55:59Z/field /doc /add -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-error-while-indexing-XML-files-on-Hadoop-tp4179168.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers
Is it important where your leader is? If you just want to minimize leadership changes during rolling re-start, then you could restart in the opposite order (S3, S2, S1). That would give only 1 transition, but the end result would be a leader on S2 instead of S1 (not sure if that important to you or not). I know its not a fix, but it might be a workaround until the whole leadership moving is done? On 12 January 2015 at 18:17, Erick Erickson erickerick...@gmail.com wrote: Just skimming, but the problem here that I ran into was with the listeners. Each _Solr_ instance out there is listening to one of the ephemeral nodes (the one in front). So deleting a node does _not_ change which ephemeral node the associated Solr instance is listening to. So, for instance, when you delete S2..n-01 and re-add it, S2 is still looking at S1n-00 and will continue looking at S1...n-00 until S1n-00 is deleted. Deleting S2..n-01 will wake up S3 though, which should now be looking at S1n-000. Now you have two Solr listeners looking at the same ephemeral node. The key is that deleting S2...n-01 does _not_ wake up S2, just any solr instance that has a watch on the associated ephemeral node. The code you want is in LeaderElector.checkIfIamLeader to understand how it all works. Be aware that the sortSeqs call sorts the nodes by 1 sequence number 2 string comparison. Which has the unfortunate characteristic of a secondary sort by session ID. So two nodes with the same sequence number can sort before or after each other depending on which one gets a session higher/lower than the other. This is quite tricky to get right, I once created a patch for 4.10.3 by applying things in this order (some minor tweaks required). All SOLR- 6115 6512 6577 6513 6517 6670 6691 Good luck! Erick On Mon, Jan 12, 2015 at 8:54 AM, Zisis Tachtsidis zist...@runbox.com wrote: SolrCloud uses ZooKeeper sequence flags to keep track of the order in which nodes register themselves as leader candidates. The node with the lowest sequence number wins as leader of the shard. What I'm trying to do is to keep the leader re-assignments to the minimum during a rolling restart. In this direction I change the zk sequence numbers on the SolrCloud nodes when all nodes of the cluster are up and active. I'm using Solr 4.10.0 and I'm aware of SOLR-6491 which has a similar purpose but I'm trying to do it from outside, using the existing APIs without editing Solr source code. == TYPICAL SCENARIO == Suppose we have 3 Solr instances S1,S2,S3. They are started in the same order and the zk sequences assigned have as follows S1:-n_00 (LEADER) S2:-n_01 S3:-n_02 In a rolling restart we'll get S2 as leader (after S1 shutdown), then S3 (after S2 shutdown) and finally S1(after S3 shutdown), 3 changes in total. == MY ATTEMPT == By using SolrZkClient and the Zookeeper multi API I found a way to get rid of the old zknodes that participate in a shard's leader election and write new ones where we can assign the sequence number of our liking. S1:-n_00 (no code running here) S2:-n_04 (code deleting zknode -n_01 and creating -n_04) S3:-n_03 (code deleting zknode -n_02 and creating -n_03) In a rolling restart I'd expect to have S3 as leader (after S1 shutdown), no change (after S2 shutdown) and finally S1(after S3 shutdown), that is 2 changes. This will be constant no matter how many servers are added in SolrCloud while in the first scenarion the # of re-assignments equals the # of Solr servers. The problem occurs when S1 (LEADER) is shut down. The elections that take place still set S2 as leader, It's like ignoring the new sequence numbers. When I go to /solr/#/~cloud?view=tree the new sequence numbers are listed under /collections based on which S3 should have become the leader. Do you have any idea why the new state is not acknowledged during the elections? Is something cached? Or to put it bluntly do I have any chance down this path? If not what are my options? Is it possible to apply all patches under SOLR-6491 in isolation and continue from there? Thank you. Extra info which might help follows 1. Some logging related to leader elections after S1 has been shut down S2 - org.apache.solr.cloud.SyncStrategy Leader's attempt to sync with shard failed, moving to the next candidate S2 - org.apache.solr.cloud.ShardLeaderElectionContext We failed sync, but we have no versions - we can't sync in that case - we were active before, so become leader anyway S3 - org.apache.solr.cloud.LeaderElector Our node is no longer in line to be leader 2. And some sample code on how I perform the ZK re-sequencing // Read current zk nodes for a specific collection
Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers
Daniel Collins wrote Is it important where your leader is? If you just want to minimize leadership changes during rolling re-start, then you could restart in the opposite order (S3, S2, S1). That would give only 1 transition, but the end result would be a leader on S2 instead of S1 (not sure if that important to you or not). I know its not a fix, but it might be a workaround until the whole leadership moving is done? I think that rolling restarting the machines in the opposite order (S3,S2,S1) will result in S3 being the leader. It's a valid approach but shouldn't I have to revert to the original order (S1,S2,S3) to achieve the same result in the following rolling restart? This includes operational costs and complexity that I want to avoid. Erick Erickson wrote Just skimming, but the problem here that I ran into was with the listeners. Each _Solr_ instance out there is listening to one of the ephemeral nodes (the one in front). So deleting a node does _not_ change which ephemeral node the associated Solr instance is listening to. So, for instance, when you delete S2..n-01 and re-add it, S2 is still looking at S1n-00 and will continue looking at S1...n-00 until S1n-00 is deleted. Deleting S2..n-01 will wake up S3 though, which should now be looking at S1n-000. Now you have two Solr listeners looking at the same ephemeral node. The key is that deleting S2...n-01 does _not_ wake up S2, just any solr instance that has a watch on the associated ephemeral node. Thanks for the info Erick. I wasn't aware of this linked-list listeners structure between the zk nodes. Based on what you've said though I've changed my implementation a bit and it seems to be working at first glance. Of course it's not reliable yet but it looks promising. My original attempt S1:-n_00 (no code running here) S2:-n_04 (code deleting zknode -n_01 and creating -n_04) S3:-n_03 (code deleting zknode -n_02 and creating -n_03) has been changed to S1:-n_00 (no code running here) S2:-n_03 (code deleting zknode -n_01 and creating -n_03 using EPHEMERAL_SEQUENTIAL) S3:-n_02 (no code running here) Once S1 is shutdown S3 becomes leader since it listens to S1 now according to what you've said The original reason I pursued this minimize leadership changes quest was that it _could_ lead to data loss in some scenarios. I'm not entirely sure though and you could correct me on this and but I'm explaining myself. If you have incoming indexing requests during a rolling restart, could there be a case during the current leader shutdown where the leader-to-be-node could not have the time to sync with the current-leader-that-shut-downs-node in which case everyone will now sync to the new leader thus missing some updates. I've seen an installation having different index sizes in each replica that deteriorated over time. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973p4179147.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr grouping problem - need help
*Schema :* field name=tenant_pool type=text stored=true/ *Code :* SolrQuery q = new SolrQuery().setQuery(*:*); q.set(GroupParams.GROUP, true); q.set(GroupParams.GROUP_FIELD, tenant_pool); *Data :* tenant_pool : Baroda Farms tenant_pool : Ketty Farms *Output coming :* groupValue=Farms, docs=2 *Expected Output :* groupValue=Baroda Farms, docs=1 groupValue=Ketty Farms, docs=1 Please guide me how i can tell solr not to tokenize stored field to decide unique groups.. I want unique groups as exact value of field not the tokens which solr is doing currently. Thanks Naresh
Re: Solr startup script in version 4.10.3
Thank you for your responses. However, according to my tests, solr 4.10.3 doesn’t use server by default anymore due to the removal of these lines in the bin/solr script. # TODO: see SOLR-3619, need to support server or example # depending on the version of Solr if [ -e $SOLR_TIP/server/start.jar ]; then DEFAULT_SERVER_DIR=$SOLR_TIP/server else DEFAULT_SERVER_DIR=$SOLR_TIP/example fi Solr 5.0.0 does in both standalone and solrcloud modes ! This is great ! Dominique http://www.eolya.fr/ Le jeudi 8 janvier 2015, Anshum Gupta ans...@anshumgupta.net a écrit : Things have changed reasonably for the 5.0 release. In case of a standalone mode, it still defaults to the server directory. So you'd find your logs in server/logs. In case of solrcloud mode e.g. if you ran bin/solr -e cloud -noprompt this would default to stuff being copied into example directory (leaving server directory untouched) and everything would run from there. You will also have the option of just creating a new SOLR home and using that instead. See the following: https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud The link above is for the upcoming Solr 5.0 and is still work in progress but should give you more information. Hope that helps. On Tue, Jan 6, 2015 at 1:29 AM, Dominique Bejean dominique.bej...@eolya.fr javascript:; wrote: Hi, In release 4.10.3, the following lines were removed from solr starting script (bin/solr) # TODO: see SOLR-3619, need to support server or example # depending on the version of Solr if [ -e $SOLR_TIP/server/start.jar ]; then DEFAULT_SERVER_DIR=$SOLR_TIP/server else DEFAULT_SERVER_DIR=$SOLR_TIP/example fi However, the usage message always say -d dir Specify the Solr server directory; defaults to server Either the usage have to be fixed or the removed lines put back to the script. Personally, I like the default to server directory. My installation process in order to have a clean empty solr instance is to copy examples into server and remove directories like example-DIH, example-shemaless, multicore and solr/collection1 Solr server (or node) can be started without the -d parameter. If this makes sense, a Jira issue could be open. Dominique http://www.eolya.fr/ -- Anshum Gupta http://about.me/anshumgupta
Re: Solr grouping problem - need help
That's your job. The easiest way is to do a copyField to a string field. -- Jack Krupansky On Tue, Jan 13, 2015 at 7:33 AM, Naresh Yadav nyadav@gmail.com wrote: *Schema :* field name=tenant_pool type=text stored=true/ *Code :* SolrQuery q = new SolrQuery().setQuery(*:*); q.set(GroupParams.GROUP, true); q.set(GroupParams.GROUP_FIELD, tenant_pool); *Data :* tenant_pool : Baroda Farms tenant_pool : Ketty Farms *Output coming :* groupValue=Farms, docs=2 *Expected Output :* groupValue=Baroda Farms, docs=1 groupValue=Ketty Farms, docs=1 Please guide me how i can tell solr not to tokenize stored field to decide unique groups.. I want unique groups as exact value of field not the tokens which solr is doing currently. Thanks Naresh
Re: Extending solr analysis in index time
A function query or an update processor to create a separate field are still your best options. -- Jack Krupansky On Tue, Jan 13, 2015 at 4:18 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Markus, Unfortunately I can not use payload since I want to retrieve this score to each user as a simple field alongside other fields. Unfortunately payload does not provide that. Also I dont want to change the default similarity method of Lucene, I just want to have this filed to do the sorting in some cases. Best regards. On Mon, Jan 12, 2015 at 10:26 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You mention having a list with important terms, then using payloads would be the most straightforward i suppose. You still need a custom similarity and custom query parser. Payloads work for us very well. M -Original message- From:Ahmet Arslan iori...@yahoo.com.INVALID Sent: Monday 12th January 2015 19:50 To: solr-user@lucene.apache.org Subject: Re: Extending solr analysis in index time Hi Ali, Reading your example, if you could somehow replace idf component with your importance weight, I think your use case looks like TFIDFSimilarity. Tf component remains same. https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html I also suggest you ask this in lucene mailing list. Someone familiar with similarity package can give insight on this. Ahmet On Monday, January 12, 2015 6:54 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Could you clarify what you mean by Lucene reverse index? That's not a term I am familiar with. -- Jack Krupansky On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Jack, Thank you very much. Yeah I was thinking of function query for sorting, but I have to problems in this case, 1) function query do the process at query time which I dont want to. 2) I also want to have the score field for retrieving and showing to users. Dear Alexandre, Here is some more explanation about the business behind the question: I am going to provide a field for each document, lets refer it as document_score. I am going to fill this field based on the information that could be extracted from Lucene reverse index. Assume I have a list of terms, called important terms and I am going to extract the term frequency for each of the terms inside this list per each document. To be honest I want to use the term frequency for calculating document_score. document_score should be storable since I am going to retrieve this field for each document. I also want to do sorting on document_store in case of preferred by user. I hope I did convey my point. Best regards. On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky jack.krupan...@gmail.com wrote: Won't function queries do the job at query time? You can add or multiply the tf*idf score by a function of the term frequency of arbitrary terms, using the tf, mul, and add functions. See: https://cwiki.apache.org/confluence/display/solr/Function+Queries -- Jack Krupansky On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Jack, Hi, I think you misunderstood my need. I dont want to change the default scoring behavior of Lucene (tf-idf) I just want to have another field to do sorting for some specific queries (not all the search business), however I am aware of Lucene payload. Thank you very much. On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky jack.krupan...@gmail.com wrote: You would do that with a custom similarity (scoring) class. That's an expert feature. In fact a SUPER-expert feature. Start by completely familiarizing yourself with how TF*IDF similarity already works: http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html And to use your custom similarity class in Solr: https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity -- Jack Krupansky On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian alinazem...@gmail.com wrote: Hi everybody, I am going to add some analysis to Solr at the index time. Here is what I am considering in my mind: Suppose I have two different fields for Solr schema, field a and field b. I am going to use the created reverse index in a way that some terms are considered as important ones and tell lucene to
Re: Solr grouping problem - need help
Hi jack, Thanks for replying, i am new to solr please guide me on this. I have many such columns in my schema so copy field will create lot of duplicate fields beside i do not need any search on original field. My usecase is i do not want any search on tenant_pool field thats why i declared it as stored field not indexed. I just need to get unique values in this field. Please show some direction. On Tue, Jan 13, 2015 at 6:16 PM, Jack Krupansky jack.krupan...@gmail.com wrote: That's your job. The easiest way is to do a copyField to a string field. -- Jack Krupansky On Tue, Jan 13, 2015 at 7:33 AM, Naresh Yadav nyadav@gmail.com wrote: *Schema :* field name=tenant_pool type=text stored=true/ *Code :* SolrQuery q = new SolrQuery().setQuery(*:*); q.set(GroupParams.GROUP, true); q.set(GroupParams.GROUP_FIELD, tenant_pool); *Data :* tenant_pool : Baroda Farms tenant_pool : Ketty Farms *Output coming :* groupValue=Farms, docs=2 *Expected Output :* groupValue=Baroda Farms, docs=1 groupValue=Ketty Farms, docs=1 Please guide me how i can tell solr not to tokenize stored field to decide unique groups.. I want unique groups as exact value of field not the tokens which solr is doing currently. Thanks Naresh -- Cheers, Naresh Yadav +919960523401 http://nareshyadav.blogspot.com/ SSE, MetrixLine Inc.
Occasionally getting error in solr suggester component.
Hi all, I am experiencing a problem in Solr SuggestComponent Occasionally solr suggester component throws an error like Solr failed: {responseHeader:{status:500,QTime:1},error:{msg:suggester was not built,trace:java.lang.IllegalStateException: suggester was not built\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.lookup(AnalyzingInfixSuggester.java:368)\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.lookup(AnalyzingInfixSuggester.java:342)\n\tat org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\tat org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:199)\n\tat org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:234)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)\n\tat org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)\n\tat org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:680)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)\n\tat org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1002)\n\tat org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)\n\tat org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:745)\n,code:500}} This is not freequently happening, but idexing and suggestor component working togethere this error will occur. In solr config searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namehaSuggester/str str name=lookupImplAnalyzingInfixLookupFactory/str !-- org.apache.solr.spelling.suggest.fst -- str name=suggestAnalyzerFieldTypetextSpell/str str name=dictionaryImplDocumentDictionaryFactory/str !-- org.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory -- str name=fieldname/str str name=weightFieldpackageWeight/str str name=buildOnCommittrue/str /lst /searchComponent requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=suggesttrue/str str name=suggest.count10/str /lst arr name=components strsuggest/str /arr /requestHandler Can any one suggest where to look to figure out this error and why these errors are occurring? Thanks, dhanesh s.r -- -- -- IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its content are confidential to the intended recipient. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. It may not be disclosed to or used by anyone other than its intended recipient, nor may it be copied in any way. If received in error, please email a reply to the sender, then delete it from your system. Although this e-mail has been scanned for viruses, HiFX cannot ultimately accept any responsibility for viruses and it is your responsibility to scan attachments (if any). Before you print this email or attachments, please consider the negative environmental impacts associated with printing.
Re: Solr fails to start with log file not found error
By any chance are you trying to start Solr as a different user when this happens? I'm wondering if there's a permissions issue here Wild guess. On Tue, Jan 13, 2015 at 12:37 AM, Graeme Pietersz gra...@pietersz.net wrote: I get this error when starting Solr using the script in bin/solr tail cannot open `[path]/logs/solr.log’ for reading: No such file or directory It does not happen every time, but it does happen a lot. It sometimes clears up after a while. I have tried creating an empty file, but solr then just says: Backing up [path]/logs/solr.log And repeats the same error. I am guessing the problem is that it cannot get the error from the log file because the log file has not been created yet, but then how do I debug this? Running Solr 4.10.2 on Debian 7 using Jetty with the default IcedTea 2.5.3 java version 1.7.0_65 Thanks for any help or pointers.
Re: Extending solr analysis in index time
I decided to go for function query and implementing function query to read term frequency for each document from index. Anyway I did not find any tutorial which is matched my problem well. I really appreciate if somebody could provide me some useful tutorial or example for this case. Thank you very much. On Tue, Jan 13, 2015 at 4:21 PM, Jack Krupansky jack.krupan...@gmail.com wrote: A function query or an update processor to create a separate field are still your best options. -- Jack Krupansky On Tue, Jan 13, 2015 at 4:18 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Markus, Unfortunately I can not use payload since I want to retrieve this score to each user as a simple field alongside other fields. Unfortunately payload does not provide that. Also I dont want to change the default similarity method of Lucene, I just want to have this filed to do the sorting in some cases. Best regards. On Mon, Jan 12, 2015 at 10:26 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You mention having a list with important terms, then using payloads would be the most straightforward i suppose. You still need a custom similarity and custom query parser. Payloads work for us very well. M -Original message- From:Ahmet Arslan iori...@yahoo.com.INVALID Sent: Monday 12th January 2015 19:50 To: solr-user@lucene.apache.org Subject: Re: Extending solr analysis in index time Hi Ali, Reading your example, if you could somehow replace idf component with your importance weight, I think your use case looks like TFIDFSimilarity. Tf component remains same. https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html I also suggest you ask this in lucene mailing list. Someone familiar with similarity package can give insight on this. Ahmet On Monday, January 12, 2015 6:54 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Could you clarify what you mean by Lucene reverse index? That's not a term I am familiar with. -- Jack Krupansky On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Jack, Thank you very much. Yeah I was thinking of function query for sorting, but I have to problems in this case, 1) function query do the process at query time which I dont want to. 2) I also want to have the score field for retrieving and showing to users. Dear Alexandre, Here is some more explanation about the business behind the question: I am going to provide a field for each document, lets refer it as document_score. I am going to fill this field based on the information that could be extracted from Lucene reverse index. Assume I have a list of terms, called important terms and I am going to extract the term frequency for each of the terms inside this list per each document. To be honest I want to use the term frequency for calculating document_score. document_score should be storable since I am going to retrieve this field for each document. I also want to do sorting on document_store in case of preferred by user. I hope I did convey my point. Best regards. On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky jack.krupan...@gmail.com wrote: Won't function queries do the job at query time? You can add or multiply the tf*idf score by a function of the term frequency of arbitrary terms, using the tf, mul, and add functions. See: https://cwiki.apache.org/confluence/display/solr/Function+Queries -- Jack Krupansky On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Jack, Hi, I think you misunderstood my need. I dont want to change the default scoring behavior of Lucene (tf-idf) I just want to have another field to do sorting for some specific queries (not all the search business), however I am aware of Lucene payload. Thank you very much. On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky jack.krupan...@gmail.com wrote: You would do that with a custom similarity (scoring) class. That's an expert feature. In fact a SUPER-expert feature. Start by completely familiarizing yourself with how TF*IDF similarity already works: http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html And to use your custom similarity class in Solr:
Highting whole pharse
Highlighting does not highlight the whole Phrase, instead each word gets highlighted. I tried all the suggestions that was given, with no luck These are my special setting I tried for phrase highlighting hl.usePhraseHighlighter=true hl.q=query http://localhost.mathworks.com:8983/solr/db/select?q=syndrome%3A%22Override+ignored+for+property%22rows=1fl=syndrome_idwt=jsonindent=truehl=truehl.simple.pre=%3Cem%3Ehl.simple.post=%3C%2Fem%3Ehl.usePhraseHighlighter=truehl.q=%22Override+ignored+for+property%22hl.fragsize=1000 This is from my schema.xml field name=syndrome type=text_general indexed=true stored=true/ Should I add parameters in the indexing stage itself to make this work? Thanks for your time. Meena -- View this message in context: http://lucene.472066.n3.nabble.com/Highting-whole-pharse-tp4179219.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers
SolrCloud is intended to work in the rolling restart case... Index size, segment counts, segment names can (and will) be different on different replicas of the same shard without anything being amiss. Commits (hard) happen at different times across the replicas in a shard. Merging logic kicks in and may (will eventually in all probability) pick different segments to merge, with varying numbers of deleted docs that get purged etc. The numFound reported on a q=*:*distrib=false, or looking at the core in the admin screen for the replicas in question and noting numDocs should be identical though if 1 you've issued a hard commit with openSearcher=true _or_ a soft commit. 2 you haven't been indexing or haven't issued a commit as in 1 since you started looking. Best, Erick On Tue, Jan 13, 2015 at 4:20 AM, Zisis Tachtsidis zist...@runbox.com wrote: Daniel Collins wrote Is it important where your leader is? If you just want to minimize leadership changes during rolling re-start, then you could restart in the opposite order (S3, S2, S1). That would give only 1 transition, but the end result would be a leader on S2 instead of S1 (not sure if that important to you or not). I know its not a fix, but it might be a workaround until the whole leadership moving is done? I think that rolling restarting the machines in the opposite order (S3,S2,S1) will result in S3 being the leader. It's a valid approach but shouldn't I have to revert to the original order (S1,S2,S3) to achieve the same result in the following rolling restart? This includes operational costs and complexity that I want to avoid. Erick Erickson wrote Just skimming, but the problem here that I ran into was with the listeners. Each _Solr_ instance out there is listening to one of the ephemeral nodes (the one in front). So deleting a node does _not_ change which ephemeral node the associated Solr instance is listening to. So, for instance, when you delete S2..n-01 and re-add it, S2 is still looking at S1n-00 and will continue looking at S1...n-00 until S1n-00 is deleted. Deleting S2..n-01 will wake up S3 though, which should now be looking at S1n-000. Now you have two Solr listeners looking at the same ephemeral node. The key is that deleting S2...n-01 does _not_ wake up S2, just any solr instance that has a watch on the associated ephemeral node. Thanks for the info Erick. I wasn't aware of this linked-list listeners structure between the zk nodes. Based on what you've said though I've changed my implementation a bit and it seems to be working at first glance. Of course it's not reliable yet but it looks promising. My original attempt S1:-n_00 (no code running here) S2:-n_04 (code deleting zknode -n_01 and creating -n_04) S3:-n_03 (code deleting zknode -n_02 and creating -n_03) has been changed to S1:-n_00 (no code running here) S2:-n_03 (code deleting zknode -n_01 and creating -n_03 using EPHEMERAL_SEQUENTIAL) S3:-n_02 (no code running here) Once S1 is shutdown S3 becomes leader since it listens to S1 now according to what you've said The original reason I pursued this minimize leadership changes quest was that it _could_ lead to data loss in some scenarios. I'm not entirely sure though and you could correct me on this and but I'm explaining myself. If you have incoming indexing requests during a rolling restart, could there be a case during the current leader shutdown where the leader-to-be-node could not have the time to sync with the current-leader-that-shut-downs-node in which case everyone will now sync to the new leader thus missing some updates. I've seen an installation having different index sizes in each replica that deteriorated over time. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973p4179147.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SpellCheck (AutoComplete) Not Working In Distributed Environment
Still not able to get my autoComplete component to work in a distributed environment. Works fine on a non-distributed system. Also, on the distributed system, if I include distrib=false, it works. I have tried shards.qt and shards parameters, but they make no difference. I should add, I am running SolrCloud and ZooKeeper, if that makes any difference. I have played around with this quite a bit, but nothing seems to work. When I add shards.qt=/ac {the name of the request handler}, I get an error in the solr logs. It simply states: java.lang.NullPointerException. That's it nothing more. This is listed as logger SolrCore and SolrDispatchFilter. Any ideas, suggestions on how I can troubleshoot and find the problem? Is there something specific I should look for? Please find attached text file with relevant information from schema.xml and sorlconfig.xml. Any help greatly appreciated! Thanks, -Charles - Original Message - From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, December 30, 2014 6:07:13 PM Subject: Re: SpellCheck (AutoComplete) Not Working In Distributed Environment Did you try the shards parameter? See: https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-DistributedSpellCheck On Tue, Dec 30, 2014 at 2:20 PM, Charles Sanders csand...@redhat.com wrote: I'm running Solr 4.8 in a distributed environment (2 shards). I have added the spellcheck component to my request handler. In my test system, which is not distributed, it works. But when I move it to the Dev box, which is distributed, 2 shards, it is not working. Is there something additional I must do to get this to work in a distributed environment? requestHandler default=true name=standard class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfallText/str !-- default autocomplete settings for this search request handler -- str name=spellchecktrue/str str name=spellcheck.dictionaryandreasAutoComplete/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations5/str /lst arr name=last-components strautoComplete/str /arr /requestHandler searchComponent name=autoComplete class=solr.SpellCheckComponent lst name=spellchecker str name=nameandreasAutoComplete/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldsugg_allText/str str name=buildOnCommittrue/str float name=threshold.005/float str name=queryAnalyzerFieldTypetext_suggest/str /lst /searchComponent Any help greatly appreciated! Thanks, -Charles * Schema.xml *** field name=issue_suggest type=text_suggest indexed=true stored=false/ field name=sugg_allText type=text_suggest indexed=true multiValued=true stored=false/ fieldType name=text_suggest class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Solrconfig.xml *** !-- Auto-Complete component -- searchComponent name=autoComplete class=solr.SpellCheckComponent lst name=spellchecker str name=nameandreasAutoComplete/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldsugg_allText/str str name=buildOnCommittrue/str float name=threshold.005/float str name=queryAnalyzerFieldTypetext_suggest/str /lst lst name=spellchecker str name=namerecommendationsAutoComplete/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldissue_suggest/str str name=buildOnCommittrue/str float name=threshold.005/float str name=queryAnalyzerFieldTypetext_suggest/str /lst /searchComponent requestHandler name=/ac class=solr.SearchHandler lst name=defaults str name=spellchecktrue/str
Re: Solr limiting number of rows to indexed to 21500 every time.
Looks like you have an underlying JDBC problem. The socket representing your database connection seems to be going away. Have you tried running this query outside of Solr and iterating through all the results? How about in a standalone Java program? Do you have a DBA you can consult to see if there are any errors on the Oracle side? Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Jan 13, 2015 at 2:31 AM, Pankaj Sonawane pankaj4sonaw...@gmail.com wrote: Hi, I am using Solr DataImportHandler to index data from database table(Oracle). One of the column contains String representation of XML (Sample below). *options* *option name=A1/option* *option name=B2/option* *option name=C3/option* *.* *.* *.* */options //option can be 100-200* I want solr to index each 'name' in 'option' tag against its value ex. JSON for 1 row docs: [ { COL1: F, COL2: ASDF, COL3: ATCC, COL4: 29039757, A_s: 1, B_s: 2, C_s: 3, . . . * }* // appending '_s' to 'name' attribute for making dynamic fields. But while indexing data, *every time only 21500 rows get indexed*. After these much records get indexed I got following exception: *1320927 [Thread-15] ERROR org.apache.solr.handler.dataimport.EntityProcessorBase û getNext() failed for query 'SELECT col1,col2,col3,col4,XMLSERIALIZE(col5 AS CLOB) AS col5 FROM tableName':org.apache.solr.handler.dataimport.DataImportHandlerException: java.sql.SQLRecoverableException: No more data to read from socket* *at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:63)* *at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:378)* *at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$600(JdbcDataSource.java:258)* *at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:293)* *at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:116)* *at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)* *at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)* *at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)* *at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)* *at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)* *at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)* *at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)* *at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)* *at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)* *Caused by: java.sql.SQLRecoverableException: No more data to read from socket* *at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1200)* *at oracle.jdbc.driver.T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1865)* *at oracle.jdbc.driver.T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1757)* *at oracle.jdbc.driver.T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1750)* *at oracle.jdbc.driver.T4CClobAccessor.handlePrefetch(T4CClobAccessor.java:543)* *at oracle.jdbc.driver.T4CClobAccessor.unmarshalOneRow(T4CClobAccessor.java:197)* *at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:916)* *at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:835)* *at oracle.jdbc.driver.T4C8Oall.readRXD(T4C8Oall.java:664)* *at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:328)* *at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)* *at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:521)* *at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:194)* *at oracle.jdbc.driver.T4CStatement.fetch(T4CStatement.java:1074)* *at oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:369)* *at oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:273)* *at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:370)* *... 12 more* *1320928 [Thread-15] ERROR org.apache.solr.handler.dataimport.DocBuilder û Exception while processing: e1 document : SolrInputDocument(fields:
Re: Solr grouping problem - need help
Something is very wrong here. Have you perhaps been changing your schema without re-indexing? And I recommend you completely remove your data directory (the one with index and tlog subdirectories) after you change your schema.xml file. Because you're trying to group on a field that is _not_ indexed, you should be getting an error returned, something like: can not use FieldCache on a field which is neither indexed nor has doc values: As far as the tokenization comment is, just start by making the field you want to group on be stored=false indexed=true type=string Best, Erick On Tue, Jan 13, 2015 at 5:09 AM, Naresh Yadav nyadav@gmail.com wrote: Hi jack, Thanks for replying, i am new to solr please guide me on this. I have many such columns in my schema so copy field will create lot of duplicate fields beside i do not need any search on original field. My usecase is i do not want any search on tenant_pool field thats why i declared it as stored field not indexed. I just need to get unique values in this field. Please show some direction. On Tue, Jan 13, 2015 at 6:16 PM, Jack Krupansky jack.krupan...@gmail.com wrote: That's your job. The easiest way is to do a copyField to a string field. -- Jack Krupansky On Tue, Jan 13, 2015 at 7:33 AM, Naresh Yadav nyadav@gmail.com wrote: *Schema :* field name=tenant_pool type=text stored=true/ *Code :* SolrQuery q = new SolrQuery().setQuery(*:*); q.set(GroupParams.GROUP, true); q.set(GroupParams.GROUP_FIELD, tenant_pool); *Data :* tenant_pool : Baroda Farms tenant_pool : Ketty Farms *Output coming :* groupValue=Farms, docs=2 *Expected Output :* groupValue=Baroda Farms, docs=1 groupValue=Ketty Farms, docs=1 Please guide me how i can tell solr not to tokenize stored field to decide unique groups.. I want unique groups as exact value of field not the tokens which solr is doing currently. Thanks Naresh -- Cheers, Naresh Yadav +919960523401 http://nareshyadav.blogspot.com/ SSE, MetrixLine Inc.
Re: Tokenizer or Filter ?
Would it be sufficient for your user case to simply extract all the d1 into one field and all the d2 in another field? If so, the update processor script would be very simple, simply matching all d1.*/d1 and copying them to a separate field value and same for d2. If you want examples of script update processors, see my Solr e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Krupansky On Tue, Jan 13, 2015 at 9:21 AM, tomas.kalas kala...@email.cz wrote: Thanks Jack for your advice. Can you please explain me little more, how it works? From Apache Wiki it's not to clear for me. I can write some javaScript code when i want filtering some data ? In this case i have d1bla bla bla/d1 d2 bla bla bla /d2 d1bla bla bla /d1 and i want filtering d2 bla bla bla /d2, But in other case i want filtering all d1 /d1 then i suppose i used it at indexed data and filtering from them? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179173.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: leader split-brain at least once a day - need help
On 1/12/2015 5:34 AM, Thomas Lamy wrote: I found no big/unusual GC pauses in the Log (at least manually; I found no free solution to analyze them that worked out of the box on a headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G before) on one of the nodes, after checking allocation after 1 hour run time was at about 2-3GB. That didn't move the time frame where a restart was needed, so I don't think Solr's JVM GC is the problem. We're trying to get all of our node's logs (zookeeper and solr) into Splunk now, just to get a better sorted view of what's going on in the cloud once a problem occurs. We're also enabling GC logging for zookeeper; maybe we were missing problems there while focussing on solr logs. If you make a copy of the gc log, you can put it on another system with a GUI and graph it with this: http://sourceforge.net/projects/gcviewer Just double-click on the jar to run the program. I find it is useful for clarity on the graph to go to the View menu and uncheck everything except the two GC Times options. You can also change the zoom to a lower percentage so you can see more of the graph. That program is how I got the graph you can see on my wiki page about GC tuning: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Another possible problem is that your install is exhausting the thread pool. Tomcat defaults to a maxThreads value of only 200. There's a good chance that your setup will need more than 200 threads at least occasionally. If you're near the limit, having a thread problem once per day based on index activity seems like a good possibility. Try setting maxThreads to 1 in the Tomcat config. Thanks, Shawn
Re: Solr grouping problem - need help
Erick, my schema is same no change in that.. *Schema :* field name=tenant_pool type=text stored=true/ my guess is i had not mentioned indexed true or falsemay be default indexed is true My question is for indexed=false, stored=true field..what is optimized way to get unique values in such field.. On Tue, Jan 13, 2015 at 10:07 PM, Erick Erickson erickerick...@gmail.com wrote: Something is very wrong here. Have you perhaps been changing your schema without re-indexing? And I recommend you completely remove your data directory (the one with index and tlog subdirectories) after you change your schema.xml file. Because you're trying to group on a field that is _not_ indexed, you should be getting an error returned, something like: can not use FieldCache on a field which is neither indexed nor has doc values: As far as the tokenization comment is, just start by making the field you want to group on be stored=false indexed=true type=string Best, Erick On Tue, Jan 13, 2015 at 5:09 AM, Naresh Yadav nyadav@gmail.com wrote: Hi jack, Thanks for replying, i am new to solr please guide me on this. I have many such columns in my schema so copy field will create lot of duplicate fields beside i do not need any search on original field. My usecase is i do not want any search on tenant_pool field thats why i declared it as stored field not indexed. I just need to get unique values in this field. Please show some direction. On Tue, Jan 13, 2015 at 6:16 PM, Jack Krupansky jack.krupan...@gmail.com wrote: That's your job. The easiest way is to do a copyField to a string field. -- Jack Krupansky On Tue, Jan 13, 2015 at 7:33 AM, Naresh Yadav nyadav@gmail.com wrote: *Schema :* field name=tenant_pool type=text stored=true/ *Code :* SolrQuery q = new SolrQuery().setQuery(*:*); q.set(GroupParams.GROUP, true); q.set(GroupParams.GROUP_FIELD, tenant_pool); *Data :* tenant_pool : Baroda Farms tenant_pool : Ketty Farms *Output coming :* groupValue=Farms, docs=2 *Expected Output :* groupValue=Baroda Farms, docs=1 groupValue=Ketty Farms, docs=1 Please guide me how i can tell solr not to tokenize stored field to decide unique groups.. I want unique groups as exact value of field not the tokens which solr is doing currently. Thanks Naresh -- Cheers, Naresh Yadav +919960523401 http://nareshyadav.blogspot.com/ SSE, MetrixLine Inc.
Re: Solr large boolean filter
TermsQueryParser I think is somewhat new. Have you tried that one? https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 13 January 2015 at 12:54, rashmy1 rashm...@gmail.com wrote: Hello, We have a similar requirement where a large list of IDs needs to be sent to SOLR in filter query. Could someone please help understand if this feature is now supported in the new versions of SOLR? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-tp4070747p4179276.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Engage custom hit collector for special search processing
As insane as it sounds, I need to process all the results. No one document is more or less important than another. Only a few hundred unique docs will be sent to the client at any one time, but the users expect to page through them all. I don't expect sub-second performance for this task. I'm just hoping for something reasonable, and I can't define that either. -- View this message in context: http://lucene.472066.n3.nabble.com/Engage-custom-hit-collector-for-special-search-processing-tp4179348p4179366.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr unit tests intermittently fail with error: java.lang.NoClassDefFoundError: org/eclipse/jetty/util/security/CertificateUtils
On 1/13/2015 2:50 PM, brian4 wrote: The problem is the jetty-util version included in the Solr build is 6.1.26, but this particular package is from version 7+. Looks like it is a bug in the build files for Solr. I fixed it by downloading jetty 7 separately and manually adding jetty-util-7.6.16.v20140903.jar to the end of my classpath. The jetty version included in the Solr build for 4.x is 8.1.10.v20130312. There is a dependency in *Lucene* for 6.1.26, but it's a completely optional Lucene add-on that is not used in *Solr*. Thanks, Shawn
Engage custom hit collector for special search processing
I have a complicated problem to solve, and I don't know enough about lucene/solr to phrase the question properly. This is kind of a shot in the dark. My requirement is to return search results always in completely collapsed form, rolling up duplicates with a count. Duplicates are defined by whatever fields are requested. If the search requests fields A, B, C, then all matched documents that have identical values for those 3 fields are dupes. The field list may change with every new search request. What I do know is the super set of all fields that may be part of the field list at index time. I know this can't be done with configuration alone. It doesn't seem performant to retrieve all 1M+ docs and post process in Java. A very smart person told me that a custom hit collector should be able to do the filtering for me. So, maybe I create a custom search handler that somehow exposes this custom hit collector that can use FieldCache or DocValues to examine all the matches and filter the results in the way I've described above. So assuming this is a viable solution path, can anyone suggest some helpful posts, code fragments, books for me to review? I admit to being out of my depth, but this requirement isn't going away. I'm grasping for straws right now. thanks (using Solr 4.9) -- View this message in context: http://lucene.472066.n3.nabble.com/Engage-custom-hit-collector-for-special-search-processing-tp4179348.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow faceting performance on a docValues field
Shawn, I've been thinking along your lines, and continued to run tests through the day. The results surprised me. For my index, Solr range faceting time is most closely related to the total number of documents in the index for the range specified. The number of buckets in the range is a second factor. I found NO correlation whatsoever to the number of hits in the query. Whether I have 3 hits or 1,500,000 hits, it's ~24 seconds to facet the result for that same time period. That is what surprised me. For example, if my facet range is a 10 year period for which there exists 47M docs in the index, the facet time is 24 seconds. If I switch my facet range to a different 10 year period with 1.3M docs, the facet time drops to less than 5 seconds. If I go back to my original 10 year period (with 47M docs in the index), but facet by month instead of day, my facet time drops to 2.5 seconds. Now, I can't meet my user needs this way, but it does show the relationship between # of buckets and faceting time. Regards, David
Re: How to configure Solr PostingsFormat block size
: ...the nuts bolts of it is that the PostingFormat baseclass should take : care of all the SPI name registration that you need based on what you : pass to the super() construction ... allthough now that i think about it, : i'm not sure how you'd go about specifying your own name for the : PostingFormat when also doing something like subclassing : Lucene41PostingsFormat ... there's no Lucene41PostingsFormat constructor : you can call from your subclass to override the name. : : not sure what the expectation is there in the java API. ok, so i talked this through with mikemccand on IRC... in 4x, the API is actaully really dangerous - you can subclass things like Lucene41PostingsFormat w/o overriding the name used in SPI, and might really screw things up as far as what class is used to read back your files later. in the 5.0 APIs, these non-abstract codec related classes are all final to prevent exactly this type of behavior - but you can still use the constructor args to change behavior related to *writing* the index, and the classes all are designed to be smart enough that when they are loaded by SPI at search time, they can make sense of what's on disk (regardlessof wether non-default constructor args were used at index time) but the question remains: where does that leave you as a solr user who wants to write a plugin, since Solr only allows you to configure the SPI name (no constructor args) via 'postingFormat=foo' the anwser is that instead of writing a subclass, you would have to write a small proxy class, something like... public final class MyPfWrapper extends PostingFormat { PostingFormat pf = new Lucene50PostingsFormat(42, 9); public MyPfWrapper() { super(MyPfWrapper); } public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws IOException { return pf.fieldsConsumer(state); } public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws IOException { return pf.fieldsConsumer(state); } public FieldsProducer fieldsProducer(SegmentReadState state) throws IOException { return pf.fieldsProducer(state); } } ..and then refer to it with postingFormat=MyPfWrapper at index time, Solr will use SPI to find your MyPfWrapper class, which will delegate to an instance of Lucene50PostingsFormat constructed with the overriden constants, and then at query time the SegmentReader code paths will use SPI to find MyPfWrapper by name as well, and it will again delegate to Lucene50PostingsFormat for reading back the index. or at least: that's how it *should* work :) -Hoss http://www.lucidworks.com/
Re: Slow faceting performance on a docValues field
On 1/13/2015 11:44 AM, David Smith wrote: I looked at Interval faceting. My required interval is 1 day. I cannot change that requirement. Unless I am mis-reading the doc, that means to facet a 10 year range, the query needs to specify over 3,600 intervals ?? I am very ignorant of how the internals work ... but it sounds like the parameters you have chosen are basically making thousands of separate facets, almost all of which will ultimately return zero, and therefore be excluded from the results. If my naive assessment of the situation is even close to accurate, then I think the rest of this paragraph would apply: If we assume that those individual facets are running consecutively, each one would be completing in single-digit-millisecond time to add up to about 25 seconds. If we assume they are running in parallel, that's a LOT of work to handle all at once, and the actual workload might look more like it's consecutive because there aren't enough CPU resources to handle them truly in parallel. I don't know that thousands of facets can be sped up very much. Thanks, Shawn
Re: Engage custom hit collector for special search processing
Do you have a sense of what your typical queries would look like? I mean, maybe you wouldn't actually need to fetch more than a tiny fraction of those million documents. Do you only need to determine the top 10 or 20 or 50 unique field value row sets, or do you need to determine ALL unique row sets? The latter would never be very performant even as a custom handler/collector since it would have to scan all rows. Try a client-side solution that reads 100 (or 50 or 20 or 200) rows at a time, storing rows by the unique combination of field values, until you hit the threshold needed for number of unique row sets. -- Jack Krupansky On Tue, Jan 13, 2015 at 4:29 PM, tedsolr tsm...@sciquest.com wrote: I have a complicated problem to solve, and I don't know enough about lucene/solr to phrase the question properly. This is kind of a shot in the dark. My requirement is to return search results always in completely collapsed form, rolling up duplicates with a count. Duplicates are defined by whatever fields are requested. If the search requests fields A, B, C, then all matched documents that have identical values for those 3 fields are dupes. The field list may change with every new search request. What I do know is the super set of all fields that may be part of the field list at index time. I know this can't be done with configuration alone. It doesn't seem performant to retrieve all 1M+ docs and post process in Java. A very smart person told me that a custom hit collector should be able to do the filtering for me. So, maybe I create a custom search handler that somehow exposes this custom hit collector that can use FieldCache or DocValues to examine all the matches and filter the results in the way I've described above. So assuming this is a viable solution path, can anyone suggest some helpful posts, code fragments, books for me to review? I admit to being out of my depth, but this requirement isn't going away. I'm grasping for straws right now. thanks (using Solr 4.9) -- View this message in context: http://lucene.472066.n3.nabble.com/Engage-custom-hit-collector-for-special-search-processing-tp4179348.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Engage custom hit collector for special search processing
Sounds like: https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results http://heliosearch.org/the-collapsingqparserplugin-solrs-new-high-performance-field-collapsing-postfilter/ The main issue is your multi-field criteria. So you may need to extend/overwrite the comparison method. Plus you'd need to keep the counts. Which you should know since you are doing the filtering. Is this the right direction for what you need? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 13 January 2015 at 16:29, tedsolr tsm...@sciquest.com wrote: I have a complicated problem to solve, and I don't know enough about lucene/solr to phrase the question properly. This is kind of a shot in the dark. My requirement is to return search results always in completely collapsed form, rolling up duplicates with a count. Duplicates are defined by whatever fields are requested. If the search requests fields A, B, C, then all matched documents that have identical values for those 3 fields are dupes. The field list may change with every new search request. What I do know is the super set of all fields that may be part of the field list at index time. I know this can't be done with configuration alone. It doesn't seem performant to retrieve all 1M+ docs and post process in Java. A very smart person told me that a custom hit collector should be able to do the filtering for me. So, maybe I create a custom search handler that somehow exposes this custom hit collector that can use FieldCache or DocValues to examine all the matches and filter the results in the way I've described above. So assuming this is a viable solution path, can anyone suggest some helpful posts, code fragments, books for me to review? I admit to being out of my depth, but this requirement isn't going away. I'm grasping for straws right now. thanks (using Solr 4.9) -- View this message in context: http://lucene.472066.n3.nabble.com/Engage-custom-hit-collector-for-special-search-processing-tp4179348.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr unit tests intermittently fail with error: java.lang.NoClassDefFoundError: org/eclipse/jetty/util/security/CertificateUtils
The problem is the jetty-util version included in the Solr build is 6.1.26, but this particular package is from version 7+. Looks like it is a bug in the build files for Solr. I fixed it by downloading jetty 7 separately and manually adding jetty-util-7.6.16.v20140903.jar to the end of my classpath. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-unit-tests-intermittently-fail-with-error-java-lang-NoClassDefFoundError-org-eclipse-jetty-utils-tp4175652p4179356.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to configure Solr PostingsFormat block size
Thanks Hoss, This is starting to sound pretty complicated. Are you saying this is not doable with Solr 4.10? ...or at least: that's how it *should* work :) makes me a bit nervous about trying this on my own. Should I open a JIRA issue or am I probably the only person with a use case for replacing a TermIndexInterval setting with changing the min and max block size on the 41 postings format? Tom On Tue, Jan 13, 2015 at 3:16 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : ...the nuts bolts of it is that the PostingFormat baseclass should take : care of all the SPI name registration that you need based on what you : pass to the super() construction ... allthough now that i think about it, : i'm not sure how you'd go about specifying your own name for the : PostingFormat when also doing something like subclassing : Lucene41PostingsFormat ... there's no Lucene41PostingsFormat constructor : you can call from your subclass to override the name. : : not sure what the expectation is there in the java API. ok, so i talked this through with mikemccand on IRC... in 4x, the API is actaully really dangerous - you can subclass things like Lucene41PostingsFormat w/o overriding the name used in SPI, and might really screw things up as far as what class is used to read back your files later. in the 5.0 APIs, these non-abstract codec related classes are all final to prevent exactly this type of behavior - but you can still use the constructor args to change behavior related to *writing* the index, and the classes all are designed to be smart enough that when they are loaded by SPI at search time, they can make sense of what's on disk (regardlessof wether non-default constructor args were used at index time) but the question remains: where does that leave you as a solr user who wants to write a plugin, since Solr only allows you to configure the SPI name (no constructor args) via 'postingFormat=foo' the anwser is that instead of writing a subclass, you would have to write a small proxy class, something like... public final class MyPfWrapper extends PostingFormat { PostingFormat pf = new Lucene50PostingsFormat(42, 9); public MyPfWrapper() { super(MyPfWrapper); } public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws IOException { return pf.fieldsConsumer(state); } public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws IOException { return pf.fieldsConsumer(state); } public FieldsProducer fieldsProducer(SegmentReadState state) throws IOException { return pf.fieldsProducer(state); } } ..and then refer to it with postingFormat=MyPfWrapper at index time, Solr will use SPI to find your MyPfWrapper class, which will delegate to an instance of Lucene50PostingsFormat constructed with the overriden constants, and then at query time the SegmentReader code paths will use SPI to find MyPfWrapper by name as well, and it will again delegate to Lucene50PostingsFormat for reading back the index. or at least: that's how it *should* work :) -Hoss http://www.lucidworks.com/
Re: Engage custom hit collector for special search processing
You may also want to take a look at how AnalyticsQueries can be plugged in. This won't show you how to do the implementation but it will show you how you can plugin a custom collector. http://heliosearch.org/solrs-new-analyticsquery-api/ http://heliosearch.org/solrs-mergestrategy/ Joel Bernstein Search Engineer at Heliosearch On Tue, Jan 13, 2015 at 4:45 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Sounds like: https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results http://heliosearch.org/the-collapsingqparserplugin-solrs-new-high-performance-field-collapsing-postfilter/ The main issue is your multi-field criteria. So you may need to extend/overwrite the comparison method. Plus you'd need to keep the counts. Which you should know since you are doing the filtering. Is this the right direction for what you need? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 13 January 2015 at 16:29, tedsolr tsm...@sciquest.com wrote: I have a complicated problem to solve, and I don't know enough about lucene/solr to phrase the question properly. This is kind of a shot in the dark. My requirement is to return search results always in completely collapsed form, rolling up duplicates with a count. Duplicates are defined by whatever fields are requested. If the search requests fields A, B, C, then all matched documents that have identical values for those 3 fields are dupes. The field list may change with every new search request. What I do know is the super set of all fields that may be part of the field list at index time. I know this can't be done with configuration alone. It doesn't seem performant to retrieve all 1M+ docs and post process in Java. A very smart person told me that a custom hit collector should be able to do the filtering for me. So, maybe I create a custom search handler that somehow exposes this custom hit collector that can use FieldCache or DocValues to examine all the matches and filter the results in the way I've described above. So assuming this is a viable solution path, can anyone suggest some helpful posts, code fragments, books for me to review? I admit to being out of my depth, but this requirement isn't going away. I'm grasping for straws right now. thanks (using Solr 4.9) -- View this message in context: http://lucene.472066.n3.nabble.com/Engage-custom-hit-collector-for-special-search-processing-tp4179348.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to configure Solr PostingsFormat block size
: This is starting to sound pretty complicated. Are you saying this is not : doable with Solr 4.10? it should be doable in 4.10, using a wrapper class like the one i mentioned below (delegating to Lucene51PostingsFormat instead of Lucene50PostingsFormat) ... it's just that the 4.10 APIs are dangerous and let malicious/foolish java devs do scary things they shouldn't do. but what i outlined before (Below) is intended to work, and should continue to work in 5.x. : ...or at least: that's how it *should* work :) makes me a bit nervous : about trying this on my own. ...worst case scenerio, i overlooked something - but all it would take to verify that it's working is to try it at small scale: write the class, configure it, index a handful of docs, shutdown restart solr, and see if your index opens is correctly searchable -- if it is, then i didn't overlook anything, if it isn't then there is a bug somewhere and details of your experiement with your custom posting format (ie wrapper class) source in JIRA would be helpful. : Should I open a JIRA issue or am I probably the only person with a use case : for replacing a TermIndexInterval setting with changing the min and max : block size on the 41 postings format? you're the only person i've ever seen ask about it :) : public final class MyPfWrapper extends PostingFormat { :PostingFormat pf = new Lucene50PostingsFormat(42, 9); :public MyPfWrapper() { : super(MyPfWrapper); :} :public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws : IOException { : return pf.fieldsConsumer(state); :} :public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws : IOException { : return pf.fieldsConsumer(state); :} :public FieldsProducer fieldsProducer(SegmentReadState state) throws : IOException { : return pf.fieldsProducer(state); :} : } : : ..and then refer to it with postingFormat=MyPfWrapper -Hoss http://www.lucidworks.com/