Re: Highlighting stopwords
O. Klein wrote Hmm, now the synonyms aren't highlighted anymore. OK back to basic (im using trunk and FVH). What is the way to go about if I want to search on a field without stopwords, but still want to highlight the stopwords? (and still highlight synonyms and stemmed words)? I made new field content_hl to prevent problems coming from copyField. When using hl.q=content_hl:(spell Check) I now get highlighting including stopwords. but when using hl.q=content_hl:(SC) where SC is synonym I get no highlighting. Can you verify if synonyms work when using hl.q? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3743317.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Improving performance for SOLR geo queries?
hey thanks all for the suggestions, didn't have time to look into them yet as we're feature-sprinting for MWC, but will report back with some feedback over the next weeks (we will have a few more performance sprints in March) Best, Matthias On Mon, Feb 13, 2012 at 2:32 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Feb 9, 2012 at 1:46 PM, Yonik Seeley yo...@lucidimagination.com wrote: One way to speed up numeric range queries (at the cost of increased index size) is to lower the precisionStep. You could try changing this from 8 to 4 and then re-indexing to see how that affects your query speed. Your issue, and the fact that I had been looking at the post-filtering code again for another client, reminded me that I had been planning on implementing post-filtering for spatial. It's now checked into trunk. If you have the ability to use trunk, you can add a high cost (like cost=200) along with cache=false to trigger it. More details here: http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/ -Yonik lucidimagination.com -- Matthias Käppler Lead Developer API Mobile Qype GmbH Großer Burstah 50-52 20457 Hamburg Telephone: +49 (0)40 - 219 019 2 - 160 Skype: m_kaeppler Email: matth...@qype.com Managing Director: Ian Brotherston Amtsgericht Hamburg HRB 95913 This e-mail and its attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail and its attachments. Any unauthorized copying, disclosure or distribution of this e-mail and its attachments is strictly forbidden. This notice also applies to future messages.
Re: Language specific tokenizer for purpose of multilingual search in single-core solr,
only one field element? There should be two or? One for each language. paul Le 14 févr. 2012 à 07:34, bing a écrit : Hi, all, I want to do multilingual search in single-core solr. That requires to define language specific tokenizers in scheme.xml. Say for example, I have two tokenizers, one for English (en) and one for simplified Chinese (zh-cn). Can I just put following definitions together in one schema.xml, and both sets of the files ( stopwords, synonym, and protwords) in one directory? 1. fieldType and field definition for english (en) fieldType name=text_en class=solr.TextField positionIncrementGap=100 analyzer type=index language=en tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory protected=protwords_en.txt/ /analyzer . /fieldType field name=text_en type=text_en indexed=true stored=false multiValued=true/ 2. fieldType and field definition for Chinese (zh_cn) fieldType name=text_zh_ch class=solr.TextField positionIncrementGap=100 analyzer type=index language=zh_cn tokenizer class=org.wltea.analyzer.solr.IKTokenizerFactory// filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_ch.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory protected=protwords_en.txt/ /analyzer . /fieldType field name=text_zh_cn type=text_zh_cn indexed=true stored=false multiValued=true/ Best Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Language-specific-tokenizer-for-purpose-of-multilingual-search-in-single-core-solr-tp3742873p3742873.html Sent from the Solr - User mailing list archive at Nabble.com.
sort my results alphabetically on facetnames
I want to sort my results on the facetnames (not by their number of results). So now I have this (ordered by number of results): Instelling voor auditief gehandicapten (16) Audiologisch centrum (13) Huisartsenpraktijk (13) Instelling voor lichamelijk gehandicapten (13) Ambulancezorg (12) Beroepsorganisatie (12) What I want is this: Ambulancezorg (12) Audiologisch centrum (13) Beroepsorganisatie (12) Huisartsenpraktijk (13) Instelling voor auditief gehandicapten (16) Instelling voor lichamelijk gehandicapten (13) How can I change my request url to sort differently? My current request url is like so: http://localhost:8983/solr/zz_healthorg/select/?indent=onfacet=trueq=*:*fl=idfacet.field=healthorganizationtypes_raw_nlfacet.mincount=1 With the resul below This XML file does not appear to have any style information associated with it. The document tree is shown below. response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=facettrue/str str name=flid/str str name=indenton/str str name=facet.mincount1/str str name=q*:*/str str name=facet.fieldhealthorganizationtypes_raw_nl/str /lst /lst result name=response numFound=258 start=0 doc str name=id1/str /doc doc str name=id2/str /doc doc str name=id3/str /doc doc str name=id4/str /doc doc str name=id5/str /doc doc str name=id6/str /doc doc str name=id7/str /doc doc str name=id8/str /doc doc str name=id9/str /doc doc str name=id10/str /doc /result lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=healthorganizationtypes_raw_nl int name=Instelling voor auditief gehandicapten16/int int name=Audiologisch centrum13/int int name=Huisartsenpraktijk13/int int name=Instelling voor lichamelijk gehandicapten13/int int name=Ambulancezorg12/int int name=Beroepsorganisatie12/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst /response -- View this message in context: http://lucene.472066.n3.nabble.com/sort-my-results-alphabetically-on-facetnames-tp3743471p3743471.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re:how to monitor solr in newrelic
Try this when you start SOLR java -javaagent:/NEWRELICPATH/newrelic.jar -jar start.jar Normally you will see your SOLR installation on your newrelic dashboard in 2 minutes. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-monitor-solr-in-newrelic-tp3739567p3743488.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort my results alphabetically on facetnames
Hi! On 14.02.2012 13:09, PeterKerk wrote: I want to sort my results on the facetnames (not by their number of results). From the example you gave, I'd assume you don't want to sort by facet names but by facet values. Simply add facet.sort=index to your request; see http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort Or simply sort the facet result on your own. Greetings, Kuli
Re: Highlighting stopwords
O. Klein wrote O. Klein wrote Hmm, now the synonyms aren't highlighted anymore. OK back to basic (im using trunk and FVH). What is the way to go about if I want to search on a field without stopwords, but still want to highlight the stopwords? (and still highlight synonyms and stemmed words)? I made new field content_hl to prevent problems coming from copyField. When using hl.q=content_hl:(spell Check) I now get highlighting including stopwords. but when using hl.q=content_hl:(SC) where SC is synonym I get no highlighting. Can you verify if synonyms work when using hl.q? OK I got it working by using hl.q=content_hl:(spell Check) content_text:(spell Check) but it makes no sense to me. only difference between the 2 fields is the use of Stopwords. What's also weird is that a query like hl.q=content_spell:(SC) also highlights synonyms, eventhough this field has no synonyms. I have not been able to find any logic in the behavior of hl.q and how it analyses the query. Could you explain how it is supposed to work? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3743616.html Sent from the Solr - User mailing list archive at Nabble.com.
'foruns' don't match 'forum' with NGramFilterFactory (or EdgeNGramFilterFactory)
Hello all, I'm experimenting with NGramFilterFactory and EgdeNGramFilterFactory. Both of them shows a match in my solr admin analysis, but when I query 'foruns' doesn't find any 'forum'. analysis http://bhakta.casadomato.org:8982/solr/admin/analysis.jsp?nt=typename=textverbose=onhighlight=onval=f%C3%B3runsqverbose=onqval=f%C3%B3runs search http://bhakta.casadomato.org:8982/solr/select/?q=forunsversion=2.2start=0rows=10indent=on Anybody knows what's the problem? bráulio
Stemming and accents (HunspellStemFilterFactory)
Hello all, I'm evaluating the HunspellStemFilterFactory I found it works with a pt_PT dictionary. For example, if I search for 'fóruns' it stems it to 'fórum' and then find 'fórum' references. But if I search for 'foruns' (without accent), then HunspellStemFilterFactory cannot stem word, as it does' not exist in its dictionary. It there any way to make HunspellStemFilterFactory work without accents differences? best, bráulio
Re: SolrCloud Replication Question
Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:
Debugging on 3,5
I did find a solution, but the output is horrible. Why does explain look so badly? lst name=explainstr name=2H7DF 6.351252 = (MATCH) boost(*:*,query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0; ,def=0.0)), product of: 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm 6.351252 = query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0; ,def=0.0)=6.351252 /str defType=edismaxboost=query($param)param=multi_field:87 -- We like the boost parameter in SOLR 3.5 with eDismax. The question we have is what we would like to replace bq with boost, but we get the multi-valued field issue when we try to do this. Bill Bell Sent from mobile
Re: Solr binary response for C#?
It's not as compact as binary format, but would just using something like JSON help enough? This is really simple, just specify wt=json (there's a method to set this on the server, at least in Java). Otherwise, you might get a more knowledgeable response on the C# java list, I'm frankly clueless Best Erick On Mon, Feb 13, 2012 at 1:15 PM, naptowndev naptowndev...@gmail.com wrote: Admittedly I'm new to this, but the project we're working on feeds results from Solr to an ASP.net application. Currently we are using XML, but our payloads can be rather large, some up to 17MB. We are looking for a way to minimize that payload and increase performance and I'm curious if there's anything anyone has been working out that creates a binary response that can be read by C# (similar to the javabin response built into Solr). That, or if anyone has experience implementing an external protocol like Thrift with Solr and consuming it with C# - again all in the effort to increase performance across the wire and while being consumed. Any help and direction would be greatly appreciated! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-binary-response-for-C-tp3741101p3741101.html Sent from the Solr - User mailing list archive at Nabble.com.
Mmap
Does someone have an example of using unmap in 3.5 and chunksize? I am using Solr 3.5. I noticed in solrconfig.xml: directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ I don't see this parameter taking.. When I set -Dsolr.directoryFactory=solr.MMapDirectoryFactory How do I see the setting in the log or in stats.jsp ? I cannot find a place that indicates it is set or not. I would assume StandardDirectoryFactory is being used but I do see (when I set it or NOT set it) Bill Bell Sent from mobile
Re: Improving performance for SOLR geo queries?
Can we get this back ported to 3x? Bill Bell Sent from mobile On Feb 14, 2012, at 3:45 AM, Matthias Käppler matth...@qype.com wrote: hey thanks all for the suggestions, didn't have time to look into them yet as we're feature-sprinting for MWC, but will report back with some feedback over the next weeks (we will have a few more performance sprints in March) Best, Matthias On Mon, Feb 13, 2012 at 2:32 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Feb 9, 2012 at 1:46 PM, Yonik Seeley yo...@lucidimagination.com wrote: One way to speed up numeric range queries (at the cost of increased index size) is to lower the precisionStep. You could try changing this from 8 to 4 and then re-indexing to see how that affects your query speed. Your issue, and the fact that I had been looking at the post-filtering code again for another client, reminded me that I had been planning on implementing post-filtering for spatial. It's now checked into trunk. If you have the ability to use trunk, you can add a high cost (like cost=200) along with cache=false to trigger it. More details here: http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/ -Yonik lucidimagination.com -- Matthias Käppler Lead Developer API Mobile Qype GmbH Großer Burstah 50-52 20457 Hamburg Telephone: +49 (0)40 - 219 019 2 - 160 Skype: m_kaeppler Email: matth...@qype.com Managing Director: Ian Brotherston Amtsgericht Hamburg HRB 95913 This e-mail and its attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail and its attachments. Any unauthorized copying, disclosure or distribution of this e-mail and its attachments is strictly forbidden. This notice also applies to future messages.
Re: Highlighting stopwords
(12/02/14 22:25), O. Klein wrote: I have not been able to find any logic in the behavior of hl.q and how it analyses the query. Could you explain how it is supposed to work? Nothing special on hl.q. If you use hl.q, the value of it will be used for highlighting rather than the value of q. There's no tricks, I think. When using hl.q=content_hl:(spell Check) I now get highlighting including stopwords. but when using hl.q=content_hl:(SC) where SC is synonym I get no highlighting. Can you verify if synonyms work when using hl.q? : OK I got it working by using hl.q=content_hl:(spell Check) content_text:(spell Check) but it makes no sense to me. only difference between the 2 fields is the use of Stopwords. Uh, what you tried was that you changed the field between q and hl.q, that I've not expected use case when I proposed hl.q. Do you think that hl.text meats your needs? https://issues.apache.org/jira/browse/SOLR-1926?focusedCommentId=12871234page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12871234 koji -- Apache Solr Query Log Visualizer http://soleami.com/
Re: Stemming and accents (HunspellStemFilterFactory)
Hi Bráulio, I don't know about HunspellStemFilterFactory especially but concerning accents: There are several accent filter that will remove accents from your tokens. If the Hunspell filter factory requires the accents, then simply add the accent filters after Hunspell in your index and query filter chains. You would then have Hunspell produce the tokens as result of the stemming and only afterwards the accents would be removed (your example: 'forum' instead of 'fórum'). Do the same on the query side in case someone inputs accents. Accent filters are: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory (lowercases, as well!) http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory and others on that page. Chantal On Tue, 2012-02-14 at 14:48 +0100, Bráulio Bhavamitra wrote: Hello all, I'm evaluating the HunspellStemFilterFactory I found it works with a pt_PT dictionary. For example, if I search for 'fóruns' it stems it to 'fórum' and then find 'fórum' references. But if I search for 'foruns' (without accent), then HunspellStemFilterFactory cannot stem word, as it does' not exist in its dictionary. It there any way to make HunspellStemFilterFactory work without accents differences? best, bráulio
Re: Highlighting stopwords
Koji Sekiguchi wrote Uh, what you tried was that you changed the field between q and hl.q, that I've not expected use case when I proposed hl.q. Do you think that hl.text meats your needs? https://issues.apache.org/jira/browse/SOLR-1926?focusedCommentId=12871234page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12871234 koji -- Apache Solr Query Log Visualizer http://soleami.com/ Well, If I understand it correctly, yes. If this means that queries are analyzed like the field they are highlighting. That would give the highlighter a lot more flexibility. -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3744054.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Replication Question
Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com
Re: SolrJ + SolrCloud
No hard plans around that that at the moment, but when I free up some time I plan on looking at the JIRA issue I pointed to. Looks like a lot of the work may already be done. - mark On Feb 12, 2012, at 8:14 AM, Darren Govoni wrote: Thanks Mark. Is there any plan to make all the Solr search handlers work with SolrCloud, like MLT? That missing feature would prohibit us from using SolrCloud at the moment. :( On Sat, 2012-02-11 at 18:24 -0500, Mark Miller wrote: On Feb 11, 2012, at 6:02 PM, Darren Govoni wrote: Hi, Do all the normal facilities of Solr work with SolrCloud from SolrJ? Things like /mlt, /cluster, facets , tvf's, etc. Darren SolrJ works the same in SolrCloud mode as it does in non SolrCloud mode - it's fully supported. There is even a new SolrJ client called CloudSolrServer that has built in cluster awareness and load balancing. In terms of what is supported - anything that is supported with distributed search - that is most things, but there is the odd man out - like MLT - looks like an issue is open here: https://issues.apache.org/jira/browse/SOLR-788 but it's not resolved yet. - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com
Re: OR-FilterQuery
On Mon, Feb 13, 2012 at 11:17 PM, spr...@gmx.eu wrote: Hi, how efficent is such an query: q=some text fq=id:(1 OR 2 OR 3...) Should I better use q:some text AND id:(1 OR 2 OR 3...)? 1. These two opts have the different scoring. 2. if you hit same fq=id:(1 OR 2 OR 3...) many times you have a benefit due to reading docset from heap instead of searching on disk. Is the Filter Cache used for the OR'ed fq? Filter cache is used for whatever filter. I guess I didn't get you. Can't you rephrase your question? Thank you -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Need help with graphing function (MATH)
Thanks I'll have a look at this. I should have mentioned that the actual values on the graph aren't important rather I was showing an example of how the function should behave. On 2/13/12 6:25 PM, Kent Fitch wrote: Hi, assuming you have x and want to generate y, then maybe - if x 50, y = 150 - if x 175, y = 60 - otherwise : either y = (100/(e^((x -50)/75)^2)) + 50 http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175 - or maybe y =sin((x+5)/38)*42+105 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175 Regards, Kent Fitch On Tue, Feb 14, 2012 at 12:29 PM, Mark static.void@gmail.com mailto:static.void@gmail.com wrote: I need some help with one of my boost functions. I would like the function to look something like the following mockup below. Starts off flat then there is a gradual decline, steep decline then gradual decline and then back to flat. Can some of you math guys please help :) Thanks.
Re: OR-FilterQuery
bq: Is the Filter Cache used for the OR'ed fq? The filter cache is actually pretty simple conceptually. It's just a map where the key is the fq and the value is the set of documents that satisfy that fq (we'll skip the implementation here, just think of it as the list of all the docs that the fq selects). Solr doesn't attempt to do much with the key, just think of it as a single string. Whether or not an fq is reused from the cache depends upon whether the key is in the map. So fq=id:(1 OR 2 OR 3) will just look to see if id:(1 OR 2 OR 3) is a key. If so, it'll just use the document list stored in the cache. It won't match id:(1 OR 2) or id:(2) or id:1 OR id:2 OR id:3 In other words, there's no attempt to decompose the fq clause and store parts of it in the cache, it's exact-match or nothing. Hope that helps Erick On Mon, Feb 13, 2012 at 2:17 PM, spr...@gmx.eu wrote: Hi, how efficent is such an query: q=some text fq=id:(1 OR 2 OR 3...) Should I better use q:some text AND id:(1 OR 2 OR 3...)? Is the Filter Cache used for the OR'ed fq? Thank you
Re: Need help with graphing function (MATH)
On 14 February 2012 23:35, Mark static.void@gmail.com wrote: Thanks I'll have a look at this. I should have mentioned that the actual values on the graph aren't important rather I was showing an example of how the function should behave. [...] either y = (100/(e^((x -50)/75)^2)) + 50 [...] In general, the exponential will be better behave than the sinusoid. You can change the exact values by tweaking the coeffiocients in the equation. Regards, Gora
Re: Re: Solr 3.5 not starting on CentOS 6 or RHEL 5
Nope, I don't have a custom /tmp mount in fstab, I just have a basic CentOS 6 install for development and testing... Full everyone read/write permissions are in place on /tmp too. Is /tmp a separate file system? There are problems with people mounting /tmp with 'noexec' as a security precaution, which then causes Solr to fail. Russ Bernhardt Systems Office Dudley Knox Library, Naval Postgraduate School Monterey, CA
Re: OR-FilterQuery
Hi Em, I briefly read the thread. Are you talking about combing of cached clauses of BooleanQuery, instead of evaluating whole BQ as a filter? I found something like that in API (but only in API) http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean) Am I get you right? Why do you need it, btw? If I'm .. I have idea how to do it in two mins: q=+f:text +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)... Right leg will be a BooleanQuery with SHOULD clauses backed on cached queries (see below). if you are not scarred by the syntax yet you can implement trivial fqQParserPlugin, which will be just // lazily through User/Generic Cache q = new FilteredQuery (new MatchAllDocsQuery(), new CachingWrapperFilter(new QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V); return q; it will use per segment bitset at contrast to Solr's fq which caches for top level reader. WDYT? On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de wrote: Hi, have a look at: http://search-lucene.com/m/Z8lWGEiKoI I think not much had changed since then. Regards, Em Am 13.02.2012 20:17, schrieb spr...@gmx.eu: Hi, how efficent is such an query: q=some text fq=id:(1 OR 2 OR 3...) Should I better use q:some text AND id:(1 OR 2 OR 3...)? Is the Filter Cache used for the OR'ed fq? Thank you -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Need help with graphing function (MATH)
In general this kind of function is very easy to construct using sums of basic sigmoidal functions. The logistic and probit functions are commonly used for this. Sent from my iPhone On Feb 14, 2012, at 10:05, Mark static.void@gmail.com wrote: Thanks I'll have a look at this. I should have mentioned that the actual values on the graph aren't important rather I was showing an example of how the function should behave. On 2/13/12 6:25 PM, Kent Fitch wrote: Hi, assuming you have x and want to generate y, then maybe - if x 50, y = 150 - if x 175, y = 60 - otherwise : either y = (100/(e^((x -50)/75)^2)) + 50 http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175 - or maybe y =sin((x+5)/38)*42+105 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175 Regards, Kent Fitch On Tue, Feb 14, 2012 at 12:29 PM, Mark static.void@gmail.com mailto:static.void@gmail.com wrote: I need some help with one of my boost functions. I would like the function to look something like the following mockup below. Starts off flat then there is a gradual decline, steep decline then gradual decline and then back to flat. Can some of you math guys please help :) Thanks.
Re: Need help with graphing function (MATH)
Would you mind throwing out an example of these types of functions. Looking at Wikipedia (http://en.wikipedia.org/wiki/Probit) its seems like the Probit function is very similar to what I want. Thanks On 2/14/12 10:56 AM, Ted Dunning wrote: In general this kind of function is very easy to construct using sums of basic sigmoidal functions. The logistic and probit functions are commonly used for this. Sent from my iPhone On Feb 14, 2012, at 10:05, Markstatic.void@gmail.com wrote: Thanks I'll have a look at this. I should have mentioned that the actual values on the graph aren't important rather I was showing an example of how the function should behave. On 2/13/12 6:25 PM, Kent Fitch wrote: Hi, assuming you have x and want to generate y, then maybe - if x 50, y = 150 - if x 175, y = 60 - otherwise : either y = (100/(e^((x -50)/75)^2)) + 50 http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175 - or maybe y =sin((x+5)/38)*42+105 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175 Regards, Kent Fitch On Tue, Feb 14, 2012 at 12:29 PM, Markstatic.void@gmail.commailto:static.void@gmail.com wrote: I need some help with one of my boost functions. I would like the function to look something like the following mockup below. Starts off flat then there is a gradual decline, steep decline then gradual decline and then back to flat. Can some of you math guys please help :) Thanks.
Re: Need help with graphing function (MATH)
Or better yet an example in solr would be best :) Thanks! On 2/14/12 11:05 AM, Mark wrote: Would you mind throwing out an example of these types of functions. Looking at Wikipedia (http://en.wikipedia.org/wiki/Probit) its seems like the Probit function is very similar to what I want. Thanks On 2/14/12 10:56 AM, Ted Dunning wrote: In general this kind of function is very easy to construct using sums of basic sigmoidal functions. The logistic and probit functions are commonly used for this. Sent from my iPhone On Feb 14, 2012, at 10:05, Markstatic.void@gmail.com wrote: Thanks I'll have a look at this. I should have mentioned that the actual values on the graph aren't important rather I was showing an example of how the function should behave. On 2/13/12 6:25 PM, Kent Fitch wrote: Hi, assuming you have x and want to generate y, then maybe - if x 50, y = 150 - if x 175, y = 60 - otherwise : either y = (100/(e^((x -50)/75)^2)) + 50 http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175 - or maybe y =sin((x+5)/38)*42+105 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175 Regards, Kent Fitch On Tue, Feb 14, 2012 at 12:29 PM, Markstatic.void@gmail.commailto:static.void@gmail.com wrote: I need some help with one of my boost functions. I would like the function to look something like the following mockup below. Starts off flat then there is a gradual decline, steep decline then gradual decline and then back to flat. Can some of you math guys please help :) Thanks.
Re: OR-FilterQuery
Hi Mikhail, thanks for kicking in some brainstorming-code! The given thread is almost a year old and I was working with Solr in my freetime to see where it fails to behave/perform as I expect/wish. I found out that if you got a lot of different access-patterns for a filter-query, you might end up with either a big cache to make things fast or with lower performance (impact depends on usecase and circumstances). Scenario: You got a permission-field and the client is able to filter by one to three permission-values. That is: fq=foo:user fq=foo:moderator fq=foo:manager If you can not control/guarantee the order of the fq's values, you could end up with a lot of mess which all returns the same. Example: fq=permission:user OR permission:moderator OR permission:manager fq=permission:user OR permission:manager OR permission:moderator fq=permission:moderator OR permission:user OR permission:manager ... They all return the same but where cached seperately which leads to the fact that you are wasting memory a lot. Furthermore, if your access pattern will lead to a lot of different fq's on a small set of distinct values, it may make more sense to cache each filter-query for itself from a memory-consuming point of view (may cost a little bit performance). That beeing said, if you cache a filter for foo:user, foo:moderator and foo:manager you can combine those filters with AND, OR, NOT or whatever without recomputing every filter over and over again which would be the case if your filter-cache is not large enough. However, I never compared the performance differences (in terms of speed) of a cached filter-query like foo:bar OR foo:baz With a combination of two cached filter-queries like foo:bar foo:baz combined by a logical OR. That's how the background looks like. Unfortunately I didn't had the time to implement this in the past. Back to your post: Looks like a cool idea and is almost what I had in mind! I would formulate an easier syntax so that one is able to parse each fq-clause on its own to cache the CachingWrapperFilter to reuse it again. it will use per segment bitset at contrast to Solr's fq which caches for top level reader. Could you explain why this bitset would be per-segment based, please? I don't see a reason why this *have* to be so. What is the benefit you are seeing? Kind regards, Em Am 14.02.2012 19:33, schrieb Mikhail Khludnev: Hi Em, I briefly read the thread. Are you talking about combing of cached clauses of BooleanQuery, instead of evaluating whole BQ as a filter? I found something like that in API (but only in API) http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean) Am I get you right? Why do you need it, btw? If I'm .. I have idea how to do it in two mins: q=+f:text +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)... Right leg will be a BooleanQuery with SHOULD clauses backed on cached queries (see below). if you are not scarred by the syntax yet you can implement trivial fqQParserPlugin, which will be just // lazily through User/Generic Cache q = new FilteredQuery (new MatchAllDocsQuery(), new CachingWrapperFilter(new QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V); return q; it will use per segment bitset at contrast to Solr's fq which caches for top level reader. WDYT? On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de wrote: Hi, have a look at: http://search-lucene.com/m/Z8lWGEiKoI I think not much had changed since then. Regards, Em Am 13.02.2012 20:17, schrieb spr...@gmx.eu: Hi, how efficent is such an query: q=some text fq=id:(1 OR 2 OR 3...) Should I better use q:some text AND id:(1 OR 2 OR 3...)? Is the Filter Cache used for the OR'ed fq? Thank you
Re: Need help with graphing function (MATH)
Hi Mark, did you already had a look at http://wiki.apache.org/solr/FunctionQuery ? Regards, Em Am 14.02.2012 20:09, schrieb Mark: Or better yet an example in solr would be best :) Thanks! On 2/14/12 11:05 AM, Mark wrote: Would you mind throwing out an example of these types of functions. Looking at Wikipedia (http://en.wikipedia.org/wiki/Probit) its seems like the Probit function is very similar to what I want. Thanks On 2/14/12 10:56 AM, Ted Dunning wrote: In general this kind of function is very easy to construct using sums of basic sigmoidal functions. The logistic and probit functions are commonly used for this. Sent from my iPhone On Feb 14, 2012, at 10:05, Markstatic.void@gmail.com wrote: Thanks I'll have a look at this. I should have mentioned that the actual values on the graph aren't important rather I was showing an example of how the function should behave. On 2/13/12 6:25 PM, Kent Fitch wrote: Hi, assuming you have x and want to generate y, then maybe - if x 50, y = 150 - if x 175, y = 60 - otherwise : either y = (100/(e^((x -50)/75)^2)) + 50 http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175 - or maybe y =sin((x+5)/38)*42+105 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175 Regards, Kent Fitch On Tue, Feb 14, 2012 at 12:29 PM, Markstatic.void@gmail.commailto:static.void@gmail.com wrote: I need some help with one of my boost functions. I would like the function to look something like the following mockup below. Starts off flat then there is a gradual decline, steep decline then gradual decline and then back to flat. Can some of you math guys please help :) Thanks.
Re: Need help with graphing function (MATH)
In practice, I expect a linear piecewise function (with sharp corners) would be indistinguishable from the smoothed function. It is also much easier to read, test, and debug. It might even be faster. Try the sharp corners one first. wunder On Feb 14, 2012, at 10:56 AM, Ted Dunning wrote: In general this kind of function is very easy to construct using sums of basic sigmoidal functions. The logistic and probit functions are commonly used for this. Sent from my iPhone On Feb 14, 2012, at 10:05, Mark static.void@gmail.com wrote: Thanks I'll have a look at this. I should have mentioned that the actual values on the graph aren't important rather I was showing an example of how the function should behave. On 2/13/12 6:25 PM, Kent Fitch wrote: Hi, assuming you have x and want to generate y, then maybe - if x 50, y = 150 - if x 175, y = 60 - otherwise : either y = (100/(e^((x -50)/75)^2)) + 50 http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175 - or maybe y =sin((x+5)/38)*42+105 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175 Regards, Kent Fitch On Tue, Feb 14, 2012 at 12:29 PM, Mark static.void@gmail.com mailto:static.void@gmail.com wrote: I need some help with one of my boost functions. I would like the function to look something like the following mockup below. Starts off flat then there is a gradual decline, steep decline then gradual decline and then back to flat. Can some of you math guys please help :) Thanks.
Re: Solr 3.5 not starting on CentOS 6 or RHEL 5
Perhaps this is some kind of vufind specific issue? The server (/example) bundled with solr unpacks the war in /example/work and not /tmp -Yonik lucidimagination.com On Mon, Feb 13, 2012 at 7:06 PM, Bernhardt, Russell (CIV) rgber...@nps.edu wrote: A software package we use recently upgraded to Solr 3.5 (from 1.4.1) and now we're having problems getting the Solr server to start up under RHEL 5 or CentOS 6. I upgraded our local install of Java to the latest from Oracle and it didn't help, even removed the local OpenJDK just to be sure. When starting jetty manually (with java -jar start.jar) I get the following messages: 2012-02-13 07:52:55.954::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2012-02-13 07:52:56.120::INFO: jetty-6.1.11 2012-02-13 07:52:56.184::INFO: Extract jar:file:/opt/vufind/solr/jetty/webapps/solr.war!/ to /tmp/Jetty_0_0_0_0_8080_solr.war__solr__7k9npr/webapp 2012-02-13 07:52:56.702::WARN: Failed startup of context org.mortbay.jetty.webapp.WebAppContext@15aaf0b3{/solr,jar:file:/opt/vufind/solr/jetty/webapps/solr.war!/} java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(Unknown Source) at java.util.jar.JarFile.init(Unknown Source) at java.util.jar.JarFile.init(Unknown Source) at org.mortbay.jetty.webapp.TagLibConfiguration.configureWebApp(TagLibConfiguration.java:168) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1217) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:513) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:222) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:977) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:512) at org.mortbay.start.Main.main(Main.java:119) 2012-02-13 07:52:56.713::INFO: Opened /opt/vufind/solr/jetty/logs/2012_02_13.request.log 2012-02-13 07:52:56.740::INFO: Started SelectChannelConnector@0.0.0.0:8080 Jetty starts up just fine but shows a 503 error when attempting to access localhost:8080/solr/. The temp directory structure does exist in /tmp/. Any ideas? Thanks, Russ Bernhardt Systems Analyst Library Information Systems Naval Postgraduate School, Monterey CA
Re: SolrCloud Replication Question
Okay Jamie, I think I have a handle on this. It looks like an issue with what config files are being used by cores created with the admin core handler - I think it's just picking up default config and not the correct config for the collection. This means they end up using config that has no UpdateLog defined - and so recovery fails. I've added more logging around this so that it's easy to determine that. I'm investigating more and working on a test + fix. I'll file a JIRA issue soon as well. - Mark On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Sounds good, if I pull the latest from trunk and rerun will that be useful or were you able to duplicate my issue now? On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote: Okay Jamie, I think I have a handle on this. It looks like an issue with what config files are being used by cores created with the admin core handler - I think it's just picking up default config and not the correct config for the collection. This means they end up using config that has no UpdateLog defined - and so recovery fails. I've added more logging around this so that it's easy to determine that. I'm investigating more and working on a test + fix. I'll file a JIRA issue soon as well. - Mark On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: OR-FilterQuery
Whoa! fq=id(1 OR 2) is not the same thing at all as fq=id:1fq=id:2 Assuming that any document had one and only one ID, the second clause would return exactly 0 documents, each and every time. Multiple fq clauses are essentially set intersections. So the first query is the set of all documents where id is 1 or 2 the second is the intersection of two sets of documents, one set with an id of 1 and one with an id of 2. Not the same thing at all. There's no support for the concept of (fq=id:1 OR fq=id:2) Best Erick On Tue, Feb 14, 2012 at 2:13 PM, Em mailformailingli...@yahoo.de wrote: Hi Mikhail, thanks for kicking in some brainstorming-code! The given thread is almost a year old and I was working with Solr in my freetime to see where it fails to behave/perform as I expect/wish. I found out that if you got a lot of different access-patterns for a filter-query, you might end up with either a big cache to make things fast or with lower performance (impact depends on usecase and circumstances). Scenario: You got a permission-field and the client is able to filter by one to three permission-values. That is: fq=foo:user fq=foo:moderator fq=foo:manager If you can not control/guarantee the order of the fq's values, you could end up with a lot of mess which all returns the same. Example: fq=permission:user OR permission:moderator OR permission:manager fq=permission:user OR permission:manager OR permission:moderator fq=permission:moderator OR permission:user OR permission:manager ... They all return the same but where cached seperately which leads to the fact that you are wasting memory a lot. Furthermore, if your access pattern will lead to a lot of different fq's on a small set of distinct values, it may make more sense to cache each filter-query for itself from a memory-consuming point of view (may cost a little bit performance). That beeing said, if you cache a filter for foo:user, foo:moderator and foo:manager you can combine those filters with AND, OR, NOT or whatever without recomputing every filter over and over again which would be the case if your filter-cache is not large enough. However, I never compared the performance differences (in terms of speed) of a cached filter-query like foo:bar OR foo:baz With a combination of two cached filter-queries like foo:bar foo:baz combined by a logical OR. That's how the background looks like. Unfortunately I didn't had the time to implement this in the past. Back to your post: Looks like a cool idea and is almost what I had in mind! I would formulate an easier syntax so that one is able to parse each fq-clause on its own to cache the CachingWrapperFilter to reuse it again. it will use per segment bitset at contrast to Solr's fq which caches for top level reader. Could you explain why this bitset would be per-segment based, please? I don't see a reason why this *have* to be so. What is the benefit you are seeing? Kind regards, Em Am 14.02.2012 19:33, schrieb Mikhail Khludnev: Hi Em, I briefly read the thread. Are you talking about combing of cached clauses of BooleanQuery, instead of evaluating whole BQ as a filter? I found something like that in API (but only in API) http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean) Am I get you right? Why do you need it, btw? If I'm .. I have idea how to do it in two mins: q=+f:text +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)... Right leg will be a BooleanQuery with SHOULD clauses backed on cached queries (see below). if you are not scarred by the syntax yet you can implement trivial fqQParserPlugin, which will be just // lazily through User/Generic Cache q = new FilteredQuery (new MatchAllDocsQuery(), new CachingWrapperFilter(new QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V); return q; it will use per segment bitset at contrast to Solr's fq which caches for top level reader. WDYT? On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de wrote: Hi, have a look at: http://search-lucene.com/m/Z8lWGEiKoI I think not much had changed since then. Regards, Em Am 13.02.2012 20:17, schrieb spr...@gmx.eu: Hi, how efficent is such an query: q=some text fq=id:(1 OR 2 OR 3...) Should I better use q:some text AND id:(1 OR 2 OR 3...)? Is the Filter Cache used for the OR'ed fq? Thank you
Re: OR-FilterQuery
BTW, you're not the first person who would like this capability, see: https://issues.apache.org/jira/browse/SOLR-1223 But the fact that this JIRA was originally opened in in June of 2009 and hasn't been implemented yet indicates that it's not super-high priority. Best Erick On Tue, Feb 14, 2012 at 4:33 PM, Erick Erickson erickerick...@gmail.com wrote: Whoa! fq=id(1 OR 2) is not the same thing at all as fq=id:1fq=id:2 Assuming that any document had one and only one ID, the second clause would return exactly 0 documents, each and every time. Multiple fq clauses are essentially set intersections. So the first query is the set of all documents where id is 1 or 2 the second is the intersection of two sets of documents, one set with an id of 1 and one with an id of 2. Not the same thing at all. There's no support for the concept of (fq=id:1 OR fq=id:2) Best Erick On Tue, Feb 14, 2012 at 2:13 PM, Em mailformailingli...@yahoo.de wrote: Hi Mikhail, thanks for kicking in some brainstorming-code! The given thread is almost a year old and I was working with Solr in my freetime to see where it fails to behave/perform as I expect/wish. I found out that if you got a lot of different access-patterns for a filter-query, you might end up with either a big cache to make things fast or with lower performance (impact depends on usecase and circumstances). Scenario: You got a permission-field and the client is able to filter by one to three permission-values. That is: fq=foo:user fq=foo:moderator fq=foo:manager If you can not control/guarantee the order of the fq's values, you could end up with a lot of mess which all returns the same. Example: fq=permission:user OR permission:moderator OR permission:manager fq=permission:user OR permission:manager OR permission:moderator fq=permission:moderator OR permission:user OR permission:manager ... They all return the same but where cached seperately which leads to the fact that you are wasting memory a lot. Furthermore, if your access pattern will lead to a lot of different fq's on a small set of distinct values, it may make more sense to cache each filter-query for itself from a memory-consuming point of view (may cost a little bit performance). That beeing said, if you cache a filter for foo:user, foo:moderator and foo:manager you can combine those filters with AND, OR, NOT or whatever without recomputing every filter over and over again which would be the case if your filter-cache is not large enough. However, I never compared the performance differences (in terms of speed) of a cached filter-query like foo:bar OR foo:baz With a combination of two cached filter-queries like foo:bar foo:baz combined by a logical OR. That's how the background looks like. Unfortunately I didn't had the time to implement this in the past. Back to your post: Looks like a cool idea and is almost what I had in mind! I would formulate an easier syntax so that one is able to parse each fq-clause on its own to cache the CachingWrapperFilter to reuse it again. it will use per segment bitset at contrast to Solr's fq which caches for top level reader. Could you explain why this bitset would be per-segment based, please? I don't see a reason why this *have* to be so. What is the benefit you are seeing? Kind regards, Em Am 14.02.2012 19:33, schrieb Mikhail Khludnev: Hi Em, I briefly read the thread. Are you talking about combing of cached clauses of BooleanQuery, instead of evaluating whole BQ as a filter? I found something like that in API (but only in API) http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean) Am I get you right? Why do you need it, btw? If I'm .. I have idea how to do it in two mins: q=+f:text +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)... Right leg will be a BooleanQuery with SHOULD clauses backed on cached queries (see below). if you are not scarred by the syntax yet you can implement trivial fqQParserPlugin, which will be just // lazily through User/Generic Cache q = new FilteredQuery (new MatchAllDocsQuery(), new CachingWrapperFilter(new QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V); return q; it will use per segment bitset at contrast to Solr's fq which caches for top level reader. WDYT? On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de wrote: Hi, have a look at: http://search-lucene.com/m/Z8lWGEiKoI I think not much had changed since then. Regards, Em Am 13.02.2012 20:17, schrieb spr...@gmx.eu: Hi, how efficent is such an query: q=some text fq=id:(1 OR 2 OR 3...) Should I better use q:some text AND id:(1 OR 2 OR 3...)? Is the Filter Cache used for the OR'ed fq? Thank you
Re: OR-FilterQuery
Hi Erick, Whoa! fq=id(1 OR 2) is not the same thing at all as fq=id:1fq=id:2 Ahm, who said they would be the same? :) I mean, you are completely right in what you are saying but it seems to me that we are talking about two different things. I was talking about caching each filter-criteria instead of the whole filter-query to recombine the cached filter-criteria based on the boolean-operators the client sends. In other words: currently fq=id:1 OR id:2 results into ONE cached filter-entry. fq=id:2 OR id:1 results into ANOTHER cached filter-entry fq=id:2 AND id:1 results into (surprise, surprise) a third filter-entry (although this example does not make sense). My idea was to cache each filter-criteria, that means caching the bitset for id:1 and the bitset for id:2 to recombine both bitsets via AND, OR, NOT etc. whenever this is neccessary. This way one could save memory (and maybe computing-time as well) which definitely makes sense when you got a way smaller set of filter-criterias while having a much larger set of possible (and used) combinations of each filter-criteria with a small number of repetitions per combination (which would destroy the benefit of caching). Don't you agree? Kind regards, Em Am 14.02.2012 22:33, schrieb Erick Erickson: Whoa! fq=id(1 OR 2) is not the same thing at all as fq=id:1fq=id:2 Assuming that any document had one and only one ID, the second clause would return exactly 0 documents, each and every time. Multiple fq clauses are essentially set intersections. So the first query is the set of all documents where id is 1 or 2 the second is the intersection of two sets of documents, one set with an id of 1 and one with an id of 2. Not the same thing at all. There's no support for the concept of (fq=id:1 OR fq=id:2) Best Erick On Tue, Feb 14, 2012 at 2:13 PM, Em mailformailingli...@yahoo.de wrote: Hi Mikhail, thanks for kicking in some brainstorming-code! The given thread is almost a year old and I was working with Solr in my freetime to see where it fails to behave/perform as I expect/wish. I found out that if you got a lot of different access-patterns for a filter-query, you might end up with either a big cache to make things fast or with lower performance (impact depends on usecase and circumstances). Scenario: You got a permission-field and the client is able to filter by one to three permission-values. That is: fq=foo:user fq=foo:moderator fq=foo:manager If you can not control/guarantee the order of the fq's values, you could end up with a lot of mess which all returns the same. Example: fq=permission:user OR permission:moderator OR permission:manager fq=permission:user OR permission:manager OR permission:moderator fq=permission:moderator OR permission:user OR permission:manager ... They all return the same but where cached seperately which leads to the fact that you are wasting memory a lot. Furthermore, if your access pattern will lead to a lot of different fq's on a small set of distinct values, it may make more sense to cache each filter-query for itself from a memory-consuming point of view (may cost a little bit performance). That beeing said, if you cache a filter for foo:user, foo:moderator and foo:manager you can combine those filters with AND, OR, NOT or whatever without recomputing every filter over and over again which would be the case if your filter-cache is not large enough. However, I never compared the performance differences (in terms of speed) of a cached filter-query like foo:bar OR foo:baz With a combination of two cached filter-queries like foo:bar foo:baz combined by a logical OR. That's how the background looks like. Unfortunately I didn't had the time to implement this in the past. Back to your post: Looks like a cool idea and is almost what I had in mind! I would formulate an easier syntax so that one is able to parse each fq-clause on its own to cache the CachingWrapperFilter to reuse it again. it will use per segment bitset at contrast to Solr's fq which caches for top level reader. Could you explain why this bitset would be per-segment based, please? I don't see a reason why this *have* to be so. What is the benefit you are seeing? Kind regards, Em Am 14.02.2012 19:33, schrieb Mikhail Khludnev: Hi Em, I briefly read the thread. Are you talking about combing of cached clauses of BooleanQuery, instead of evaluating whole BQ as a filter? I found something like that in API (but only in API) http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean) Am I get you right? Why do you need it, btw? If I'm .. I have idea how to do it in two mins: q=+f:text +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)... Right leg will be a BooleanQuery with SHOULD clauses backed on cached queries (see below). if you are not scarred by the syntax yet you can implement trivial
Re: SolrCloud Replication Question
Doh - looks like I was just seeing a test issue. Do you mind updating and trying the latest rev? At the least there should be some better logging around the recovery. I'll keep working on tests in the meantime. - Mark On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote: Sounds good, if I pull the latest from trunk and rerun will that be useful or were you able to duplicate my issue now? On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote: Okay Jamie, I think I have a handle on this. It looks like an issue with what config files are being used by cores created with the admin core handler - I think it's just picking up default config and not the correct config for the collection. This means they end up using config that has no UpdateLog defined - and so recovery fails. I've added more logging around this so that it's easy to determine that. I'm investigating more and working on a test + fix. I'll file a JIRA issue soon as well. - Mark On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: Need help with graphing function (MATH)
agreeing with wunder - I don't know the application, but I think almost always, a set of linear approximations over a few ranges would be ok (and you could increase the number of ranges until it was), and will be faster. And if you need just one equation, a sigmoid function will do the trick, such as 110 - 50((x-100)/20)/(sqrt(1+((x-100)/20)^2)) http://www.wolframalpha.com/input/?i=plot+110+-+50%28%28x-100%29%2F20%29%2F%28sqrt%281%2B%28%28x-100%29%2F20%29 ^2%29%29%2C+x%3D0..200 Regards Kent Fitch On Wed, Feb 15, 2012 at 6:17 AM, Walter Underwood wun...@wunderwood.orgwrote: In practice, I expect a linear piecewise function (with sharp corners) would be indistinguishable from the smoothed function. It is also much easier to read, test, and debug. It might even be faster. Try the sharp corners one first. wunder On Feb 14, 2012, at 10:56 AM, Ted Dunning wrote: In general this kind of function is very easy to construct using sums of basic sigmoidal functions. The logistic and probit functions are commonly used for this. Sent from my iPhone On Feb 14, 2012, at 10:05, Mark static.void@gmail.com wrote: Thanks I'll have a look at this. I should have mentioned that the actual values on the graph aren't important rather I was showing an example of how the function should behave. On 2/13/12 6:25 PM, Kent Fitch wrote: Hi, assuming you have x and want to generate y, then maybe - if x 50, y = 150 - if x 175, y = 60 - otherwise : either y = (100/(e^((x -50)/75)^2)) + 50 http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e ^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175 - or maybe y =sin((x+5)/38)*42+105 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175 Regards, Kent Fitch On Tue, Feb 14, 2012 at 12:29 PM, Mark static.void@gmail.commailto: static.void@gmail.com wrote: I need some help with one of my boost functions. I would like the function to look something like the following mockup below. Starts off flat then there is a gradual decline, steep decline then gradual decline and then back to flat. Can some of you math guys please help :) Thanks.
Re: Semantic autocomplete with Solr
facetting? paul Le 14 févr. 2012 à 23:10, Octavian Covalschi a écrit : Hey guys, Has anyone done any kind of smart autocomplete? Let's say we have a web store, and we'd like to autocomplete user's searches. So if I'll type in jacket next word that will be suggested should be something related to jacket (color, fabric) etc... It seems to me I have to structure this data in a particular way, but that way I can do without solr, so I was wondering if Solr could help us. Thank you in advance.
Re: Can I rebuild an index and remove some fields?
I was thinking if I make a wrapper class that aggregates another IndexReader and filter out terms I don't want anymore it might work. And then pass that wrapper into SegmentMerger. I think if I filter out terms on GetFieldNames(...) and Terms(...) it might work. Something like: HashSetstring ignoredTerms=...; FilteringIndexReader wrapper=new FilterIndexReader(reader); SegmentMerger merger=new SegmentMerger(writer); merger.add(wrapper); merger.Merge(); On Feb 14, 2012, at 1:49 AM, Li Li wrote: for method 2, delete is wrong. we can't delete terms. you also should hack with the tii and tis file. On Tue, Feb 14, 2012 at 2:46 PM, Li Li fancye...@gmail.com wrote: method1, dumping data for stored fields, you can traverse the whole index and save it to somewhere else. for indexed but not stored fields, it may be more difficult. if the indexed and not stored field is not analyzed(fields such as id), it's easy to get from FieldCache.StringIndex. But for analyzed fields, though theoretically it can be restored from term vector and term position, it's hard to recover from index. method 2, hack with metadata 1. indexed fields delete by query, e.g. field:* 2. stored fields because all fields are stored sequentially. it's not easy to delete some fields. this will not affect search speed. but if you want to get stored fields, and the useless fields are very long, then it will slow down. also it's possible to hack with it. but need more effort to understand the index file format and traverse the fdt/fdx file. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html this will give you some insight. On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.comwrote: Lets say I have a large index (100M docs, 1TB, split up between 10 indexes). And a bunch of the stored and indexed fields are not used in search at all. In order to save memory and disk, I'd like to rebuild that index *without* those fields, but I don't have original documents to rebuild entire index with (don't have the full-text anymore, etc.). Is there some way to rebuild or optimize an existing index with only a sub-set of the existing indexed fields? Or alternatively is there a way to avoid loading some indexed fields at all ( to avoid loading term infos and terms index ) ? Thanks Bob
Re: Semantic autocomplete with Solr
Hm... I used it for some basic group by feature, but haven't thought of it for autocomplete. I'll give it a shot. Thanks! On Tue, Feb 14, 2012 at 4:19 PM, Paul Libbrecht p...@hoplahup.net wrote: facetting? paul Le 14 févr. 2012 à 23:10, Octavian Covalschi a écrit : Hey guys, Has anyone done any kind of smart autocomplete? Let's say we have a web store, and we'd like to autocomplete user's searches. So if I'll type in jacket next word that will be suggested should be something related to jacket (color, fabric) etc... It seems to me I have to structure this data in a particular way, but that way I can do without solr, so I was wondering if Solr could help us. Thank you in advance.
Re: Semantic autocomplete with Solr
done something along these lines: https://svnweb.cern.ch/trac/rcarepo/wiki/InspireAutoSuggest#Autosuggestautocompletefunctionality but you would need MontySolr for that - https://github.com/romanchyla/montysolr roman On Tue, Feb 14, 2012 at 11:10 PM, Octavian Covalschi octavian.covals...@gmail.com wrote: Hey guys, Has anyone done any kind of smart autocomplete? Let's say we have a web store, and we'd like to autocomplete user's searches. So if I'll type in jacket next word that will be suggested should be something related to jacket (color, fabric) etc... It seems to me I have to structure this data in a particular way, but that way I can do without solr, so I was wondering if Solr could help us. Thank you in advance.
payload and exact match
Is there the possibility of perform 'exact search' in a payload field? I'have to index text with auxiliary info for each word. In particular at each word is associated the bounding box containing it in the original pdf page (it is used for highligthing the search terms in the pdf). I used the payload to store that information. In the schema.xml, the fieldType definition is: --- fieldtype name=wppayloads stored=false indexed=true class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.DelimitedPayloadTokenFilterFactory encoder=identity/ /analyzer /fieldtype --- while the field definition is: --- field name=words type=wppayloads indexed=true stored=true required=true multiValued=true/ --- When indexing, the field 'words' contains a list of word|box as in the following example: --- doc_id=example words={Fonte:|307.62,948.16,324.62,954.25 Comune|326.29,948.16,349.07,954.25 di|350.74,948.16,355.62,954.25 Bologna|358.95,948.16,381.28,954.25} --- Such solution works well except in the case of an exact search. For example, assuming the only indexed doc is the 'example' doc (before shown), the query words:Comune di Bologna returns no results. Someone know if there is the possibility of perform 'exact search' in a payload field? Thanks in advance, Leonardo -- View this message in context: http://lucene.472066.n3.nabble.com/payload-and-exact-match-tp3745369p3745369.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr soft commit feature
Hi All, Is there a way to soft commit in the current released version of solr 3.5? Regards, Dipti Srivastava This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.
Re: OR-FilterQuery
Ah, OK, I misread your post apparently. And yes, what you suggest would result in some efficiencies, but at present I don't think there's any syntax that allows one to combine filter queries as you suggest. There was some discussion about it in the JIRA I referenced, but no action that I could see. That is, efficiencies in some circumstances, though I think it would be hard to predict. For instance, imagine a set of 100 entries in an FQ. And no, I'm not making things up, I've seen applications where this makes sense. Splitting that out into 100 separate entries in the filterCache would use up a lot of space. Likewise, I suspect that the actual process of creating the heuristics that were able to analyze an incoming filter query and do the right thing in terms of splitting it up and recombining it would be pretty hairy. Local parameters for instance, and let's throw in dereferencing too G... So I suspect that this is one of those features that is quite easy to see the benefits of in the simple case, but pretty quickly becomes a nightmare to actually implement correctly, but that's mostly a guess. And before putting the work into it, I think modeling the actual benefits would be wise, as well as convincing myself that there are enough cases where this *would* be beneficial. I mean Solr does a pretty reasonable job of caching these anyway, and with the non-cached filters it's not clear to me that the benefits are sufficient... Good luck, though, if you want to tackle it! Erick On Tue, Feb 14, 2012 at 4:54 PM, Em mailformailingli...@yahoo.de wrote: Hi Erick, Whoa! fq=id(1 OR 2) is not the same thing at all as fq=id:1fq=id:2 Ahm, who said they would be the same? :) I mean, you are completely right in what you are saying but it seems to me that we are talking about two different things. I was talking about caching each filter-criteria instead of the whole filter-query to recombine the cached filter-criteria based on the boolean-operators the client sends. In other words: currently fq=id:1 OR id:2 results into ONE cached filter-entry. fq=id:2 OR id:1 results into ANOTHER cached filter-entry fq=id:2 AND id:1 results into (surprise, surprise) a third filter-entry (although this example does not make sense). My idea was to cache each filter-criteria, that means caching the bitset for id:1 and the bitset for id:2 to recombine both bitsets via AND, OR, NOT etc. whenever this is neccessary. This way one could save memory (and maybe computing-time as well) which definitely makes sense when you got a way smaller set of filter-criterias while having a much larger set of possible (and used) combinations of each filter-criteria with a small number of repetitions per combination (which would destroy the benefit of caching). Don't you agree? Kind regards, Em Am 14.02.2012 22:33, schrieb Erick Erickson: Whoa! fq=id(1 OR 2) is not the same thing at all as fq=id:1fq=id:2 Assuming that any document had one and only one ID, the second clause would return exactly 0 documents, each and every time. Multiple fq clauses are essentially set intersections. So the first query is the set of all documents where id is 1 or 2 the second is the intersection of two sets of documents, one set with an id of 1 and one with an id of 2. Not the same thing at all. There's no support for the concept of (fq=id:1 OR fq=id:2) Best Erick On Tue, Feb 14, 2012 at 2:13 PM, Em mailformailingli...@yahoo.de wrote: Hi Mikhail, thanks for kicking in some brainstorming-code! The given thread is almost a year old and I was working with Solr in my freetime to see where it fails to behave/perform as I expect/wish. I found out that if you got a lot of different access-patterns for a filter-query, you might end up with either a big cache to make things fast or with lower performance (impact depends on usecase and circumstances). Scenario: You got a permission-field and the client is able to filter by one to three permission-values. That is: fq=foo:user fq=foo:moderator fq=foo:manager If you can not control/guarantee the order of the fq's values, you could end up with a lot of mess which all returns the same. Example: fq=permission:user OR permission:moderator OR permission:manager fq=permission:user OR permission:manager OR permission:moderator fq=permission:moderator OR permission:user OR permission:manager ... They all return the same but where cached seperately which leads to the fact that you are wasting memory a lot. Furthermore, if your access pattern will lead to a lot of different fq's on a small set of distinct values, it may make more sense to cache each filter-query for itself from a memory-consuming point of view (may cost a little bit performance). That beeing said, if you cache a filter for foo:user, foo:moderator and foo:manager you can combine those filters with AND, OR, NOT or whatever without recomputing every filter over and over again
Re: Solr soft commit feature
This has not been ported back to the 3.X line yet - mostly because it involved some rather large and invasive changes that I wanted to bake on trunk for some time first. Even still, the back port is not trivial, so I don't know that it's something I'd personally be able to get to in the short term. If I had any free time, I'd probably prefer pushing towards a 4 release with NRT. Some of the changes also broke back compat behavior in ways that are more acceptable over a major release. Someone else might jump in and do the work of course. On Feb 14, 2012, at 7:41 PM, Dipti Srivastava wrote: Hi All, Is there a way to soft commit in the current released version of solr 3.5? Regards, Dipti Srivastava This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system. - Mark Miller lucidimagination.com
Re: Can I rebuild an index and remove some fields?
I have roughly read the codes of 4.0 trunk. maybe it's feasible. SegmentMerger.add(IndexReader) will add to be merged Readers merge() will call mergeTerms(segmentWriteState); mergePerDoc(segmentWriteState); mergeTerms() will construct fields from IndexReaders for(int readerIndex=0;readerIndexmergeState.readers.size();readerIndex++) { final MergeState.IndexReaderAndLiveDocs r = mergeState.readers.get(readerIndex); final Fields f = r.reader.fields(); final int maxDoc = r.reader.maxDoc(); if (f != null) { slices.add(new ReaderUtil.Slice(docBase, maxDoc, readerIndex)); fields.add(f); } docBase += maxDoc; } So If you wrapper your IndexReader and override its fields() method, maybe it will work for merge terms. for DocValues, it can also override AtomicReader.docValues(). just return null for fields you want to remove. maybe it should traverse CompositeReader's getSequentialSubReaders() and wrapper each AtomicReader other things like term vectors norms are similar. On Wed, Feb 15, 2012 at 6:30 AM, Robert Stewart bstewart...@gmail.comwrote: I was thinking if I make a wrapper class that aggregates another IndexReader and filter out terms I don't want anymore it might work. And then pass that wrapper into SegmentMerger. I think if I filter out terms on GetFieldNames(...) and Terms(...) it might work. Something like: HashSetstring ignoredTerms=...; FilteringIndexReader wrapper=new FilterIndexReader(reader); SegmentMerger merger=new SegmentMerger(writer); merger.add(wrapper); merger.Merge(); On Feb 14, 2012, at 1:49 AM, Li Li wrote: for method 2, delete is wrong. we can't delete terms. you also should hack with the tii and tis file. On Tue, Feb 14, 2012 at 2:46 PM, Li Li fancye...@gmail.com wrote: method1, dumping data for stored fields, you can traverse the whole index and save it to somewhere else. for indexed but not stored fields, it may be more difficult. if the indexed and not stored field is not analyzed(fields such as id), it's easy to get from FieldCache.StringIndex. But for analyzed fields, though theoretically it can be restored from term vector and term position, it's hard to recover from index. method 2, hack with metadata 1. indexed fields delete by query, e.g. field:* 2. stored fields because all fields are stored sequentially. it's not easy to delete some fields. this will not affect search speed. but if you want to get stored fields, and the useless fields are very long, then it will slow down. also it's possible to hack with it. but need more effort to understand the index file format and traverse the fdt/fdx file. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html this will give you some insight. On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.com wrote: Lets say I have a large index (100M docs, 1TB, split up between 10 indexes). And a bunch of the stored and indexed fields are not used in search at all. In order to save memory and disk, I'd like to rebuild that index *without* those fields, but I don't have original documents to rebuild entire index with (don't have the full-text anymore, etc.). Is there some way to rebuild or optimize an existing index with only a sub-set of the existing indexed fields? Or alternatively is there a way to avoid loading some indexed fields at all ( to avoid loading term infos and terms index ) ? Thanks Bob
Re: SolrCloud Replication Question
Doing so now, will let you know if I continue to see the same issues On Tue, Feb 14, 2012 at 4:59 PM, Mark Miller markrmil...@gmail.com wrote: Doh - looks like I was just seeing a test issue. Do you mind updating and trying the latest rev? At the least there should be some better logging around the recovery. I'll keep working on tests in the meantime. - Mark On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote: Sounds good, if I pull the latest from trunk and rerun will that be useful or were you able to duplicate my issue now? On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote: Okay Jamie, I think I have a handle on this. It looks like an issue with what config files are being used by cores created with the admin core handler - I think it's just picking up default config and not the correct config for the collection. This means they end up using config that has no UpdateLog defined - and so recovery fails. I've added more logging around this so that it's easy to determine that. I'm investigating more and working on a test + fix. I'll file a JIRA issue soon as well. - Mark On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
All of the nodes now show as being Active. When starting the replicas I did receive the following message though. Not sure if this is expected or not. INFO: Attempting to replicate from http://JamiesMac.local:8501/solr/slice2_shard2/ Feb 14, 2012 10:53:34 PM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: null java.lang.NullPointerException at org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) null java.lang.NullPointerExceptionat org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) request: http://JamiesMac.local:8501/solr/admin/cores?action=PREPRECOVERYcore=slice2_shard2nodeName=JamiesMac.local:8502_solrcoreNodeName=JamiesMac.local:8502_solr_slice2_shard1wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:208) Feb 14, 2012 10:53:34 PM org.apache.solr.update.UpdateLog dropBufferedUpdates
feeding mahout cluster output back to solr
hi at present we use carrot2 for clustering and doing analysis on customer feedback data. Since its in memory and search time we are having issues with performance and cluster size. I was reading about generating clusters using mahout from solr index data. But can we feed segmentation generated by mahout back into solr to use as facets? I am not even sure how the output from mahout looks like so wanted to know -- View this message in context: http://lucene.472066.n3.nabble.com/feeding-mahout-cluster-output-back-to-solr-tp3745883p3745883.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OR-FilterQuery
On Tue, Feb 14, 2012 at 11:13 PM, Em mailformailingli...@yahoo.de wrote: Hi Mikhail, it will use per segment bitset at contrast to Solr's fq which caches for top level reader. Could you explain why this bitset would be per-segment based, please? I don't see a reason why this *have* to be so. it's just how org.apache.lucene.search.CachingWrapperFilter works. The first out-of-the box stuff which I've found. as an top-level segment alternative we need org.apache.solr.search.SolrIndexSearcher.getDocSet(Query). btw, one more top-level snippet class FQParser extends QParser{ Query parse(...){ return new SolrConstantScoreQuery( solrIndexSearcher.getDocSet( subQuery(localParam.get(V)) ).getTopFilter()) } } What is the benefit you are seeing? It seems like two different POVs: Lucene prefer per segment caching to have fast incremental updates, but maybe 'because it's good but not in worst case' (I guess I've heard it there http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/many-facets-apache-solr) Solr prefer top-reader caches. Kind regards, Em Am 14.02.2012 19:33, schrieb Mikhail Khludnev: Hi Em, I briefly read the thread. Are you talking about combing of cached clauses of BooleanQuery, instead of evaluating whole BQ as a filter? I found something like that in API (but only in API) http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean) Am I get you right? Why do you need it, btw? If I'm .. I have idea how to do it in two mins: q=+f:text +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)... Right leg will be a BooleanQuery with SHOULD clauses backed on cached queries (see below). if you are not scarred by the syntax yet you can implement trivial fqQParserPlugin, which will be just // lazily through User/Generic Cache q = new FilteredQuery (new MatchAllDocsQuery(), new CachingWrapperFilter(new QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V); return q; it will use per segment bitset at contrast to Solr's fq which caches for top level reader. WDYT? On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de wrote: Hi, have a look at: http://search-lucene.com/m/Z8lWGEiKoI I think not much had changed since then. Regards, Em Am 13.02.2012 20:17, schrieb spr...@gmx.eu: Hi, how efficent is such an query: q=some text fq=id:(1 OR 2 OR 3...) Should I better use q:some text AND id:(1 OR 2 OR 3...)? Is the Filter Cache used for the OR'ed fq? Thank you -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: OR-FilterQuery
Hi Mikhail, it's just how org.apache.lucene.search.CachingWrapperFilter works. The first out-of-the box stuff which I've found. Thanks for your explanation and snippets - I thought this was configurable. Regards, Em Am 15.02.2012 06:16, schrieb Mikhail Khludnev: On Tue, Feb 14, 2012 at 11:13 PM, Em mailformailingli...@yahoo.de wrote: Hi Mikhail, it will use per segment bitset at contrast to Solr's fq which caches for top level reader. Could you explain why this bitset would be per-segment based, please? I don't see a reason why this *have* to be so. it's just how org.apache.lucene.search.CachingWrapperFilter works. The first out-of-the box stuff which I've found. as an top-level segment alternative we need org.apache.solr.search.SolrIndexSearcher.getDocSet(Query). btw, one more top-level snippet class FQParser extends QParser{ Query parse(...){ return new SolrConstantScoreQuery( solrIndexSearcher.getDocSet( subQuery(localParam.get(V)) ).getTopFilter()) } } What is the benefit you are seeing? It seems like two different POVs: Lucene prefer per segment caching to have fast incremental updates, but maybe 'because it's good but not in worst case' (I guess I've heard it there http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/many-facets-apache-solr) Solr prefer top-reader caches. Kind regards, Em Am 14.02.2012 19:33, schrieb Mikhail Khludnev: Hi Em, I briefly read the thread. Are you talking about combing of cached clauses of BooleanQuery, instead of evaluating whole BQ as a filter? I found something like that in API (but only in API) http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean) Am I get you right? Why do you need it, btw? If I'm .. I have idea how to do it in two mins: q=+f:text +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)... Right leg will be a BooleanQuery with SHOULD clauses backed on cached queries (see below). if you are not scarred by the syntax yet you can implement trivial fqQParserPlugin, which will be just // lazily through User/Generic Cache q = new FilteredQuery (new MatchAllDocsQuery(), new CachingWrapperFilter(new QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V); return q; it will use per segment bitset at contrast to Solr's fq which caches for top level reader. WDYT? On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de wrote: Hi, have a look at: http://search-lucene.com/m/Z8lWGEiKoI I think not much had changed since then. Regards, Em Am 13.02.2012 20:17, schrieb spr...@gmx.eu: Hi, how efficent is such an query: q=some text fq=id:(1 OR 2 OR 3...) Should I better use q:some text AND id:(1 OR 2 OR 3...)? Is the Filter Cache used for the OR'ed fq? Thank you