Re: SnowballPorterFilterFactory stemming word question
Thanks Hoss Could you please provide with any example Does solr provide any implementation for dictionary stemmer, please let me know Thanks Rashid hossman wrote: : If i give machine why is that it stems to machin, now from where does : this word come from : If i give revolutionary it stems to revolutionari, i thought it should : stem to revolution. : : How does stemming work? the porter stemmer (and all of the stemmers provided with solr) are programtic stemmers ... they don't actually know the root of any words the use an aproximate algorithm to compute a *token* from a word based on a set of rules ... these tokens aren't neccessarily real words (and most of the time they aren't words) but the same token tends to be produced from words with similar roots. if you want to see the actaul root word, you'll have to use a dictionary based stemmer. -Hoss -- View this message in context: http://www.nabble.com/SnowballPorterFilterFactory-stemming-word-question-tp25180310p25325738.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field Collapsing (was Re: Schema for group/child entity setup)
Great. Nice site and very similar to my requirements. thanks. So, right now, you get all field values by default? Right now, no field values are returned for the collapsed documents. The patch which will be committed soon will add this functionality. R. Tan wrote: Great. Nice site and very similar to my requirements. There's work on the patch that is being done now which will enable you to ask for specific field values of the collapsed documents using a dedicated request parameter. So, right now, you get all field values by default? On Sun, Sep 6, 2009 at 3:58 AM, Uri Boness ubon...@gmail.com wrote: You can check out http://www.ilocal.nl. If you search for a bank in Amsterdam then you'll see that a lot of the results are collapsed. For this we used an older version of this patch (which works on 1.3) but a lot has changed since then. We're currently using this patch on another project, but it's not live yet. Uri R. Tan wrote: Thanks Uri. Your personal suggestion is appreciated and I think I'll follow your advice. We're still early in development and 1.4 would be a good choice. I hope I can get field collapsing to work with my requirements. Do you know any live site using field collapsing already? On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness ubon...@gmail.com wrote: There's work on the patch that is being done now which will enable you to ask for specific field values of the collapsed documents using a dedicated request parameter. This work is not committed yet to the latest patch, but will be very soon. There is of course a drawback to that as well, the collapsed documents set can be very large (depends on your data of course) in which case the returned result which includes the fields values can be rather large, which will impact performance, this is why this feature will be enabled only if you specify this extra parameter - by default no field values will be returned. AFAIK, the latest patch should work fine with the latest build. Martijn (which is the main maintainer of this patch) tries to keep it up to date with the latest builds. But I guess the safest way is to work with the nightly build of the same date as the latest patch (though I would give it a try first with the latest build). BTW, it's not an official suggestion from the Solr development team, but if you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I would go for the later. 1.4 is supposed to be released in the upcoming week or two and it bring loads of bug fixes, enhancements and extra functionality. But again, this is my personal suggestion. cheers, Uri R. Tan wrote: Okay. Thanks for giving an insight on how it works in general. Without trying it myself, are the field values for the collapsed ones also part of the results data? What is the latest build that is safe to use on a production environment? I'd probably go for that and use field collapsing. Thank you very much. On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote: The collapsed documents are represented by one master document which can be part of the normal search result (the doc list), so pagination just works as expected, meaning taking only the returned documents in account (ignoring the collapsed ones). As for the scoring, the master document is actually the document with the highest score in the collapsed group. As for Solr 1.3 compatibility... well... it's very hart to tell. All latest patch are certainly *not* 1.3 compatible (I think they're also depending on some changes in lucene which are not available for solr 1.3). I guess you'll have to try some of the old patches, but I'm not sure about their stability. cheers, Uri R. Tan wrote: Thanks Uri. How does paging and scoring work when using field collapsing? What patch works with 1.3? Is it production ready? R On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote: The development on this patch is quite active. It works well for single solr instance, but distributed search (ie. shards) is not yet supported. Using this page you can group search results based on a specific field. There are two flavors of field collapsing - adjacent and non-adjacent, the former collapses only document which happen to be located next to each other in the otherwise-non-collapsed results set. The later (the non-adjacent) one collapses all documents with the same field value (regardless of their position in the otherwise-non-collapsed results set). Note, that non-adjacent performs better than adjacent one. There's currently discussion to extend this support so in addition to collapsing the documents, extra information will be returned for the collapsed documents (see the discussion on the issue page). Uri R. Tan wrote: I think this is what I'm looking for. What is the status of this patch? On Thu, Sep 3, 2009 at 12:00 PM, R. Tan
Faceting optimization
Hi I'm currently trying to optimize the response time of my solr server. I found one aberration and hope you may be able to help me solve it: If, considering the whole document index, there is a lot of possible values for a field, asking for facet on that field dramatically increase response time. Even if the search returns only one document, with only one facet value for that field. This is shown by the three requests at the bottom of this mail. It seems to me that solr looks at all the possible values in the whole index for the faceted field. Whereas it should look at the possible values only for the documents in the results, wich would be a lot faster. Is there a way asking him to do so? --- Let's look at this three requests: 1- This request returns only one document and take 3ms http://localhost:8983/solr/select/? rows=10 q=(available_owner_display_name_s_facet:%22mag%22)+AND+type_s:[T0+TO+T9] 2- This request returns one document, and its facets for one field. It takes about 1000ms. The facet on a_10_alpha_sort returns only one value: air du temps. But overall the whole index, there is a lot of values (10 000) for a_10_alpha_sort. http://localhost:8983/solr/select/? facet=true rows=10 q=(available_owner_display_name_s_facet:%22mag%22)+AND+type_s:[T0+TO+T9] facet.field=a_10_alpha_sort f.a_10_alpha_sort.facet.mincount=1 f.a_10_alpha_sort.facet.sort=true f.a_10_alpha_sort.facet.limit=8 3- This request includes the value air du temps in the search string. It takes 3ms http://localhost:8983/solr/select/? rows=10 q=(available_owner_display_name_s_facet:%22mag%22+AND+a_10_alpha_sort:air+du+temps)+AND+type_s:[T0+TO+T9] Here is the description of the faceted field in my schema: this is a single-valued field, with no tokens. dynamicField name=*_alpha_sort type=alphaOnlySort indexed=true stored=false multivalued=false/ fieldType name=alphaOnlySort class=solr.TextField sortMissingLast=true omitNorms=true analyzer !-- KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token -- tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType
Re: Exact Word Search
Hi Shalin, My search is based on the following fields in schema.xml field name=url type=string indexed=true stored=true/ field name=content type=text indexed=true stored=true/ field name=description type=string indexed=true stored=true/ Let me know if you need anything else? Regards Bhaskar --- On Fri, 9/4/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote: From: Shalin Shekhar Mangar shalinman...@gmail.com Subject: Re: Exact Word Search To: solr-user@lucene.apache.org Date: Friday, September 4, 2009, 5:51 AM On Fri, Sep 4, 2009 at 6:06 PM, bhaskar chandrasekar bas_s...@yahoo.co.inwrote: Hi, I have integrated Solr with Carrot2 Cluster Engine (v 3.1.0). Carrot2 is used as a presentation layer. Carrot2 sends requested query to external source (Solr) and get results from Solr. Carrot2 may not be responsible for forming Query. It would have been handled from Solr end. Can you post the exact query that your application or Carrot2 is sending to Solr? Can you also list the Solr field and type defined in schema.xml which is being searched? Please help me with the below scenarios. Scenario: (Please DO NOT consider any case sensitive) Assuming I give bhaskar as input string It should give me search results pertaining to word ‘bhaskar’ only. I am expecting output like below database query Select * from MASTER where name =’bhaskar’; Above query suppose to return matched records for ‘bhaskar’.. Use a solr.TextField with KeywordTokenizer and LowerCaseFilter and search with q=field-name:field-value -- Regards, Shalin Shekhar Mangar.
Re: Netbeans and Solr : Whac-A-Mole
This testcase is quite independent of anything in Solr. It is a standalone utility and the only dependency is stax. discalimer (I run these testcases from Intellij and command line) BTW are you using XpathRecordReader outside of DIH? On Mon, Sep 7, 2009 at 3:26 PM, Fergus McMenemiefer...@twig.me.uk wrote: Hello all, I would appreciate help from somebody who has set up Solr within netbeans, I am wanting to do more work with DIH and particularly its XpathEntityProcessor stuff. I wish to preform the following from within the IDE ant -Dtestcase=TestXPathRecordReader.java test I have spent a few hours playing Whac-A-Mole with classpath and source settings. In the end I got it down to zero flags, but I then added some test cases and the scanner thing then went off and flagged dozens files with undefined classes I removed my change but the rescan did not remove the dozens of flagged files. PS: I am a total netbeans newbie. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer === -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Netbeans and Solr : Whac-A-Mole
We use command-line for most stuff except editing/debugging! 2009/9/7 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com This testcase is quite independent of anything in Solr. It is a standalone utility and the only dependency is stax. discalimer (I run these testcases from Intellij and command line) BTW are you using XpathRecordReader outside of DIH? On Mon, Sep 7, 2009 at 3:26 PM, Fergus McMenemiefer...@twig.me.uk wrote: Hello all, I would appreciate help from somebody who has set up Solr within netbeans, I am wanting to do more work with DIH and particularly its XpathEntityProcessor stuff. I wish to preform the following from within the IDE ant -Dtestcase=TestXPathRecordReader.java test I have spent a few hours playing Whac-A-Mole with classpath and source settings. In the end I got it down to zero flags, but I then added some test cases and the scanner thing then went off and flagged dozens files with undefined classes I removed my change but the rescan did not remove the dozens of flagged files. PS: I am a total netbeans newbie. -- === Fergus McMenemie Email:fer...@twig.me.ukemail%3afer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer === -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Netbeans and Solr : Whac-A-Mole
This testcase is quite independent of anything in Solr. It is a standalone utility and the only dependency is stax. discalimer (I run these testcases from Intellij and command line) BTW are you using XpathRecordReader outside of DIH? Nobel, Is there a better way to test and play with XPathRecordReader.java other than ant -Dtestcase=TestXPathRecordReader test Which takes 8secs to run here? I am not using XpathRecordReader outside of DIH, but looking to see how I would add support for xpaths such as //a. Fergus. On Mon, Sep 7, 2009 at 3:26 PM, Fergus McMenemiefer...@twig.me.uk wrote: Hello all, I would appreciate help from somebody who has set up Solr within netbeans, I am wanting to do more work with DIH and particularly its XpathEntityProcessor stuff. I wish to preform the following from within the IDE ant -Dtestcase=TestXPathRecordReader.java test I have spent a few hours playing Whac-A-Mole with classpath and source settings. In the end I got it down to zero flags, but I then added some test cases and the scanner thing then went off and flagged dozens files with undefined classes I removed my change but the rescan did not remove the dozens of flagged files. PS: I am a total netbeans newbie. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer === -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: Netbeans and Solr : Whac-A-Mole
On Mon, Sep 7, 2009 at 5:58 PM, Fergus McMenemie fer...@twig.me.uk wrote: This testcase is quite independent of anything in Solr. It is a standalone utility and the only dependency is stax. discalimer (I run these testcases from Intellij and command line) BTW are you using XpathRecordReader outside of DIH? Nobel, Is there a better way to test and play with XPathRecordReader.java other than ant -Dtestcase=TestXPathRecordReader test Which takes 8secs to run here? I am not using XpathRecordReader outside of DIH, but looking to see how I would add support for xpaths such as //a. The target takes a lot of time because it has to go through all the test-cases in core and contribs trying to match the value given in -Dtestcase. You could also do ant -Dtestcase=TestXPathRecordReader test-contrib which should be a little faster. I run individual test cases directly through IDEA which avoids these extra steps. -- Regards, Shalin Shekhar Mangar.
Re: Netbeans and Solr : Whac-A-Mole
On Mon, Sep 7, 2009 at 5:58 PM, Fergus McMenemie fer...@twig.me.uk wrote: This testcase is quite independent of anything in Solr. It is a standalone utility and the only dependency is stax. discalimer (I run these testcases from Intellij and command line) BTW are you using XpathRecordReader outside of DIH? Nobel, Is there a better way to test and play with XPathRecordReader.java other than ant -Dtestcase=TestXPathRecordReader test Which takes 8secs to run here? I am not using XpathRecordReader outside of DIH, but looking to see how I would add support for xpaths such as //a. The target takes a lot of time because it has to go through all the test-cases in core and contribs trying to match the value given in -Dtestcase. You could also do ant -Dtestcase=TestXPathRecordReader test-contrib which should be a little faster. I run individual test cases directly through IDEA which avoids these extra steps. Shalin, Hmm, 6 seconds. I looked up IDEA and I guess I should be able to use it for free while working on solr. Is it easier to setup and come up the learning curve? Regards Fergus. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: capturing field length into a stored document field
Here's a hybrid solution. Add a filter to the field in question that counts all the tokens and at the end outputs a token of the form __numtokens.numTokens__. This eliminates the need to retokenize the field again. Also, bucket the numbers, either by some factor of ten, or base 2, so that there aren't so many different token types produced. This has a space advantage over storing in a field, especially since the information isn't needed at query time anyway. mike.schultz wrote: For various statistics I collect from an index it's important for me to know the length (measured in tokens) of a document field. I can get that information to some degree from the norms for the field but a) the resolution isn't that great, and b) more importantly, if boosts are used it's almost impossible to get lengths from this. Here's two ideas I was thinking about that maybe some can comment on. 1) Use copyto to copy the field in question, fieldA to an addition field, fieldALength, which has an extra filter that just counts the tokens and only outputs a token representing the length of the field. This has the disadvantage of retokenizing basically the whole document (because the field in question is basically the body). Plus I would think littering the term space with these tokens might be bad for performance, I'm not sure. 2) Add a filter to the field in question which again counts the tokens. This filter allows the regular tokens to be indexed as usual but somehow manages to get the token-count into a stored field of the document. This has the advantage of not having to retokenize the field and instead of littering the token space, the count becomes docdata for each doc. Can this be done? Maybe using threadLocal to temporarily store the count? Thanks. -- View this message in context: http://www.nabble.com/capturing-field-length-into-a-stored-document-field-tp25297690p25339584.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can solr return documents which don't match a query?
Or add a new tag: the NO TAG at index time and search for that. (If you have the possibility to reindex at least that one, first, time.) Would clear things up for developers/admins looking at the stuff in some months... Chantal Yonik Seeley schrieb: return all documents which either match query1 or don't match query 2 query1 (*:* -query2) -Yonik http://www.lucidimagination.com