Re: Implementing a customised tokenizer
Hi Ahmet, I think the expungesDelete is done automatically through SolrJ. So I don't think it was that. THe problem solved by itself apparently. I wonder if it has to do with an automatic optimization of Solr indexes? Otherwise it was something similar to XY problem :P Thanks for the help! -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4122864.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing a customised tokenizer
Hi; I suggest you to look at the source code. NGramTokenizer.java has some explanations as comments and it may help you. Thanks; Furkan KAMACI 2014-03-11 16:06 GMT+02:00 epnRui rui_banda...@hotmail.com: Hi Ahmet, I think the expungesDelete is done automatically through SolrJ. So I don't think it was that. THe problem solved by itself apparently. I wonder if it has to do with an automatic optimization of Solr indexes? Otherwise it was something similar to XY problem :P Thanks for the help! -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4122864.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing a customised tokenizer
Hi, expungesDeletes (default false) is not done automatically through SolrJ. Please see : https://issues.apache.org/jira/browse/SOLR-1487 During segment merge, deleted terms purged. Thats why problem solved by itself. Ahmet On Tuesday, March 11, 2014 4:07 PM, epnRui rui_banda...@hotmail.com wrote: Hi Ahmet, I think the expungesDelete is done automatically through SolrJ. So I don't think it was that. THe problem solved by itself apparently. I wonder if it has to do with an automatic optimization of Solr indexes? Otherwise it was something similar to XY problem :P Thanks for the help! -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4122864.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing a customised tokenizer
Hi iorixxx! Thanks for replying. I managed to get around well enough not to need a tokenizer customized implementation. That would be a pain in ... Anyway, now I have another problem, which is related to the following: - I had previously used replace chars and replace patterns, charfilters and filters, at index time to replace EP by European Parliament. At that point, it increased the facet_field count for European Parliament. Well now I have a big problem which is: I have already deleted the document which generated the European Parliament and still that facet_field.count will not subtract!! Is there a way to either remove a facet_field or to subtract its count manually? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4121957.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing a customised tokenizer
Hi, After you delete your document, did you commit with expungeDeletes=true? Also please see : https://people.apache.org/~hossman/#xyproblem Ahmet On Friday, March 7, 2014 1:16 PM, epnRui rui_banda...@hotmail.com wrote: Hi iorixxx! Thanks for replying. I managed to get around well enough not to need a tokenizer customized implementation. That would be a pain in ... Anyway, now I have another problem, which is related to the following: - I had previously used replace chars and replace patterns, charfilters and filters, at index time to replace EP by European Parliament. At that point, it increased the facet_field count for European Parliament. Well now I have a big problem which is: I have already deleted the document which generated the European Parliament and still that facet_field.count will not subtract!! Is there a way to either remove a facet_field or to subtract its count manually? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4121957.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing a customised tokenizer
Hi Rui, I think ClassicTokenizerImpl.jflex file is good start for understanding tokenizers. http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.jflex Please see other *.jflex files in source tree. But usually you can manipulate tokenizer behaviour with chatFilters without creating a new tokenizer. Can you eleborate more? On Wednesday, March 5, 2014 1:00 PM, epnRui rui_banda...@hotmail.com wrote: I have managed to understand how to properly implement and change the words on a CharFilter and a Filter, but I fail to understand how the Tokenizer works... I also fail to find any tutorials on the thing.. Could you provide some example implementation of incrementToken and how to manipulate the tokens? Is there any documentation on this? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355.html Sent from the Solr - User mailing list archive at Nabble.com.