Re: Implementing a customised tokenizer

2014-03-11 Thread epnRui
Hi Ahmet,

I think the expungesDelete is done automatically through SolrJ. So I don't
think it was that.
THe problem solved by itself apparently. I wonder if it has to do with an
automatic optimization of Solr indexes?
Otherwise it was something similar to XY problem :P

Thanks for the help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4122864.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Implementing a customised tokenizer

2014-03-11 Thread Furkan KAMACI
Hi;

I suggest you to look at the source code. NGramTokenizer.java has some
explanations as comments and it may help you.

Thanks;
Furkan KAMACI


2014-03-11 16:06 GMT+02:00 epnRui rui_banda...@hotmail.com:

 Hi Ahmet,

 I think the expungesDelete is done automatically through SolrJ. So I don't
 think it was that.
 THe problem solved by itself apparently. I wonder if it has to do with an
 automatic optimization of Solr indexes?
 Otherwise it was something similar to XY problem :P

 Thanks for the help!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4122864.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing a customised tokenizer

2014-03-11 Thread Ahmet Arslan
Hi,

expungesDeletes (default false) is not done automatically through SolrJ. 
Please see : https://issues.apache.org/jira/browse/SOLR-1487

During segment merge, deleted terms purged. Thats why problem solved by itself. 
 

Ahmet


On Tuesday, March 11, 2014 4:07 PM, epnRui rui_banda...@hotmail.com wrote:
Hi Ahmet,

I think the expungesDelete is done automatically through SolrJ. So I don't
think it was that.
THe problem solved by itself apparently. I wonder if it has to do with an
automatic optimization of Solr indexes?
Otherwise it was something similar to XY problem :P

Thanks for the help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4122864.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing a customised tokenizer

2014-03-07 Thread epnRui
Hi iorixxx!

Thanks for replying. I managed to get around well enough not to need a
tokenizer customized implementation. That would be a pain in ...

Anyway, now I have another problem, which is related to the following:

 - I had previously used replace chars and replace patterns, charfilters and
filters, at index time to replace EP by European Parliament. At that
point, it increased the facet_field count for European Parliament.
Well now I have a big problem which is: I have already deleted the document
which generated the European Parliament and still that facet_field.count
will not subtract!! Is there a way to either remove a facet_field or to
subtract its count manually?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4121957.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Implementing a customised tokenizer

2014-03-07 Thread Ahmet Arslan
Hi,

After you delete your document, did you commit with expungeDeletes=true? 

Also please see : https://people.apache.org/~hossman/#xyproblem

Ahmet



On Friday, March 7, 2014 1:16 PM, epnRui rui_banda...@hotmail.com wrote:
Hi iorixxx!

Thanks for replying. I managed to get around well enough not to need a
tokenizer customized implementation. That would be a pain in ...

Anyway, now I have another problem, which is related to the following:

- I had previously used replace chars and replace patterns, charfilters and
filters, at index time to replace EP by European Parliament. At that
point, it increased the facet_field count for European Parliament.
Well now I have a big problem which is: I have already deleted the document
which generated the European Parliament and still that facet_field.count
will not subtract!! Is there a way to either remove a facet_field or to
subtract its count manually?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4121957.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing a customised tokenizer

2014-03-05 Thread Ahmet Arslan
Hi Rui,

I think ClassicTokenizerImpl.jflex file is good start for understanding 
tokenizers.

http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.jflex


Please see other *.jflex files in source tree.

But usually you can manipulate tokenizer behaviour with chatFilters without 
creating a new tokenizer. 

Can you eleborate more?

   


On Wednesday, March 5, 2014 1:00 PM, epnRui rui_banda...@hotmail.com wrote:
I have managed to understand how to properly implement and change the words
on a CharFilter and a Filter, but I fail to understand how the Tokenizer
works...

I also fail to find any tutorials on the thing..
Could you provide some example implementation of incrementToken and how to
manipulate the tokens?
Is there any documentation on this?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355.html
Sent from the Solr - User mailing list archive at Nabble.com.