Hi guys,

So, I keep facing this problem which I can't solve. I thought it was due to
HTML anchors containing the name of the hashtag, and thus repeating it, but
it's not.

So the use case is:
1 - I need to consider hashtags as tokens.
2 - The hashtag has to show up in the facets.

Right now if I index this text:
"Action, sanctions or diplomacy: which way forward for the  #EU
<http://twitter.com/search?q=%23EU>   &amp;  #Ukraine
<http://twitter.com/search?q=%23Ukraine>  ? Tell us  @LinkedIn
<http://twitter.com/LinkedIn>   debate  http://t.co/umf9olxH9f
<http://t.co/umf9olxH9f>  "

I get the tokens as follows (see image for more detail):
action  sanction        diplomacy       forward #eu     #ukraine        tell    
linkedin        debate
umf9olxh9f
ace     bate

<http://lucene.472066.n3.nabble.com/file/n4121389/solr.png> 

Then, if I have a look at the facets after the indexation, I find that (for
ukraine), the facets counts is increased for both "Ukraine" and "#Ukraine",
isntead of only for #Ukraine.

Does anyone have any idea of why this is happening?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4121389.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to