Hi guys, So, I keep facing this problem which I can't solve. I thought it was due to HTML anchors containing the name of the hashtag, and thus repeating it, but it's not.
So the use case is: 1 - I need to consider hashtags as tokens. 2 - The hashtag has to show up in the facets. Right now if I index this text: "Action, sanctions or diplomacy: which way forward for the #EU <http://twitter.com/search?q=%23EU> & #Ukraine <http://twitter.com/search?q=%23Ukraine> ? Tell us @LinkedIn <http://twitter.com/LinkedIn> debate http://t.co/umf9olxH9f <http://t.co/umf9olxH9f> " I get the tokens as follows (see image for more detail): action sanction diplomacy forward #eu #ukraine tell linkedin debate umf9olxh9f ace bate <http://lucene.472066.n3.nabble.com/file/n4121389/solr.png> Then, if I have a look at the facets after the indexation, I find that (for ukraine), the facets counts is increased for both "Ukraine" and "#Ukraine", isntead of only for #Ukraine. Does anyone have any idea of why this is happening? -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4121389.html Sent from the Solr - User mailing list archive at Nabble.com.