Re: Term vs. token

2016-04-20 Thread Jack Krupansky
I gather that "term" is the proper technical term within the Vector Space Model (TDIFS) and BM25 similarity, so it may simply be a question of where the boundary is in Lucene between VSM processing and other stuff, like the source for documents and queries. -- Jack Krupansky On Wed, Apr 20, 2016

Re: Term vs. token

2016-04-20 Thread Ryan Josal
My understanding is a Term is comprised of a "token" and a field. So then the documentation makes sense to me - return the count of tokens in a field for example. But there were a couple of references you had there that don't match with that definition, like the number of tokens in a collection.

Term vs. token

2016-04-20 Thread Jack Krupansky
Looking at the Lucene Similarity Javadoc, I see some references to tokens, but I am wondering if that is intentional or whether those should really be references to terms. For example: *lengthNorm - computed *when the document is added to the index in accordance with the number