In TFIDFPartialVectorReducer.java:

If docFreq > maxDocFreq then the vector at that index is not set (ignored)
If docFreq < minDocFreq then the vector at that index is set to the TfIdf 
calculation using minDocFreq instead of the actual document frequency.

Should minDocFreq not be treated the same as maxDocFreq by skipping setting the 
vector at that index?

In both cases, the vector length remains the same and these settings have no 
effect on pruning the vector length / term reduction?


NOTICE: This message and any attachments are intended only for the use of the 
addressee and may contain confidential, proprietary and/or privileged 
information. If you are not the intended recipient, any review, use, 
distribution, dissemination or copying of this email is prohibited. If you have 
received this email in error, please notify the sender by replying to this 
message and delete this email immediately. Securities trading, account 
management, and investment banking services are offered by MDB Capital Group 
LLC, a registered broker-dealer and member of FINRA and SIPC. Unless clearly 
stated, nothing herein shall be construed to be an offer to sell, nor a 
solicitation of an offer to buy, any financial product.

Reply via email to