purijatin opened a new pull request #29482: URL: https://github.com/apache/spark/pull/29482
### What changes were proposed in this pull request? The strict requirement for the vocabulary to remain non-empty has been removed in this pull request. Link to the discussion: http://apache-spark-user-list.1001560.n3.nabble.com/Ability-to-have-CountVectorizerModel-vocab-as-empty-td38396.html ### Why are the changes needed? This soothens running it across the corner cases. Without this, the user has to manupulate the data in genuine case, which may be a perfectly fine valid use-case. Question: Should we a log when empty vocabulary is found instead? ### Does this PR introduce _any_ user-facing change? May be a slight change. If someone has put a try-catch to detect an empty vocab. Then that behavior would no longer stand still. ### How was this patch tested? 1. Added testcase to `fit` generating an empty vocabulary 2. Added testcase to `transform` with empty vocabulary Request to review: @srowen @hhbyyh ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
