Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/6887#issuecomment-113945291
2 thoughts:
* @mengxr suggested that for a sequence of length < n, we return nothing.
That does not seem ideal since it throws out information. (I would be
surprised if I applied a transformer and got back empty sequences.) Using the
default behavior of Scala's grouped seems better to me.
* (future) In general, people will want to apply: Tokenizer, NGrams,
HashingTF. Later on, we should provide something which handles this directly,
rather than creating a bunch of intermediate objects.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]