Github user kevinpetersavage commented on the pull request:
https://github.com/apache/spark/pull/4957#issuecomment-78524314
I had a bit more of a think about this based on the comments so far. I
think that what we really care about is blocks that are too large because we
might perform super linear algorithms on them. With this in mind I've
implemented an independent limit to block size called
"spark.streaming.maxBlockSize".
If maxBlockSize is set, you can't get blocks that are bigger than the
number you set it to. It defaults to Long.MaxValue so that we keep the current
behaviour if it isn't set. I've adjusted the tests so that they only check the
upper bound on the message sizes and the lower bound on the number of messages
as these are now the only guarantees.
I think this seems like reasonable behaviour for the goals and the test
passes every time for me now. Do you think this is sufficient?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]