Github user thunterdb commented on the issue:
https://github.com/apache/spark/pull/14299
@AnthonyTruchet thank you for the PR. This is definitely worth fixing for
large deployments. Now, as you noticed, this portion of code does not quite
abide by the best engineering practices... Instead of adding an extra layer of
nesting, would you mind make the following changes?
```scala
def fit[S <: Iterable[String]](dataset: RDD[S]): Word2VecModel = {
...
val expTable = sc.broadcast(createExpTable())
val bcVocab = sc.broadcast(vocab)
val bcVocabHash = sc.broadcast(vocabHash)
try { fit0(expTable, ...) } finally {
...
}
}
private final def fit0(...) {
// Put all the content here.
// Note that the inner code also includes some broadcasts, you may
want to fix these as well if you can
}
```
I personally agree about resource management and scala-arm. We try to keep
scala dependencies to a minimum, unfortunately, because they can be very
tedious to move from one scala version to another.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]