[GitHub] spark issue #14299: Ensure broadcasted variables are destroyed even in case ...

thunterdb Fri, 17 Feb 2017 12:14:17 -0800

Github user thunterdb commented on the issue:

    https://github.com/apache/spark/pull/14299
  
    @AnthonyTruchet thank you for the PR. This is definitely worth fixing for 
large deployments. Now, as you noticed, this portion of code does not quite 
abide by the best engineering practices... Instead of adding an extra layer of 
nesting, would you mind make the following changes?
    
    ```scala
      def fit[S <: Iterable[String]](dataset: RDD[S]): Word2VecModel = {
        ...
        val expTable = sc.broadcast(createExpTable())
        val bcVocab = sc.broadcast(vocab)
        val bcVocabHash = sc.broadcast(vocabHash)
        try { fit0(expTable, ...) } finally {
          ...
        }
       }
       private final def fit0(...) { 
         // Put all the content here.
         // Note that the inner code also includes some broadcasts, you may 
want to fix these as well if you can
       }
    ```
    
    I personally agree about resource management and scala-arm. We try to keep 
scala dependencies to a minimum, unfortunately, because they can be very 
tedious to move from one scala version to another.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14299: Ensure broadcasted variables are destroyed even in case ...

Reply via email to