I'm a bit confused regarding expected behavior of unions. I'm running on 8 
cores. I have an RDD that is used to collect cluster associations (cluster id, 
content id, distance) for internal clusters as well as leaf clusters since I'm 
doing hierarchical k-means and need all distances for sorting documents 
appropriately upon examination. 
It appears that Union simply adds items in the argument to the RDD instance the 
method is called on rather than just returning a new RDD. If I want to do Union 
this was as more of an add/append should I be capturing the return value and 
releasing it from memory. Need help clarifying the semantics here. 
Also, in another related thread someone mentioned coalesce after union. Would I 
need to do the same on the instance RDD I'm calling Union on. 
Perhaps a method such as append would be useful and clearer.                    
                  

Reply via email to