Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/266#discussion_r11325597
  
    --- Diff: 
core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala ---
    @@ -26,8 +26,7 @@ import scala.reflect.ClassTag
      *
      * Under the hood, it uses our OpenHashSet implementation.
      */
    -private[spark]
    -class OpenHashMap[K >: Null : ClassTag, @specialized(Long, Int, Double) V: 
ClassTag](
    +class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V: 
ClassTag](
    --- End diff --
    
    To elaborate a bit, `RDD.countByValue()` uses `OpenHashMap` in this patch, 
and it would only compile if `RDD`'s type parameter also was bounded by `>: 
Null`. And that alone makes a whole, whole lot of code require the same bound.
    
    I am new to Scala, but, does dropping the bound actually hurt? It means 
that one could use value types as keys, which seems conceptually OK. The class 
supports mapping `null` to a value with the `nullValue` field, and that is used 
only where a method gets a `null` key. For value types, the key will simply 
never be `null`. Even lines like `null.asInstanceOf[V]` work fine for value 
types; for `Int` you get 0 for example.
    
    So my read is that the class permits `null` (i.e. reference types) but does 
not require them, really? If I'm right I think the simplest thing is to just 
permit value types as keys?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to