Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/266#discussion_r11325597
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala ---
@@ -26,8 +26,7 @@ import scala.reflect.ClassTag
*
* Under the hood, it uses our OpenHashSet implementation.
*/
-private[spark]
-class OpenHashMap[K >: Null : ClassTag, @specialized(Long, Int, Double) V:
ClassTag](
+class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V:
ClassTag](
--- End diff --
To elaborate a bit, `RDD.countByValue()` uses `OpenHashMap` in this patch,
and it would only compile if `RDD`'s type parameter also was bounded by `>:
Null`. And that alone makes a whole, whole lot of code require the same bound.
I am new to Scala, but, does dropping the bound actually hurt? It means
that one could use value types as keys, which seems conceptually OK. The class
supports mapping `null` to a value with the `nullValue` field, and that is used
only where a method gets a `null` key. For value types, the key will simply
never be `null`. Even lines like `null.asInstanceOf[V]` work fine for value
types; for `Int` you get 0 for example.
So my read is that the class permits `null` (i.e. reference types) but does
not require them, really? If I'm right I think the simplest thing is to just
permit value types as keys?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---