Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5836#issuecomment-99984953
I decided to try to benchmark this using the following harness:
https://gist.github.com/JoshRosen/286b26494ab27e657051
When I ran this benchmark, I immediately ran into a bug:
```scala
[error] Exception in thread "main" java.lang.NullPointerException
[error] at
org.apache.spark.unsafe.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:194)
[error] at
org.apache.spark.unsafe.map.BytesToBytesMap$1.next(BytesToBytesMap.java:195)
[error] at
org.apache.spark.unsafe.map.BytesToBytesMap$1.next(BytesToBytesMap.java:174)
[error] at
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$.runBenchmark(BytesToBytesMapIterationBenchmark.scala:36)
[error] at
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2$$anonfun$apply$mcVI$sp$1$$anonfun$apply$1.apply$mcVI$sp(BytesToBytesMapIterationBenchmark.scala:64)
[error] at
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
[error] at
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2$$anonfun$apply$mcVI$sp$1.apply(BytesToBytesMapIterationBenchmark.scala:63)
[error] at
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2$$anonfun$apply$mcVI$sp$1.apply(BytesToBytesMapIterationBenchmark.scala:58)
[error] at scala.collection.immutable.List.foreach(List.scala:318)
[error] at
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2.apply$mcVI$sp(BytesToBytesMapIterationBenchmark.scala:58)
[error] at
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2.apply(BytesToBytesMapIterationBenchmark.scala:57)
[error] at
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2.apply(BytesToBytesMapIterationBenchmark.scala:57)
[error] at scala.collection.Iterator$class.foreach(Iterator.scala:727)
[error] at
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
[error] at
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
[error] at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
[error] at
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$.main(BytesToBytesMapIterationBenchmark.scala:57)
[error] at
org.apache.spark.sql.BytesToBytesMapIterationBenchmark.main(BytesToBytesMapIterationBenchmark.scala)
```
I noticed many issues in this patch, which implies that we need better unit
tests for this code (the external stress test harnesses manage to test this
pretty well, but we should integrate those into our CI pipeline).
Also, we need to have a better description of the changes; this PR
shouldn't have an empty description. The code also needs comments to make it
clear that we first iterate through data pages, then through records in a page,
rolling over to the next page or stopping when we encounter the special
negative length. We also need comments on some of the various `+8` calls to
clarify that we're reserving space for the last length marker.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]