[GitHub] spark pull request: [SPARK-7251][Spark Core] Perform sequential sc...

JoshRosen Thu, 07 May 2015 12:10:08 -0700

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/5836#issuecomment-99984953
  
    I decided to try to benchmark this using the following harness: 
https://gist.github.com/JoshRosen/286b26494ab27e657051
    
    When I ran this benchmark, I immediately ran into a bug:
    
    ```scala
    [error] Exception in thread "main" java.lang.NullPointerException
    [error]     at 
org.apache.spark.unsafe.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:194)
    [error]     at 
org.apache.spark.unsafe.map.BytesToBytesMap$1.next(BytesToBytesMap.java:195)
    [error]     at 
org.apache.spark.unsafe.map.BytesToBytesMap$1.next(BytesToBytesMap.java:174)
    [error]     at 
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$.runBenchmark(BytesToBytesMapIterationBenchmark.scala:36)
    [error]     at 
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2$$anonfun$apply$mcVI$sp$1$$anonfun$apply$1.apply$mcVI$sp(BytesToBytesMapIterationBenchmark.scala:64)
    [error]     at 
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
    [error]     at 
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2$$anonfun$apply$mcVI$sp$1.apply(BytesToBytesMapIterationBenchmark.scala:63)
    [error]     at 
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2$$anonfun$apply$mcVI$sp$1.apply(BytesToBytesMapIterationBenchmark.scala:58)
    [error]     at scala.collection.immutable.List.foreach(List.scala:318)
    [error]     at 
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2.apply$mcVI$sp(BytesToBytesMapIterationBenchmark.scala:58)
    [error]     at 
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2.apply(BytesToBytesMapIterationBenchmark.scala:57)
    [error]     at 
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$$anonfun$main$2.apply(BytesToBytesMapIterationBenchmark.scala:57)
    [error]     at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    [error]     at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    [error]     at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    [error]     at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    [error]     at 
org.apache.spark.sql.BytesToBytesMapIterationBenchmark$.main(BytesToBytesMapIterationBenchmark.scala:57)
    [error]     at 
org.apache.spark.sql.BytesToBytesMapIterationBenchmark.main(BytesToBytesMapIterationBenchmark.scala)
    ```
    
    I noticed many issues in this patch, which implies that we need better unit 
tests for this code (the external stress test harnesses manage to test this 
pretty well, but we should integrate those into our CI pipeline).
    
    Also, we need to have a better description of the changes; this PR 
shouldn't have an empty description.  The code also needs comments to make it 
clear that we first iterate through data pages, then through records in a page, 
rolling over to the next page or stopping when we encounter the special 
negative length.  We also need comments on some of the various `+8` calls to 
clarify that we're reserving space for the last length marker.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7251][Spark Core] Perform sequential sc...

Reply via email to