Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2351#discussion_r17993979
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2081,8 +2085,44 @@ def _jrdd(self):
                                                  self.ctx.pythonExec,
                                                  broadcast_vars, 
self.ctx._javaAccumulator)
             self._jrdd_val = python_rdd.asJavaRDD()
    +
    +        if enable_profile:
    +            self._id = self._jrdd_val.id()
    +            if not self._created_profiles:
    +                dump_path = self.ctx._conf.get("spark.python.profile.dump")
    +                if dump_path:
    +                    atexit.register(PipelinedRDD.dump_profile, dump_path)
    +                else:
    +                    atexit.register(PipelinedRDD.show_profile)
    +            self._created_profiles.append((self._id, profileStats))
    +
             return self._jrdd_val
     
    +    @classmethod
    +    def show_profile(cls):
    +        """ Print the profile stats to stdout """
    +        for id, acc in cls._created_profiles:
    +            stats = acc.value
    +            if stats:
    +                print "=" * 60
    +                print "Profile of RDD<id=%d>" % id
    +                print "=" * 60
    +                stats.sort_stats("tottime", "cumtime").print_stats()
    +        cls._created_profiles = []
    --- End diff --
    
    Should we document that this clears the created profiles?  I guess the 
intended usage here is to run a bunch of code interactively then print the 
profiling data for everything that's run since the last time I called 
`show_profile`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to