Sorry too quick to pull the trigger on my original email. I should have added that I'm tried using persist() and cache() but no joy.
I'm doing this: data = sc.textFile("somedata") data.cache data.count() but I still can't see anything in the storage? On 31 October 2014 10:42, Sameer Farooqui <same...@databricks.com> wrote: > Hey Stuart, > > The RDD won't show up under the Storage tab in the UI until it's been > cached. Basically Spark doesn't know what the RDD will look like until it's > cached, b/c up until then the RDD is just on disk (external to Spark). If > you launch some transformations + an action on an RDD that is purely on > disk, then Spark will read it from disk, compute against it and then write > the results back to disk or show you the results at the scala/python > shells. But when you run Spark workloads against purely on disk files, the > RDD won't show up in Spark's Storage UI. Hope that makes sense... > > - Sameer > > On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman <stuart.hors...@gmail.com> > wrote: > >> Hi All, >> >> When I load an RDD with: >> >> data = sc.textFile("somefile") >> >> I don't see the resulting RDD in the SparkContext gui on localhost:4040 >> in /storage. >> >> Is there something special I need to do to allow me to view this? I >> tried but scala and python shells but same result. >> >> Thanks >> >> Stuart >> > >