Hi Team, I am new to spark and have this basic question. After calling persist, why the size in sparkui is not matching with the actual file size?
Actaul File Size for "/user/rohit_prusty/application2.log" - 39 KB Code snippet: =========== logData = sc.textFile("/user/rohit_prusty/application2.log") logData.persist() logData.count() errors = logData.filter(lambda line: "ERROR" in line) errors.persist() errors.count() Output in SparkUI ============== logData RDD takes 2.1 KB errors RDD takes 1.3 KB Regards Rohit Kumar Prusty +91-9884070075