Hi Team,
I am new to spark and have this basic question. After calling persist, why the 
size in sparkui is not matching with the actual file size?

Actaul File Size for "/user/rohit_prusty/application2.log" - 39 KB

Code snippet:
===========
logData = sc.textFile("/user/rohit_prusty/application2.log")
logData.persist()
logData.count()
errors = logData.filter(lambda line: "ERROR" in line)
errors.persist()
errors.count()

Output in SparkUI
==============
logData RDD takes 2.1 KB
errors RDD takes 1.3 KB

Regards
Rohit Kumar Prusty
+91-9884070075

Reply via email to