Hi, I am new to HDFS and wanted to know if there exists a design pattern to delete matching records from files stored in HDFS. The following is my use case -
I have my files with json content stored in hdfs and each json record has 2 fields a and b. Now I need to delete the content/record from the file which matches either a or b from a list of a,b values which keeps getting updated every now and then. So I have a list of <a1,b1>,<a2,b2>.......<an,bn>. At the end of the day I run a job which will delete all the records from the file whose a or b values are matched. Regards, Aditya
