Re: Deleting Hudi Partitons
Yes, that would work. You would typically add below option on dataframe to use insert overwrite (InsertOverwrite is a new API, I haven't updated documentation yet). - hoodie.datasource.write.operation: insert_overwrite Let me know if you have any questions. @Balaji Thanks for creating the follow up ticket. Agree this can be supported in a much simpler way using insert_overwrite primitive. On Wed, Oct 21, 2020 at 6:19 PM Balaji Varadarajan wrote: > cc Satish who implemented Insert Overwrite support. > We have recently landed Insert Overwrite support in Hudi. Partition level > deletion is a logical extension of this feature but not currently available > yet. I have added a jira to track this : > https://issues.apache.org/jira/browse/HUDI-1350 > Meanwhile, using master branch, you can do this in 2 steps. You can > generate a record for each partition you want to delete and commit the > batch. This would essentially truncate the partition to 1 record. You can > then issue a hard delete on that record. By keeping cleaner retention to > 1, you can essentially cleanup the files in the directory. Satish - Can you > chime in and see if this makes sense and if you are seeing any issues with > this ? > Thanks,Balaji.V > On Tuesday, October 20, 2020, 11:31:45 PM PDT, selvaraj periyasamy < > selvaraj.periyasamy1...@gmail.com> wrote: > > Team , > > I have a COW table which has sub partition columns > Date/Hour . For some of the use case , I need to totally remove free > petitions (removing few hours alone) . Hudi maintains metadata info. > Manually removing folders as well as in hive megastore , may mess up hudi > metadata. What is the best way to do this? > > > Thanks, > Selva >
Re: Deleting Hudi Partitons
Fixing incorrect Satish's email.On Wednesday, October 21, 2020, 06:19:43 PM PDT, Balaji Varadarajan wrote: cc Satish who implemented Insert Overwrite support. We have recently landed Insert Overwrite support in Hudi. Partition level deletion is a logical extension of this feature but not currently available yet. I have added a jira to track this : https://issues.apache.org/jira/browse/HUDI-1350 Meanwhile, using master branch, you can do this in 2 steps. You can generate a record for each partition you want to delete and commit the batch. This would essentially truncate the partition to 1 record. You can then issue a hard delete on that record. By keeping cleaner retention to 1, you can essentially cleanup the files in the directory. Satish - Can you chime in and see if this makes sense and if you are seeing any issues with this ? Thanks,Balaji.V On Tuesday, October 20, 2020, 11:31:45 PM PDT, selvaraj periyasamy wrote: Team , I have a COW table which has sub partition columns Date/Hour . For some of the use case , I need to totally remove free petitions (removing few hours alone) . Hudi maintains metadata info. Manually removing folders as well as in hive megastore , may mess up hudi metadata. What is the best way to do this? Thanks, Selva
Re: Deleting Hudi Partitons
cc Satish who implemented Insert Overwrite support. We have recently landed Insert Overwrite support in Hudi. Partition level deletion is a logical extension of this feature but not currently available yet. I have added a jira to track this : https://issues.apache.org/jira/browse/HUDI-1350 Meanwhile, using master branch, you can do this in 2 steps. You can generate a record for each partition you want to delete and commit the batch. This would essentially truncate the partition to 1 record. You can then issue a hard delete on that record. By keeping cleaner retention to 1, you can essentially cleanup the files in the directory. Satish - Can you chime in and see if this makes sense and if you are seeing any issues with this ? Thanks,Balaji.V On Tuesday, October 20, 2020, 11:31:45 PM PDT, selvaraj periyasamy wrote: Team , I have a COW table which has sub partition columns Date/Hour . For some of the use case , I need to totally remove free petitions (removing few hours alone) . Hudi maintains metadata info. Manually removing folders as well as in hive megastore , may mess up hudi metadata. What is the best way to do this? Thanks, Selva
Deleting Hudi Partitons
Team , I have a COW table which has sub partition columns Date/Hour . For some of the use case , I need to totally remove free petitions (removing few hours alone) . Hudi maintains metadata info. Manually removing folders as well as in hive megastore , may mess up hudi metadata. What is the best way to do this? Thanks, Selva