garyli1019 commented on pull request #1722:
URL: https://github.com/apache/hudi/pull/1722#issuecomment-658842626
@vinothchandar I agree we should use @umehrot2 RDD approach.
>So you can also in parallel just proceed?
Yes, I will change this PR in parallel.
>Do you just want
garyli1019 commented on pull request #1722:
URL: https://github.com/apache/hudi/pull/1722#issuecomment-649235630
@vinothchandar Thanks for reviewing! I created tickets for the follow-up
work. All the file listing and globing can be improved after @umehrot2 's PR
merged.
garyli1019 commented on pull request #1722:
URL: https://github.com/apache/hudi/pull/1722#issuecomment-643901550
Few major concerns here:
- Listing files are too expensive.
Solution: Switch to bootstrap file listing methods once udit's PR merged.
Move to RFC-15 once it was ready.
garyli1019 commented on pull request #1722:
URL: https://github.com/apache/hudi/pull/1722#issuecomment-642348677
Successfully got rid of those `RecordReaders`! @vinothchandar Thanks for the
hint!
This is an automated
garyli1019 commented on pull request #1722:
URL: https://github.com/apache/hudi/pull/1722#issuecomment-642182315
@vinothchandar Thanks for the feedback. Your approach makes sense to me. If
we can do it that way then we can reduce some maintenance overhead and be more
flexible for the