[GitHub] [hudi] garyli1019 commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-07-15 Thread GitBox
garyli1019 commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-658842626 @vinothchandar I agree we should use @umehrot2 RDD approach. >So you can also in parallel just proceed? Yes, I will change this PR in parallel. >Do you just want

[GitHub] [hudi] garyli1019 commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-24 Thread GitBox
garyli1019 commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-649235630 @vinothchandar Thanks for reviewing! I created tickets for the follow-up work. All the file listing and globing can be improved after @umehrot2 's PR merged.

[GitHub] [hudi] garyli1019 commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-14 Thread GitBox
garyli1019 commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-643901550 Few major concerns here: - Listing files are too expensive. Solution: Switch to bootstrap file listing methods once udit's PR merged. Move to RFC-15 once it was ready.

[GitHub] [hudi] garyli1019 commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-10 Thread GitBox
garyli1019 commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-642348677 Successfully got rid of those `RecordReaders`! @vinothchandar Thanks for the hint! This is an automated

[GitHub] [hudi] garyli1019 commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-10 Thread GitBox
garyli1019 commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-642182315 @vinothchandar Thanks for the feedback. Your approach makes sense to me. If we can do it that way then we can reduce some maintenance overhead and be more flexible for the