Yanjia Gary Li created HUDI-597:
-----------------------------------

             Summary: Enable incremental pulling from defined partitions
                 Key: HUDI-597
                 URL: https://issues.apache.org/jira/browse/HUDI-597
             Project: Apache Hudi (incubating)
          Issue Type: New Feature
            Reporter: Yanjia Gary Li
            Assignee: Yanjia Gary Li


For the use case that I only need to pull the incremental part of certain 
partitions, I need to do the incremental pulling from the entire dataset first 
then filtering in Spark.

If we can use the folder partitions directly as part of the input path, it 
could run faster by only load relevant parquet files.

Example:

 
{code:java}
spark.read.format("org.apache.hudi")
.option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY,DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL)
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "000")
.load(path, "year=2020/*/*/*")
 
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to