Hi Raymond,

HoodieROPathFilter is supposed to return true only for file matches belonging 
to latest version if the path refers to a Hudi partition or if the path refers 
to a non-hoodie partition or dataset.  
I looked at the test-case you referred. It only works because the path filter 
wrongly assumes it is a non-hoodie path. You can run this in debug mode to see 
the code path.  From the usage perspective, this is used only from Spark 
(InMemoryFileIndex) where only the files are passed to this filter. So, I 
wouldn't classify this as a bug. But, it makes sense to make it consistent for 
both cases 
Balaji.V

Thanks,Balaji.V    On Wednesday, September 9, 2020, 07:45:09 AM PDT, Raymond Xu 
<xu.shiyan.raym...@gmail.com> wrote:  
 
 Hi Balaji, not sure if I fully get it.
I'm attempting to refer to this test case
https://github.com/apache/hudi/blob/9bcd3221fd440081dbae70e89d08539c3b484862/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieROTablePathFilter.java#L63-L65

where a partition path is supposed to be accepted.
If I change L64 to
Path partitionPath = new Path(Paths.get(basePath, "2017/01/01").toUri());

Then it resulted in not being accepted due to partitionPath ending with `/`
(a directory path). To me, this seems to be a corner case not being
covered. Could you kindly confirm the expectation please? Thanks.

On Tue, Sep 8, 2020 at 8:58 PM Balaji Varadarajan
<v.bal...@ymail.com.invalid> wrote:

>  Hi Raymond,
> IIRC, we need to give a blob path to make  HoodieROTablePathFilter to work
> correctly (e.g: "base/partition/*"). The path-cache is at partition level
> and not at table level so we need to extract the partition-path correctly
> to be used as look-up key. To extract partition-path, the challenge here is
> "Path" type does not have APIs to quickly figure if a path is a directory
> or not and we should avoid making RPC calls here.
> Thanks,Balaji.V
>    On Tuesday, September 8, 2020, 09:56:49 AM PDT, Raymond Xu <
> xu.shiyan.raym...@gmail.com> wrote:
>
>
> https://github.com/apache/hudi/blob/9bcd3221fd440081dbae70e89d08539c3b484862/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L120-L121
>
> As shown in the 2 lines above, it does not seem to work with directory
> Path.
> It should work for both `new Path("base/partition")` and `new
> Path("base/partition/")`, but it only works for the former case. In the
> latter case, `folder` will be "base/partition" and `path` will be
> "base/partition/", which will always result in returning false.
> A potential bug?
>
  

Reply via email to