[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-1896: - Fix Version/s: 0.14.1 (was: 0.14.0) (was: 1.1.0) > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: Epic > Components: deltastreamer >Reporter: Raymond Xu >Assignee: Rajesh Mahindra >Priority: Critical > Labels: hudi-umbrellas, pull-request-available > Fix For: 0.14.1 > > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1896: - Fix Version/s: 0.14.0 1.1.0 (was: 1.0.0) > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: Epic > Components: deltastreamer >Reporter: Raymond Xu >Assignee: Rajesh Mahindra >Priority: Critical > Labels: hudi-umbrellas, pull-request-available > Fix For: 0.14.0, 1.1.0 > > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha Saktheeswaran updated HUDI-1896: -- Epic Name: Implement DeltaStreamer Source for cloud object stores > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: Epic > Components: DeltaStreamer >Reporter: Raymond Xu >Assignee: Rajesh Mahindra >Priority: Critical > Labels: hudi-umbrellas, pull-request-available > Fix For: 1.0.0 > > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1896: - Fix Version/s: 1.0.0 > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: Epic > Components: DeltaStreamer >Reporter: Raymond Xu >Assignee: Rajesh Mahindra >Priority: Critical > Labels: hudi-umbrellas, pull-request-available > Fix For: 1.0.0 > > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1896: - Issue Type: Epic (was: New Feature) > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: Epic > Components: DeltaStreamer >Reporter: Raymond Xu >Assignee: Rajesh Mahindra >Priority: Critical > Labels: hudi-umbrellas, pull-request-available > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1896: - Labels: hudi-umbrellas pull-request-available (was: pull-request-available) > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: Raymond Xu >Assignee: Sagar Sumit >Priority: Critical > Labels: hudi-umbrellas, pull-request-available > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1896: - Labels: pull-request-available (was: ) > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: Raymond Xu >Priority: Critical > Labels: pull-request-available > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1896: - Description: As discussed in HUDI-1723, we need a better implementation for Cloud object storage like AWS S3 or GCS, leveraging on change notification. Also consider [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] We need to look into current *DFSSource classes and see if we can add a new `DFSPathSelector` implementation, that fetech new files on cloud storage after a given point in time. The timestamp based approach used by existing path selector, largely works, but has corner cases as mentioned in HUDI-1723 was: As discussed in HUDI-1723, we need a better implementation for Cloud object storage like AWS S3 or GCS, leveraging on change notification. Also consider https://docs.databricks.com/spark/latest/structured-streaming/sqs.html > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: Raymond Xu >Priority: Critical > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1896: - Summary: [UMBRELLA] Implement DeltaStreamer Source for cloud object stores (was: Implement DeltaStreamer Source for cloud object stores) > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: Raymond Xu >Priority: Critical > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1896: - Description: As discussed in HUDI-1723, we need a better implementation for Cloud object storage like AWS S3 or GCS, leveraging on change notification. Also consider https://docs.databricks.com/spark/latest/structured-streaming/sqs.html was:As discussed in HUDI-1723, we need a better implementation for Cloud object storage like AWS S3 or GCS, leveraging on change notification. > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: Raymond Xu >Priority: Critical > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > https://docs.databricks.com/spark/latest/structured-streaming/sqs.html -- This message was sent by Atlassian Jira (v8.3.4#803005)