[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2023-10-04 Thread Prashant Wason (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-1896:
-
Fix Version/s: 0.14.1
   (was: 0.14.0)
   (was: 1.1.0)

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: deltastreamer
>Reporter: Raymond Xu
>Assignee: Rajesh Mahindra
>Priority: Critical
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.14.1
>
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2023-05-17 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1896:
-
Fix Version/s: 0.14.0
   1.1.0
   (was: 1.0.0)

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: deltastreamer
>Reporter: Raymond Xu
>Assignee: Rajesh Mahindra
>Priority: Critical
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.14.0, 1.1.0
>
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2022-01-06 Thread Bhavani Sudha Saktheeswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha Saktheeswaran updated HUDI-1896:
--
Epic Name: Implement DeltaStreamer Source for cloud object stores

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Assignee: Rajesh Mahindra
>Priority: Critical
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 1.0.0
>
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2022-01-06 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1896:
-
Fix Version/s: 1.0.0

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Assignee: Rajesh Mahindra
>Priority: Critical
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 1.0.0
>
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2022-01-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1896:
-
Issue Type: Epic  (was: New Feature)

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Assignee: Rajesh Mahindra
>Priority: Critical
>  Labels: hudi-umbrellas, pull-request-available
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-09-12 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1896:
-
Labels: hudi-umbrellas pull-request-available  (was: pull-request-available)

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Critical
>  Labels: hudi-umbrellas, pull-request-available
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-07-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1896:
-
Labels: pull-request-available  (was: )

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-06-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1896:
-
Description: 
As discussed in HUDI-1723, we need a better implementation for Cloud object 
storage like AWS S3 or GCS, leveraging on change notification.

Also consider 
[https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]

 

We need to look into current *DFSSource classes and see if we can add a new 
`DFSPathSelector` implementation, that fetech new files on cloud storage after 
a given point in time. The timestamp based approach used by existing path 
selector, largely works, but has corner cases as mentioned in HUDI-1723 

  was:
As discussed in HUDI-1723, we need a better implementation for Cloud object 
storage like AWS S3 or GCS, leveraging on change notification.

Also consider 
https://docs.databricks.com/spark/latest/structured-streaming/sqs.html


> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-05-12 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1896:
-
Summary: [UMBRELLA] Implement DeltaStreamer Source for cloud object stores  
(was: Implement DeltaStreamer Source for cloud object stores)

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2021-05-12 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1896:
-
Description: 
As discussed in HUDI-1723, we need a better implementation for Cloud object 
storage like AWS S3 or GCS, leveraging on change notification.

Also consider 
https://docs.databricks.com/spark/latest/structured-streaming/sqs.html

  was:As discussed in HUDI-1723, we need a better implementation for Cloud 
object storage like AWS S3 or GCS, leveraging on change notification.


> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> https://docs.databricks.com/spark/latest/structured-streaming/sqs.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)