[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF by default

2022-06-03 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17547213#comment-17547213
 ] 

Kenneth Knowles commented on BEAM-4735:
---

This issue has been migrated to https://github.com/apache/beam/issues/18981

> Make HBaseIO.read() based on SDF by default
> ---
>
> Key: BEAM-4735
> URL: https://issues.apache.org/jira/browse/BEAM-4735
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: P3
>
> BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method 
> still uses the Source based API because SDF does not support Dynamic Work 
> Rebalancing (DWR) but the Source API of HBase does, so changing it means 
> losing some functionality.
> Since DWR is only supported by Dataflow once Dataflow supports SDF  + DWR we 
> can move the main read() function to use the SDF API and remove the Source 
> based implementation. The rest of the runners already support Bounded based 
> Reads (like HBase based on SDF does) via a default translation without DWR.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF by default

2020-02-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037345#comment-17037345
 ] 

Ismaël Mejía commented on BEAM-4735:


Since Dataflow is the only runner that implements Dynamic Work Rebalancing 
(DWR) once Dataflow supports DWR we can switch HBaseIO to it.

> Make HBaseIO.read() based on SDF by default
> ---
>
> Key: BEAM-4735
> URL: https://issues.apache.org/jira/browse/BEAM-4735
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: Minor
>
> BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method 
> still uses the Source based API because SDF does not support Dynamic Work 
> Rebalancing (DWR) but the Source API of HBase does, so changing it means 
> losing some functionality.
> Since DWR is only supported by Dataflow once Dataflow supports SDF  + DWR we 
> can move the main read() function to use the SDF API and remove the Source 
> based implementation. The rest of the runners already support Bounded based 
> Reads (like HBase based on SDF does) via a default translation without DWR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF by default

2020-02-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037344#comment-17037344
 ] 

Ismaël Mejía commented on BEAM-4735:


Thanks, is BEAM-4287 still a pending thing?

> Make HBaseIO.read() based on SDF by default
> ---
>
> Key: BEAM-4735
> URL: https://issues.apache.org/jira/browse/BEAM-4735
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: Minor
>
> BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method 
> still uses the Source based API because SDF does not support Dynamic Work 
> Rebalancing (DWR) but the Source API of HBase does, so changing it means 
> losing some functionality.
> Since DWR is only supported by Dataflow once Dataflow supports SDF  + DWR we 
> can move the main read() function to use the SDF API and remove the Source 
> based implementation. The rest of the runners already support Bounded based 
> Reads (like HBase based on SDF does) via a default translation without DWR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF by default

2020-02-14 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037335#comment-17037335
 ] 

Luke Cwik commented on BEAM-4735:
-

https://issues.apache.org/jira/browse/BEAM-4737

> Make HBaseIO.read() based on SDF by default
> ---
>
> Key: BEAM-4735
> URL: https://issues.apache.org/jira/browse/BEAM-4735
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: Minor
>
> BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method 
> still uses the Source based API because SDF does not support Dynamic Work 
> Rebalancing (DWR) but the Source API of HBase does, so changing it means 
> losing some functionality.
> Since DWR is only supported by Dataflow once Dataflow supports SDF  + DWR we 
> can move the main read() function to use the SDF API and remove the Source 
> based implementation. The rest of the runners already support Bounded based 
> Reads (like HBase based on SDF does) via a default translation without DWR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF by default

2020-02-10 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033429#comment-17033429
 ] 

Ismaël Mejía commented on BEAM-4735:


[~lcwik] is there any Jira tracking Dynamic Work Rebalancing on Google Dataflow 
for SplittableDoFn?

> Make HBaseIO.read() based on SDF by default
> ---
>
> Key: BEAM-4735
> URL: https://issues.apache.org/jira/browse/BEAM-4735
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: Minor
>
> BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method 
> still uses the Source based API because SDF does not support Dynamic Work 
> Rebalancing (DWR) but the Source API of HBase does, so changing it means 
> losing some functionality.
> Since DWR is only supported by Dataflow once Dataflow supports SDF  + DWR we 
> can move the main read() function to use the SDF API and remove the Source 
> based implementation. The rest of the runners already support Bounded based 
> Reads (like HBase based on SDF does) via a default translation without DWR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF

2020-01-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024990#comment-17024990
 ] 

Ismaël Mejía commented on BEAM-4735:


Oh interesting finding! Just filled BEAM-9204 to tackle it.

> Make HBaseIO.read() based on SDF
> 
>
> Key: BEAM-4735
> URL: https://issues.apache.org/jira/browse/BEAM-4735
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: Minor
>
> BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method 
> still uses the Source based API for two reasons:
> 1. Most distributed runners don't supports Bounded SDF today.
> 2. SDF does not support Dynamic Work Rebalancing but the Source API of HBase 
> already supports it so changing it means losing some functionality.
> Once there is improvements in both (1) and (2) we should consider moving the 
> main read() function to use the SDF API and remove the Source based 
> implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF

2020-01-27 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024812#comment-17024812
 ] 

Luke Cwik commented on BEAM-4735:
-

I noticed there was a bug in the `@SplitRestriction`. The range input parameter 
is not being used to restrict the splitRanges that are being returned. If 
multiple rounds of splitting happened, it could be that `@SplitRestriction` is 
invoked multiple times, once for each split leading to duplication of work.

 

https://github.com/apache/beam/blob/0a37f19e274b9d766f9eee2228460226c81b6b7c/sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseReadSplittableDoFn.java#L87

> Make HBaseIO.read() based on SDF
> 
>
> Key: BEAM-4735
> URL: https://issues.apache.org/jira/browse/BEAM-4735
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: Minor
>
> BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method 
> still uses the Source based API for two reasons:
> 1. Most distributed runners don't supports Bounded SDF today.
> 2. SDF does not support Dynamic Work Rebalancing but the Source API of HBase 
> already supports it so changing it means losing some functionality.
> Once there is improvements in both (1) and (2) we should consider moving the 
> main read() function to use the SDF API and remove the Source based 
> implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF

2019-01-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738000#comment-16738000
 ] 

Ismaël Mejía commented on BEAM-4735:


This will be the default once SDF supports DWR (and we implement this support).

> Make HBaseIO.read() based on SDF
> 
>
> Key: BEAM-4735
> URL: https://issues.apache.org/jira/browse/BEAM-4735
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method 
> still uses the Source based API for two reasons:
> 1. Most distributed runners don't supports Bounded SDF today.
> 2. SDF does not support Dynamic Work Rebalancing but the Source API of HBase 
> already supports it so changing it means losing some functionality.
> Once there is improvements in both (1) and (2) we should consider moving the 
> main read() function to use the SDF API and remove the Source based 
> implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)