[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF by default
[ https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17547213#comment-17547213 ] Kenneth Knowles commented on BEAM-4735: --- This issue has been migrated to https://github.com/apache/beam/issues/18981 > Make HBaseIO.read() based on SDF by default > --- > > Key: BEAM-4735 > URL: https://issues.apache.org/jira/browse/BEAM-4735 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Priority: P3 > > BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method > still uses the Source based API because SDF does not support Dynamic Work > Rebalancing (DWR) but the Source API of HBase does, so changing it means > losing some functionality. > Since DWR is only supported by Dataflow once Dataflow supports SDF + DWR we > can move the main read() function to use the SDF API and remove the Source > based implementation. The rest of the runners already support Bounded based > Reads (like HBase based on SDF does) via a default translation without DWR. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF by default
[ https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037345#comment-17037345 ] Ismaël Mejía commented on BEAM-4735: Since Dataflow is the only runner that implements Dynamic Work Rebalancing (DWR) once Dataflow supports DWR we can switch HBaseIO to it. > Make HBaseIO.read() based on SDF by default > --- > > Key: BEAM-4735 > URL: https://issues.apache.org/jira/browse/BEAM-4735 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Priority: Minor > > BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method > still uses the Source based API because SDF does not support Dynamic Work > Rebalancing (DWR) but the Source API of HBase does, so changing it means > losing some functionality. > Since DWR is only supported by Dataflow once Dataflow supports SDF + DWR we > can move the main read() function to use the SDF API and remove the Source > based implementation. The rest of the runners already support Bounded based > Reads (like HBase based on SDF does) via a default translation without DWR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF by default
[ https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037344#comment-17037344 ] Ismaël Mejía commented on BEAM-4735: Thanks, is BEAM-4287 still a pending thing? > Make HBaseIO.read() based on SDF by default > --- > > Key: BEAM-4735 > URL: https://issues.apache.org/jira/browse/BEAM-4735 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Priority: Minor > > BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method > still uses the Source based API because SDF does not support Dynamic Work > Rebalancing (DWR) but the Source API of HBase does, so changing it means > losing some functionality. > Since DWR is only supported by Dataflow once Dataflow supports SDF + DWR we > can move the main read() function to use the SDF API and remove the Source > based implementation. The rest of the runners already support Bounded based > Reads (like HBase based on SDF does) via a default translation without DWR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF by default
[ https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037335#comment-17037335 ] Luke Cwik commented on BEAM-4735: - https://issues.apache.org/jira/browse/BEAM-4737 > Make HBaseIO.read() based on SDF by default > --- > > Key: BEAM-4735 > URL: https://issues.apache.org/jira/browse/BEAM-4735 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Priority: Minor > > BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method > still uses the Source based API because SDF does not support Dynamic Work > Rebalancing (DWR) but the Source API of HBase does, so changing it means > losing some functionality. > Since DWR is only supported by Dataflow once Dataflow supports SDF + DWR we > can move the main read() function to use the SDF API and remove the Source > based implementation. The rest of the runners already support Bounded based > Reads (like HBase based on SDF does) via a default translation without DWR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF by default
[ https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033429#comment-17033429 ] Ismaël Mejía commented on BEAM-4735: [~lcwik] is there any Jira tracking Dynamic Work Rebalancing on Google Dataflow for SplittableDoFn? > Make HBaseIO.read() based on SDF by default > --- > > Key: BEAM-4735 > URL: https://issues.apache.org/jira/browse/BEAM-4735 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Priority: Minor > > BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method > still uses the Source based API because SDF does not support Dynamic Work > Rebalancing (DWR) but the Source API of HBase does, so changing it means > losing some functionality. > Since DWR is only supported by Dataflow once Dataflow supports SDF + DWR we > can move the main read() function to use the SDF API and remove the Source > based implementation. The rest of the runners already support Bounded based > Reads (like HBase based on SDF does) via a default translation without DWR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF
[ https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024990#comment-17024990 ] Ismaël Mejía commented on BEAM-4735: Oh interesting finding! Just filled BEAM-9204 to tackle it. > Make HBaseIO.read() based on SDF > > > Key: BEAM-4735 > URL: https://issues.apache.org/jira/browse/BEAM-4735 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Priority: Minor > > BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method > still uses the Source based API for two reasons: > 1. Most distributed runners don't supports Bounded SDF today. > 2. SDF does not support Dynamic Work Rebalancing but the Source API of HBase > already supports it so changing it means losing some functionality. > Once there is improvements in both (1) and (2) we should consider moving the > main read() function to use the SDF API and remove the Source based > implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF
[ https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024812#comment-17024812 ] Luke Cwik commented on BEAM-4735: - I noticed there was a bug in the `@SplitRestriction`. The range input parameter is not being used to restrict the splitRanges that are being returned. If multiple rounds of splitting happened, it could be that `@SplitRestriction` is invoked multiple times, once for each split leading to duplication of work. https://github.com/apache/beam/blob/0a37f19e274b9d766f9eee2228460226c81b6b7c/sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseReadSplittableDoFn.java#L87 > Make HBaseIO.read() based on SDF > > > Key: BEAM-4735 > URL: https://issues.apache.org/jira/browse/BEAM-4735 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Priority: Minor > > BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method > still uses the Source based API for two reasons: > 1. Most distributed runners don't supports Bounded SDF today. > 2. SDF does not support Dynamic Work Rebalancing but the Source API of HBase > already supports it so changing it means losing some functionality. > Once there is improvements in both (1) and (2) we should consider moving the > main read() function to use the SDF API and remove the Source based > implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF
[ https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738000#comment-16738000 ] Ismaël Mejía commented on BEAM-4735: This will be the default once SDF supports DWR (and we implement this support). > Make HBaseIO.read() based on SDF > > > Key: BEAM-4735 > URL: https://issues.apache.org/jira/browse/BEAM-4735 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method > still uses the Source based API for two reasons: > 1. Most distributed runners don't supports Bounded SDF today. > 2. SDF does not support Dynamic Work Rebalancing but the Source API of HBase > already supports it so changing it means losing some functionality. > Once there is improvements in both (1) and (2) we should consider moving the > main read() function to use the SDF API and remove the Source based > implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)