[GitHub] [flink] Airblader commented on a change in pull request #17011: [FLINK-23663][table-planner] Reduce state size of ChangelogNormalize

2021-08-31 Thread GitBox


Airblader commented on a change in pull request #17011:
URL: https://github.com/apache/flink/pull/17011#discussion_r698432369



##
File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/physical/stream/PushFilterPastChangelogNormalizeRule.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.physical.stream;
+
+import org.apache.flink.annotation.Internal;
+import 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalCalc;
+import 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalChangelogNormalize;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexLocalRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexProgram;
+import org.apache.calcite.rex.RexProgramBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.util.Pair;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import static 
org.apache.flink.table.planner.plan.utils.RexNodeExtractor.extractRefInputFields;
+
+/**
+ * Pushes primary key filters through a {@link 
StreamPhysicalChangelogNormalize ChangelogNormalize}
+ * operator to reduce its state size.
+ *
+ * This rule looks for Calc → ChangelogNormalize where the {@link 
StreamPhysicalCalc Calc}
+ * contains a filter condition. The condition is transformed into CNF and then 
each conjunction is
+ * tested for whether it affects only primary key columns. If such conditions 
exist, they are moved
+ * into a new, separate Calc and pushed through the ChangelogNormalize 
operator. ChangelogNormalize
+ * keeps state for every unique key it encounters, thus pushing filters on the 
primary key in front
+ * of it helps reduce the size of its state.
+ *
+ * Note that pushing primary key filters is safe to do, but pushing any 
other filters would lead
+ * to incorrect results.

Review comment:
   I'll settle for "… can lead to …" which adds a bit less noise in the 
sentence (a system that is only broken in some scenarios is broken overall, 
IMO).

##
File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/physical/stream/PushFilterPastChangelogNormalizeRule.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.physical.stream;
+
+import org.apache.flink.annotation.Internal;
+import 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalCalc;
+import 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalChangelogNormalize;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexLocalRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexProgram;
+import 

[GitHub] [flink] Airblader commented on a change in pull request #17011: [FLINK-23663][table-planner] Reduce state size of ChangelogNormalize

2021-08-30 Thread GitBox


Airblader commented on a change in pull request #17011:
URL: https://github.com/apache/flink/pull/17011#discussion_r698432369



##
File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/physical/stream/PushFilterPastChangelogNormalizeRule.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.physical.stream;
+
+import org.apache.flink.annotation.Internal;
+import 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalCalc;
+import 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalChangelogNormalize;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexLocalRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexProgram;
+import org.apache.calcite.rex.RexProgramBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.util.Pair;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import static 
org.apache.flink.table.planner.plan.utils.RexNodeExtractor.extractRefInputFields;
+
+/**
+ * Pushes primary key filters through a {@link 
StreamPhysicalChangelogNormalize ChangelogNormalize}
+ * operator to reduce its state size.
+ *
+ * This rule looks for Calc → ChangelogNormalize where the {@link 
StreamPhysicalCalc Calc}
+ * contains a filter condition. The condition is transformed into CNF and then 
each conjunction is
+ * tested for whether it affects only primary key columns. If such conditions 
exist, they are moved
+ * into a new, separate Calc and pushed through the ChangelogNormalize 
operator. ChangelogNormalize
+ * keeps state for every unique key it encounters, thus pushing filters on the 
primary key in front
+ * of it helps reduce the size of its state.
+ *
+ * Note that pushing primary key filters is safe to do, but pushing any 
other filters would lead
+ * to incorrect results.

Review comment:
   I'll settle for "… can lead to …" which adds a bit less noise in the 
sentence (a system that is only broken in some scenarios is broken overall, 
IMO).

##
File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/physical/stream/PushFilterPastChangelogNormalizeRule.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.physical.stream;
+
+import org.apache.flink.annotation.Internal;
+import 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalCalc;
+import 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalChangelogNormalize;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexLocalRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexProgram;
+import 

[GitHub] [flink] Airblader commented on a change in pull request #17011: [FLINK-23663][table-planner] Reduce state size of ChangelogNormalize

2021-08-30 Thread GitBox


Airblader commented on a change in pull request #17011:
URL: https://github.com/apache/flink/pull/17011#discussion_r697805528



##
File path: 
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/factories/TableFactoryHarness.java
##
@@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.factories;
+
+import org.apache.flink.configuration.ConfigOption;
+import org.apache.flink.configuration.ConfigOptions;
+import org.apache.flink.configuration.ReadableConfig;
+import org.apache.flink.streaming.api.functions.sink.SinkFunction;
+import org.apache.flink.streaming.api.functions.source.SourceFunction;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.TableDescriptor;
+import org.apache.flink.table.api.ValidationException;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.connector.ChangelogMode;
+import org.apache.flink.table.connector.sink.DynamicTableSink;
+import 
org.apache.flink.table.connector.sink.DynamicTableSink.SinkRuntimeProvider;
+import org.apache.flink.table.connector.sink.SinkFunctionProvider;
+import org.apache.flink.table.connector.source.DynamicTableSource;
+import org.apache.flink.table.connector.source.ScanTableSource;
+import 
org.apache.flink.table.connector.source.ScanTableSource.ScanRuntimeProvider;
+import org.apache.flink.table.connector.source.SourceFunctionProvider;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.factories.DynamicTableFactory;
+import org.apache.flink.table.factories.DynamicTableSinkFactory;
+import org.apache.flink.table.factories.DynamicTableSourceFactory;
+import org.apache.flink.table.factories.FactoryUtil;
+import org.apache.flink.testutils.junit.SharedObjects;
+import org.apache.flink.util.InstantiationUtil;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Base64;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+
+/**
+ * Provides a flexible testing harness for table factories.
+ *
+ * This testing harness allows writing custom sources and sinks which can 
be directly
+ * instantiated from the test. This avoids having to implement a factory, and 
enables using the
+ * {@link SharedObjects} rule to get direct access to the underlying 
source/sink from the test.
+ *
+ * Note that the underlying source/sink must be {@link Serializable}. It is 
recommended to extend
+ * from {@link SourceBase} or {@link SinkBase} which provides default 
implementations for most
+ * methods as well as some convenience methods.
+ *
+ * The harness provides a {@link Factory}. You can register a source / sink 
through configuration
+ * by passing a base64-encoded serialization. The harness provides convenience 
methods to make this
+ * process as simple as possible.
+ *
+ * Example:
+ *
+ * {@code
+ * public class CustomSourceTest {
+ * {@literal @}Rule public SharedObjects sharedObjects = 
SharedObjects.create();
+ *
+ * {@literal @}Test
+ * public void test() {
+ * SharedReference> appliedLimits = sharedObjects.add(new 
ArrayList<>());

Review comment:
   Actually using SharedObjects here is entirely unnecessary. We 
instantiate the source specifically for this test, so we can just keep state on 
the source and assert on that. Will update and also the docs for the harness.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Airblader commented on a change in pull request #17011: [FLINK-23663][table-planner] Reduce state size of ChangelogNormalize

2021-08-27 Thread GitBox


Airblader commented on a change in pull request #17011:
URL: https://github.com/apache/flink/pull/17011#discussion_r697805773



##
File path: 
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/factories/TableFactoryHarness.java
##
@@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.factories;
+
+import org.apache.flink.configuration.ConfigOption;
+import org.apache.flink.configuration.ConfigOptions;
+import org.apache.flink.configuration.ReadableConfig;
+import org.apache.flink.streaming.api.functions.sink.SinkFunction;
+import org.apache.flink.streaming.api.functions.source.SourceFunction;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.TableDescriptor;
+import org.apache.flink.table.api.ValidationException;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.connector.ChangelogMode;
+import org.apache.flink.table.connector.sink.DynamicTableSink;
+import 
org.apache.flink.table.connector.sink.DynamicTableSink.SinkRuntimeProvider;
+import org.apache.flink.table.connector.sink.SinkFunctionProvider;
+import org.apache.flink.table.connector.source.DynamicTableSource;
+import org.apache.flink.table.connector.source.ScanTableSource;
+import 
org.apache.flink.table.connector.source.ScanTableSource.ScanRuntimeProvider;
+import org.apache.flink.table.connector.source.SourceFunctionProvider;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.factories.DynamicTableFactory;
+import org.apache.flink.table.factories.DynamicTableSinkFactory;
+import org.apache.flink.table.factories.DynamicTableSourceFactory;
+import org.apache.flink.table.factories.FactoryUtil;
+import org.apache.flink.testutils.junit.SharedObjects;
+import org.apache.flink.util.InstantiationUtil;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Base64;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+
+/**
+ * Provides a flexible testing harness for table factories.
+ *
+ * This testing harness allows writing custom sources and sinks which can 
be directly
+ * instantiated from the test. This avoids having to implement a factory, and 
enables using the
+ * {@link SharedObjects} rule to get direct access to the underlying 
source/sink from the test.
+ *
+ * Note that the underlying source/sink must be {@link Serializable}. It is 
recommended to extend
+ * from {@link SourceBase} or {@link SinkBase} which provides default 
implementations for most
+ * methods as well as some convenience methods.
+ *
+ * The harness provides a {@link Factory}. You can register a source / sink 
through configuration
+ * by passing a base64-encoded serialization. The harness provides convenience 
methods to make this
+ * process as simple as possible.
+ *
+ * Example:
+ *
+ * {@code
+ * public class CustomSourceTest {
+ * {@literal @}Rule public SharedObjects sharedObjects = 
SharedObjects.create();
+ *
+ * {@literal @}Test
+ * public void test() {
+ * SharedReference> appliedLimits = sharedObjects.add(new 
ArrayList<>());
+ *
+ * Schema schema = Schema.newBuilder().build();
+ * TableDescriptor sourceDescriptor = 
TableFactoryHarness.forSource(schema,
+ * new CustomSource(appliedLimits));
+ *
+ * tEnv.createTable("T", sourceDescriptor);
+ * tEnv.explainSql("SELECT * FROM T LIMIT 42");
+ *
+ * assertEquals(1, appliedLimits.get().size());
+ * assertEquals((Long) 42L, appliedLimits.get().get(0));
+ * }
+ *
+ * private static class CustomSource extends SourceBase implements 
SupportsLimitPushDown {
+ * private final SharedReference> appliedLimits;
+ *
+ * CustomSource(SharedReference> appliedLimits) {
+ * this.appliedLimits = appliedLimits;
+ * }
+ *
+ * {@literal @}Override
+ * public void applyLimit(long limit) {
+ * appliedLimits.get().add(limit);
+ * }
+ * }
+ * }
+ * }
+ */
+public class TableFactoryHarness {
+
+/** Factory identifier for {@link Factory}. */
+public static final String IDENTIFIER 

[GitHub] [flink] Airblader commented on a change in pull request #17011: [FLINK-23663][table-planner] Reduce state size of ChangelogNormalize

2021-08-27 Thread GitBox


Airblader commented on a change in pull request #17011:
URL: https://github.com/apache/flink/pull/17011#discussion_r697805629



##
File path: 
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/factories/TableFactoryHarness.java
##
@@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.factories;
+
+import org.apache.flink.configuration.ConfigOption;
+import org.apache.flink.configuration.ConfigOptions;
+import org.apache.flink.configuration.ReadableConfig;
+import org.apache.flink.streaming.api.functions.sink.SinkFunction;
+import org.apache.flink.streaming.api.functions.source.SourceFunction;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.TableDescriptor;
+import org.apache.flink.table.api.ValidationException;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.connector.ChangelogMode;
+import org.apache.flink.table.connector.sink.DynamicTableSink;
+import 
org.apache.flink.table.connector.sink.DynamicTableSink.SinkRuntimeProvider;
+import org.apache.flink.table.connector.sink.SinkFunctionProvider;
+import org.apache.flink.table.connector.source.DynamicTableSource;
+import org.apache.flink.table.connector.source.ScanTableSource;
+import 
org.apache.flink.table.connector.source.ScanTableSource.ScanRuntimeProvider;
+import org.apache.flink.table.connector.source.SourceFunctionProvider;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.factories.DynamicTableFactory;
+import org.apache.flink.table.factories.DynamicTableSinkFactory;
+import org.apache.flink.table.factories.DynamicTableSourceFactory;
+import org.apache.flink.table.factories.FactoryUtil;
+import org.apache.flink.testutils.junit.SharedObjects;
+import org.apache.flink.util.InstantiationUtil;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Base64;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+
+/**
+ * Provides a flexible testing harness for table factories.
+ *
+ * This testing harness allows writing custom sources and sinks which can 
be directly
+ * instantiated from the test. This avoids having to implement a factory, and 
enables using the
+ * {@link SharedObjects} rule to get direct access to the underlying 
source/sink from the test.
+ *
+ * Note that the underlying source/sink must be {@link Serializable}. It is 
recommended to extend
+ * from {@link SourceBase} or {@link SinkBase} which provides default 
implementations for most
+ * methods as well as some convenience methods.
+ *
+ * The harness provides a {@link Factory}. You can register a source / sink 
through configuration
+ * by passing a base64-encoded serialization. The harness provides convenience 
methods to make this
+ * process as simple as possible.
+ *
+ * Example:
+ *
+ * {@code
+ * public class CustomSourceTest {
+ * {@literal @}Rule public SharedObjects sharedObjects = 
SharedObjects.create();
+ *
+ * {@literal @}Test
+ * public void test() {
+ * SharedReference> appliedLimits = sharedObjects.add(new 
ArrayList<>());
+ *
+ * Schema schema = Schema.newBuilder().build();
+ * TableDescriptor sourceDescriptor = 
TableFactoryHarness.forSource(schema,
+ * new CustomSource(appliedLimits));
+ *
+ * tEnv.createTable("T", sourceDescriptor);
+ * tEnv.explainSql("SELECT * FROM T LIMIT 42");
+ *
+ * assertEquals(1, appliedLimits.get().size());
+ * assertEquals((Long) 42L, appliedLimits.get().get(0));
+ * }
+ *
+ * private static class CustomSource extends SourceBase implements 
SupportsLimitPushDown {
+ * private final SharedReference> appliedLimits;
+ *
+ * CustomSource(SharedReference> appliedLimits) {
+ * this.appliedLimits = appliedLimits;
+ * }
+ *
+ * {@literal @}Override
+ * public void applyLimit(long limit) {
+ * appliedLimits.get().add(limit);
+ * }
+ * }
+ * }
+ * }
+ */
+public class TableFactoryHarness {
+
+/** Factory identifier for {@link Factory}. */
+public static final String IDENTIFIER 

[GitHub] [flink] Airblader commented on a change in pull request #17011: [FLINK-23663][table-planner] Reduce state size of ChangelogNormalize

2021-08-27 Thread GitBox


Airblader commented on a change in pull request #17011:
URL: https://github.com/apache/flink/pull/17011#discussion_r697805528



##
File path: 
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/factories/TableFactoryHarness.java
##
@@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.factories;
+
+import org.apache.flink.configuration.ConfigOption;
+import org.apache.flink.configuration.ConfigOptions;
+import org.apache.flink.configuration.ReadableConfig;
+import org.apache.flink.streaming.api.functions.sink.SinkFunction;
+import org.apache.flink.streaming.api.functions.source.SourceFunction;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.TableDescriptor;
+import org.apache.flink.table.api.ValidationException;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.connector.ChangelogMode;
+import org.apache.flink.table.connector.sink.DynamicTableSink;
+import 
org.apache.flink.table.connector.sink.DynamicTableSink.SinkRuntimeProvider;
+import org.apache.flink.table.connector.sink.SinkFunctionProvider;
+import org.apache.flink.table.connector.source.DynamicTableSource;
+import org.apache.flink.table.connector.source.ScanTableSource;
+import 
org.apache.flink.table.connector.source.ScanTableSource.ScanRuntimeProvider;
+import org.apache.flink.table.connector.source.SourceFunctionProvider;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.factories.DynamicTableFactory;
+import org.apache.flink.table.factories.DynamicTableSinkFactory;
+import org.apache.flink.table.factories.DynamicTableSourceFactory;
+import org.apache.flink.table.factories.FactoryUtil;
+import org.apache.flink.testutils.junit.SharedObjects;
+import org.apache.flink.util.InstantiationUtil;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Base64;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+
+/**
+ * Provides a flexible testing harness for table factories.
+ *
+ * This testing harness allows writing custom sources and sinks which can 
be directly
+ * instantiated from the test. This avoids having to implement a factory, and 
enables using the
+ * {@link SharedObjects} rule to get direct access to the underlying 
source/sink from the test.
+ *
+ * Note that the underlying source/sink must be {@link Serializable}. It is 
recommended to extend
+ * from {@link SourceBase} or {@link SinkBase} which provides default 
implementations for most
+ * methods as well as some convenience methods.
+ *
+ * The harness provides a {@link Factory}. You can register a source / sink 
through configuration
+ * by passing a base64-encoded serialization. The harness provides convenience 
methods to make this
+ * process as simple as possible.
+ *
+ * Example:
+ *
+ * {@code
+ * public class CustomSourceTest {
+ * {@literal @}Rule public SharedObjects sharedObjects = 
SharedObjects.create();
+ *
+ * {@literal @}Test
+ * public void test() {
+ * SharedReference> appliedLimits = sharedObjects.add(new 
ArrayList<>());

Review comment:
   Actually using SharedObjects here is entirely unnecessary. We 
instantiate the source specifically for this test, so we can just keep state on 
the source and assert on that. Will update and also the docs for the harness.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Airblader commented on a change in pull request #17011: [FLINK-23663][table-planner] Reduce state size of ChangelogNormalize

2021-08-27 Thread GitBox


Airblader commented on a change in pull request #17011:
URL: https://github.com/apache/flink/pull/17011#discussion_r697805051



##
File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/physical/stream/PushFilterPastChangelogNormalizeRule.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.physical.stream;
+
+import org.apache.flink.annotation.Internal;
+import 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalCalc;
+import 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalChangelogNormalize;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexLocalRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexProgram;
+import org.apache.calcite.rex.RexProgramBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.util.Pair;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import static 
org.apache.flink.table.planner.plan.utils.RexNodeExtractor.extractRefInputFields;
+
+/**
+ * Pushes primary key filters through a {@link 
StreamPhysicalChangelogNormalize ChangelogNormalize}
+ * operator to reduce its state size.
+ *
+ * This rule looks for Calc → ChangelogNormalize where the {@link 
StreamPhysicalCalc Calc}
+ * contains a filter condition. The condition is transformed into CNF and then 
each conjunction is
+ * tested for whether it affects only primary key columns. If such conditions 
exist, they are moved
+ * into a new, separate Calc and pushed through the ChangelogNormalize 
operator. ChangelogNormalize
+ * keeps state for every unique key it encounters, thus pushing filters on the 
primary key in front
+ * of it helps reduce the size of its state.
+ *
+ * Note that pushing primary key filters is safe to do, but pushing any 
other filters would lead

Review comment:
   Wonky sentence here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org