[jira] [Created] (FLINK-35530) protobuf-format support discard unknow field Improve deserialization performance
JingWei Li created FLINK-35530: -- Summary: protobuf-format support discard unknow field Improve deserialization performance Key: FLINK-35530 URL: https://issues.apache.org/jira/browse/FLINK-35530 Project: Flink Issue Type: Improvement Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Reporter: JingWei Li Add a protobuf option that allows calling _CodedStreamHelper.discardUnknownFields_ to save the performance overhead of deserializing unknown fields when decoding data. {code:java} create table source (...) with ( 'format' = 'protobuf', 'protobuf.discard-unknown-field' = 'true' ){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35529) protobuf-format compatible protobuf bad indentifier
[ https://issues.apache.org/jira/browse/FLINK-35529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JingWei Li updated FLINK-35529: --- Description: The main bug occurs during the decode process. The decode method is a method generated by the codegen of Flink at runtime, and in the process of generating the decode method, some getter and setter methods of the protobuf object need to be used to construct the RowData. Currently, the way to generate the getter and setter is through string concatenation, using the "get" prefix and camelCase variable names. Some special characters may lead to errors in the generated Getter and Setter methods, thus causing bugs. - Examples: - If the protobuf defines a field named "class", the Getter method will be getClass(), which conflicts with the Object.getClass() method, so the real value of the "class" field cannot be accessed. The method generated by protoc is getClass_(). - If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods are both getABC(), causing a naming conflict. The methods generated by protoc are getABC+sequence number(). Solution: {code:java} //case a if (name1 + "Count" == name2) { *info = "both repeated field \"" + field1->name() + "\" and singular " + "field \"" + field2->name() + "\" generate the method \"" + "get" + name1 + "Count()\""; return true; } if (name1 + "List" == name2) { *info = "both repeated field \"" + field1->name() + "\" and singular " + "field \"" + field2->name() + "\" generate the method \"" + "get" + name1 + "List()\""; return true; } //case b if (name == other_name) { is_conflict[i] = is_conflict[j] = true; conflict_reason[i] = conflict_reason[j] = "capitalized name of field \"" + field->name() + "\" conflicts with field \"" + other->name() + "\""; } else if (IsConflicting(field, name, other, other_name, _reason[j])) { is_conflict[i] = is_conflict[j] = true; conflict_reason[i] = conflict_reason[j]; } //solver for (int i = 0; i < fields.size(); ++i) { const FieldDescriptor* field = fields[i]; FieldGeneratorInfo info; info.name = CamelCaseFieldName(field); info.capitalized_name = UnderscoresToCapitalizedCamelCase(field); // For fields conflicting with some other fields, we append the field // number to their field names in generated code to avoid conflicts. if (is_conflict[i]) { info.name += StrCat(field->number()); info.capitalized_name += StrCat(field->number()); info.disambiguated_reason = conflict_reason[i]; } field_generator_info_map_[field] = info; } {code} was: The main bug occurs during the decode process. The decode method is a method generated by the codegen of Flink at runtime, and in the process of generating the decode method, some getter and setter methods of the protobuf object need to be used to construct the RowData. Currently, the way to generate the getter and setter is through string concatenation, using the "get" prefix and camelCase variable names. Some special characters may lead to errors in the generated Getter and Setter methods, thus causing bugs. - Examples: - If the protobuf defines a field named "class", the Getter method will be getClass(), which conflicts with the Object.getClass() method, so the real value of the "class" field cannot be accessed. The method generated by protoc is getClass_(). - If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods are both getABC(), causing a naming conflict. The methods generated by protoc are getABC+sequence number(). resolution > protobuf-format compatible protobuf bad indentifier > --- > > Key: FLINK-35529 > URL: https://issues.apache.org/jira/browse/FLINK-35529 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.17.2 >Reporter: JingWei Li >Priority: Major > Fix For: 2.0.0 > > > The main bug occurs during the decode process. The decode method is a method > generated by the codegen of Flink at runtime, and in the process of > generating the decode method, some getter and setter methods of the protobuf > object need to be used to construct the RowData. Currently, the way to > generate the getter and setter is through string concatenation, using the > "get" prefix and camelCase variable names. Some special characters may lead > to errors in the generated Getter and Setter methods, thus causing bugs. > - Examples: > - If the protobuf defines a field named "class", the Getter method will be > getClass(), which conflicts with the Object.getClass() method, so the real > value of the "class" field cannot be accessed. The method generated by protoc > is getClass_(). > - If
[jira] [Updated] (FLINK-35529) protobuf-format compatible protobuf bad indentifier
[ https://issues.apache.org/jira/browse/FLINK-35529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JingWei Li updated FLINK-35529: --- Description: The main bug occurs during the decode process. The decode method is a method generated by the codegen of Flink at runtime, and in the process of generating the decode method, some getter and setter methods of the protobuf object need to be used to construct the RowData. Currently, the way to generate the getter and setter is through string concatenation, using the "get" prefix and camelCase variable names. Some special characters may lead to errors in the generated Getter and Setter methods, thus causing bugs. - Examples: - If the protobuf defines a field named "class", the Getter method will be getClass(), which conflicts with the Object.getClass() method, so the real value of the "class" field cannot be accessed. The method generated by protoc is getClass_(). - If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods are both getABC(), causing a naming conflict. The methods generated by protoc are getABC+sequence number(). resolution was: The main bug occurs during the decode process. The decode method is a method generated by the codegen of Flink at runtime, and in the process of generating the decode method, some getter and setter methods of the protobuf object need to be used to construct the RowData. Currently, the way to generate the getter and setter is through string concatenation, using the "get" prefix and camelCase variable names. Some special characters may lead to errors in the generated Getter and Setter methods, thus causing bugs. - Examples: - If the protobuf defines a field named "class", the Getter method will be getClass(), which conflicts with the Object.getClass() method, so the real value of the "class" field cannot be accessed. The method generated by protoc is getClass_(). - If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods are both getABC(), causing a naming conflict. The methods generated by protoc are getABC+sequence number(). > protobuf-format compatible protobuf bad indentifier > --- > > Key: FLINK-35529 > URL: https://issues.apache.org/jira/browse/FLINK-35529 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.17.2 >Reporter: JingWei Li >Priority: Major > Fix For: 2.0.0 > > > The main bug occurs during the decode process. The decode method is a method > generated by the codegen of Flink at runtime, and in the process of > generating the decode method, some getter and setter methods of the protobuf > object need to be used to construct the RowData. Currently, the way to > generate the getter and setter is through string concatenation, using the > "get" prefix and camelCase variable names. Some special characters may lead > to errors in the generated Getter and Setter methods, thus causing bugs. > - Examples: > - If the protobuf defines a field named "class", the Getter method will be > getClass(), which conflicts with the Object.getClass() method, so the real > value of the "class" field cannot be accessed. The method generated by protoc > is getClass_(). > - If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods > are both getABC(), causing a naming conflict. The methods generated by protoc > are getABC+sequence number(). > resolution -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35529) protobuf-format compatible protobuf bad indentifier
[ https://issues.apache.org/jira/browse/FLINK-35529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JingWei Li updated FLINK-35529: --- Description: The main bug occurs during the decode process. The decode method is a method generated by the codegen of Flink at runtime, and in the process of generating the decode method, some getter and setter methods of the protobuf object need to be used to construct the RowData. Currently, the way to generate the getter and setter is through string concatenation, using the "get" prefix and camelCase variable names. Some special characters may lead to errors in the generated Getter and Setter methods, thus causing bugs. - Examples: - If the protobuf defines a field named "class", the Getter method will be getClass(), which conflicts with the Object.getClass() method, so the real value of the "class" field cannot be accessed. The method generated by protoc is getClass_(). - If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods are both getABC(), causing a naming conflict. The methods generated by protoc are getABC+sequence number(). was:The main bug occurs during the decode process. The decode method is a method generated by the codegen of Flink at runtime, and in the process of generating the decode method, some getter and setter methods of the protobuf object need to be used to construct the RowData. Currently, the way to generate the getter and setter is through string concatenation, using the "get" prefix and camelCase variable names. Some special characters may lead to errors in the generated Getter and Setter methods, thus causing bugs. > protobuf-format compatible protobuf bad indentifier > --- > > Key: FLINK-35529 > URL: https://issues.apache.org/jira/browse/FLINK-35529 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.17.2 >Reporter: JingWei Li >Priority: Major > Fix For: 2.0.0 > > > The main bug occurs during the decode process. The decode method is a method > generated by the codegen of Flink at runtime, and in the process of > generating the decode method, some getter and setter methods of the protobuf > object need to be used to construct the RowData. Currently, the way to > generate the getter and setter is through string concatenation, using the > "get" prefix and camelCase variable names. Some special characters may lead > to errors in the generated Getter and Setter methods, thus causing bugs. > - Examples: > - If the protobuf defines a field named "class", the Getter method will be > getClass(), which conflicts with the Object.getClass() method, so the real > value of the "class" field cannot be accessed. The method generated by protoc > is getClass_(). > - If the protobuf defines two fields "a_b_c" and "ab_c", the Getter methods > are both getABC(), causing a naming conflict. The methods generated by protoc > are getABC+sequence number(). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35529) protobuf-format compatible protobuf bad indentifier
JingWei Li created FLINK-35529: -- Summary: protobuf-format compatible protobuf bad indentifier Key: FLINK-35529 URL: https://issues.apache.org/jira/browse/FLINK-35529 Project: Flink Issue Type: Improvement Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.17.2 Reporter: JingWei Li Fix For: 2.0.0 The main bug occurs during the decode process. The decode method is a method generated by the codegen of Flink at runtime, and in the process of generating the decode method, some getter and setter methods of the protobuf object need to be used to construct the RowData. Currently, the way to generate the getter and setter is through string concatenation, using the "get" prefix and camelCase variable names. Some special characters may lead to errors in the generated Getter and Setter methods, thus causing bugs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33791) Fix NPE when array is null in PostgresArrayConverter in flink-connector-jdbc
JingWei Li created FLINK-33791: -- Summary: Fix NPE when array is null in PostgresArrayConverter in flink-connector-jdbc Key: FLINK-33791 URL: https://issues.apache.org/jira/browse/FLINK-33791 Project: Flink Issue Type: Bug Components: Connectors / JDBC Reporter: JingWei Li {code:java} // private JdbcDeserializationConverter createPostgresArrayConverter(ArrayType arrayType) { // Since PGJDBC 42.2.15 (https://github.com/pgjdbc/pgjdbc/pull/1194) bytea[] is wrapped in // primitive byte arrays final Class elementClass = LogicalTypeUtils.toInternalConversionClass(arrayType.getElementType()); final JdbcDeserializationConverter elementConverter = createNullableInternalConverter(arrayType.getElementType()); return val -> { @SuppressWarnings("unchecked") T pgArray = (T) val; Object[] in = (Object[]) pgArray.getArray(); final Object[] array = (Object[]) Array.newInstance(elementClass, in.length); for (int i = 0; i < in.length; i++) { array[i] = elementConverter.deserialize(in[i]); } return new GenericArrayData(array); }; } {code} When use this method, array is null pgArray.getArray() will throw NPE。 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33790) Upsert statement filter unique key field colume in mysql dielact
JingWei Li created FLINK-33790: -- Summary: Upsert statement filter unique key field colume in mysql dielact Key: FLINK-33790 URL: https://issues.apache.org/jira/browse/FLINK-33790 Project: Flink Issue Type: Improvement Components: Connectors / JDBC Reporter: JingWei Li example: `col2` and `col4` is unique key in table `my_table` {code:java} INSERT INTO `my_table`(`col1`, `col2`, `col3`, `col4`, `col5`) VALUES (?, ?, ?, ?, ?) ON DUPLICATE KEY UPDATE `col1`=VALUES(`col1`), `col2`=VALUES(`col2`), `col3`=VALUES(`col3`), `col4`=VALUES(`col4`), `col5`=VALUES(`col5`){code} result: {code:java} INSERT INTO `my_table`(`col1`, `col2`, `col3`, `col4`, `col5`) VALUES (?, ?, ?, ?, ?) ON DUPLICATE KEY UPDATE `col1`=VALUES(`col1`), `col3`=VALUES(`col3`), `col5`=VALUES(`col5`) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)