Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-18 Thread via GitHub


LadyForest closed pull request #23678: [FLINK-32993][table] Datagen connector 
handles length-constrained fields according to the schema definition by default
URL: https://github.com/apache/flink/pull/23678


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-17 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1429393701


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/DataGenTableSourceFactory.java:
##
@@ -141,15 +143,15 @@ private DataGeneratorContainer createContainer(
 }
 
 private void validateFieldOptions(String name, DataType type, 
ReadableConfig options) {
-ConfigOption lenOption =
+ConfigOption varLenOption =
 key(DataGenConnectorOptionsUtil.FIELDS
 + "."
 + name
 + "."
 + DataGenConnectorOptionsUtil.VAR_LEN)
 .booleanType()
 .defaultValue(false);
-options.getOptional(lenOption)
+options.getOptional(varLenOption)

Review Comment:
   sorry to misunderstand. I invoke this to filter the table that contains ` 
'fields.#.var-len' = 'true' `, then we could use `ifPresent` check if it is 
sufficient to use such option.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-17 Thread via GitHub


LadyForest commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1429375718


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/DataGenTableSourceFactory.java:
##
@@ -141,15 +143,15 @@ private DataGeneratorContainer createContainer(
 }
 
 private void validateFieldOptions(String name, DataType type, 
ReadableConfig options) {
-ConfigOption lenOption =
+ConfigOption varLenOption =
 key(DataGenConnectorOptionsUtil.FIELDS
 + "."
 + name
 + "."
 + DataGenConnectorOptionsUtil.VAR_LEN)
 .booleanType()
 .defaultValue(false);
-options.getOptional(lenOption)
+options.getOptional(varLenOption)

Review Comment:
   > the prior name is not clear enough, so I use a new one instead.
   
   I mean L#155
   `filter(option -> option)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-14 Thread via GitHub


liyubin117 commented on PR #23678:
URL: https://github.com/apache/flink/pull/23678#issuecomment-1855638485

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-11 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1423393030


##
flink-table/flink-table-api-java-bridge/src/test/java/org/apache/flink/table/factories/DataGenTableSourceFactoryTest.java:
##
@@ -297,10 +295,104 @@ void testVariableLengthDataGeneration() throws Exception 
{
 anyCauseMatches(
 ValidationException.class,
 String.format(
-"Only supports specifying '%s' option 
for variable-length types (varchar, string, varbinary, bytes). The type of 
field %s is not within this range.",
+"Only supports specifying '%s' option 
for variable-length types (VARCHAR/STRING/VARBINARY/BYTES). The type of field 
'%s' is not within this range.",
 
DataGenConnectorOptions.FIELD_VAR_LEN.key(), "f4")));
 }
 
+@Test
+void testVariableLengthDataType() throws Exception {
+DescriptorProperties descriptor = new DescriptorProperties();
+final int rowsNumber = 200;
+descriptor.putString(FactoryUtil.CONNECTOR.key(), "datagen");
+descriptor.putLong(DataGenConnectorOptions.NUMBER_OF_ROWS.key(), 
rowsNumber);
+
+List results = runGenerator(LENGTH_CONSTRAINED_SCHEMA, 
descriptor);
+assertThat(results).hasSize(rowsNumber);
+
+for (RowData row : results) {
+assertThat(row.getString(2).toString()).hasSize(30);
+assertThat(row.getBinary(3)).hasSize(20);
+assertThat(row.getString(4).toString())
+
.hasSize(RandomGeneratorVisitor.RANDOM_STRING_LENGTH_DEFAULT);
+}
+
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS + ".f2." + 
DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS + ".f2." + 
DataGenConnectorOptionsUtil.LENGTH,
+25);
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS + ".f4." + 
DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS + ".f4." + 
DataGenConnectorOptionsUtil.LENGTH,
+);
+
+results = runGenerator(LENGTH_CONSTRAINED_SCHEMA, descriptor);
+
+for (RowData row : results) {
+assertThat(row.getString(2).toString()).hasSize(25);
+assertThat(row.getString(4).toString()).hasSize();
+}
+
+assertThatThrownBy(
+() -> {
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS
++ ".f3."
++ DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS
++ ".f3."
++ 
DataGenConnectorOptionsUtil.LENGTH,
+21);
+
+runGenerator(LENGTH_CONSTRAINED_SCHEMA, 
descriptor);
+})
+.satisfies(
+anyCauseMatches(
+ValidationException.class,
+"Custom length '21' for variable-length type 
(VARCHAR/STRING/VARBINARY/BYTES) field 'f3' should be shorter than '20' defined 
in the schema."));
+}
+
+@Test
+void testFixedLengthDataType() throws Exception {
+DescriptorProperties descriptor = new DescriptorProperties();
+final int rowsNumber = 200;
+descriptor.putString(FactoryUtil.CONNECTOR.key(), "datagen");
+descriptor.putLong(DataGenConnectorOptions.NUMBER_OF_ROWS.key(), 
rowsNumber);
+
+List results = runGenerator(LENGTH_CONSTRAINED_SCHEMA, 
descriptor);
+assertThat(results).hasSize(rowsNumber);
+
+for (RowData row : results) {
+assertThat(row.getString(0).toString()).hasSize(50);
+assertThat(row.getBinary(1)).hasSize(40);
+}
+
+assertThatThrownBy(
+() -> {
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS
++ ".f0."
++ DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS
+ 

Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-11 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1423393030


##
flink-table/flink-table-api-java-bridge/src/test/java/org/apache/flink/table/factories/DataGenTableSourceFactoryTest.java:
##
@@ -297,10 +295,104 @@ void testVariableLengthDataGeneration() throws Exception 
{
 anyCauseMatches(
 ValidationException.class,
 String.format(
-"Only supports specifying '%s' option 
for variable-length types (varchar, string, varbinary, bytes). The type of 
field %s is not within this range.",
+"Only supports specifying '%s' option 
for variable-length types (VARCHAR/STRING/VARBINARY/BYTES). The type of field 
'%s' is not within this range.",
 
DataGenConnectorOptions.FIELD_VAR_LEN.key(), "f4")));
 }
 
+@Test
+void testVariableLengthDataType() throws Exception {
+DescriptorProperties descriptor = new DescriptorProperties();
+final int rowsNumber = 200;
+descriptor.putString(FactoryUtil.CONNECTOR.key(), "datagen");
+descriptor.putLong(DataGenConnectorOptions.NUMBER_OF_ROWS.key(), 
rowsNumber);
+
+List results = runGenerator(LENGTH_CONSTRAINED_SCHEMA, 
descriptor);
+assertThat(results).hasSize(rowsNumber);
+
+for (RowData row : results) {
+assertThat(row.getString(2).toString()).hasSize(30);
+assertThat(row.getBinary(3)).hasSize(20);
+assertThat(row.getString(4).toString())
+
.hasSize(RandomGeneratorVisitor.RANDOM_STRING_LENGTH_DEFAULT);
+}
+
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS + ".f2." + 
DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS + ".f2." + 
DataGenConnectorOptionsUtil.LENGTH,
+25);
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS + ".f4." + 
DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS + ".f4." + 
DataGenConnectorOptionsUtil.LENGTH,
+);
+
+results = runGenerator(LENGTH_CONSTRAINED_SCHEMA, descriptor);
+
+for (RowData row : results) {
+assertThat(row.getString(2).toString()).hasSize(25);
+assertThat(row.getString(4).toString()).hasSize();
+}
+
+assertThatThrownBy(
+() -> {
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS
++ ".f3."
++ DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS
++ ".f3."
++ 
DataGenConnectorOptionsUtil.LENGTH,
+21);
+
+runGenerator(LENGTH_CONSTRAINED_SCHEMA, 
descriptor);
+})
+.satisfies(
+anyCauseMatches(
+ValidationException.class,
+"Custom length '21' for variable-length type 
(VARCHAR/STRING/VARBINARY/BYTES) field 'f3' should be shorter than '20' defined 
in the schema."));
+}
+
+@Test
+void testFixedLengthDataType() throws Exception {
+DescriptorProperties descriptor = new DescriptorProperties();
+final int rowsNumber = 200;
+descriptor.putString(FactoryUtil.CONNECTOR.key(), "datagen");
+descriptor.putLong(DataGenConnectorOptions.NUMBER_OF_ROWS.key(), 
rowsNumber);
+
+List results = runGenerator(LENGTH_CONSTRAINED_SCHEMA, 
descriptor);
+assertThat(results).hasSize(rowsNumber);
+
+for (RowData row : results) {
+assertThat(row.getString(0).toString()).hasSize(50);
+assertThat(row.getBinary(1)).hasSize(40);
+}
+
+assertThatThrownBy(
+() -> {
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS
++ ".f0."
++ DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS
+ 

Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-11 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1423385053


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/DataGenTableSourceFactory.java:
##
@@ -141,15 +143,15 @@ private DataGeneratorContainer createContainer(
 }
 
 private void validateFieldOptions(String name, DataType type, 
ReadableConfig options) {
-ConfigOption lenOption =
+ConfigOption varLenOption =
 key(DataGenConnectorOptionsUtil.FIELDS
 + "."
 + name
 + "."
 + DataGenConnectorOptionsUtil.VAR_LEN)
 .booleanType()
 .defaultValue(false);
-options.getOptional(lenOption)
+options.getOptional(varLenOption)

Review Comment:
   the prior name is not clear enough, so I use a new one instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-11 Thread via GitHub


LadyForest commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1423376727


##
flink-table/flink-table-api-java-bridge/src/test/java/org/apache/flink/table/factories/DataGenTableSourceFactoryTest.java:
##
@@ -297,10 +295,104 @@ void testVariableLengthDataGeneration() throws Exception 
{
 anyCauseMatches(
 ValidationException.class,
 String.format(
-"Only supports specifying '%s' option 
for variable-length types (varchar, string, varbinary, bytes). The type of 
field %s is not within this range.",
+"Only supports specifying '%s' option 
for variable-length types (VARCHAR/STRING/VARBINARY/BYTES). The type of field 
'%s' is not within this range.",
 
DataGenConnectorOptions.FIELD_VAR_LEN.key(), "f4")));
 }
 
+@Test
+void testVariableLengthDataType() throws Exception {
+DescriptorProperties descriptor = new DescriptorProperties();
+final int rowsNumber = 200;
+descriptor.putString(FactoryUtil.CONNECTOR.key(), "datagen");
+descriptor.putLong(DataGenConnectorOptions.NUMBER_OF_ROWS.key(), 
rowsNumber);
+
+List results = runGenerator(LENGTH_CONSTRAINED_SCHEMA, 
descriptor);
+assertThat(results).hasSize(rowsNumber);
+
+for (RowData row : results) {
+assertThat(row.getString(2).toString()).hasSize(30);
+assertThat(row.getBinary(3)).hasSize(20);
+assertThat(row.getString(4).toString())
+
.hasSize(RandomGeneratorVisitor.RANDOM_STRING_LENGTH_DEFAULT);
+}
+
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS + ".f2." + 
DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS + ".f2." + 
DataGenConnectorOptionsUtil.LENGTH,
+25);
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS + ".f4." + 
DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS + ".f4." + 
DataGenConnectorOptionsUtil.LENGTH,
+);
+
+results = runGenerator(LENGTH_CONSTRAINED_SCHEMA, descriptor);
+
+for (RowData row : results) {
+assertThat(row.getString(2).toString()).hasSize(25);
+assertThat(row.getString(4).toString()).hasSize();
+}
+
+assertThatThrownBy(
+() -> {
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS
++ ".f3."
++ DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS
++ ".f3."
++ 
DataGenConnectorOptionsUtil.LENGTH,
+21);
+
+runGenerator(LENGTH_CONSTRAINED_SCHEMA, 
descriptor);
+})
+.satisfies(
+anyCauseMatches(
+ValidationException.class,
+"Custom length '21' for variable-length type 
(VARCHAR/STRING/VARBINARY/BYTES) field 'f3' should be shorter than '20' defined 
in the schema."));
+}
+
+@Test
+void testFixedLengthDataType() throws Exception {
+DescriptorProperties descriptor = new DescriptorProperties();
+final int rowsNumber = 200;
+descriptor.putString(FactoryUtil.CONNECTOR.key(), "datagen");
+descriptor.putLong(DataGenConnectorOptions.NUMBER_OF_ROWS.key(), 
rowsNumber);
+
+List results = runGenerator(LENGTH_CONSTRAINED_SCHEMA, 
descriptor);
+assertThat(results).hasSize(rowsNumber);
+
+for (RowData row : results) {
+assertThat(row.getString(0).toString()).hasSize(50);
+assertThat(row.getBinary(1)).hasSize(40);
+}
+
+assertThatThrownBy(
+() -> {
+descriptor.putString(
+DataGenConnectorOptionsUtil.FIELDS
++ ".f0."
++ DataGenConnectorOptionsUtil.KIND,
+DataGenConnectorOptionsUtil.RANDOM);
+descriptor.putLong(
+DataGenConnectorOptionsUtil.FIELDS
+ 

Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-10 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1421917082


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/DataGenConnectorOptions.java:
##
@@ -118,10 +118,10 @@ public class DataGenConnectorOptions {
 .withDescription("The proportion of null values.");
 
 /** Placeholder {@link ConfigOption}. Not used for retrieving values. */
-public static final ConfigOption FIELD_VAR_LEN =
+public static final ConfigOption FIELD_VAR_LEN =
 ConfigOptions.key(String.format("%s.#.%s", FIELDS, VAR_LEN))
-.floatType()
-.defaultValue(0f)
+.booleanType()
+.defaultValue(false)

Review Comment:
   just fix typo, to make commits clear, I have divided the modifition into 
another commit :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-10 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1421917082


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/DataGenConnectorOptions.java:
##
@@ -118,10 +118,10 @@ public class DataGenConnectorOptions {
 .withDescription("The proportion of null values.");
 
 /** Placeholder {@link ConfigOption}. Not used for retrieving values. */
-public static final ConfigOption FIELD_VAR_LEN =
+public static final ConfigOption FIELD_VAR_LEN =
 ConfigOptions.key(String.format("%s.#.%s", FIELDS, VAR_LEN))
-.floatType()
-.defaultValue(0f)
+.booleanType()
+.defaultValue(false)

Review Comment:
   just to fix typo, to make commits clear, I have divided the modifition into 
another commit :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-10 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1421910481


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/RandomGeneratorVisitor.java:
##
@@ -489,6 +465,16 @@ protected DataGeneratorContainer defaultMethod(LogicalType 
logicalType) {
 throw new ValidationException("Unsupported type: " + logicalType);
 }
 
+private ConfigOption getLengthOption(Supplier 
defaultLengthSupplier) {

Review Comment:
   wise idea !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-10 Thread via GitHub


LadyForest commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1421891193


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/DataGenConnectorOptions.java:
##
@@ -118,10 +118,10 @@ public class DataGenConnectorOptions {
 .withDescription("The proportion of null values.");
 
 /** Placeholder {@link ConfigOption}. Not used for retrieving values. */
-public static final ConfigOption FIELD_VAR_LEN =
+public static final ConfigOption FIELD_VAR_LEN =
 ConfigOptions.key(String.format("%s.#.%s", FIELDS, VAR_LEN))
-.floatType()
-.defaultValue(0f)
+.booleanType()
+.defaultValue(false)

Review Comment:
   I'm a little bit confused. Why change this config type?



##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/RandomGeneratorVisitor.java:
##
@@ -489,6 +465,16 @@ protected DataGeneratorContainer defaultMethod(LogicalType 
logicalType) {
 throw new ValidationException("Unsupported type: " + logicalType);
 }
 
+private ConfigOption getLengthOption(Supplier 
defaultLengthSupplier) {

Review Comment:
   `ArrayType`,  `MultisetType,` and `MapType` can also use this method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-04 Thread via GitHub


liyubin117 commented on PR #23678:
URL: https://github.com/apache/flink/pull/23678#issuecomment-1839898221

   @LadyForest I have updated commits and rebased on the latest branch, now CI 
passed, PLAL, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-01 Thread via GitHub


liyubin117 commented on PR #23678:
URL: https://github.com/apache/flink/pull/23678#issuecomment-1835872939

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-01 Thread via GitHub


liyubin117 commented on PR #23678:
URL: https://github.com/apache/flink/pull/23678#issuecomment-1835806974

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-01 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1411842995


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/DataGenTableSourceFactory.java:
##
@@ -127,11 +132,53 @@ private DataGeneratorContainer createContainer(
 String name, DataType type, String kind, ReadableConfig options) {
 switch (kind) {
 case DataGenConnectorOptionsUtil.RANDOM:
+validateFieldOptions(name, type, options);
 return type.getLogicalType().accept(new 
RandomGeneratorVisitor(name, options));
 case DataGenConnectorOptionsUtil.SEQUENCE:
 return type.getLogicalType().accept(new 
SequenceGeneratorVisitor(name, options));
 default:
 throw new ValidationException("Unsupported generator kind: " + 
kind);
 }
 }
+
+private void validateFieldOptions(String name, DataType type, 
ReadableConfig options) {
+ConfigOption lenOption =
+key(DataGenConnectorOptionsUtil.FIELDS
++ "."
++ name
++ "."
++ DataGenConnectorOptionsUtil.LENGTH)
+.intType()
+.noDefaultValue();
+options.getOptional(lenOption)
+.ifPresent(
+option -> {
+LogicalType logicalType = type.getLogicalType();
+if (logicalType instanceof CharType
+|| logicalType instanceof BinaryType) {
+throw new ValidationException(
+String.format(
+"User-defined length of the 
fixed-length field %s is not supported.",
+name));
+}
+if (logicalType instanceof VarCharType) {
+int length = ((VarCharType) 
logicalType).getLength();
+if (option > length) {
+throw new ValidationException(
+String.format(
+"User-defined length of 
the VARCHAR field %s should be shorter than the schema definition.",
+name));
+}
+}
+if (logicalType instanceof VarBinaryType) {
+int length = ((VarBinaryType) 
logicalType).getLength();
+if (option > length) {
+throw new ValidationException(
+String.format(
+"User-defined length of 
the VARBINARY field %s should be shorter than the schema definition.",
+name));
+}
+}
+});
+}

Review Comment:
   done



##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/DataGenTableSourceFactory.java:
##
@@ -127,11 +132,53 @@ private DataGeneratorContainer createContainer(
 String name, DataType type, String kind, ReadableConfig options) {
 switch (kind) {
 case DataGenConnectorOptionsUtil.RANDOM:
+validateFieldOptions(name, type, options);
 return type.getLogicalType().accept(new 
RandomGeneratorVisitor(name, options));
 case DataGenConnectorOptionsUtil.SEQUENCE:
 return type.getLogicalType().accept(new 
SequenceGeneratorVisitor(name, options));
 default:
 throw new ValidationException("Unsupported generator kind: " + 
kind);
 }
 }
+
+private void validateFieldOptions(String name, DataType type, 
ReadableConfig options) {
+ConfigOption lenOption =
+key(DataGenConnectorOptionsUtil.FIELDS
++ "."
++ name
++ "."
++ DataGenConnectorOptionsUtil.LENGTH)
+.intType()
+.noDefaultValue();
+options.getOptional(lenOption)
+.ifPresent(
+option -> {
+LogicalType logicalType = type.getLogicalType();
+if (logicalType instanceof CharType
+|| logicalType instanceof BinaryType) {
+throw new 

Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-01 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1411839904


##
flink-table/flink-table-api-java-bridge/src/test/java/org/apache/flink/table/factories/DataGenTableSourceFactoryTest.java:
##
@@ -215,6 +216,100 @@ void testSource() throws Exception {
 }
 }
 
+@Test
+void testVariableLengthDataType() throws Exception {
+DescriptorProperties descriptor = new DescriptorProperties();

Review Comment:
   Thanks for your reminds, I will create another issue to follow up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-12-01 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1411815025


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/RandomGeneratorVisitor.java:
##
@@ -152,25 +142,20 @@ public DataGeneratorContainer visit(VarCharType 
varCharType) {
 + "."
 + DataGenConnectorOptionsUtil.LENGTH)
 .intType()
-.defaultValue(RANDOM_STRING_LENGTH_DEFAULT);
+.defaultValue(varCharType.getLength());

Review Comment:
   good idea!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-11-30 Thread via GitHub


LadyForest commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1410315513


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/DataGenTableSourceFactory.java:
##
@@ -127,11 +132,53 @@ private DataGeneratorContainer createContainer(
 String name, DataType type, String kind, ReadableConfig options) {
 switch (kind) {
 case DataGenConnectorOptionsUtil.RANDOM:
+validateFieldOptions(name, type, options);
 return type.getLogicalType().accept(new 
RandomGeneratorVisitor(name, options));
 case DataGenConnectorOptionsUtil.SEQUENCE:
 return type.getLogicalType().accept(new 
SequenceGeneratorVisitor(name, options));
 default:
 throw new ValidationException("Unsupported generator kind: " + 
kind);
 }
 }
+
+private void validateFieldOptions(String name, DataType type, 
ReadableConfig options) {
+ConfigOption lenOption =
+key(DataGenConnectorOptionsUtil.FIELDS
++ "."
++ name
++ "."
++ DataGenConnectorOptionsUtil.LENGTH)
+.intType()
+.noDefaultValue();
+options.getOptional(lenOption)
+.ifPresent(
+option -> {
+LogicalType logicalType = type.getLogicalType();
+if (logicalType instanceof CharType
+|| logicalType instanceof BinaryType) {
+throw new ValidationException(
+String.format(
+"User-defined length of the 
fixed-length field %s is not supported.",

Review Comment:
   Nit: Custom length for fixed-length type field '%s' is not supported.



##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/DataGenTableSourceFactory.java:
##
@@ -127,11 +132,53 @@ private DataGeneratorContainer createContainer(
 String name, DataType type, String kind, ReadableConfig options) {
 switch (kind) {
 case DataGenConnectorOptionsUtil.RANDOM:
+validateFieldOptions(name, type, options);
 return type.getLogicalType().accept(new 
RandomGeneratorVisitor(name, options));
 case DataGenConnectorOptionsUtil.SEQUENCE:
 return type.getLogicalType().accept(new 
SequenceGeneratorVisitor(name, options));
 default:
 throw new ValidationException("Unsupported generator kind: " + 
kind);
 }
 }
+
+private void validateFieldOptions(String name, DataType type, 
ReadableConfig options) {
+ConfigOption lenOption =
+key(DataGenConnectorOptionsUtil.FIELDS
++ "."
++ name
++ "."
++ DataGenConnectorOptionsUtil.LENGTH)
+.intType()
+.noDefaultValue();
+options.getOptional(lenOption)
+.ifPresent(
+option -> {
+LogicalType logicalType = type.getLogicalType();
+if (logicalType instanceof CharType
+|| logicalType instanceof BinaryType) {
+throw new ValidationException(
+String.format(
+"User-defined length of the 
fixed-length field %s is not supported.",
+name));
+}
+if (logicalType instanceof VarCharType) {
+int length = ((VarCharType) 
logicalType).getLength();
+if (option > length) {
+throw new ValidationException(
+String.format(
+"User-defined length of 
the VARCHAR field %s should be shorter than the schema definition.",
+name));
+}
+}
+if (logicalType instanceof VarBinaryType) {
+int length = ((VarBinaryType) 
logicalType).getLength();
+if (option > length) {
+throw new ValidationException(
+String.format(
+

Re: [PR] [FLINK-32993][table] Datagen connector handles length-constrained fields according to the schema definition by default [flink]

2023-11-24 Thread via GitHub


liyubin117 commented on code in PR #23678:
URL: https://github.com/apache/flink/pull/23678#discussion_r1404708247


##
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/connector/datagen/table/RandomGeneratorVisitor.java:
##
@@ -135,7 +135,7 @@ public DataGeneratorContainer visit(CharType charType) {
 + "."
 + DataGenConnectorOptionsUtil.LENGTH)
 .intType()
-.defaultValue(RANDOM_STRING_LENGTH_DEFAULT);
+.defaultValue(charType.getLength());

Review Comment:
   done :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org