LuciferYang commented on code in PR #45056:
URL: https://github.com/apache/spark/pull/45056#discussion_r1482339139
##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala:
##########
@@ -42,7 +42,17 @@ private[connect] object ProtoUtils {
val size = string.length
val threshold = thresholds.getOrElse(STRING, MAX_STRING_SIZE)
if (size > threshold) {
- builder.setField(field, createString(string.take(threshold), size))
+ builder.setField(field, truncateString(string, threshold))
+ }
+
+ case (field: FieldDescriptor, strings: java.lang.Iterable[_])
+ if field.getJavaType == FieldDescriptor.JavaType.STRING &&
field.isRepeated
+ && strings != null =>
+ val threshold = thresholds.getOrElse(STRING, MAX_STRING_SIZE)
+ strings.iterator().asScala.zipWithIndex.foreach {
+ case (string: String, i) if string != null && string.length >
threshold =>
Review Comment:
In fact, for repeated fields, null checks exist whether adding a single
element or adding a collection of elements, for example:
```java
public Builder addColumnNames(
java.lang.String value) {
if (value == null) { throw new NullPointerException(); }
ensureColumnNamesIsMutable();
columnNames_.add(value);
bitField0_ |= 0x00000004;
onChanged();
return this;
}
//
com.google.protobuf.AbstractMessageLite.Builder#addAll(java.lang.Iterable<T>,
java.util.List<? super T>)
protected static <T> void addAll(final Iterable<T> values, final List<?
super T> list) {
checkNotNull(values);
if (values instanceof LazyStringList) {
// For StringOrByteStringLists, check the underlying elements to
avoid
// forcing conversions of ByteStrings to Strings.
// TODO: Could we just prohibit nulls in all protobuf lists and get
rid of this? Is
// if even possible to hit this condition as all protobuf methods
check for null first,
// right?
List<?> lazyValues = ((LazyStringList)
values).getUnderlyingElements();
LazyStringList lazyList = (LazyStringList) list;
....
```
also, I have performed the following checks:
- adding a collection of elements with null
```
val names = Seq.range(0, 10).map(i => if (i == 3) null else i.toString *
1024)
val drop =
proto.Drop.newBuilder().setInput(sql).addAllColumnNames(names.asJava).build()
```
```
[info] - truncate repeated strings with nulls *** FAILED *** (3 milliseconds)
[info] java.lang.NullPointerException: Element at index 3 is null.
[info] at
com.google.protobuf.AbstractMessageLite$Builder.addAllCheckingNulls(AbstractMessageLite.java:359)
[info] at
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:414)
[info] at
org.apache.spark.connect.proto.Drop$Builder.addAllColumnNames(Drop.java:1240)
```
- add a null element
```
val drop = proto.Drop.newBuilder().setInput(sql).addColumnNames(null).build()
```
```
[info] - truncate repeated strings with nulls *** FAILED *** (4 milliseconds)
[info] java.lang.NullPointerException:
[info] at
org.apache.spark.connect.proto.Drop$Builder.addColumnNames(Drop.java:1221)
```
As you can see, under normal circumstances, it is impossible to add a null
element to the repeated type. So personally, I don't think this null check is
necessary, but I don't object to adding it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]