Re: [PR] [SPARK-43117][CONNECT] Make `ProtoUtils.abbreviate` support repeated fields [spark]

via GitHub Wed, 07 Feb 2024 18:25:02 -0800


LuciferYang commented on code in PR #45056:
URL: https://github.com/apache/spark/pull/45056#discussion_r1482339139



##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala:
##########
@@ -42,7 +42,17 @@ private[connect] object ProtoUtils {
         val size = string.length
         val threshold = thresholds.getOrElse(STRING, MAX_STRING_SIZE)
         if (size > threshold) {
-          builder.setField(field, createString(string.take(threshold), size))
+          builder.setField(field, truncateString(string, threshold))
+        }
+
+      case (field: FieldDescriptor, strings: java.lang.Iterable[_])
+          if field.getJavaType == FieldDescriptor.JavaType.STRING && 
field.isRepeated
+            && strings != null =>
+        val threshold = thresholds.getOrElse(STRING, MAX_STRING_SIZE)
+        strings.iterator().asScala.zipWithIndex.foreach {
+          case (string: String, i) if string != null && string.length > 
threshold =>

Review Comment:
   In fact, for repeated fields, null checks exist whether adding a single 
element or adding a collection of elements, for example:
   
   ```java
       public Builder addColumnNames(
           java.lang.String value) {
         if (value == null) { throw new NullPointerException(); }
         ensureColumnNamesIsMutable();
         columnNames_.add(value);
         bitField0_ |= 0x00000004;
         onChanged();
         return this;
       }
   // 
com.google.protobuf.AbstractMessageLite.Builder#addAll(java.lang.Iterable<T>, 
java.util.List<? super T>)
   protected static <T> void addAll(final Iterable<T> values, final List<? 
super T> list) {
         checkNotNull(values);
         if (values instanceof LazyStringList) {
           // For StringOrByteStringLists, check the underlying elements to 
avoid
           // forcing conversions of ByteStrings to Strings.
           // TODO: Could we just prohibit nulls in all protobuf lists and get 
rid of this? Is
           // if even possible to hit this condition as all protobuf methods 
check for null first,
           // right?
           List<?> lazyValues = ((LazyStringList) 
values).getUnderlyingElements();
           LazyStringList lazyList = (LazyStringList) list;
           ....
   ```
   
   also, I have performed the following checks:
   
   - adding a collection of elements with null
    
   ```
   val names = Seq.range(0, 10).map(i => if (i == 3) null else i.toString * 
1024)
   val drop = 
proto.Drop.newBuilder().setInput(sql).addAllColumnNames(names.asJava).build()
   ```
   
   ```
   [info] - truncate repeated strings with nulls *** FAILED *** (3 milliseconds)
   [info]   java.lang.NullPointerException: Element at index 3 is null.
   [info]   at 
com.google.protobuf.AbstractMessageLite$Builder.addAllCheckingNulls(AbstractMessageLite.java:359)
   [info]   at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:414)
   [info]   at 
org.apache.spark.connect.proto.Drop$Builder.addAllColumnNames(Drop.java:1240)
   ```
   
   - add a null element 
   ```
   val drop = proto.Drop.newBuilder().setInput(sql).addColumnNames(null).build()
   ```
   
   ```
   [info] - truncate repeated strings with nulls *** FAILED *** (4 milliseconds)
   [info]   java.lang.NullPointerException:
   [info]   at 
org.apache.spark.connect.proto.Drop$Builder.addColumnNames(Drop.java:1221)
   ```
   
   As you can see, under normal circumstances, it is impossible to add a null 
element to the repeated type. So personally, I don't think this null check is 
necessary, but I don't object to adding it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-43117][CONNECT] Make `ProtoUtils.abbreviate` support repeated fields [spark]

Reply via email to