Re: [PR] [SPARK-50599][SQL] Create the DataEncoder trait that allows for Avro and UnsafeRow encoding [spark]

via GitHub Tue, 17 Dec 2024 13:02:14 -0800


brkyvz commented on code in PR #48944:
URL: https://github.com/apache/spark/pull/48944#discussion_r1889211051



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala:
##########
@@ -51,93 +51,138 @@ sealed trait RocksDBValueStateEncoder {
   def decodeValues(valueBytes: Array[Byte]): Iterator[UnsafeRow]
 }
 
-abstract class RocksDBKeyStateEncoderBase(
-    useColumnFamilies: Boolean,
-    virtualColFamilyId: Option[Short] = None) extends RocksDBKeyStateEncoder {
-  def offsetForColFamilyPrefix: Int =
-    if (useColumnFamilies) VIRTUAL_COL_FAMILY_PREFIX_BYTES else 0
+/**
+ * The DataEncoder can encode UnsafeRows into raw bytes in two ways:
+ *    - Using the direct byte layout of the UnsafeRow
+ *    - Converting the UnsafeRow into an Avro row, and encoding that
+ * In both of these cases, the raw bytes that are written into RockDB have
+ * headers, footers and other metadata, but they also have data that is 
provided
+ * by the callers. The metadata in each row does not need to be written as 
Avro or UnsafeRow,
+ * but the actual data provided by the caller does.
+ * The classes that use this trait require specialized partial encoding which 
makes them much
+ * easier to cache and use, which is why each DataEncoder deals with multiple 
schemas.
+ */
+trait DataEncoder {

Review Comment:
   Awesome scaladoc throughout the class. Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50599][SQL] Create the DataEncoder trait that allows for Avro and UnsafeRow encoding [spark]

Reply via email to