Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/22009#discussion_r208634657
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.streaming;
+
+import org.apache.spark.sql.sources.v2.reader.ReadSupport;
+
+/**
+ * A base interface for streaming read support. This is package private
and is invisible to data
+ * sources. Data sources should implement concrete streaming read support
interfaces:
+ * {@link MicroBatchReadSupport} or {@link ContinuousReadSupport}.
+ */
+interface StreamingReadSupport extends ReadSupport {
+
+ /**
+ * Returns the initial offset for a streaming query to start reading
from. Note that the
+ * streaming data source should not assume that it will start reading
from its
+ * {@link #initialOffset()} value: if Spark is restarting an existing
query, it will restart from
+ * the check-pointed offset rather than the initial one.
+ */
+ Offset initialOffset();
+
+ /**
+ * Deserialize a JSON string into an Offset of the
implementation-defined offset type.
+ *
+ * @throws IllegalArgumentException if the JSON does not encode a valid
offset for this reader
+ */
+ Offset deserializeOffset(String json);
--- End diff --
What I'm trying to say is this: either delegate serialization and
deserialization entirely to the source -- in which case it passes you byte[] --
or standardize both the serialized representation and the class you will pass
to the source.
I don't think it makes any sense for Spark to serialize to JSON and require
that the source can deserialize it. Alternatively, if the source produces and
consumes that JSON why are we forcing the source to use JSON?
Having a human-readable representation isn't a good justification for this
because we have a ton of other objects that get serialized and aren't required
to have a human-readable serialized form. Why is this different? If we care
about the serialized representation so much, why allow customization of the
Offset classes?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]