WweiL commented on code in PR #42521:
URL: https://github.com/apache/spark/pull/42521#discussion_r1300307559


##########
python/pyspark/sql/tests/connect/streaming/test_parity_listener.py:
##########
@@ -19,38 +19,153 @@
 import time
 
 from pyspark.sql.tests.streaming.test_streaming_listener import 
StreamingListenerTestsMixin
-from pyspark.sql.streaming.listener import StreamingQueryListener, 
QueryStartedEvent
-from pyspark.sql.types import StructType, StructField, StringType
+from pyspark.sql.streaming.listener import (
+    StreamingQueryListener,
+    QueryStartedEvent,
+    QueryProgressEvent,
+    QueryIdleEvent,
+    QueryTerminatedEvent,
+)
+from pyspark.sql.types import (
+    ArrayType,
+    StructType,
+    StructField,
+    StringType,
+    IntegerType,
+    FloatType,
+    MapType,
+)
+from pyspark.sql.functions import count, lit
 from pyspark.testing.connectutils import ReusedConnectTestCase
 
 
 def get_start_event_schema():
     return StructType(
         [
-            StructField("id", StringType(), True),
-            StructField("runId", StringType(), True),
+            StructField("id", StringType(), False),
+            StructField("runId", StringType(), False),
             StructField("name", StringType(), True),
-            StructField("timestamp", StringType(), True),
+            StructField("timestamp", StringType(), False),
         ]
     )

Review Comment:
   @HyukjinKwon 
   I'm looking at the [test error 
here](https://github.com/WweiL/oss-spark/actions/runs/5884139868/job/15959887335)
 -- I couldn't reproduce it locally. 
   
   But I think the change is orthogonal to the test error. It's more about an 
addition to the listener events API. We can just define the `asDict`, and 
`get_event_schema` method in the test suite. And the test still runs. For 
example, in current master, the `onQueryStartedEvent` is implemented like this:
   
https://github.com/apache/spark/blob/master/python/pyspark/sql/tests/connect/streaming/test_parity_listener.py#L27-L44
   
   But that would mean users need to add exactly the same redundant code I 
added in the suite if they want to write the event to external table. That 
looks not as painful, but `onQueryProgress` would be extremely painful I think. 
   
   Because it's very likely every user who want to write events to external 
tables need to redo the same code all times, I'm thinking providing the API so 
they don't need to reinvent the wheel



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to