Github user stczwd commented on the issue:
https://github.com/apache/spark/pull/22575
@WangTaoTheTonic
Adding 'stream' keyword has two purposes:
- **Mark the entire sql query as a stream query and generate the
SQLStreaming plan tree.**
- **Mark the table type as UnResolvedStreamRelation.** Parse the table as
StreamingRelation or other Relation, especially in the stream join batch
queries, such as kafka join mysql.
**Besides, the keyword 'stream' makes it easier to express StructStreaming
with pure SQL.**
A little example to show importances of 'stream': read stream from kafka
stream table, and join mysql to count user message
- with 'stream'
- `select stream kafka_sql_test.name, count(door) from kafka_sql_test
inner join mysql_test on kafka_sql_test.name == mysql_test.name group by
kafka_sql_test.name`
- **It will be regarded as Streaming Query using Console Sink**, the
kafka_sql_test will be parsed as StreamingRelation and mysql_test will be
parsed as JDBCRelation, not Streaming Relation.
- `insert into csv_sql_table select stream kafka_sql_test.name,
count(door) from kafka_sql_test inner join mysql_test on kafka_sql_test.name
== mysql_test.name group by kafka_sql_test.name`
- **It will be regarded as Streaming Query using FileStream Sink**,
the kafka_sql_test will be parsed as StreamingRelation and mysql_test will be
parsed as JDBCRelation, not Streaming Relation.
- without 'stream'
- `select kafka_sql.name, count(door) from kafka_sql_test inner join
mysql_test on kafka_sql_test.name == mysql_test.name group by
kafka_sql_test.name`
- **It will be regarded as Batch Query**, the kafka_sql_test will be
parsed to KafkaRelation and mysql_test will be parsed as JDBCRelation.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]