Re: Debugging SQL queries

Caizhi Weng Mon, 14 Mar 2022 19:50:13 -0700

Hi!

My understanding is that Flink SQL will generate Java code equivalent and
> submit the compiled jar to the cluster as a job, is that correct?
>
Almost correct. Flink SQL will generate Java code and compile them to Java
classes. These compiled classes are stored in transactions (a type of data
stream API) which are organized into job graphs and send to the cluster.
You can say that Flink SQL API changes SQL to transactions and the rest are
handled by the data stream API.


Can I get the source code to see what kind of low level API the SQL query
> generated?
>
They exists in the flink-table-planner module,
specifically org.apache.flink.table.planner.codegen package. For example
you can see how most operators and functions are generated
in ExprCodeGenerator class.

How do I set the log level for
> "org.apache.flink.table.runtime.generated.CompileUtils" only?
>
This is actually about the usage of log4j. Add these two lines to
conf/log4j.properties in your Flink directory.
logger.codelog.name = org.apache.flink.table.runtime.generated.CompileUtils
logger.codelog.level = DEBUG
You shall then see generated Java code in the logs.

Is this related to the behavior of checkpointing?
>
I'm not familiar with Zeppelin so I'm not sure how jobs and checkpoints are
handled in Zeppelin. Does the same thing happen in Flink SQL client? If you
set startup mode to earliest offset and you start a new job each time (not
recovering from a savepoint) then Kafka source shall read from the very
beginning.


dz902 <dz9...@gmail.com> 于2022年3月14日周一 14:17写道：

> Thanks! I have a few follow up questions.
>
> I have searched but was unable to find where to set the log level just for
> "org.apache.flink.table.runtime.generated.CompileUtils", so I set
> "rootLogger.level = DEBUG" in "log4j-cli.properties" and got the logs, but
> was unable to find generated code in
> "flink-root-sql-client.xxx.internal.log". So my questions are:
>
> - My understanding is that Flink SQL will generate Java code equivalent
> and submit the compiled jar to the cluster as a job, is that correct? Can I
> get the source code to see what kind of low level API the SQL query
> generated?
> - How do I set the log level for
> "org.apache.flink.table.runtime.generated.CompileUtils" only?
>
> Also, when I start a local Flink cluster, I am able to query the table and
> immediately see the results. I did the same on a Zeppelin notebook with a
> remote cluster which also worked.
>
> However after 8-10 hours, I tried the same simple SELECT again in
> Zeppelin, no data were shown. I INSERT'd a new row to the source table and
> the result are showing again. Is this related to the behavior of
> checkpointing? Because I already had "'scan.startup.mode' =
> 'earliest-offset'" so I expected it to work even with no new data coming in
> for a long time.
>
> Maybe a bit much to ask...but thank you again for the help!
>
>
>
> On Mon, Mar 14, 2022 at 11:14 AM Caizhi Weng <tsreape...@gmail.com> wrote:
>
>> Sorry for misleading. I mean if you enable checkpointing then selected
>> results are only visible after the checkpoint completes. If there is no
>> checkpointing the results will be instantly visible, just as the document
>> records.
>>
>> Caizhi Weng <tsreape...@gmail.com> 于2022年3月14日周一 11:12写道：
>>
>>> Hi!
>>>
>>> I see. So you're running a streaming job. "select" in a streaming job
>>> will only produce visible data when you enable checkpointing (this is due
>>> to the exactly-once guarantee of Flink), see [1] for more detail. See [2]
>>> on how to enable checkpointing for Flink SQL.
>>>
>>> Generated code are also in the logs if you set the appropriate logging
>>> level.
>>>
>>> [1]
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/overview/#execute-a-query
>>> [2] https://stackoverflow.com/a/65681975
>>>
>>>
>>> dz902 <dz9...@gmail.com> 于2022年3月14日周一 11:08写道：
>>>
>>>> Hi Caizhi,
>>>>
>>>> Thanks for the quick reply. I was just running a simple "SELECT * FROM
>>>> tbl_name" against a table with Kafka connector, but no data showed up and
>>>> no errors.
>>>>
>>>> Where can I find the generated code if I'm using SQL client?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On Mon, Mar 14, 2022 at 10:58 AM Caizhi Weng <tsreape...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> For stages and logs you can refer to the web UI. For generated code
>>>>> set logging level of org.apache.flink.table.runtime.generated.CompileUtils
>>>>> to debug.
>>>>>
>>>>> What query are you running? If possible can you share your SQL in the
>>>>> mailing list?
>>>>>
>>>>> dz902 <dz9...@gmail.com> 于2022年3月14日周一 10:42写道：
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to debug SQL queries, from SQL client or Zeppelin notebook
>>>>>> (job submitted to remote cluster).
>>>>>>
>>>>>> I have a query not getting any data. How do I debug? Can I see the
>>>>>> actual code generated from the SQL query? Or is it possible to show all 
>>>>>> the
>>>>>> stages, actions or logs generated by the query?
>>>>>>
>>>>>> Thanks,
>>>>>> Dai
>>>>>>
>>>>>

Re: Debugging SQL queries

Reply via email to