Re: Re: How to get all input tables of a SPARK SQL 'select' statement

Ramandeep Singh Nanda Fri, 25 Jan 2019 13:42:29 -0800

Hi,

You don't have to run the SQL statement. You can parse it, that will be the
logical parsing.


val logicalPlan = ss.sessionState.sqlParser.parsePlan(sqlText = query)
println(logicalPlan.prettyJson)

[ {
  "class" : "org.apache.spark.sql.catalyst.plans.logical.Project",
  "num-children" : 1,
  "projectList" : [ [ {
    "class" : "org.apache.spark.sql.catalyst.analysis.UnresolvedStar",
    "num-children" : 0
  } ] ],
  "child" : 0
}, {
  "class" : "org.apache.spark.sql.catalyst.analysis.UnresolvedRelation",
  "num-children" : 0,
  "tableIdentifier" : {
    "product-class" : "org.apache.spark.sql.catalyst.TableIdentifier",
    "table" : "abc"
  }
} ]



On Fri, Jan 25, 2019 at 6:07 AM <l...@china-inv.cn> wrote:

> Hi, All,
>
> I tried the suggested approach and it works, but it requires to 'run' the
> SQL statement first.
>
> I just want to parse the SQL statement without running it, so I can do
> this in my laptop without connecting to our production environment.
>
> I tried to write a tool which uses the SqlBase.g4 bundled with SPARK SQL
> to extract names of the input tables and it works as expected.
>
> But I have a question:
>
> The parser generated by SqlBase.g4 only accepts 'select' statement with
> all keywords such as 'SELECT', 'FROM' and table names capitalized
> e.g. it accepts 'SELECT * FROM FOO', but it doesn't accept 'select * from
> foo'.
>
> But I can run the spark.sql("select * from foo") in the spark2-shell
> without any problem.
>
> Is there another 'layer' in the SPARK SQL to capitalize those 'tokens'
> before invoking the parser?
>
> If so, why not just modify the SqlBase.g4 to accept lower cases keywords?
>
> Thanks
>
> Boying
>
>
>
> 发件人: "Shahab Yunus" <shahab.yu...@gmail.com>
> 收件人: "Ramandeep Singh Nanda" <ramannan...@gmail.com>
> 抄送: "Tomas Bartalos" <tomas.barta...@gmail.com>, l...@china-inv.cn, "user
> @spark/'user @spark'/spark users/user@spark" <user@spark.apache.org>
> 日期: 2019/01/24 06:45
> 主题: Re: How to get all input tables of a SPARK SQL 'select' statement
> ------------------------------
>
>
>
> Could be a tangential idea but might help: Why not use queryExecution and
> logicalPlan objects that are available when you execute a query using
> SparkSession and get a DataFrame back? The Json representation contains
> almost all the info that you need and you don't need to go to Hive to get
> this info.
>
> Some details here:
> *https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Dataset.html#queryExecution*
> <https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Dataset.html#queryExecution>
>
> On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda <
> *ramannan...@gmail.com* <ramannan...@gmail.com>> wrote:
> Explain extended or explain would list the plan along with the tables. Not
> aware of any statements that explicitly list dependencies or tables
> directly.
>
> Regards,
> Ramandeep Singh
>
> On Wed, Jan 23, 2019, 11:05 Tomas Bartalos <*tomas.barta...@gmail.com*
> <tomas.barta...@gmail.com> wrote:
> This might help:
> show tables;
>
> st 23. 1. 2019 o 10:43 <*l...@china-inv.cn* <l...@china-inv.cn>>
> napísal(a):
> Hi, All,
>
> We need to get all input tables of several SPARK SQL 'select' statements.
>
> We can get those information of Hive SQL statements by using 'explain
> dependency select....'.
> But I can't find the equivalent command for SPARK SQL.
>
> Does anyone know how to get this information of a SPARK SQL 'select'
> statement?
>
> Thanks
>
> Boying
>
>
>
> ------------------------------
>
>
>
> 本邮件内容包含保密信息。如阁下并非拟发送的收件人，请您不要阅读、保存、对外披露或复制本邮件的任何内容，或者打开本邮件的任何附件。请即回复邮件告知发件人，并立刻将该邮件及其附件从您的电脑系统中全部删除，不胜感激。
>
>
> This email message may contain confidential and/or privileged information.
> If you are not the intended recipient, please do not read, save, forward,
> disclose or copy the contents of this email or open any file attached to
> this email. We will be grateful if you could advise the sender immediately
> by replying this email, and delete this email and any attachment or links
> to this email completely and immediately from your computer system.
>
>
> ------------------------------
>
>
>
>
> ------------------------------
>
>
>
> 本邮件内容包含保密信息。如阁下并非拟发送的收件人，请您不要阅读、保存、对外披露或复制本邮件的任何内容，或者打开本邮件的任何附件。请即回复邮件告知发件人，并立刻将该邮件及其附件从您的电脑系统中全部删除，不胜感激。
>
>
>
> This email message may contain confidential and/or privileged information.
> If you are not the intended recipient, please do not read, save, forward,
> disclose or copy the contents of this email or open any file attached to
> this email. We will be grateful if you could advise the sender immediately
> by replying this email, and delete this email and any attachment or links
> to this email completely and immediately from your computer system.
>
>
> ------------------------------
>
>

-- 
Regards,
Ramandeep Singh
http://orastack.com
+13474792296
ramannan...@gmail.com

Re: Re: How to get all input tables of a SPARK SQL 'select' statement

Reply via email to