Re: pyspark execution

2018-04-17 Thread hemant singh
If it contains only SQL then you can use a function as below -

import subprocess

def run_sql(sql_file_path, your_db_name ,location):

subprocess.call(["spark-sql","-S","--hivevar","",,"--hivevar","LOCATION",location,"-f",sql_file_path])

In you have other pieces like spark code and not only sql in that file-

Write a parse function which parse you sql and replace the placeholders
like DB Name etc in your sql and then execute the new formed sql.

Maintaining your sql in a separate file though de-couples the code and sql
and make it easier from maintenance perspective.

On Tue, Apr 17, 2018 at 8:11 AM, anudeep  wrote:

> Hi All,
>
> I have a python file which I am executing directly with spark-submit
> command.
>
> Inside the python file, I have sql written using hive context.I created a
> generic variable for the  database name inside sql
>
> The problem is : How can I pass the value for this variable dynamically
> just as we give in hive like --hivevar parameter.
>
> Thanks!
> Anudeep
>
>
>
>
>
>
>


pyspark execution

2018-04-16 Thread anudeep
Hi All,

I have a python file which I am executing directly with spark-submit
command.

Inside the python file, I have sql written using hive context.I created a
generic variable for the  database name inside sql

The problem is : How can I pass the value for this variable dynamically
just as we give in hive like --hivevar parameter.

Thanks!
Anudeep