Right, nothing wrong with a for loop here. Seems like just the right thing.

On Fri, Jan 6, 2023, 3:20 PM Joris Billen <joris.bil...@bigindustries.be>
wrote:

> Hello Community,
> I am working in pyspark with sparksql and have a very similar very complex
> list of dataframes that Ill have to execute several times for all the
> “models” I have.
> Suppose the code is exactly the same for all models, only the table it
> reads from and some values in the where statements will have the modelname
> in it.
> My question is how to prevent repetitive code.
> So instead of doing somethg like this (this is pseudocode, in reality it
> makes use of lots of complex dataframes) which also would require me to
> change the code every time I change it in the future:
>
> *dfmodel1=sqlContext.sql("SELECT <quite complex query> FROM model1_table
> WHERE model =‘model1’ “).write()*
> *dfmodel2=sqlContext.sql("SELECT <quite complex query> FROM model2_table
> WHERE model =‘model2’ “).write()*
> *dfmodel3=sqlContext.sql("SELECT <quite complex query> FROM model3_table
> WHERE model =‘model3’ “).write()*
>
>
> For loops in spark sound like a bad idea (but that is mainly in terms of
> data, maybe nothing against looping over sql statements). Is it allowed
> to do something like this?
>
>
> *spark-submit withloops.py [“model1”,"model2”,"model3"]*
>
> *code withloops.py*
> *models=sys.arg[1]*
> *qry="""SELECT <quite complex query> FROM {} WHERE model ='{}'"""*
> *for i in models:*
> *  FROM_TABLE=table_model*
> *  sqlContext.sql(qry.format(i,table_model )).write()*
>
>
>
> I was trying to look up about refactoring in pyspark to prevent redundant
> code but didnt find any relevant links.
>
>
>
> Thanks for input!
>

Reply via email to