Right, nothing wrong with a for loop here. Seems like just the right thing.
On Fri, Jan 6, 2023, 3:20 PM Joris Billen <joris.bil...@bigindustries.be> wrote: > Hello Community, > I am working in pyspark with sparksql and have a very similar very complex > list of dataframes that Ill have to execute several times for all the > “models” I have. > Suppose the code is exactly the same for all models, only the table it > reads from and some values in the where statements will have the modelname > in it. > My question is how to prevent repetitive code. > So instead of doing somethg like this (this is pseudocode, in reality it > makes use of lots of complex dataframes) which also would require me to > change the code every time I change it in the future: > > *dfmodel1=sqlContext.sql("SELECT <quite complex query> FROM model1_table > WHERE model =‘model1’ “).write()* > *dfmodel2=sqlContext.sql("SELECT <quite complex query> FROM model2_table > WHERE model =‘model2’ “).write()* > *dfmodel3=sqlContext.sql("SELECT <quite complex query> FROM model3_table > WHERE model =‘model3’ “).write()* > > > For loops in spark sound like a bad idea (but that is mainly in terms of > data, maybe nothing against looping over sql statements). Is it allowed > to do something like this? > > > *spark-submit withloops.py [“model1”,"model2”,"model3"]* > > *code withloops.py* > *models=sys.arg[1]* > *qry="""SELECT <quite complex query> FROM {} WHERE model ='{}'"""* > *for i in models:* > * FROM_TABLE=table_model* > * sqlContext.sql(qry.format(i,table_model )).write()* > > > > I was trying to look up about refactoring in pyspark to prevent redundant > code but didnt find any relevant links. > > > > Thanks for input! >