subject:"Using Lambda function to generate random data in PySpark throws not defined error"

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-13 Thread Sofia’s World

Hey Mich glad to know u got to the bottom In python, if you want to run a module - same as if you would use Java/Scala -you will have to define a def main() method You'll notice that the snippet i sent you had this syntax - if __name__ == "main": main() I am guessing you just choose an unfo

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-13 Thread Mich Talebzadeh

Thanks all. Found out the problem :( I defined the runner.py as class main() I replaced it with def main(): and it worked without declaring numRows as global. I am still wondering the reason for it working with def main()? regards, Mich *Disclaimer:* Use it at your own risk. Any and all

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-13 Thread Sean Owen

I don't believe you'll be able to use globals in a Spark task, as they won't exist on the remote executor machines. On Sun, Dec 13, 2020 at 3:46 AM Mich Talebzadeh wrote: > thanks Marco. > > When I stripped down spark etc and ran your map, it came back OK (no > errors) WITHOUT global numRows > >

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-13 Thread Sofia’s World

Sure Mich...uhm...let me try to run your code in my IDE. .. I m intrigued by the error.. Will report back either if I find something or not. Kind regards On Sun, Dec 13, 2020, 9:46 AM Mich Talebzadeh wrote: > thanks Marco. > > When I stripped down spark etc and ran your map, it came back OK (no

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-12 Thread Sofia’s World

Hi Mich i dont think it's a good idea... I believe your IDE is playing tricks on you. Take spark out of the equation this is a python issue only. i am guessing your IDE is somehow messing up your environment. if you take out the whole spark code and replace it by this code map(lambda x: (x

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-12 Thread Mich Talebzadeh

I solved the issue of variable numRows within the lambda function not defined by defining it as a Global variable global numRows numRows = 10 ## do in increment of 50K rows otherwise you blow up driver memory! # Then I could call it within the lambda function as follows rdd = sc.parallelize(R

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Mich Talebzadeh

many thanks KR. If i call the clusterted function on its own it works numRows = 10 print(uf.clustered(200,numRows)) and returns 0.00199 If I run all in one including the UsedFunctions claa in the same py file it works. The code is attached However, in PyCharm, I do the following UsedFunc

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Sofia’s World

copying and pasting your code code in a jup notebook works fine. that is, using my own version of Range which is simply a list of numbers how bout this.. does this work fine? list(map(lambda x: (x, clustered(x, numRows)),[1,2,3,4])) If it does, i'd look in what's inside your Range and what you ge

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Mich Talebzadeh

Sorry, part of the code is not that visible rdd = sc.parallelize(Range). \ map(lambda x: (x, uf.clustered(x, numRows), \ uf.scattered(x,1), \ uf.randomised(x,1), \ uf.randomString(50), \

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Mich Talebzadeh

Thanks Sean, This is the code numRows = 10 ## do in increment of 50K rows otherwise you blow up driver memory! # ## Check if table exist otherwise create it rows = 0 sqltext = "" if (spark.sql(f"SHOW TABLES IN {DB} like '{tableName}'").count() == 1): rows = spark.sql(f"""SELECT COUNT(1

Re: Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Sean Owen

Looks like a simple Python error - you haven't shown the code that produces it. Indeed, I suspect you'll find there is no such symbol. On Fri, Dec 11, 2020 at 9:09 AM Mich Talebzadeh wrote: > Hi, > > This used to work but not anymore. > > I have UsedFunctions.py file that has these functions > >

Using Lambda function to generate random data in PySpark throws not defined error

2020-12-11 Thread Mich Talebzadeh

Hi, This used to work but not anymore. I have UsedFunctions.py file that has these functions import random import string import math def randomString(length): letters = string.ascii_letters result_str = ''.join(random.choice(letters) for i in range(length)) return result_str def cl

Re: Using Lambda function to generate random data in PySpark throws not defined error

Re: Using Lambda function to generate random data in PySpark throws not defined error

Re: Using Lambda function to generate random data in PySpark throws not defined error

Re: Using Lambda function to generate random data in PySpark throws not defined error

Re: Using Lambda function to generate random data in PySpark throws not defined error

Re: Using Lambda function to generate random data in PySpark throws not defined error

Re: Using Lambda function to generate random data in PySpark throws not defined error

Re: Using Lambda function to generate random data in PySpark throws not defined error

Re: Using Lambda function to generate random data in PySpark throws not defined error

Re: Using Lambda function to generate random data in PySpark throws not defined error

Re: Using Lambda function to generate random data in PySpark throws not defined error

Using Lambda function to generate random data in PySpark throws not defined error

12 matches

Site Navigation

Mail list logo

Footer information