Hey Mich
glad to know u got to the bottom
In python, if you want to run a module - same as if you would use
Java/Scala -you will have to define a def main() method
You'll notice that the snippet i sent you had this syntax -
if __name__ == "main":
main()
I am guessing you just choose an unfo
Thanks all.
Found out the problem :(
I defined the runner.py as
class main()
I replaced it with
def main():
and it worked without declaring numRows as global.
I am still wondering the reason for it working with def main()?
regards,
Mich
*Disclaimer:* Use it at your own risk. Any and all
I don't believe you'll be able to use globals in a Spark task, as they
won't exist on the remote executor machines.
On Sun, Dec 13, 2020 at 3:46 AM Mich Talebzadeh
wrote:
> thanks Marco.
>
> When I stripped down spark etc and ran your map, it came back OK (no
> errors) WITHOUT global numRows
>
>
Sure Mich...uhm...let me try to run your code in my IDE. .. I m intrigued
by the error..
Will report back either if I find something or not.
Kind regards
On Sun, Dec 13, 2020, 9:46 AM Mich Talebzadeh
wrote:
> thanks Marco.
>
> When I stripped down spark etc and ran your map, it came back OK (no
Hi Mich
i dont think it's a good idea... I believe your IDE is playing tricks on
you.
Take spark out of the equation this is a python issue only.
i am guessing your IDE is somehow messing up your environment.
if you take out the whole spark code and replace it by this code
map(lambda x: (x
I solved the issue of variable numRows within the lambda function not
defined by defining it as a Global variable
global numRows
numRows = 10 ## do in increment of 50K rows otherwise you blow up
driver memory!
#
Then I could call it within the lambda function as follows
rdd = sc.parallelize(R
many thanks KR.
If i call the clusterted function on its own it works
numRows = 10
print(uf.clustered(200,numRows))
and returns
0.00199
If I run all in one including the UsedFunctions claa in the same py file it
works. The code is attached
However, in PyCharm, I do the following
UsedFunc
copying and pasting your code code in a jup notebook works fine. that is,
using my own version of Range which is simply a list of numbers
how bout this.. does this work fine?
list(map(lambda x: (x, clustered(x, numRows)),[1,2,3,4]))
If it does, i'd look in what's inside your Range and what you ge
Sorry, part of the code is not that visible
rdd = sc.parallelize(Range). \
map(lambda x: (x, uf.clustered(x, numRows), \
uf.scattered(x,1), \
uf.randomised(x,1), \
uf.randomString(50), \
Thanks Sean,
This is the code
numRows = 10 ## do in increment of 50K rows otherwise you blow
up driver memory!
#
## Check if table exist otherwise create it
rows = 0
sqltext = ""
if (spark.sql(f"SHOW TABLES IN {DB} like '{tableName}'").count() == 1):
rows = spark.sql(f"""SELECT COUNT(1
Looks like a simple Python error - you haven't shown the code that produces
it. Indeed, I suspect you'll find there is no such symbol.
On Fri, Dec 11, 2020 at 9:09 AM Mich Talebzadeh
wrote:
> Hi,
>
> This used to work but not anymore.
>
> I have UsedFunctions.py file that has these functions
>
>
Hi,
This used to work but not anymore.
I have UsedFunctions.py file that has these functions
import random
import string
import math
def randomString(length):
letters = string.ascii_letters
result_str = ''.join(random.choice(letters) for i in range(length))
return result_str
def cl
12 matches
Mail list logo