I think that submitting the spark job on behalf of user01 will solve the
problem.
You may also try to set a sticky bit on /data/user01/rdd folder if you want
to allow multiple users writing to /data/user01/rdd same at same time, but
i'd not recommend allow multiple users writing to same dir *exac
A. global scope and global variables are bad habits in Python (this is
about an 'rdd' and 'i' variable used in lambda).
B. lambdas are usually misused and abused in Python especially when they
used in global context: ideally you'd like to use pure functions and use
something like:
```
def my_rdd_f
Hi all,
I have a spark application with Arbitrary Stateful Aggregation implemented
with FlatMapGroupsWithStateFunction.
I want to make some statistics about incoming events inside
FlatMapGroupsWithStateFunction.
The statistics are made from some event property which on the one hand has
dynamic val
Hi all, I have following question:
What happens to the state (in terms of expiration) if I’m updating the
state without setting timeout?
E.g. in FlatMapGroupsWithStateFunction
1. first batch:
state.update(myObj)
state.setTimeoutDuration(timeout)
1. second batch:
state.update(myObj)