Re: Is it enable to use Multiple UGIs in One Spark Context?

2021-03-25 Thread יורי אולייניקוב
I think that submitting the spark job on behalf of user01 will solve the problem. You may also try to set a sticky bit on /data/user01/rdd folder if you want to allow multiple users writing to /data/user01/rdd same at same time, but i'd not recommend allow multiple users writing to same dir *exac

Re: RDD filter in for loop gave strange results

2021-01-20 Thread יורי אולייניקוב
A. global scope and global variables are bad habits in Python (this is about an 'rdd' and 'i' variable used in lambda). B. lambdas are usually misused and abused in Python especially when they used in global context: ideally you'd like to use pure functions and use something like: ``` def my_rdd_f

Dynamic Spark metrics creation

2021-01-16 Thread יורי אולייניקוב
Hi all, I have a spark application with Arbitrary Stateful Aggregation implemented with FlatMapGroupsWithStateFunction. I want to make some statistics about incoming events inside FlatMapGroupsWithStateFunction. The statistics are made from some event property which on the one hand has dynamic val

Arbitrary stateful aggregation: updating state without setting timeout

2020-10-05 Thread יורי אולייניקוב
Hi all, I have following question: What happens to the state (in terms of expiration) if I’m updating the state without setting timeout? E.g. in FlatMapGroupsWithStateFunction 1. first batch: state.update(myObj) state.setTimeoutDuration(timeout) 1. second batch: state.update(myObj)