Re: Resolving all JIRAs affecting EOL releases

2019-05-19 Thread Hyukjin Kwon
Thanks Shane .. the URL I linked somehow didn't work in other people browser. Hope this link works:

Re: Access to live data of cached dataFrame

2019-05-19 Thread Tomas Bartalos
I'm trying to re-read however I'm getting cached data (which is a bit confusing). For re-read I'm issuing: spark.read.format("delta").load("/data").groupBy(col("event_hour")).count The cache seems to be global influencing also new dataframes. So the question is how should I re-read without

Object serialization for workers

2019-05-19 Thread R. Tyler Croy
Greetings! I am looking into the possibility of JRuby support for Spark, and could use some pointers (references?) to orient myself a bit better within the codebase. JRuby fat jars load just fine in Spark but where things start to get predictably dicey is with object serialization for RDDs

Re: Resolving all JIRAs affecting EOL releases

2019-05-19 Thread Hyukjin Kwon
I will add one more condition for "updated". So, it will additionally avoid things updated within one year but left open against EOL releases. project = SPARK AND status in (Open, "In Progress", Reopened) AND ( affectedVersion = EMPTY OR NOT (affectedVersion in versionMatch("^3.*")

Re: Resolving all JIRAs affecting EOL releases

2019-05-19 Thread Sean Owen
I'd only tweak this to perhaps not close JIRAs that have been updated recently -- even just avoiding things updated in the last month. For example this would close https://issues.apache.org/jira/browse/SPARK-27758 which was opened Friday (though, for other reasons it should probably be closed).