SPARK-23443 - Spark with Glue as external catalog

2020-05-22 Thread Edgar Klerks
Hi there, I am a potentially new contributor, so don't spend too much time on me. However I would like to give this a try. Reason is that it would be a nice to have at my work (the connection between glue and spark). We run our own spark clusters and don't use EMR and right now our spark jobs

Re: [VOTE] Apache Spark 3.0 RC2

2020-05-22 Thread Koert Kuipers
i would like to point out that SPARK-27194 is a fault tolerance bug that causes jobs to fail when any single task is retried. for us this is a major headache because we have to keep restarting jobs (and explain that spark is really fault tolerant generally, just not here).

Re: Weird ClassCastException when using generics from Java

2020-05-22 Thread Sean Owen
I don't immediately see what the issue could be - try .count()-ing the individual RDDs to narrow it down? What code change made it work? Also I think this could probably be a few lines of SQL with an aggregate, collect_list(), and joins. On Thu, May 21, 2020 at 11:27 PM Stephen Coy wrote: > >

Re: [VOTE] Apache Spark 3.0 RC2

2020-05-22 Thread Xiao Li
Thanks for reporting these issues! Please continue to test RC2 and report more issues. Cheers, Xiao On Fri, May 22, 2020 at 7:40 AM Koert Kuipers wrote: > i would like to point out that SPARK-27194 is a fault tolerance bug that > causes jobs to fail when any single task is retried. for us

Spark behavior with changing data source

2020-05-22 Thread Vipul Rajan
I have a use case where I am joining a streamingDataFrame with a static DataFrame. The static DataFrame is read from a parquet table (a directory containing parquet files). This parquet data is updated by another process once a day. I am using structured streaming for the streaming DataFrame. My

Re: [VOTE] Apache Spark 3.0 RC2

2020-05-22 Thread Hyukjin Kwon
Ryan, > I'm fine with the commit, other than the fact that it violated ASF norms to commit without waiting for a review. Looks it became the different proposal as you and other people discussed and suggested there, which you didn't technically vote

Re: [VOTE] Apache Spark 3.0 RC2

2020-05-22 Thread 王斐
Hi all, Can we help review this pr and resolve this issue before spark-3.0 RC3. This is a fault tolerance bug in spark. not as serious as a correctness issue, but pretty high up.( I just cite the comment, https://github.com/apache/spark/pull/26339#issuecomment-632707720).