Why Dataset.hint uses logicalPlan (= analyzed not planWithBarrier)?

2018-01-25 Thread Jacek Laskowski
Hi, I've just noticed that every time Dataset.hint is used it triggers execution of logical commands, their unions and hint resolution (among other things that analyzer does). Why? Why does hint trigger hint resolution (through QueryExecution.analyzed)? [1] And moreover why not to use

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Joseph Torres
SPARK-23221 fixes an issue specific to KafkaContinuousSourceStressForDontFailOnDataLossSuite; I don't think it could cause other suites to deadlock. Do note that the previous hang issues we saw caused by SPARK-23055 were correctly marked as failures. On Thu, Jan 25, 2018 at 3:40 PM,

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Shixiong(Ryan) Zhu
+ Jose On Thu, Jan 25, 2018 at 2:18 PM, Dongjoon Hyun wrote: > SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue. > > For the hang issues, it seems not to be marked as a failure correctly in > Apache Spark Jenkins history. > > > On Thu, Jan 25, 2018

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Dongjoon Hyun
SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue. For the hang issues, it seems not to be marked as a failure correctly in Apache Spark Jenkins history. On Thu, Jan 25, 2018 at 1:03 PM, Marcelo Vanzin wrote: > On Thu, Jan 25, 2018 at 12:29 PM, Sean

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen wrote: > I am still seeing these tests fail or hang: > > - subscribing topic by name from earliest offsets (failOnDataLoss: false) > - subscribing topic by name from earliest offsets (failOnDataLoss: true) This is something that we

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
> Most tests pass on RC2, except I'm still seeing the timeout caused by > https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never > finish. I followed the thread a bit further and wasn't clear whether it was > subsequently re-fixed for 2.3.0 or not. It says it's resolved along with >

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
I'm a -1 too. In addition to SPARK-23207 , we've recently merged two codegen fixes (SPARK-23208 and SPARK-21717 ) that address a major

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sean Owen
Most tests pass on RC2, except I'm still seeing the timeout caused by https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never finish. I followed the thread a bit further and wasn't clear whether it was subsequently re-fixed for 2.3.0 or not. It says it's resolved along with

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Nick Pentreath
I think this has come up before (and Sean mentions it above), but the sub-items on: SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella are actually marked as Blockers, but are not targeted to 2.3.0. I think they should be, and I'm not comfortable with those not being resolved before voting

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
Sorry, have to change my vote again. Hive guys ran into SPARK-23209 and that's a regression we need to fix. I'll post a patch soon. So -1 (although others have already -1'ed). On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin wrote: > Given that the bugs I was worried about

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread 蒋星博
I'm sorry to post -1 on this, since there is a non-trivial correctness issue that I believe we should fix in 2.3. TL;DR; of the issue: A certain pattern of shuffle+repartition in a query may produce wrong result if some downstream stages failed and trigger retry of repartition, the reason of this