Re: FW: Email to Spark Org please

2021-04-01 Thread Sean Owen
Yes that's a great option when the modeling process itself doesn't really need Spark. You can use any old modeling tool you want and get the parallelism in tuning via hyperopt's Spark integration. On Thu, Apr 1, 2021 at 10:50 AM Williams, David (Risk Value Stream) wrote: > Classification: Public

RE: FW: Email to Spark Org please

2021-04-01 Thread Williams, David (Risk Value Stream)
Value Stream) Cc: user@spark.apache.org Subject: Re: FW: Email to Spark Org please -- This email has reached the Bank via an external source -- Right, could also be the case that the overhead of distributing it is just dominating. You wouldn't use sklearn with Spark, just use sklearn at this sc

Re: FW: Email to Spark Org please

2021-03-26 Thread Sean Owen
uster. So if we get that working in distributed, will we get > benefits similar to spark ML? > > > > Best Regards, > > Dave Williams > > > > *From:* Sean Owen > *Sent:* 26 March 2021 13:20 > *To:* Williams, David (Risk Value Stream) > > *Cc:* user@spar

RE: FW: Email to Spark Org please

2021-03-26 Thread Williams, David (Risk Value Stream)
if we get that working in distributed, will we get benefits similar to spark ML? Best Regards, Dave Williams From: Sean Owen Sent: 26 March 2021 13:20 To: Williams, David (Risk Value Stream) Cc: user@spark.apache.org Subject: Re: FW: Email to Spark Org please -- This email has reached the Bank v

Re: FW: Email to Spark Org please

2021-03-26 Thread Sean Owen
ent:* 25 March 2021 16:40 > *To:* Williams, David (Risk Value Stream) < > david.willi...@lloydsbanking.com> > *Cc:* user@spark.apache.org > *Subject:* Re: FW: Email to Spark Org please > > > > > *-- This email has reached the Bank via an external source -- * > > Spark is overk

RE: FW: Email to Spark Org please

2021-03-26 Thread Williams, David (Risk Value Stream)
David (Risk Value Stream) mailto:david.willi...@lloydsbanking.com>> Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: FW: Email to Spark Org please -- This email has reached the Bank via an external source -- Spark is overkill for this problem; use sklearn. But I

Re: FW: Email to Spark Org please

2021-03-25 Thread Sean Owen
Spark is overkill for this problem; use sklearn. But I'd suspect that you are using just 1 partition for such a small data set, and get no parallelism from Spark. repartition your input to many more partitions, but, it's unlikely to get much faster than in-core sklearn for this task. On Thu, Mar 2