+1 on exposing the APIs for columnar processing support.
I understand that the scope of this SPIP doesn't cover AI / ML
use-cases. But I saw a good performance gain when I converted data
from rows to columns to leverage on SIMD architectures in a POC ML
application.
With the exposed columnar proc
Btw the heuristics for batch mode (
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L289)
vs
streaming (
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManag
I am on k8s where there is no support yet afaik, there is wip wrt the
shuffle service. So from your experience there are no issues with using the
batch dynamic allocation version like there was before with dstreams as
described in the related jira?
Στις Παρ, 24 Μαΐ 2019, 8:28 μ.μ. ο χρήστης Gabor
Yes nothing happens. In this case it could propagate info to the resource
manager to scale down the number of executors no? Just a thought.
Στις Παρ, 24 Μαΐ 2019, 7:17 μ.μ. ο χρήστης Gabor Somogyi <
gabor.g.somo...@gmail.com> έγραψε:
> Structured Streaming works differently. If no data arrives no
It scales down with yarn. Not sure how you've tested.
On Fri, 24 May 2019, 19:10 Stavros Kontopoulos, <
stavros.kontopou...@lightbend.com> wrote:
> Yes nothing happens. In this case it could propagate info to the resource
> manager to scale down the number of executors no? Just a thought.
>
> Στι
Structured Streaming works differently. If no data arrives no tasks are
executed (just had a case in this area).
BR,
G
On Fri, 24 May 2019, 18:14 Stavros Kontopoulos, <
stavros.kontopou...@lightbend.com> wrote:
> Hi,
>
> Some while ago the streaming dynamic allocation part was added in DStreams
Hi,
Some while ago the streaming dynamic allocation part was added in DStreams(
https://issues.apache.org/jira/browse/SPARK-12133) to improve the issues
with the batch based one. Should this be ported to structured streaming?
Thoughts?
AFAIK there is no support in SS for it.
Best,
Stavros
Hi experts,
I am trying to create a custom Spark Datasource(v1) to read from a
transactional data endpoint, and I need to acquire a lock with the endpoint
before fetching data and release the lock after reading. Note that the lock
acquisition and release needs to happen in the Driver JVM.
I have