I have one observation: is "python udf is slow due to deserialization
penulty" still relevant? Even after arrow is used as in memory data mgmt
and so heavy investment from spark dev community on making pandas first
class citizen including Udfs.
As I work with multiple clients, my exp is org cultur
Not quite sure how meaningful this discussion is, but in case someone is
really faced with this query the question still is 'what is the use case'?
I am just a bit confused with the one size fits all deterministic approach
here thought that those days were over almost 10 years ago.
Regards
Gourav
I agree with Wim's assessment of data engineering / ETL vs Data Science.
I wrote pipelines/frameworks for large companies and scala was a much
better choice. But for ad-hoc work interfacing directly with data science
experiments pyspark presents less friction.
On Sat, 10 Oct 2020 at 13:03, Mich Ta
Many thanks everyone for their valuable contribution.
We all started with Spark a few years ago where Scala was the talk of the
town. I agree with the note that as long as Spark stayed nish and elite,
then someone with Scala knowledge was attracting premiums. In fairness in
2014-2015, there was no
I would not leave it to data scientists unless they will maintain it.
The key decision in cases I've seen was usually people
cost/availability with ETL operations cost taken into account.
Often the situation is that ETL cloud cost is small and you will not
save much. Then it is just skills cost/a
Is spark compute engine only or it's also cluster which comes with set of
hardware /nodes ? What exactly is spark clusterr?
Dear Experts please help
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsu
Is spark compute engine only or it's also cluster which comes with set of
hardware /nodes ? What exactly is spark clusterr?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsu
It really depends on what your data scientists talk. I don’t think it makes
sense for ad hoc data science things to impose a language on them, but let them
choose.
For more complex AI engineering things you can though apply different standards
and criteria. And then it really depends on architec
If it works without Arrow optimization, it's likely a bug. Please feel free
to file a JIRA for that.
On Wed, 7 Oct 2020, 22:44 Jacek Pliszka, wrote:
> Hi!
>
> Is there any place I can find information how to use gapply with arrow?
>
> I've tried something very simple
>
> collect(gapply(
> df,
Hey Mich,
This is a very fair question .. I've seen many data engineering teams start
out with Scala because technically it is the best choice for many given
reasons and basically it is what Spark is.
On the other hand, almost all use cases we see these days are data science
use cases where peopl
What is the use case?
Unless you have unlimited funding and time to waste you would usually start
with that.
Regards,
Gourav
On Fri, Oct 9, 2020 at 10:29 PM Russell Spitzer
wrote:
> Spark in Scala (or java) Is much more performant if you are using RDD's,
> those operations basically force you t
11 matches
Mail list logo