Re: Dataframes - sole data structure for parallel computations?

Michael Armbrust Thu, 08 Oct 2015 10:57:06 -0700

Don't worry, the ability to work with domain objects and lambda functions
is not going to go away.  However, we are looking at ways to leverage
Tungsten's improved performance when processing structured data.


More details can be found here:
https://issues.apache.org/jira/browse/SPARK-9999

On Thu, Oct 8, 2015 at 7:40 AM, Tracewski, Lukasz <
lukasz.tracew...@credit-suisse.com> wrote:

> Hi,
>
>
>
> Many people interpret this slide from Databricks
>
> https://ogirardot.files.wordpress.com/2015/05/future-of-spark.png
>
> as indication that Dataframes API is going to be the main processing unit
> of Spark and sole access point to MLlib, Streaming and such. Is it true? My
> impression was that Dataframes are an additional abstraction layer with
> some promising optimisation coming from Tungsten project, but that’s all.
> RDDs are there to stay. They are a natural selection when it comes to e.g.
> processing images.
>
>
>
> Here is one article that advertises Dataframes as a “sole data structure
> for parallel computations”:
>
>
> https://ogirardot.wordpress.com/2015/05/29/rdds-are-the-new-bytecode-of-apache-spark/
> (paragraph 4)
>
>
>
> Cheers,
>
> Lucas
>
>
>
>
>
>
>
> ==============================================================================
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>
> ==============================================================================
>

Re: Dataframes - sole data structure for parallel computations?

Reply via email to