Haha... the optimiser looks interesting and I looked at the Nephele paper "Nephele Streaming: Stream processing under QoS constraints at scale", there are some very interesting ideas around using different qos metrics to determine the "correct" plan. It would be interesting to see a comparison on that front. Is there a similar planner in spark? Or is spark a little low level to be thinking about plans and that is left to the user?
-- Ankur On 26 Nov 2013, at 11:42, Dmitriy Lyubimov <[email protected]> wrote: > sounds... formiddable, Sebastian! > > > On Tue, Nov 26, 2013 at 12:27 AM, Sebastian Schelter <[email protected]> wrote: > Stratosphere is a massively parallel data processing system that is > heavily inspired by database technology. It is based on research > published at leading international scientific conferences (VLDB, Sigmod, > SoCC, CIKM). > > It is similar to Spark in many aspects, e.g. it has a Scala API, it > supports complex data flows and very efficiently executes iterative > programs. > > A core differences is that it features an optimizer that will for > example automatically choose data shipping and execution strategies for > joins (broadcast/repartition, sort-merge/hybrid-hash join). Another > difference is that its operators are designed to work in memory but > gracefully go out of core under memory pressure. > > Checkout the feature overview on the start page of http://stratosphere.eu/ > > On 23.11.2013 01:17, Ankur Chauhan wrote: > > Hi, > > > > That's what I thought but as per the slides on http://www.stratosphere.eu > > they seem to "know" about spark and the scala api does look similar. > > I found the PACT model interesting. Would like to know if matei or other > > core comitters have something to weight in on. > > > > -- Ankur > > On 22 Nov 2013, at 16:05, Patrick Wendell <[email protected]> wrote: > > > >> I've never seen that project before, would be interesting to get a > >> comparison. Seems to offer a much lower level API. For instance this > >> is a wordcount program: > >> > >> https://github.com/stratosphere/stratosphere/blob/master/pact/pact-examples/src/main/java/eu/stratosphere/pact/example/wordcount/WordCount.java > >> > >> On Thu, Nov 21, 2013 at 3:15 PM, Ankur Chauhan <[email protected]> > >> wrote: > >>> Hi, > >>> > >>> I was just curious about https://github.com/stratosphere/stratosphere > >>> and how does spark compare to it. Anyone has any experience with it to > >>> make > >>> any comments? > >>> > >>> -- Ankur > > > >
