FWIW, Cloudera and LucidWorks believe Solr belongs with Hadoop (Cloudera Search and LucidWorks Big Data). So while the Solr project is lagging a bit in Hadoop integration points, other projects have reached out to pull it in (DigitalPebble Behemoth, Cascading, Twitter's Elephant Bird Pig/Lucene integration). That, to me, proves the usefulness of including Solr within BigTop.
-Scott On Jun 12, 2013, at 11:50 AM, Andrew Purtell <[email protected]> wrote: > It's a fair point that SOLR might not be part of the Hadoop ecosystem. > However, Bigtop is a top level project at Apache, and so maybe casting a > wider net makes sense. Would be the Bigtop community's consensus what to > include versus not. I like the Bigtop community's inclusiveness to date - the > strategy might be described as "building the Apache Big Data Operating > System". That's fantastic. > > > On Tue, Jun 11, 2013 at 10:27 PM, Konstantin Boudnik <[email protected]> wrote: > On Mon, Jun 10, 2013 at 06:23PM, Jay Vyas wrote: > > Thanks roman :) yes we once lived and died by the SolrOutputFormat for > > some time. It is a very nice extension to hadoop's reduce outputs - but > > What i mean is that SOLR is not part of the hadoop ecosystem, in the sense > > that it doesnt natively depend on HDFS . Rather it uses standard file > > system and is a memory intensive app, scaling via more cores, not more data > > nodes or task trackers . > > > > I think of "hadoop ecosytem" tools as tools which rely on HDFS, or > > MapReduce, in order to run. > > HDFS largely yes. YARN (not MR per se) isn't that much. Say, Bigtop is/about > to integrate in-memory analytic systems (Spark, Shark) that aren't relying on > MR at all, and only somewhat benefit from YARN. > > > But maybe the definition of the "hadoop ecosystem" is brodening in the YARN > > / Zookeeper era ? > > See above. > > Cos > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via > Tom White)
