hi Samuel, You may wish to evaluate Presto (https://prestodb.io/) , which has an added advantage of being faster than conventional Hive due to no MR jobs being fired. It has a dependency on Hive metastore though , through which it derives the mechanism to execute the queries directly on source files. The only flip side I found was the absence of complex SQL syntax that means creating a lot of intermediate tables for little complicated calculations (and imho , all calculations become complex sooner than we intend them to )
regards Devopam On Tue, Feb 3, 2015 at 10:30 AM, Samuel Marks <samuelma...@gmail.com> wrote: > Alexander: So would you recommend using Phoenix for all but those kind of > queries, and switching to Hive+Tez for the rest? - Is that feasible? > > Checking their documentation, it looks like it just might be: > https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration > > There is some early work on a Hive + Phoenix integration on GitHub: > https://github.com/nmaillard/Phoenix-Hive > > Saurabh: I am sure there are a variety of very good non open-source > products on the market :) - However in this thread I am only looking at > open-source options. Additionally I am planning on open-sourcing this > project I am building using these tools, so it makes even more sense that > the entire toolset and their dependencies are also open-source. > > Best, > > Samuel Marks > http://linkedin.com/in/samuelmarks > > On Tue, Feb 3, 2015 at 2:33 PM, Saurabh B <saurabh.wri...@gmail.com> > wrote: > >> This is not open source but we are using Vertica and it works very nicely >> for us. There is a 1TB community edition but above that it costs money. >> It has really advanced SQL (analytical functions, etc), works like an >> RDBMS, has R/Java/C++ SDK and scales nicely. There is a similar option of >> Redshift available but Vertica has more features (pattern matching >> functions, etc). >> >> Again, not open source so I would be interested to know what you end up >> going with and what your experience is. >> >> On Mon, Feb 2, 2015 at 12:08 AM, Samuel Marks <samuelma...@gmail.com> >> wrote: >> >>> Well what I am seeking is a Big Data database that can work with Small >>> Data also. I.e.: scaleable from one node to vast clusters; whilst >>> maintaining relatively low latency throughout. >>> >>> Which fit into this category? >>> >>> Samuel Marks >>> http://linkedin.com/in/samuelmarks >>> >> >> > -- Devopam Mittra Life and Relations are not binary