Jianshi: How many column families and columns are you expecting (maximum) in your largest table ?
Cheers On Tue, Jun 24, 2014 at 7:29 AM, Jianshi Huang <[email protected]> wrote: > Hi David, > > I did, it's a wonderful piece of work and for reviewing facts in a > networks it's a great tool. (And Lumify looks really nice) > > However, my queries are mostly time-bound (from time A to time B), and to > make some query real-time (< 50ms), I have to roll out my own schema and > index, to denormalize properties and to incrementally do aggregations. I > don't think there're existing solution in Graph database that can do these. > > And it's really fun to implement it myself. :) > > Please correct me if I'm wrong > > Jianshi > > > > On Tue, Jun 24, 2014 at 10:10 PM, David Medinets <[email protected] > > wrote: > >> Did you get a chance to review http://securegraph.org/? SecureGraph is >> an API to manipulate graphs, similar to Blueprints. Unlike Blueprints, >> every Secure graph method requires authorizations and visibilities. >> SecureGraph also supports multivalued properties as well as property >> metadata. >> >> >> On Tue, Jun 24, 2014 at 9:51 AM, Jianshi Huang <[email protected]> >> wrote: >> >>> Wow, so many replies and very educational. Thank you all! >>> >>> I'm working on a Graph backend that I hope the same infrastructure can >>> support >>> >>> 1) interactive graph exploration and queries >>> >>> Answering what are the interactions among N users from time A to time B, >>> and how are users connected (now and before). >>> >>> 2) real-time (<100ms) feature calculation (aggregation, matching) in a >>> network of accounts >>> >>> Answering questions like: what's the ratio of newly registered accounts >>> in my 'connected' (need flexible definition) network, how fast does it >>> change; Does the network has path satisfying A(CN) -> B(IT) -> C(US) where >>> the age of path is less than 3 days; etc. >>> >>> 3) offline simulation of events or offline calculation of new features >>> (used for building models), so I need to take snapshots and also save >>> point-in-time data >>> >>> Having them all-in-one in the same infrastructure will greatly simplify >>> the implementation. >>> >>> BTW, I'm working for PayPal, Risk Data Science. (All questions above are >>> fake and are not related to PayPal :) >>> >>> I made a prototype in the last two weeks for purpose 1) and my feeling >>> about Accumulo is exactly what many of you has said: it just works! Very >>> little admin work, Clean and clear documentation and APIs. One thing I >>> haven't got right was high-speed ingestion, I only got 100K rows/sec/node, >>> but it's already very satisfying. :) >>> >>> BTW, from Mike's slides it seems HBase is much faster in read throughput >>> if the number of columns is small. Any comments? What about latency? Can I >>> cache all data in memory in Accumulo to reduce latency for cold data (say I >>> just restarted my cluster)? >>> >>> >>> Jianshi >>> >>> >>> >>> >>> On Tue, Jun 24, 2014 at 10:41 AM, William Slacum < >>> [email protected]> wrote: >>> >>>> I think first and foremost, how has writing your application been? Is >>>> it something you can easily onboard other people for? Does it seem stable >>>> enough? If you can answer those questions positively, I think you have a >>>> winning situation. >>>> >>>> The big three Hadoop vendors (Cloudera, Hortonworks and MapR) all >>>> provide some level of support for Accumulo, so it has the pedigree of other >>>> members of the Hadoop ecosystem. >>>> >>>> Regarding the performance, I think Mike's presentation needs some >>>> context. He can definitely provide more context than the rest of us (and >>>> possibly Sean or Bill |-|), but I think one thing he was driving home is >>>> that out of the box, Accumulo is configured to run on someone's laptop. >>>> There are adjustments to be made when running at any scale greater than a >>>> dev machine and they may not be documented clearly. >>>> >>>> >>>> On Mon, Jun 23, 2014 at 8:16 PM, Tejinder S Luthra <[email protected] >>>> > wrote: >>>> >>>>> Mike did a pretty good presentation on performance comparison between >>>>> Accumulo / HBase. Again not official IMO but is pretty detailed in the >>>>> approach take and apples-apples comparison >>>>> http://www.slideshare.net/AccumuloSummit/10-30-drob >>>>> >>>>> >>>>> >>>>> [image: Inactive hide details for Jeremy Kepner ---06/23/2014 07:42:57 >>>>> PM---Performance is probably the largest difference between Accu]Jeremy >>>>> Kepner ---06/23/2014 07:42:57 PM---Performance is probably the largest >>>>> difference between Accumulo and HBase. Accumulo can ingest/scan >>>>> >>>>> From: Jeremy Kepner <[email protected]> >>>>> To: <[email protected]> >>>>> Date: 06/23/2014 07:42 PM >>>>> Subject: Re: How does Accumulo compare to HBase >>>>> ------------------------------ >>>>> >>>>> >>>>> >>>>> Performance is probably the largest difference between Accumulo and >>>>> HBase. >>>>> >>>>> Accumulo can ingest/scan at a rate of 800K entries/sec/node. >>>>> This performance scales well into the hundreds of nodes to deliver >>>>> 100M+ entries/sec. >>>>> >>>>> There are no recent HBase benchmarks and none in the peer-reviewed >>>>> literature. >>>>> Old data suggests that HBase performance is ~1% of Accumulo >>>>> performance. >>>>> >>>>> In short, one can often replace a 20+ node database with >>>>> a single node Accumulo database. >>>>> >>>>> On Tue, Jun 24, 2014 at 01:55:54AM +0800, Jianshi Huang wrote: >>>>> > Er... basically I need to explain to my manager why choosing >>>>> Accumulo, >>>>> > instead of HBase. >>>>> > >>>>> > So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98 >>>>> also >>>>> > got cell-level security, modeled after Accumulo) >>>>> > >>>>> > -- >>>>> > Jianshi Huang >>>>> > >>>>> > LinkedIn: jianshi >>>>> > Twitter: @jshuang >>>>> > Github & Blog: http://huangjs.github.com/ >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >
