Hi Ted, CF: maybe dozens Columns: billions (rowkey = nodeId, CF = event type, CQ = Index+eventId)
Make sense? Jianshi On Tue, Jun 24, 2014 at 10:33 PM, Ted Yu <[email protected]> wrote: > Jianshi: > How many column families and columns are you expecting (maximum) in your > largest table ? > > Cheers > > > On Tue, Jun 24, 2014 at 7:29 AM, Jianshi Huang <[email protected]> > wrote: > >> Hi David, >> >> I did, it's a wonderful piece of work and for reviewing facts in a >> networks it's a great tool. (And Lumify looks really nice) >> >> However, my queries are mostly time-bound (from time A to time B), and to >> make some query real-time (< 50ms), I have to roll out my own schema and >> index, to denormalize properties and to incrementally do aggregations. I >> don't think there're existing solution in Graph database that can do these. >> >> And it's really fun to implement it myself. :) >> >> Please correct me if I'm wrong >> >> Jianshi >> >> >> >> On Tue, Jun 24, 2014 at 10:10 PM, David Medinets < >> [email protected]> wrote: >> >>> Did you get a chance to review http://securegraph.org/? SecureGraph is >>> an API to manipulate graphs, similar to Blueprints. Unlike Blueprints, >>> every Secure graph method requires authorizations and visibilities. >>> SecureGraph also supports multivalued properties as well as property >>> metadata. >>> >>> >>> On Tue, Jun 24, 2014 at 9:51 AM, Jianshi Huang <[email protected]> >>> wrote: >>> >>>> Wow, so many replies and very educational. Thank you all! >>>> >>>> I'm working on a Graph backend that I hope the same infrastructure can >>>> support >>>> >>>> 1) interactive graph exploration and queries >>>> >>>> Answering what are the interactions among N users from time A to time >>>> B, and how are users connected (now and before). >>>> >>>> 2) real-time (<100ms) feature calculation (aggregation, matching) in a >>>> network of accounts >>>> >>>> Answering questions like: what's the ratio of newly registered accounts >>>> in my 'connected' (need flexible definition) network, how fast does it >>>> change; Does the network has path satisfying A(CN) -> B(IT) -> C(US) where >>>> the age of path is less than 3 days; etc. >>>> >>>> 3) offline simulation of events or offline calculation of new features >>>> (used for building models), so I need to take snapshots and also save >>>> point-in-time data >>>> >>>> Having them all-in-one in the same infrastructure will greatly simplify >>>> the implementation. >>>> >>>> BTW, I'm working for PayPal, Risk Data Science. (All questions above >>>> are fake and are not related to PayPal :) >>>> >>>> I made a prototype in the last two weeks for purpose 1) and my feeling >>>> about Accumulo is exactly what many of you has said: it just works! Very >>>> little admin work, Clean and clear documentation and APIs. One thing I >>>> haven't got right was high-speed ingestion, I only got 100K rows/sec/node, >>>> but it's already very satisfying. :) >>>> >>>> BTW, from Mike's slides it seems HBase is much faster in read >>>> throughput if the number of columns is small. Any comments? What about >>>> latency? Can I cache all data in memory in Accumulo to reduce latency for >>>> cold data (say I just restarted my cluster)? >>>> >>>> >>>> Jianshi >>>> >>>> >>>> >>>> >>>> On Tue, Jun 24, 2014 at 10:41 AM, William Slacum < >>>> [email protected]> wrote: >>>> >>>>> I think first and foremost, how has writing your application been? Is >>>>> it something you can easily onboard other people for? Does it seem stable >>>>> enough? If you can answer those questions positively, I think you have a >>>>> winning situation. >>>>> >>>>> The big three Hadoop vendors (Cloudera, Hortonworks and MapR) all >>>>> provide some level of support for Accumulo, so it has the pedigree of >>>>> other >>>>> members of the Hadoop ecosystem. >>>>> >>>>> Regarding the performance, I think Mike's presentation needs some >>>>> context. He can definitely provide more context than the rest of us (and >>>>> possibly Sean or Bill |-|), but I think one thing he was driving home is >>>>> that out of the box, Accumulo is configured to run on someone's laptop. >>>>> There are adjustments to be made when running at any scale greater than a >>>>> dev machine and they may not be documented clearly. >>>>> >>>>> >>>>> On Mon, Jun 23, 2014 at 8:16 PM, Tejinder S Luthra < >>>>> [email protected]> wrote: >>>>> >>>>>> Mike did a pretty good presentation on performance comparison between >>>>>> Accumulo / HBase. Again not official IMO but is pretty detailed in the >>>>>> approach take and apples-apples comparison >>>>>> http://www.slideshare.net/AccumuloSummit/10-30-drob >>>>>> >>>>>> >>>>>> >>>>>> [image: Inactive hide details for Jeremy Kepner ---06/23/2014 >>>>>> 07:42:57 PM---Performance is probably the largest difference between >>>>>> Accu]Jeremy >>>>>> Kepner ---06/23/2014 07:42:57 PM---Performance is probably the largest >>>>>> difference between Accumulo and HBase. Accumulo can ingest/scan >>>>>> >>>>>> From: Jeremy Kepner <[email protected]> >>>>>> To: <[email protected]> >>>>>> Date: 06/23/2014 07:42 PM >>>>>> Subject: Re: How does Accumulo compare to HBase >>>>>> ------------------------------ >>>>>> >>>>>> >>>>>> >>>>>> Performance is probably the largest difference between Accumulo and >>>>>> HBase. >>>>>> >>>>>> Accumulo can ingest/scan at a rate of 800K entries/sec/node. >>>>>> This performance scales well into the hundreds of nodes to deliver >>>>>> 100M+ entries/sec. >>>>>> >>>>>> There are no recent HBase benchmarks and none in the peer-reviewed >>>>>> literature. >>>>>> Old data suggests that HBase performance is ~1% of Accumulo >>>>>> performance. >>>>>> >>>>>> In short, one can often replace a 20+ node database with >>>>>> a single node Accumulo database. >>>>>> >>>>>> On Tue, Jun 24, 2014 at 01:55:54AM +0800, Jianshi Huang wrote: >>>>>> > Er... basically I need to explain to my manager why choosing >>>>>> Accumulo, >>>>>> > instead of HBase. >>>>>> > >>>>>> > So what are the pros and cons of Accumulo vs. HBase? (btw HBase >>>>>> 0.98 also >>>>>> > got cell-level security, modeled after Accumulo) >>>>>> > >>>>>> > -- >>>>>> > Jianshi Huang >>>>>> > >>>>>> > LinkedIn: jianshi >>>>>> > Twitter: @jshuang >>>>>> > Github & Blog: http://huangjs.github.com/ >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Jianshi Huang >>>> >>>> LinkedIn: jianshi >>>> Twitter: @jshuang >>>> Github & Blog: http://huangjs.github.com/ >>>> >>> >>> >> >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
