Hi David, I did, it's a wonderful piece of work and for reviewing facts in a networks it's a great tool. (And Lumify looks really nice)
However, my queries are mostly time-bound (from time A to time B), and to make some query real-time (< 50ms), I have to roll out my own schema and index, to denormalize properties and to incrementally do aggregations. I don't think there're existing solution in Graph database that can do these. And it's really fun to implement it myself. :) Please correct me if I'm wrong Jianshi On Tue, Jun 24, 2014 at 10:10 PM, David Medinets <[email protected]> wrote: > Did you get a chance to review http://securegraph.org/? SecureGraph is an > API to manipulate graphs, similar to Blueprints. Unlike Blueprints, every > Secure graph method requires authorizations and visibilities. SecureGraph > also supports multivalued properties as well as property metadata. > > > On Tue, Jun 24, 2014 at 9:51 AM, Jianshi Huang <[email protected]> > wrote: > >> Wow, so many replies and very educational. Thank you all! >> >> I'm working on a Graph backend that I hope the same infrastructure can >> support >> >> 1) interactive graph exploration and queries >> >> Answering what are the interactions among N users from time A to time B, >> and how are users connected (now and before). >> >> 2) real-time (<100ms) feature calculation (aggregation, matching) in a >> network of accounts >> >> Answering questions like: what's the ratio of newly registered accounts >> in my 'connected' (need flexible definition) network, how fast does it >> change; Does the network has path satisfying A(CN) -> B(IT) -> C(US) where >> the age of path is less than 3 days; etc. >> >> 3) offline simulation of events or offline calculation of new features >> (used for building models), so I need to take snapshots and also save >> point-in-time data >> >> Having them all-in-one in the same infrastructure will greatly simplify >> the implementation. >> >> BTW, I'm working for PayPal, Risk Data Science. (All questions above are >> fake and are not related to PayPal :) >> >> I made a prototype in the last two weeks for purpose 1) and my feeling >> about Accumulo is exactly what many of you has said: it just works! Very >> little admin work, Clean and clear documentation and APIs. One thing I >> haven't got right was high-speed ingestion, I only got 100K rows/sec/node, >> but it's already very satisfying. :) >> >> BTW, from Mike's slides it seems HBase is much faster in read throughput >> if the number of columns is small. Any comments? What about latency? Can I >> cache all data in memory in Accumulo to reduce latency for cold data (say I >> just restarted my cluster)? >> >> >> Jianshi >> >> >> >> >> On Tue, Jun 24, 2014 at 10:41 AM, William Slacum < >> [email protected]> wrote: >> >>> I think first and foremost, how has writing your application been? Is it >>> something you can easily onboard other people for? Does it seem stable >>> enough? If you can answer those questions positively, I think you have a >>> winning situation. >>> >>> The big three Hadoop vendors (Cloudera, Hortonworks and MapR) all >>> provide some level of support for Accumulo, so it has the pedigree of other >>> members of the Hadoop ecosystem. >>> >>> Regarding the performance, I think Mike's presentation needs some >>> context. He can definitely provide more context than the rest of us (and >>> possibly Sean or Bill |-|), but I think one thing he was driving home is >>> that out of the box, Accumulo is configured to run on someone's laptop. >>> There are adjustments to be made when running at any scale greater than a >>> dev machine and they may not be documented clearly. >>> >>> >>> On Mon, Jun 23, 2014 at 8:16 PM, Tejinder S Luthra <[email protected]> >>> wrote: >>> >>>> Mike did a pretty good presentation on performance comparison between >>>> Accumulo / HBase. Again not official IMO but is pretty detailed in the >>>> approach take and apples-apples comparison >>>> http://www.slideshare.net/AccumuloSummit/10-30-drob >>>> >>>> >>>> >>>> [image: Inactive hide details for Jeremy Kepner ---06/23/2014 07:42:57 >>>> PM---Performance is probably the largest difference between Accu]Jeremy >>>> Kepner ---06/23/2014 07:42:57 PM---Performance is probably the largest >>>> difference between Accumulo and HBase. Accumulo can ingest/scan >>>> >>>> From: Jeremy Kepner <[email protected]> >>>> To: <[email protected]> >>>> Date: 06/23/2014 07:42 PM >>>> Subject: Re: How does Accumulo compare to HBase >>>> ------------------------------ >>>> >>>> >>>> >>>> Performance is probably the largest difference between Accumulo and >>>> HBase. >>>> >>>> Accumulo can ingest/scan at a rate of 800K entries/sec/node. >>>> This performance scales well into the hundreds of nodes to deliver >>>> 100M+ entries/sec. >>>> >>>> There are no recent HBase benchmarks and none in the peer-reviewed >>>> literature. >>>> Old data suggests that HBase performance is ~1% of Accumulo performance. >>>> >>>> In short, one can often replace a 20+ node database with >>>> a single node Accumulo database. >>>> >>>> On Tue, Jun 24, 2014 at 01:55:54AM +0800, Jianshi Huang wrote: >>>> > Er... basically I need to explain to my manager why choosing Accumulo, >>>> > instead of HBase. >>>> > >>>> > So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98 >>>> also >>>> > got cell-level security, modeled after Accumulo) >>>> > >>>> > -- >>>> > Jianshi Huang >>>> > >>>> > LinkedIn: jianshi >>>> > Twitter: @jshuang >>>> > Github & Blog: http://huangjs.github.com/ >>>> >>>> >>>> >>> >> >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
