Definitely interesting possibilities thank you! Ultimately: I think the coin() doesn't decrease our traversal time, just gets us a more random sample - if we're looking to get 10k results, with a 50% random we'd end up traversing ~20k edges. Within those edges it'd be random what we get, but the runtime shouldn't be much better
I'm not sure about shuffle - that may do it, but I believe that the end result of shuffling your results is that the sample you've taken gets shuffled - not that shuffle the set of documents you go through to get the result. By that I mean if you had a set [0,1,2,3,4,5,6,7,8,9], if you selected 4 records you'd get [0,1,2,3]. I think with shuffle you'd get something like [2,3,0,1], not [6,2,8,3]. I really like the last possibility - we get the size of the edge counts, and select either a set percentage or a set number of random ints from the range of [0-size()], then use that result to get what we need. I'm not 100% sure this is easy to implement through the java API but we'll see! Still, a SQL query with a randomized selection would be a great thing to have. Ultimately what we need to do is a weighted random - all our edges have weights, and I need to traverse the edges in a weighted random fashion. If we're able to implement this in a server side function, we'd be in a good spot for our query run time. On Monday, September 7, 2015 at 2:26:02 AM UTC-7, MV-dev1 wrote: > > OK - first, take this information with a grain of salt because I'm new to > OrientDb and haven't actually rolled out a successful release but..... > > *Thought #1: Get 'Groovy' with it....* > > I've been reading all the ways you can write server functions (store > procedures) and one is Groovy which seems to relate to or also be called > 'Gremlin' or 'TinkerPop' or 'Blueprints'. > > https://github.com/tinkerpop/blueprints/wiki/OrientDB-Implementation - > "Blueprints is the default Java API for OrientDB, so you don’t need to > include additional modules. For more information look at OrientDB > Blueprints API." > > I ran across this 'Coin Step' yesterday when scanning the TinkerPop3 > documentation. > > From > http://tinkerpop.incubator.apache.org/docs/3.0.0-incubating/#coin-step > Coin Step > > To randomly filter out a traverser, use the coin()-step (*filter*). The > provided double argument biases the "coin toss." > > e.g. gremlin> g.V().coin(0.5) > Order Step > > When the objects of the traversal stream need to be sorted, order()-step ( > *map*) can be leveraged. > > I noticed 'shuffle' -- "Randomizing the order of the traversers at a > particular point in the traversal is possible with Order.shuffle." > > e.g. gremlin> g.V().hasLabel('person').order().by(shuffle) > Groovy statements do work as server functions so maybe this is something > that you could use? > > References. > > > > *Thought #2: 'JavaScript'*You can always write a store proc that > generates a random number 0 to size() and steps ahead that number of steps. > > > *For anyone that knows, please correct me.* > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
