Re: Combining output of multiple filters/iterators
You cannot feasibly hold onto some intermediate batch of nodes in memory. You're invalidating the general premise of how Accumulo iterators are meant to work in doing this. Further, an Iterator can _only_ safely operate within one row of a table. Two adjacent rows may be located on two different physical machines. Would suggest you read through this presentation and try to take some time to understand why they did it this way: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf. You might also be able to take something from Shana Hutchison's work on Graphulo: https://arxiv.org/abs/1606.07085 On 3/29/19 2:20 PM, Enas Alkawasmi wrote: Thank you for this suggestion. i have one question, c I pass options to the new source that are from the result of the current iterator? . the new iterator need to get the parent nodes from the the current one how can enforce the iterator to wait for the result form its preceding iterator? -- Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Re: Combining output of multiple filters/iterators
Thank you for this suggestion. i have one question, c I pass options to the new source that are from the result of the current iterator? . the new iterator need to get the parent nodes from the the current one how can enforce the iterator to wait for the result form its preceding iterator? -- Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Re: Combining output of multiple filters/iterators
You may be able to use the deepCopy() method. This allows an iterator to create multiple copies of its source. Then it can seek each copy separately. Deep copies should be created in the init method. The following is an example of this. https://github.com/apache/accumulo/blob/rel/1.9.2/core/src/main/java/org/apache/accumulo/core/iterators/user/IntersectingIterator.java#L502 On Thu, Mar 28, 2019 at 9:44 AM Enas Alkawasmi wrote: > > I have the following problem model: > I have a graph stored in accumulo and I want to design iterators that > retrieve all nodes siblings of a given node. > I am thinking of a nested iterators(filters) that filter out the graph table > based on the node_ID and then each filter pass the resulted nodes to the > iterator comes after it as condition. I need to accumulate all the extracted > nodes and return them back to the client. I mean I need to have all the > processing to be done on the server side. My question is: how can make one > filter gets its options from the iterator executed before it. Also I want > each filter to be applied on the same original data-set but the outputs of > the iterators united after they all done. In other words I need the > iterators to call each others in nested way and they pass options from their > initial output to be used in the next iterator process but the result of > each iterator I need to keep it a side without affecting the original table > then I combine them at the end. > I read thoroughly in the map reduce examples with no luck. and I also read > about iterators and filers and I came to know that i cannot directly control > their execution but I can use priorities to control them. I still have the > challenge of interdependence between options. I need help in coding that in > java if some one can guide me to achieve what mentioned here: > " So it means if I set an Iterator and creates a buffer in memory with in > the iterator it will be created on each tablet server, right? This is the > map like function but how can I return and combine all the buffers client > side (the reduce)? Does Iterator has some functionality to make this process > easy? Further it would be a great help if you can provide some sample code > for the same or you have some similar implementation using iterators or > MapReduce" > I quoted from: > https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo > > > > -- > Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Re: Combining output of multiple filters/iterators
What do you mean by Christopher Tubbs-2 wrote > single iterator as a composition > of other, smaller components. what type of components are those? can you provide template code structure that I can follow in my code?. Do you think mapreduce can fit to my problem? -- Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Re: Combining output of multiple filters/iterators
You could set up your iterators to communicate with each other when initialized via the "source" parameter to the init() method. However, because your iterators seem to be so dependent upon one another, it might be better for you to implement this as a single iterator. But, you can implement this single iterator as a composition of other, smaller components, if necessary. A single iterator would probably perform better anyway. On Thu, Mar 28, 2019 at 9:44 AM Enas Alkawasmi wrote: > > I have the following problem model: > I have a graph stored in accumulo and I want to design iterators that > retrieve all nodes siblings of a given node. > I am thinking of a nested iterators(filters) that filter out the graph table > based on the node_ID and then each filter pass the resulted nodes to the > iterator comes after it as condition. I need to accumulate all the extracted > nodes and return them back to the client. I mean I need to have all the > processing to be done on the server side. My question is: how can make one > filter gets its options from the iterator executed before it. Also I want > each filter to be applied on the same original data-set but the outputs of > the iterators united after they all done. In other words I need the > iterators to call each others in nested way and they pass options from their > initial output to be used in the next iterator process but the result of > each iterator I need to keep it a side without affecting the original table > then I combine them at the end. > I read thoroughly in the map reduce examples with no luck. and I also read > about iterators and filers and I came to know that i cannot directly control > their execution but I can use priorities to control them. I still have the > challenge of interdependence between options. I need help in coding that in > java if some one can guide me to achieve what mentioned here: > " So it means if I set an Iterator and creates a buffer in memory with in > the iterator it will be created on each tablet server, right? This is the > map like function but how can I return and combine all the buffers client > side (the reduce)? Does Iterator has some functionality to make this process > easy? Further it would be a great help if you can provide some sample code > for the same or you have some similar implementation using iterators or > MapReduce" > I quoted from: > https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo > > > > -- > Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Combining output of multiple filters/iterators
I have the following problem model: I have a graph stored in accumulo and I want to design iterators that retrieve all nodes siblings of a given node. I am thinking of a nested iterators(filters) that filter out the graph table based on the node_ID and then each filter pass the resulted nodes to the iterator comes after it as condition. I need to accumulate all the extracted nodes and return them back to the client. I mean I need to have all the processing to be done on the server side. My question is: how can make one filter gets its options from the iterator executed before it. Also I want each filter to be applied on the same original data-set but the outputs of the iterators united after they all done. In other words I need the iterators to call each others in nested way and they pass options from their initial output to be used in the next iterator process but the result of each iterator I need to keep it a side without affecting the original table then I combine them at the end. I read thoroughly in the map reduce examples with no luck. and I also read about iterators and filers and I came to know that i cannot directly control their execution but I can use priorities to control them. I still have the challenge of interdependence between options. I need help in coding that in java if some one can guide me to achieve what mentioned here: " So it means if I set an Iterator and creates a buffer in memory with in the iterator it will be created on each tablet server, right? This is the map like function but how can I return and combine all the buffers client side (the reduce)? Does Iterator has some functionality to make this process easy? Further it would be a great help if you can provide some sample code for the same or you have some similar implementation using iterators or MapReduce" I quoted from: https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo -- Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html