Re: Filtering on column qualifier
Extending looked like a bit of a boondoggle, because all of the useful fields in the class are private, not protected. I also ran into another architectural question, how does one pass a value (a-la constructor) into one of these classes? If I'm going to use this to filter based on a threshold, I'd need to pass that threshold in somehow. On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote: There's no way to extend the ColumnQualietyFilter via configuration, but it sounds like you are on top of it. You just need to extend the class, possibly copy a bit of code, and change the equality check to a compareTo after converting the Strings to Doubles. On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman mreich...@pixelforensics.com wrote: I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. I know I can fetch column qualifiers which are exact. I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? Thanks, Marc
Re: Filtering on column qualifier
Have you thought of writing a filter class that takes some bit of groovy for execution inside the accept method, depending on how efficient you need to be and how changeable your constraints are. On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman mreich...@pixelforensics.com wrote: Extending looked like a bit of a boondoggle, because all of the useful fields in the class are private, not protected. I also ran into another architectural question, how does one pass a value (a-la constructor) into one of these classes? If I'm going to use this to filter based on a threshold, I'd need to pass that threshold in somehow. On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote: There's no way to extend the ColumnQualietyFilter via configuration, but it sounds like you are on top of it. You just need to extend the class, possibly copy a bit of code, and change the equality check to a compareTo after converting the Strings to Doubles. On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman mreich...@pixelforensics.com wrote: I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. I know I can fetch column qualifiers which are exact. I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? Thanks, Marc
Re: Filtering on column qualifier
I haven't considered that. Would that allow me to specify it in the client-side code and not worry about spreading JARs around? It is a very basic need, in my scan iterator loop right now is: String matchScoreString = key.getColumnQualifier().toString(); Double score = Double.parseDouble(matchScoreString); if (threshold != null threshold score) { // TODO: figure out if this is possible to do via data-local scan iterator continue; } What is the pattern for including a groovy snippet for a scan iterator? On Thu, Aug 22, 2013 at 11:16 AM, David Medinets david.medin...@gmail.comwrote: Have you thought of writing a filter class that takes some bit of groovy for execution inside the accept method, depending on how efficient you need to be and how changeable your constraints are. On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman mreich...@pixelforensics.com wrote: Extending looked like a bit of a boondoggle, because all of the useful fields in the class are private, not protected. I also ran into another architectural question, how does one pass a value (a-la constructor) into one of these classes? If I'm going to use this to filter based on a threshold, I'd need to pass that threshold in somehow. On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote: There's no way to extend the ColumnQualietyFilter via configuration, but it sounds like you are on top of it. You just need to extend the class, possibly copy a bit of code, and change the equality check to a compareTo after converting the Strings to Doubles. On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman mreich...@pixelforensics.com wrote: I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. I know I can fetch column qualifiers which are exact. I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? Thanks, Marc
Re: Filtering on column qualifier
The advantage is that you'd only write the iterator once and deploy it to the cluster. Then the groovy snippet changes its behavior. You'd save passing the data to your client code, but more work would be done by the accumulo cluster. On Thu, Aug 22, 2013 at 12:33 PM, Marc Reichman mreich...@pixelforensics.com wrote: I haven't considered that. Would that allow me to specify it in the client-side code and not worry about spreading JARs around? It is a very basic need, in my scan iterator loop right now is: String matchScoreString = key.getColumnQualifier().toString(); Double score = Double.parseDouble(matchScoreString); if (threshold != null threshold score) { // TODO: figure out if this is possible to do via data-local scan iterator continue; } What is the pattern for including a groovy snippet for a scan iterator? On Thu, Aug 22, 2013 at 11:16 AM, David Medinets david.medin...@gmail.com wrote: Have you thought of writing a filter class that takes some bit of groovy for execution inside the accept method, depending on how efficient you need to be and how changeable your constraints are. On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman mreich...@pixelforensics.com wrote: Extending looked like a bit of a boondoggle, because all of the useful fields in the class are private, not protected. I also ran into another architectural question, how does one pass a value (a-la constructor) into one of these classes? If I'm going to use this to filter based on a threshold, I'd need to pass that threshold in somehow. On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote: There's no way to extend the ColumnQualietyFilter via configuration, but it sounds like you are on top of it. You just need to extend the class, possibly copy a bit of code, and change the equality check to a compareTo after converting the Strings to Doubles. On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman mreich...@pixelforensics.com wrote: I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. I know I can fetch column qualifiers which are exact. I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? Thanks, Marc
Re: Filtering on column qualifier
I apologize for my dense-ness, but could you walk me through this? Is there some form of existing scan iterator which interprets groovy? Or is this something I would build? On Thu, Aug 22, 2013 at 12:10 PM, David Medinets david.medin...@gmail.comwrote: The advantage is that you'd only write the iterator once and deploy it to the cluster. Then the groovy snippet changes its behavior. You'd save passing the data to your client code, but more work would be done by the accumulo cluster. On Thu, Aug 22, 2013 at 12:33 PM, Marc Reichman mreich...@pixelforensics.com wrote: I haven't considered that. Would that allow me to specify it in the client-side code and not worry about spreading JARs around? It is a very basic need, in my scan iterator loop right now is: String matchScoreString = key.getColumnQualifier().toString(); Double score = Double.parseDouble(matchScoreString); if (threshold != null threshold score) { // TODO: figure out if this is possible to do via data-local scan iterator continue; } What is the pattern for including a groovy snippet for a scan iterator? On Thu, Aug 22, 2013 at 11:16 AM, David Medinets david.medin...@gmail.com wrote: Have you thought of writing a filter class that takes some bit of groovy for execution inside the accept method, depending on how efficient you need to be and how changeable your constraints are. On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman mreich...@pixelforensics.com wrote: Extending looked like a bit of a boondoggle, because all of the useful fields in the class are private, not protected. I also ran into another architectural question, how does one pass a value (a-la constructor) into one of these classes? If I'm going to use this to filter based on a threshold, I'd need to pass that threshold in somehow. On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote: There's no way to extend the ColumnQualietyFilter via configuration, but it sounds like you are on top of it. You just need to extend the class, possibly copy a bit of code, and change the equality check to a compareTo after converting the Strings to Doubles. On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman mreich...@pixelforensics.com wrote: I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. I know I can fetch column qualifiers which are exact. I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? Thanks, Marc
Re: Filtering on column qualifier
We've done similar with Clojure as a lark, passing in custom map or filter functions. But we've never deployed it because of the security risk of running arbitrary user code on tservers, unsandboxed. On Thu, Aug 22, 2013 at 1:10 PM, David Medinets david.medin...@gmail.comwrote: The advantage is that you'd only write the iterator once and deploy it to the cluster. Then the groovy snippet changes its behavior. You'd save passing the data to your client code, but more work would be done by the accumulo cluster. On Thu, Aug 22, 2013 at 12:33 PM, Marc Reichman mreich...@pixelforensics.com wrote: I haven't considered that. Would that allow me to specify it in the client-side code and not worry about spreading JARs around? It is a very basic need, in my scan iterator loop right now is: String matchScoreString = key.getColumnQualifier().toString(); Double score = Double.parseDouble(matchScoreString); if (threshold != null threshold score) { // TODO: figure out if this is possible to do via data-local scan iterator continue; } What is the pattern for including a groovy snippet for a scan iterator? On Thu, Aug 22, 2013 at 11:16 AM, David Medinets david.medin...@gmail.com wrote: Have you thought of writing a filter class that takes some bit of groovy for execution inside the accept method, depending on how efficient you need to be and how changeable your constraints are. On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman mreich...@pixelforensics.com wrote: Extending looked like a bit of a boondoggle, because all of the useful fields in the class are private, not protected. I also ran into another architectural question, how does one pass a value (a-la constructor) into one of these classes? If I'm going to use this to filter based on a threshold, I'd need to pass that threshold in somehow. On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote: There's no way to extend the ColumnQualietyFilter via configuration, but it sounds like you are on top of it. You just need to extend the class, possibly copy a bit of code, and change the equality check to a compareTo after converting the Strings to Doubles. On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman mreich...@pixelforensics.com wrote: I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. I know I can fetch column qualifiers which are exact. I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? Thanks, Marc -- John Stoneham ly...@lyrically.net
Re: Filtering on column qualifier
You don't need to write a special language to make it configurable. You can pass configurations to iterators (and Filters) via IteratorSettings, which then get tossed into the options on seek. You may need to extend seek to get that information out though. On Thu, Aug 22, 2013 at 3:09 PM, John Stoneham ly...@lyrically.net wrote: We've done similar with Clojure as a lark, passing in custom map or filter functions. But we've never deployed it because of the security risk of running arbitrary user code on tservers, unsandboxed. On Thu, Aug 22, 2013 at 1:10 PM, David Medinets david.medin...@gmail.comwrote: The advantage is that you'd only write the iterator once and deploy it to the cluster. Then the groovy snippet changes its behavior. You'd save passing the data to your client code, but more work would be done by the accumulo cluster. On Thu, Aug 22, 2013 at 12:33 PM, Marc Reichman mreich...@pixelforensics.com wrote: I haven't considered that. Would that allow me to specify it in the client-side code and not worry about spreading JARs around? It is a very basic need, in my scan iterator loop right now is: String matchScoreString = key.getColumnQualifier().toString(); Double score = Double.parseDouble(matchScoreString); if (threshold != null threshold score) { // TODO: figure out if this is possible to do via data-local scan iterator continue; } What is the pattern for including a groovy snippet for a scan iterator? On Thu, Aug 22, 2013 at 11:16 AM, David Medinets david.medin...@gmail.com wrote: Have you thought of writing a filter class that takes some bit of groovy for execution inside the accept method, depending on how efficient you need to be and how changeable your constraints are. On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman mreich...@pixelforensics.com wrote: Extending looked like a bit of a boondoggle, because all of the useful fields in the class are private, not protected. I also ran into another architectural question, how does one pass a value (a-la constructor) into one of these classes? If I'm going to use this to filter based on a threshold, I'd need to pass that threshold in somehow. On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote: There's no way to extend the ColumnQualietyFilter via configuration, but it sounds like you are on top of it. You just need to extend the class, possibly copy a bit of code, and change the equality check to a compareTo after converting the Strings to Doubles. On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman mreich...@pixelforensics.com wrote: I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. I know I can fetch column qualifiers which are exact. I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? Thanks, Marc -- John Stoneham ly...@lyrically.net
Filtering on column qualifier
I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. I know I can fetch column qualifiers which are exact. I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? Thanks, Marc
Re: Filtering on column qualifier
There's no way to extend the ColumnQualietyFilter via configuration, but it sounds like you are on top of it. You just need to extend the class, possibly copy a bit of code, and change the equality check to a compareTo after converting the Strings to Doubles. On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman mreich...@pixelforensics.com wrote: I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. I know I can fetch column qualifiers which are exact. I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? Thanks, Marc
RE: Filtering on column qualifier
Would this require extending the BatchScanner as well to make use of the extended ColumnQualifierFilter, or could this be done with an unmodified BatchScanner? From: John Vines [mailto:vi...@apache.org] Sent: Wednesday, August 21, 2013 10:49 AM To: user@accumulo.apache.org Subject: Re: Filtering on column qualifier There's no way to extend the ColumnQualietyFilter via configuration, but it sounds like you are on top of it. You just need to extend the class, possibly copy a bit of code, and change the equality check to a compareTo after converting the Strings to Doubles. On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman mreich...@pixelforensics.commailto:mreich...@pixelforensics.com wrote: I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. I know I can fetch column qualifiers which are exact. I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? Thanks, Marc
Re: Filtering on column qualifier
Nope, it's just a custom Filter, which is a type of iterator. You can attach iterators to run on a scanner, you just need to make sure you deploy the jar with the custom iterator to all of the tservers. On Wed, Aug 21, 2013 at 7:58 PM, Slater, David M. david.sla...@jhuapl.eduwrote: Would this require extending the BatchScanner as well to make use of the extended ColumnQualifierFilter, or could this be done with an unmodified BatchScanner? ** ** *From:* John Vines [mailto:vi...@apache.org] *Sent:* Wednesday, August 21, 2013 10:49 AM *To:* user@accumulo.apache.org *Subject:* Re: Filtering on column qualifier ** ** There's no way to extend the ColumnQualietyFilter via configuration, but it sounds like you are on top of it. You just need to extend the class, possibly copy a bit of code, and change the equality check to a compareTo after converting the Strings to Doubles. ** ** On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman mreich...@pixelforensics.com wrote: I have some data stored in Accumulo with some scores stored as column qualifiers (there was an older thread about this). I would like to find a way to do thresholding when retrieving the data without retrieving it all and then manually filtering out items below my threshold. ** ** I know I can fetch column qualifiers which are exact. ** ** I've seen the ColumnQualifierFilter, which I assume is what's in play when I fetch qualifiers. Is there a reasonable pattern to extend this and try to use it as a scan iterator so I can do things like greater than a value which will be interpreted as a Double vs. the string equality going on now? ** ** Thanks, Marc ** **