Re: Filtering on column qualifier

2013-08-22 Thread Marc Reichman
Extending looked like a bit of a boondoggle, because all of the useful
fields in the class are private, not protected. I also ran into another
architectural question, how does one pass a value (a-la constructor) into
one of these classes? If I'm going to use this to filter based on a
threshold, I'd need to pass that threshold in somehow.




On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote:

 There's no way to extend the ColumnQualietyFilter via configuration, but
 it sounds like you are on top of it. You just need to extend the class,
 possibly copy a bit of code, and change the equality check to a compareTo
 after converting the Strings to Doubles.


 On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I have some data stored in Accumulo with some scores stored as column
 qualifiers (there was an older thread about this). I would like to find a
 way to do thresholding when retrieving the data without retrieving it all
 and then manually filtering out items below my threshold.

 I know I can fetch column qualifiers which are exact.

 I've seen the ColumnQualifierFilter, which I assume is what's in play
 when I fetch qualifiers. Is there a reasonable pattern to extend this and
 try to use it as a scan iterator so I can do things like greater than a
 value which will be interpreted as a Double vs. the string equality going
 on now?

 Thanks,
 Marc





Re: Filtering on column qualifier

2013-08-22 Thread David Medinets
Have you thought of writing a filter class that takes some bit of groovy
for execution inside the accept method, depending on how efficient you need
to be and how changeable your constraints are.


On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman 
mreich...@pixelforensics.com wrote:

 Extending looked like a bit of a boondoggle, because all of the useful
 fields in the class are private, not protected. I also ran into another
 architectural question, how does one pass a value (a-la constructor) into
 one of these classes? If I'm going to use this to filter based on a
 threshold, I'd need to pass that threshold in somehow.




 On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote:

 There's no way to extend the ColumnQualietyFilter via configuration, but
 it sounds like you are on top of it. You just need to extend the class,
 possibly copy a bit of code, and change the equality check to a compareTo
 after converting the Strings to Doubles.


 On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I have some data stored in Accumulo with some scores stored as column
 qualifiers (there was an older thread about this). I would like to find a
 way to do thresholding when retrieving the data without retrieving it all
 and then manually filtering out items below my threshold.

 I know I can fetch column qualifiers which are exact.

 I've seen the ColumnQualifierFilter, which I assume is what's in play
 when I fetch qualifiers. Is there a reasonable pattern to extend this and
 try to use it as a scan iterator so I can do things like greater than a
 value which will be interpreted as a Double vs. the string equality going
 on now?

 Thanks,
 Marc






Re: Filtering on column qualifier

2013-08-22 Thread Marc Reichman
I haven't considered that. Would that allow me to specify it in the
client-side code and not worry about spreading JARs around? It is a very
basic need, in my scan iterator loop right now is:

String matchScoreString = key.getColumnQualifier().toString();
Double score = Double.parseDouble(matchScoreString);

if (threshold != null  threshold  score) {
// TODO: figure out if this is possible to do via
data-local scan iterator
continue;
}

What is the pattern for including a groovy snippet for a scan iterator?


On Thu, Aug 22, 2013 at 11:16 AM, David Medinets
david.medin...@gmail.comwrote:

 Have you thought of writing a filter class that takes some bit of groovy
 for execution inside the accept method, depending on how efficient you need
 to be and how changeable your constraints are.


 On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 Extending looked like a bit of a boondoggle, because all of the useful
 fields in the class are private, not protected. I also ran into another
 architectural question, how does one pass a value (a-la constructor) into
 one of these classes? If I'm going to use this to filter based on a
 threshold, I'd need to pass that threshold in somehow.




 On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote:

 There's no way to extend the ColumnQualietyFilter via configuration, but
 it sounds like you are on top of it. You just need to extend the class,
 possibly copy a bit of code, and change the equality check to a compareTo
 after converting the Strings to Doubles.


 On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I have some data stored in Accumulo with some scores stored as column
 qualifiers (there was an older thread about this). I would like to find a
 way to do thresholding when retrieving the data without retrieving it all
 and then manually filtering out items below my threshold.

 I know I can fetch column qualifiers which are exact.

 I've seen the ColumnQualifierFilter, which I assume is what's in play
 when I fetch qualifiers. Is there a reasonable pattern to extend this and
 try to use it as a scan iterator so I can do things like greater than a
 value which will be interpreted as a Double vs. the string equality going
 on now?

 Thanks,
 Marc







Re: Filtering on column qualifier

2013-08-22 Thread David Medinets
The advantage is that you'd only write the iterator once and deploy it to
the cluster. Then the groovy snippet changes its behavior. You'd save
passing the data to your client code, but more work would be done by the
accumulo cluster.


On Thu, Aug 22, 2013 at 12:33 PM, Marc Reichman 
mreich...@pixelforensics.com wrote:

 I haven't considered that. Would that allow me to specify it in the
 client-side code and not worry about spreading JARs around? It is a very
 basic need, in my scan iterator loop right now is:

 String matchScoreString = key.getColumnQualifier().toString();
 Double score = Double.parseDouble(matchScoreString);

 if (threshold != null  threshold  score) {
 // TODO: figure out if this is possible to do via
 data-local scan iterator
 continue;
 }

 What is the pattern for including a groovy snippet for a scan iterator?


 On Thu, Aug 22, 2013 at 11:16 AM, David Medinets david.medin...@gmail.com
  wrote:

 Have you thought of writing a filter class that takes some bit of groovy
 for execution inside the accept method, depending on how efficient you need
 to be and how changeable your constraints are.


 On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 Extending looked like a bit of a boondoggle, because all of the useful
 fields in the class are private, not protected. I also ran into another
 architectural question, how does one pass a value (a-la constructor) into
 one of these classes? If I'm going to use this to filter based on a
 threshold, I'd need to pass that threshold in somehow.




 On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote:

 There's no way to extend the ColumnQualietyFilter via configuration,
 but it sounds like you are on top of it. You just need to extend the class,
 possibly copy a bit of code, and change the equality check to a compareTo
 after converting the Strings to Doubles.


 On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I have some data stored in Accumulo with some scores stored as column
 qualifiers (there was an older thread about this). I would like to find a
 way to do thresholding when retrieving the data without retrieving it all
 and then manually filtering out items below my threshold.

 I know I can fetch column qualifiers which are exact.

 I've seen the ColumnQualifierFilter, which I assume is what's in play
 when I fetch qualifiers. Is there a reasonable pattern to extend this and
 try to use it as a scan iterator so I can do things like greater than a
 value which will be interpreted as a Double vs. the string equality going
 on now?

 Thanks,
 Marc








Re: Filtering on column qualifier

2013-08-22 Thread Marc Reichman
I apologize for my dense-ness, but could you walk me through this? Is there
some form of existing scan iterator which interprets groovy? Or is this
something I would build?


On Thu, Aug 22, 2013 at 12:10 PM, David Medinets
david.medin...@gmail.comwrote:

 The advantage is that you'd only write the iterator once and deploy it to
 the cluster. Then the groovy snippet changes its behavior. You'd save
 passing the data to your client code, but more work would be done by the
 accumulo cluster.


 On Thu, Aug 22, 2013 at 12:33 PM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I haven't considered that. Would that allow me to specify it in the
 client-side code and not worry about spreading JARs around? It is a very
 basic need, in my scan iterator loop right now is:

 String matchScoreString = key.getColumnQualifier().toString();
 Double score = Double.parseDouble(matchScoreString);

 if (threshold != null  threshold  score) {
 // TODO: figure out if this is possible to do via
 data-local scan iterator
 continue;
 }

 What is the pattern for including a groovy snippet for a scan iterator?


 On Thu, Aug 22, 2013 at 11:16 AM, David Medinets 
 david.medin...@gmail.com wrote:

 Have you thought of writing a filter class that takes some bit of groovy
 for execution inside the accept method, depending on how efficient you need
 to be and how changeable your constraints are.


 On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 Extending looked like a bit of a boondoggle, because all of the useful
 fields in the class are private, not protected. I also ran into another
 architectural question, how does one pass a value (a-la constructor) into
 one of these classes? If I'm going to use this to filter based on a
 threshold, I'd need to pass that threshold in somehow.




 On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote:

 There's no way to extend the ColumnQualietyFilter via configuration,
 but it sounds like you are on top of it. You just need to extend the 
 class,
 possibly copy a bit of code, and change the equality check to a compareTo
 after converting the Strings to Doubles.


 On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I have some data stored in Accumulo with some scores stored as column
 qualifiers (there was an older thread about this). I would like to find a
 way to do thresholding when retrieving the data without retrieving it all
 and then manually filtering out items below my threshold.

 I know I can fetch column qualifiers which are exact.

 I've seen the ColumnQualifierFilter, which I assume is what's in play
 when I fetch qualifiers. Is there a reasonable pattern to extend this and
 try to use it as a scan iterator so I can do things like greater than a
 value which will be interpreted as a Double vs. the string equality going
 on now?

 Thanks,
 Marc









Re: Filtering on column qualifier

2013-08-22 Thread John Stoneham
We've done similar with Clojure as a lark, passing in custom map or filter
functions. But we've never deployed it because of the security risk of
running arbitrary user code on tservers, unsandboxed.


On Thu, Aug 22, 2013 at 1:10 PM, David Medinets david.medin...@gmail.comwrote:

 The advantage is that you'd only write the iterator once and deploy it to
 the cluster. Then the groovy snippet changes its behavior. You'd save
 passing the data to your client code, but more work would be done by the
 accumulo cluster.


 On Thu, Aug 22, 2013 at 12:33 PM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I haven't considered that. Would that allow me to specify it in the
 client-side code and not worry about spreading JARs around? It is a very
 basic need, in my scan iterator loop right now is:

 String matchScoreString = key.getColumnQualifier().toString();
 Double score = Double.parseDouble(matchScoreString);

 if (threshold != null  threshold  score) {
 // TODO: figure out if this is possible to do via
 data-local scan iterator
 continue;
 }

 What is the pattern for including a groovy snippet for a scan iterator?


 On Thu, Aug 22, 2013 at 11:16 AM, David Medinets 
 david.medin...@gmail.com wrote:

 Have you thought of writing a filter class that takes some bit of groovy
 for execution inside the accept method, depending on how efficient you need
 to be and how changeable your constraints are.


 On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 Extending looked like a bit of a boondoggle, because all of the useful
 fields in the class are private, not protected. I also ran into another
 architectural question, how does one pass a value (a-la constructor) into
 one of these classes? If I'm going to use this to filter based on a
 threshold, I'd need to pass that threshold in somehow.




 On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote:

 There's no way to extend the ColumnQualietyFilter via configuration,
 but it sounds like you are on top of it. You just need to extend the 
 class,
 possibly copy a bit of code, and change the equality check to a compareTo
 after converting the Strings to Doubles.


 On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I have some data stored in Accumulo with some scores stored as column
 qualifiers (there was an older thread about this). I would like to find a
 way to do thresholding when retrieving the data without retrieving it all
 and then manually filtering out items below my threshold.

 I know I can fetch column qualifiers which are exact.

 I've seen the ColumnQualifierFilter, which I assume is what's in play
 when I fetch qualifiers. Is there a reasonable pattern to extend this and
 try to use it as a scan iterator so I can do things like greater than a
 value which will be interpreted as a Double vs. the string equality going
 on now?

 Thanks,
 Marc









-- 
John Stoneham
ly...@lyrically.net


Re: Filtering on column qualifier

2013-08-22 Thread John Vines
You don't need to write a special language to make it configurable. You can
pass configurations to iterators (and Filters) via IteratorSettings, which
then get tossed into the options on seek. You may need to extend seek to
get that information out though.


On Thu, Aug 22, 2013 at 3:09 PM, John Stoneham ly...@lyrically.net wrote:

 We've done similar with Clojure as a lark, passing in custom map or filter
 functions. But we've never deployed it because of the security risk of
 running arbitrary user code on tservers, unsandboxed.


 On Thu, Aug 22, 2013 at 1:10 PM, David Medinets 
 david.medin...@gmail.comwrote:

 The advantage is that you'd only write the iterator once and deploy it to
 the cluster. Then the groovy snippet changes its behavior. You'd save
 passing the data to your client code, but more work would be done by the
 accumulo cluster.


 On Thu, Aug 22, 2013 at 12:33 PM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I haven't considered that. Would that allow me to specify it in the
 client-side code and not worry about spreading JARs around? It is a very
 basic need, in my scan iterator loop right now is:

 String matchScoreString =
 key.getColumnQualifier().toString();
 Double score = Double.parseDouble(matchScoreString);

 if (threshold != null  threshold  score) {
 // TODO: figure out if this is possible to do via
 data-local scan iterator
 continue;
 }

 What is the pattern for including a groovy snippet for a scan iterator?


 On Thu, Aug 22, 2013 at 11:16 AM, David Medinets 
 david.medin...@gmail.com wrote:

 Have you thought of writing a filter class that takes some bit of
 groovy for execution inside the accept method, depending on how efficient
 you need to be and how changeable your constraints are.


 On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 Extending looked like a bit of a boondoggle, because all of the useful
 fields in the class are private, not protected. I also ran into another
 architectural question, how does one pass a value (a-la constructor) into
 one of these classes? If I'm going to use this to filter based on a
 threshold, I'd need to pass that threshold in somehow.




 On Wed, Aug 21, 2013 at 9:49 AM, John Vines vi...@apache.org wrote:

 There's no way to extend the ColumnQualietyFilter via configuration,
 but it sounds like you are on top of it. You just need to extend the 
 class,
 possibly copy a bit of code, and change the equality check to a compareTo
 after converting the Strings to Doubles.


 On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I have some data stored in Accumulo with some scores stored as
 column qualifiers (there was an older thread about this). I would like 
 to
 find a way to do thresholding when retrieving the data without 
 retrieving
 it all and then manually filtering out items below my threshold.

 I know I can fetch column qualifiers which are exact.

 I've seen the ColumnQualifierFilter, which I assume is what's in
 play when I fetch qualifiers. Is there a reasonable pattern to extend 
 this
 and try to use it as a scan iterator so I can do things like greater 
 than
 a value which will be interpreted as a Double vs. the string equality 
 going
 on now?

 Thanks,
 Marc









 --
 John Stoneham
 ly...@lyrically.net



Filtering on column qualifier

2013-08-21 Thread Marc Reichman
I have some data stored in Accumulo with some scores stored as column
qualifiers (there was an older thread about this). I would like to find a
way to do thresholding when retrieving the data without retrieving it all
and then manually filtering out items below my threshold.

I know I can fetch column qualifiers which are exact.

I've seen the ColumnQualifierFilter, which I assume is what's in play when
I fetch qualifiers. Is there a reasonable pattern to extend this and try to
use it as a scan iterator so I can do things like greater than a value
which will be interpreted as a Double vs. the string equality going on now?

Thanks,
Marc


Re: Filtering on column qualifier

2013-08-21 Thread John Vines
There's no way to extend the ColumnQualietyFilter via configuration, but it
sounds like you are on top of it. You just need to extend the class,
possibly copy a bit of code, and change the equality check to a compareTo
after converting the Strings to Doubles.


On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman 
mreich...@pixelforensics.com wrote:

 I have some data stored in Accumulo with some scores stored as column
 qualifiers (there was an older thread about this). I would like to find a
 way to do thresholding when retrieving the data without retrieving it all
 and then manually filtering out items below my threshold.

 I know I can fetch column qualifiers which are exact.

 I've seen the ColumnQualifierFilter, which I assume is what's in play when
 I fetch qualifiers. Is there a reasonable pattern to extend this and try to
 use it as a scan iterator so I can do things like greater than a value
 which will be interpreted as a Double vs. the string equality going on now?

 Thanks,
 Marc



RE: Filtering on column qualifier

2013-08-21 Thread Slater, David M.
Would this require extending the BatchScanner as well to make use of the 
extended ColumnQualifierFilter, or could this be done with an unmodified 
BatchScanner?

From: John Vines [mailto:vi...@apache.org]
Sent: Wednesday, August 21, 2013 10:49 AM
To: user@accumulo.apache.org
Subject: Re: Filtering on column qualifier

There's no way to extend the ColumnQualietyFilter via configuration, but it 
sounds like you are on top of it. You just need to extend the class, possibly 
copy a bit of code, and change the equality check to a compareTo after 
converting the Strings to Doubles.

On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman 
mreich...@pixelforensics.commailto:mreich...@pixelforensics.com wrote:
I have some data stored in Accumulo with some scores stored as column 
qualifiers (there was an older thread about this). I would like to find a way 
to do thresholding when retrieving the data without retrieving it all and then 
manually filtering out items below my threshold.

I know I can fetch column qualifiers which are exact.

I've seen the ColumnQualifierFilter, which I assume is what's in play when I 
fetch qualifiers. Is there a reasonable pattern to extend this and try to use 
it as a scan iterator so I can do things like greater than a value which will 
be interpreted as a Double vs. the string equality going on now?

Thanks,
Marc



Re: Filtering on column qualifier

2013-08-21 Thread John Vines
Nope, it's just a custom Filter, which is a type of iterator. You can
attach iterators to run on a scanner, you just need to make sure you deploy
the jar with the custom iterator to all of the tservers.


On Wed, Aug 21, 2013 at 7:58 PM, Slater, David M.
david.sla...@jhuapl.eduwrote:

 Would this require extending the BatchScanner as well to make use of the
 extended ColumnQualifierFilter, or could this be done with an unmodified
 BatchScanner?

 ** **

 *From:* John Vines [mailto:vi...@apache.org]
 *Sent:* Wednesday, August 21, 2013 10:49 AM
 *To:* user@accumulo.apache.org
 *Subject:* Re: Filtering on column qualifier

 ** **

 There's no way to extend the ColumnQualietyFilter via configuration, but
 it sounds like you are on top of it. You just need to extend the class,
 possibly copy a bit of code, and change the equality check to a compareTo
 after converting the Strings to Doubles.

 ** **

 On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 I have some data stored in Accumulo with some scores stored as column
 qualifiers (there was an older thread about this). I would like to find a
 way to do thresholding when retrieving the data without retrieving it all
 and then manually filtering out items below my threshold.

 ** **

 I know I can fetch column qualifiers which are exact.

 ** **

 I've seen the ColumnQualifierFilter, which I assume is what's in play when
 I fetch qualifiers. Is there a reasonable pattern to extend this and try to
 use it as a scan iterator so I can do things like greater than a value
 which will be interpreted as a Double vs. the string equality going on now?
 

 ** **

 Thanks,

 Marc

 ** **