Thanks for you response.
You're definitely right about the separate config file for job specific
parameters.
About the filter configuration, I was planning on extending the Scan
object so that it is able to parse a Configuration (i.e.
scan.setConf(conf)). Startrow, stoprow and filters can be parsed out of
the configuration - only when specified (otherwise default behaviour is
kept). As a side effect, generic frameworks such as the TableMapper
could use this method, so that every job extending it can benefit from
the ability to configure filters at runtime.
Now startrow and stoprow is not that difficult to configure (we could
use properties 'scan.startrow' and 'scan.stoprow' for example). The only
real difficulty is creating a nice way to configure a filter. Ofcouse
this is because there are several implementations
(SingleColumnValueFilter, FilterList), each with it's own specific options.
So I was just figuring out how to provide a clean way to do implement
filter options in configuration.
Michael Segel wrote:
Ferdy,
I don't think you understand.
What you're asking for doesn't make sense.
Your filters could be built dynamically, so you code it once and based on the
parameters passed in, you build a filter and apply it to the scan.
Whether you pass in the parameters in a configuration file or from a GUI
attached to the client code doesn't matter.
Just a clarification ... I've seen some developers do this and its not a good
practice... You want to avoid putting job specific parameters in a hadoop
config file.
Use the config file as a way to pass in cloud specific parameters that you want
to override and use a separate config file to pass in the application specific
command line options or use command line options. (I'm sure someone is going
to argue a counter point to this.)
But getting back to your point. You just need to write some dynamic code for
your filters and then you can pass in your column list to filter on as a
parameter.
HTH
-Mike
Date: Tue, 29 Jun 2010 09:17:55 +0200
From: [email protected]
To: [email protected]
Subject: Re: how to specify filters in configuration
The point is that instead of coding a scan with it's filters, we would
like a way to do this in configuration. Different jobs could be run more
ad hoc.
On 06/28/2010 05:55 PM, Michael Segel wrote:
I'm not sure I understand the question...
Configurations are meant for your application to have additional/changed cloud
configuration at run time.
Scan filters are specific to the job you're running.
As to making your scans more dynamic, you should be able to do this already
within your code.
Date: Mon, 28 Jun 2010 11:28:51 +0200
From: [email protected]
To: [email protected]
Subject: how to specify filters in configuration
Currently it's possible to specify filters on a Scan object. Is there a
way to specify them in configuration instead? So that they aren't hardcoded?
Generic Hbase tools (extending TableMapper) could benefit a lot of such
a configuration. For example, we would like to import/export data with
user specified filters, but we would not want code a specific Tool every
time.
What are the current possibilities?
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1