Ferdy,

I think you still misunderstand...

The basic scanner API can already do this for you.

Its a question about the class you have that is building the scanner and filter 
objects.

It doesn't make sense to put this data in to the configuration object, unless 
you're specifically talking about your input going in to a map/reduce job where 
the scan object is happening within each mapper instance.

Then you'll have a different issue.

> Date: Wed, 30 Jun 2010 14:53:03 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: how to specify filters in configuration
> 
> Thanks for you response.
> 
> You're definitely right about the separate config file for job specific 
> parameters.
> 
> About the filter configuration, I was planning on extending the Scan 
> object so that it is able to parse a Configuration (i.e. 
> scan.setConf(conf)).  Startrow, stoprow and filters can be parsed out of 
> the configuration - only when specified (otherwise default behaviour is 
> kept). As a side effect, generic frameworks such as the TableMapper 
> could use this method, so that every job extending it can benefit from 
> the ability to configure filters at runtime.
> 
> Now startrow and stoprow is not that difficult to configure (we could 
> use properties 'scan.startrow' and 'scan.stoprow' for example). The only 
> real difficulty is creating a nice way to configure a filter. Ofcouse 
> this is because there are several implementations 
> (SingleColumnValueFilter, FilterList), each with it's own specific options.
> 
> So I was just figuring out how to provide a clean way to do implement 
> filter options in configuration.
> 
> Michael Segel wrote:
> > Ferdy,
> >
> > I don't think you understand.
> >
> > What you're asking for doesn't make sense.
> >
> > Your filters could be built dynamically, so you code it once and based on 
> > the parameters passed in, you build a filter and apply it to the scan.
> > Whether you pass in the parameters in a configuration file or from a GUI 
> > attached to the client code doesn't matter.
> >
> > Just a clarification ... I've seen some developers do this and its not a 
> > good practice... You want to avoid putting job specific parameters in a 
> > hadoop config file.
> > Use the config file as a way to pass in cloud specific parameters that you 
> > want to override and use a separate config file to pass in the application 
> > specific command line options or use command line options.  (I'm sure 
> > someone is going to argue a counter point to this.)
> >
> > But getting back to your point. You just need to write some dynamic code 
> > for your filters and then you can pass in your column list to filter on as 
> > a parameter.
> >
> > HTH
> >
> > -Mike
> >
> >   
> >> Date: Tue, 29 Jun 2010 09:17:55 +0200
> >> From: [email protected]
> >> To: [email protected]
> >> Subject: Re: how to specify filters in configuration
> >>
> >> The point is that instead of coding a scan with it's filters, we would 
> >> like a way to do this in configuration. Different jobs could be run more 
> >> ad hoc.
> >>
> >> On 06/28/2010 05:55 PM, Michael Segel wrote:
> >>     
> >>> I'm not sure I understand the question...
> >>>
> >>> Configurations are meant for your application to have additional/changed 
> >>> cloud configuration at run time.
> >>> Scan filters are specific to the job you're running.
> >>>
> >>> As to making your scans more dynamic, you should be able to do this 
> >>> already within your code.
> >>>
> >>>    
> >>>       
> >>>> Date: Mon, 28 Jun 2010 11:28:51 +0200
> >>>> From: [email protected]
> >>>> To: [email protected]
> >>>> Subject: how to specify filters in configuration
> >>>>
> >>>> Currently it's possible to specify filters on a Scan object. Is there a
> >>>> way to specify them in configuration instead? So that they aren't 
> >>>> hardcoded?
> >>>>
> >>>> Generic Hbase tools (extending TableMapper) could benefit a lot of such
> >>>> a configuration. For example, we would like to import/export data with
> >>>> user specified filters, but we would not want code a specific Tool every
> >>>> time.
> >>>>
> >>>> What are the current possibilities?
> >>>>
> >>>>
> >>>>      
> >>>>         
> >>>                                           
> >>> _________________________________________________________________
> >>> The New Busy is not the too busy. Combine all your e-mail accounts with 
> >>> Hotmail.
> >>> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
> >>>    
> >>>       
> >                                       
> > _________________________________________________________________
> > Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
> > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
> >   
                                          
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

Reply via email to