There is no method in the eval func that gets called on the backend before any exec calls. You can keep a flag that tracks whether you have done the initialization so that you only do it the first time.

Alan.

On Mar 2, 2011, at 5:29 AM, Lai Will wrote:

Hello,

I wrote a EvalFunc implementation that


1)      Parses a SQL Query

2) Scans a folder for resource files and creates an index on these files

3) According to certain properties of the SQL Query accesses the corresponding file and creates a Java objects holding relevant the information of the file (for reuse).

4) Does some computation with the SQL Query and the information found in the file

5)      Outputs a transformed SQL Query

Currently I'm doing local tests without Hadoop and the code works fine.

The problem I see, is that right now I initialize my parser in the EvalFunc, so that every time It gets instantiated a new instance of the parser is generated. Ideally only on instance per machine would be created. Even worse right now I create the index and parse the corresponding resource file once per call exec in EvalFunc and therefore do a lot of redundant computation.

Just because I don't know where and how to put this shared computation.
Does anybody have a solution on that?

Best,
Will

Reply via email to