There is no method in the eval func that gets called on the backend
before any exec calls. You can keep a flag that tracks whether you
have done the initialization so that you only do it the first time.
Alan.
On Mar 2, 2011, at 5:29 AM, Lai Will wrote:
Hello,
I wrote a EvalFunc implementation that
1) Parses a SQL Query
2) Scans a folder for resource files and creates an index on
these files
3) According to certain properties of the SQL Query accesses
the corresponding file and creates a Java objects holding relevant
the information of the file (for reuse).
4) Does some computation with the SQL Query and the information
found in the file
5) Outputs a transformed SQL Query
Currently I'm doing local tests without Hadoop and the code works
fine.
The problem I see, is that right now I initialize my parser in the
EvalFunc, so that every time It gets instantiated a new instance of
the parser is generated. Ideally only on instance per machine would
be created.
Even worse right now I create the index and parse the corresponding
resource file once per call exec in EvalFunc and therefore do a lot
of redundant computation.
Just because I don't know where and how to put this shared
computation.
Does anybody have a solution on that?
Best,
Will