Hello,

I wrote a EvalFunc implementation that


1)      Parses a SQL Query

2)      Scans a folder for resource files and creates an index on these files

3)      According to certain properties of the SQL Query accesses the 
corresponding file and creates a Java objects holding relevant the information 
of the file (for reuse).

4)      Does some computation with the SQL Query and the information found in 
the file

5)      Outputs a transformed SQL Query

Currently I'm doing local tests without Hadoop and the code works fine.

The problem I see, is that right now I initialize my parser in the EvalFunc, so 
that every time It gets instantiated a new instance of the parser is generated. 
Ideally only on instance per machine would be created.
Even worse right now I create the index and parse the corresponding resource 
file once per call exec in EvalFunc  and therefore do a lot of redundant 
computation.

Just because I don't know where and how to put this shared computation.
Does anybody have a solution on that?

Best,
Will

Reply via email to