So I still get the redundant work whenever the same clusternode/vm creates multiple instances of my EvalFunc? And is it usual to have several instance of the EvalFunc on the same clusternode/vm?
Will -----Original Message----- From: Alan Gates [mailto:[email protected]] Sent: Wednesday, March 02, 2011 4:49 PM To: [email protected] Subject: Re: Shared resources There is no method in the eval func that gets called on the backend before any exec calls. You can keep a flag that tracks whether you have done the initialization so that you only do it the first time. Alan. On Mar 2, 2011, at 5:29 AM, Lai Will wrote: > Hello, > > I wrote a EvalFunc implementation that > > > 1) Parses a SQL Query > > 2) Scans a folder for resource files and creates an index on > these files > > 3) According to certain properties of the SQL Query accesses > the corresponding file and creates a Java objects holding relevant the > information of the file (for reuse). > > 4) Does some computation with the SQL Query and the information > found in the file > > 5) Outputs a transformed SQL Query > > Currently I'm doing local tests without Hadoop and the code works > fine. > > The problem I see, is that right now I initialize my parser in the > EvalFunc, so that every time It gets instantiated a new instance of > the parser is generated. Ideally only on instance per machine would be > created. > Even worse right now I create the index and parse the corresponding > resource file once per call exec in EvalFunc and therefore do a lot > of redundant computation. > > Just because I don't know where and how to put this shared > computation. > Does anybody have a solution on that? > > Best, > Will
