All RETE implementations use RAM these days. There are older rule engines that used databases or file systems when there wasn't enough RAM. The key to efficient scale of rulebase systems or expert systems is loading only the data you need. An expert system is inference engine + rules + functions + facts. Some products shameless promote their rule engine as an expert system, when they don't understand what the term means. Some rule engines are expert systems shells, which provide a full programming environment without needing IDE and a bunch of other stuff. For example CLIPS, JESS and Haley come to mind.
I would suggest reading Gary Riley's book http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems In terms of nodes, that actually doesn't matter much due to the discrimination network produced by RETE algorithm. What matters more is the number of facts and % of the facts that match some of the patterns declared in the rules. Most RETE implementations materialize the joins results, so that is the biggest factor in memory consumption. For example, if you had 1000 rules, but only 3 have joins, they it doesn't make much difference. In contrast, if you had 200 rules and each has 4 joins, it will consume more memory for the same dataset. Proper scaling of rulebase systems requires years of experience and expertise, so it's not something one should rush. It's best to study the domain and methodically develop the rulebase so that it is efficient. I would recommend you use JESS. Feel free to email me directly if your company wants to hire experienced rule developer to assist with your project. RETE rule engines are powerful tools, but it does require experience to scale properly. On Sat, Oct 20, 2012 at 10:24 AM, Luangsay Sourygna <[email protected]> wrote: > In your RETE implementation, did you just relied on RAM to store the > alpha and beta memories? > What if there is a huge number of facts/WME/nodes and that you have to > retain them for quite a long period (I mean: what happens if the > alpha&beta memories gets higher than the RAM of your server?) ? > > HBase seemed interesting to me because it enables me to "scale out" > this amount of memory and gives me the MR boost. Maybe there is a more > interesting database/distributed cache for that? > > A big thank you anyway for your reply: I have googled a bit on your > name and found many papers that should help me in going to the right > direction (from this link: > http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/). > Till now, the only paper I had found was: > http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf > (found on wikipedia) which I started to read. > > On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <[email protected]> wrote: >> Since I've implemented RETE algorithm, that is a terrible idea and >> wouldn't be efficient. >> >> storing alpha and beta memories in HBase is technically feasible, but >> it would be so slow as to be useless. >>
