Matthias Brantner has proposed merging lp:~zorba-coders/zorba/caching into 
lp:zorba.

Requested reviews:
  Matthias Brantner (matthias-brantner)
  Markos Zaharioudakis (markos-za)

For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/caching/+merge/85423

- automatic caching of recursive, non-sequential, and deterministic functions 
with atomic parameter and return types
- %ann:cache and %ann:no-cache for controlling function result caching
-- 
https://code.launchpad.net/~zorba-coders/zorba/caching/+merge/85423
Your team Zorba Coders is subscribed to branch lp:zorba.
=== modified file 'ChangeLog'
--- ChangeLog	2011-12-10 00:10:52 +0000
+++ ChangeLog	2011-12-13 03:41:26 +0000
@@ -1,6 +1,9 @@
 Zorba - The XQuery Processor
 
 version 2.2
+
+  * Caching of results for recursive functions with atomic parameter and return types.
+  * Added %ann:cache and %ann:no-cache to enable or disable caching of results of functions with atomic parameter and return types.
   * Added index management function to the C++ api's StaticCollectionManager.
 
 version 2.1
@@ -77,6 +80,7 @@
   * General index cannot be declared as unique if the type of its key is
   xs:anyAtomicType or xs:untypedAtomic.
   * Added undo for node revalidation
+  * Optimization for count(collection()) expressions
   * Fixed bug #867133 (SWIG PHP build failure on Mac OSX)
   * Fixed bug #872796 (validate-in-place can interfere with other update primitives)
   * Fixed bug #872799 (validate-in-place can set incorrect types)

=== modified file 'doc/zorba/options.dox'
--- doc/zorba/options.dox	2011-09-14 06:15:19 +0000
+++ doc/zorba/options.dox	2011-12-13 03:41:26 +0000
@@ -278,6 +278,41 @@
 In order to be able to use the value twice, the <tt>string:materialize</tt> function must be used to materialize the entire contents of the file <tt>myfile.txt</tt> in memory.
 Otherwise, the error zerr:ZSTR0055 is raised.
 
+
+\paragraph caching_annotation Caching Results of Functions
+
+Caching of function results may improve performance when computationally 
+expensive functions are invoked multiple times with the same arguments.
+
+If the optimization level is O1 or higher, Zorba automatically caches results
+of recursive, deterministic, and non-sequential functions whose parameter and 
+return types are subtypes of xs:anyAtomicType. Specifically, if such a function
+is called more than once with the same arguments, the result of the first call
+will be cached and subsequent calls will return the cached value without 
+re-evaluating the function.
+
+For example, in the following recursive function computing a fibonacci number, 
+each result is automatically cached and, hence, dramatically improves performance.
+
+\include zorba/udf/udf-fib-rec.xq
+
+Specifically, this optimization reduces the complexity of the function from
+O(1.6^n) to O(n).
+
+In order to explicitly disable function caching, the user can specify the 
+<tt>%ann:no-cache</tt> annotation.
+
+In addition, the user can use the <tt>%ann:cache</tt> annotation to cache the 
+results of functions other than the ones that are automatically cached. However, 
+this will only work if the function is not updating and its parameter and return
+types are subtypes of xs:anyAtomicType; otherwise, Zorba will raise a warning 
+(zwarn:ZWST0005) and simply ignore the %ann:cache annotation.
+
+Please note, that explicitly enforcing caching for sequential or nondeterministic
+functions might not give the intended result. In such cases, Zorba will raise a
+warning (zwarn:ZWST0006) but obey the %ann:cache annotation.
+
+
 \paragraph collection_index_annotations Annotations on Collections and Indexes
 
 The \ref xqddf uses annotations to assign properties to collections and indexes.

=== modified file 'include/zorba/pregenerated/diagnostic_list.h'
--- include/zorba/pregenerated/diagnostic_list.h	2011-11-15 08:23:20 +0000
+++ include/zorba/pregenerated/diagnostic_list.h	2011-12-13 03:41:26 +0000
@@ -600,6 +600,8 @@
 
 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0034_INDEX_RANGE_VALUE_PROBE_BAD_KEY_TYPES;
 
+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0035_INDEX_GENERAL_INSERT;
+
 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0031_IC_NOT_DECLARED;
 
 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0032_IC_NOT_ACTIVATED;
@@ -752,6 +754,10 @@
 
 extern ZORBA_DLL_PUBLIC ZorbaWarningCode ZWST0004_AMBIGUOUS_SEQUENTIAL_FLWOR;
 
+extern ZORBA_DLL_PUBLIC ZorbaWarningCode ZWST0005_CACHING_NOT_POSSIBLE;
+
+extern ZORBA_DLL_PUBLIC ZorbaWarningCode ZWST0006_CACHING_MIGHT_NOT_BE_INTENDED;
+
 } // namespace zwarn
 } // namespace zorba
 #endif /* ZORBA_DIAGNOSTIC_LIST_API_H */

=== modified file 'modules/com/zorba-xquery/www/modules/pregenerated/errors.xq'
--- modules/com/zorba-xquery/www/modules/pregenerated/errors.xq	2011-11-15 08:23:20 +0000
+++ modules/com/zorba-xquery/www/modules/pregenerated/errors.xq	2011-12-13 03:41:26 +0000
@@ -501,6 +501,10 @@
 
 (:~
 :)
+declare variable $zerr:ZDDY0035 as xs:QName := fn:QName($zerr:NS, "zerr:ZDDY0035");
+
+(:~
+:)
 declare variable $zerr:ZDDY0031 as xs:QName := fn:QName($zerr:NS, "zerr:ZDDY0031");
 
 (:~

=== modified file 'modules/com/zorba-xquery/www/modules/pregenerated/warnings.xq'
--- modules/com/zorba-xquery/www/modules/pregenerated/warnings.xq	2011-11-15 08:10:49 +0000
+++ modules/com/zorba-xquery/www/modules/pregenerated/warnings.xq	2011-12-13 03:41:26 +0000
@@ -53,4 +53,23 @@
 
 (:~
 :)
-declare variable $zwarn:ZWST0004 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0004");
\ No newline at end of file
+declare variable $zwarn:ZWST0004 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0004");
+
+(:~
+ :
+ : This warning is raised if the user explicitly enables caching
+ : of function results (using the %ann:cache annotation) but the function
+ : is updating or its parameter and return types are not subtypes of
+ : xs:anyAtomicType.
+ : 
+:)
+declare variable $zwarn:ZWST0005 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0005");
+
+(:~
+ :
+ : This warning is raised if the user explicitly enables caching
+ : of function results (using the %ann:cache annotation) and the function
+ : is annotated as sequential or nondeterministic.
+ : 
+:)
+declare variable $zwarn:ZWST0006 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0006");
\ No newline at end of file

=== modified file 'src/annotations/annotations.cpp'
--- src/annotations/annotations.cpp	2011-11-07 06:32:00 +0000
+++ src/annotations/annotations.cpp	2011-12-13 03:41:26 +0000
@@ -100,6 +100,9 @@
 
   ZANN(streamable, streamable);
 
+  ZANN(cache, cache);
+  ZANN(no-cache, nocache);
+
   //
   // Zorba annotations - xqddf
   //
@@ -168,6 +171,10 @@
       ZANN(zann_nonsequential));
 
   theRuleSet.push_back(
+      ZANN(zann_cache) |
+      ZANN(zann_nocache));
+
+  theRuleSet.push_back(
       ZANN(fn_private) |
       ZANN(fn_public));
 
@@ -433,6 +440,5 @@
 }
 
 
-
 } /* namespace zorba */
 /* vim:set et sw=2 ts=2: */

=== modified file 'src/annotations/annotations.h'
--- src/annotations/annotations.h	2011-11-01 13:47:10 +0000
+++ src/annotations/annotations.h	2011-12-13 03:41:26 +0000
@@ -55,6 +55,8 @@
     zann_nonassignable,
     zann_sequential,
     zann_nonsequential,
+    zann_cache,
+    zann_nocache,
     zann_variadic,
     zann_streamable,
     zann_unique,

=== modified file 'src/compiler/codegen/plan_visitor.cpp'
--- src/compiler/codegen/plan_visitor.cpp	2011-12-07 20:46:23 +0000
+++ src/compiler/codegen/plan_visitor.cpp	2011-12-13 03:41:26 +0000
@@ -98,6 +98,7 @@
 #endif
 
 #include "functions/function.h"
+#include "functions/udf.h"
 #include "functions/library.h"
 
 #include "types/typeops.h"
@@ -2276,7 +2277,7 @@
 {
   CODEGEN_TRACE_OUT("");
 
-  const function* func = v.get_func();
+  function* func = v.get_func();
 
   std::vector<PlanIter_t> argv;
 
@@ -2320,45 +2321,12 @@
           dynamic_cast<EnclosedIterator*>(iter.getp())->setInUpdateExpr();
       }
     }
-#if 0
     else if (func->isUdf())
     {
-      const user_function* udf = static_cast<const user_function*>(func);
-
-      if (udf->isExiting())
-      {
-        TypeManager* tm = v.get_type_manager();
-
-        const xqtref_t& udfType = udf->getSignature().returnType();
-
-        expr* body = udf->getBody();
-
-        std::vector<expr*> exitExprs;
-        ulong numExitExprs;
-        ulong i;
-
-        body->get_exprs_of_kind(exit_expr_kind, exitExprs);
-
-        for (i = 0; i < numExitExprs; ++i)
-        {
-          if (!TypeOps::is_subtype(tm,
-                                   *exitExprs[i]->get_return_type(),
-                                   *udfType,
-                                   loc))
-            break;
-        }
-
-        if (i < numExitExprs)
-        {
-          UDFunctionCallIterator* udfIter = 
-          dynamic_cast<UDFunctionCallIterator*>(iter.getp());
-
-          ZORBA_ASSERT(udfIter != NULL);
-          udfIter->setCheckType();
-        }
-      }
+      // need to computeResultCaching here for iterprint to work
+      user_function* udf = static_cast<user_function*>(func);
+      udf->computeResultCaching(theCCB->theXQueryDiagnostics);
     }
-#endif
   }
   else
   {

=== modified file 'src/diagnostics/diagnostic_en.xml'
--- src/diagnostics/diagnostic_en.xml	2011-12-07 20:46:23 +0000
+++ src/diagnostics/diagnostic_en.xml	2011-12-13 03:41:26 +0000
@@ -2013,6 +2013,10 @@
       <value>"$1": index range-value probe has search keys with incompatible types</value>
     </diagnostic>
 
+    <diagnostic code="ZDDY0035" name="INDEX_GENERAL_INSERT">
+      <value>"$1": index inserting more than one key not allowed for general index</value>
+    </diagnostic>
+
     <diagnostic code="ZDDY0031" name="IC_NOT_DECLARED">
       <value>"$1": integrity constraint is not declared</value>
     </diagnostic>
@@ -2323,6 +2327,37 @@
       <value>Sequential FLWOR expr may not have the semantics you expect</value>
     </diagnostic>
 
+    <diagnostic code="ZWST0005" name="CACHING_NOT_POSSIBLE">
+      <comment>
+        This warning is raised if the user explicitly enables caching
+        of function results (using the %ann:cache annotation) but the function
+        is updating or its parameter and return types are not subtypes of
+        xs:anyAtomicType.
+      </comment>
+      <value>"$1": function caching not possible; $2</value>
+      <entry key="RETURN_TYPE">
+        <value>return type ($3) is not subtype of xs:anyAtomicType</value>
+      </entry>
+      <entry key="PARAM_TYPE">
+        <value>type of parameter $3 is $4 which is not a subtype of xs:anyAtomicType</value>
+      </entry>
+      <entry key="UPDATING">
+        <value>function is updating</value>
+      </entry>
+      <entry key="VARIADIC">
+        <value>function is variadic</value>
+      </entry>
+    </diagnostic>
+
+    <diagnostic code="ZWST0006" name="CACHING_MIGHT_NOT_BE_INTENDED">
+      <comment>
+        This warning is raised if the user explicitly enables caching
+        of function results (using the %ann:cache annotation) and the function
+        is annotated as sequential or nondeterministic.
+      </comment>
+      <value>"$1": function caching might not give the intended result because the function is declared as $2</value>
+    </diagnostic>
+
   </namespace>
 
   <!--////////// Subvalues /////////////////////////////////////////////////-->

=== modified file 'src/diagnostics/pregenerated/diagnostic_list.cpp'
--- src/diagnostics/pregenerated/diagnostic_list.cpp	2011-11-15 08:23:20 +0000
+++ src/diagnostics/pregenerated/diagnostic_list.cpp	2011-12-13 03:41:26 +0000
@@ -879,6 +879,9 @@
 ZorbaErrorCode ZDDY0034_INDEX_RANGE_VALUE_PROBE_BAD_KEY_TYPES( "ZDDY0034" );
 
 
+ZorbaErrorCode ZDDY0035_INDEX_GENERAL_INSERT( "ZDDY0035" );
+
+
 ZorbaErrorCode ZDDY0031_IC_NOT_DECLARED( "ZDDY0031" );
 
 
@@ -1104,6 +1107,12 @@
 ZorbaWarningCode ZWST0004_AMBIGUOUS_SEQUENTIAL_FLWOR( "ZWST0004" );
 
 
+ZorbaWarningCode ZWST0005_CACHING_NOT_POSSIBLE( "ZWST0005" );
+
+
+ZorbaWarningCode ZWST0006_CACHING_MIGHT_NOT_BE_INTENDED( "ZWST0006" );
+
+
 } // namespace zwarn
 
 } // namespace zorba

=== modified file 'src/diagnostics/pregenerated/dict_en.cpp'
--- src/diagnostics/pregenerated/dict_en.cpp	2011-12-01 16:19:52 +0000
+++ src/diagnostics/pregenerated/dict_en.cpp	2011-12-13 03:41:26 +0000
@@ -298,6 +298,7 @@
   { "ZDDY0032", "\"$1\": integrity constraint is not activated" },
   { "ZDDY0033", "\"$1\": integrity constraint not met for collection \"$2\"" },
   { "ZDDY0034", "\"$1\": index range-value probe has search keys with incompatible types" },
+  { "ZDDY0035", "\"$1\": index inserting more than one key not allowed for general index" },
   { "ZDST0001", "\"$1\": collection already declared" },
   { "ZDST0002", "\"$1\": collection already imported into module \"$2\"" },
   { "ZDST0003", "\"$1\": collection declaration not allowed in main module" },
@@ -360,6 +361,8 @@
   { "ZWST0002", "\"$1\": unknown or unsupported annotation" },
   { "ZWST0003", "\"$1\": function declared sequential, but has non-sequential body" },
   { "ZWST0004", "Sequential FLWOR expr may not have the semantics you expect" },
+  { "ZWST0005", "\"$1\": function caching not possible; $2" },
+  { "ZWST0006", "\"$1\": function caching might not give the intended result because the function is declared as $2" },
   { "ZXQD0001", "\"$1\": prefix not declared when calling function \"$2\" from $3" },
   { "ZXQD0002", "\"$1\": $2" },
   { "ZXQD0003", "inconsistent options to the parse-xml-fragment() function: $1" },
@@ -664,6 +667,10 @@
   { "~XUST0002_UDF_2", "\"$2\": function declared updating but body is not updating or vacuous" },
   { "~ZDST0060_unknown_localname", "unknown localname ($3)" },
   { "~ZDST0060_unknown_namespace", "unknown namespace ($3)" },
+  { "~ZWST0005_PARAM_TYPE", "type of parameter $3 is $4 which is not a subtype of xs:anyAtomicType" },
+  { "~ZWST0005_RETURN_TYPE", "return type ($3) is not subtype of xs:anyAtomicType" },
+  { "~ZWST0005_UPDATING", "function is updating" },
+  { "~ZWST0005_VARIADIC", "function is variadic" },
   { "~ZXQD0004_NON_NEGATIVE", "given value must be non-negative ($2)" },
   { "~ZXQD0004_NOT_WITHIN_RANGE", "not within allowed range ($2)" },
   { "~ZXQP0004_TypeOps_is_in_scope_ForFunctionItemTypes", "TypeOps::is_in_scope() for function-item types" },

=== modified file 'src/functions/udf.cpp'
--- src/functions/udf.cpp	2011-10-18 17:37:41 +0000
+++ src/functions/udf.cpp	2011-12-13 03:41:26 +0000
@@ -26,10 +26,15 @@
 #include "compiler/rewriter/framework/rewriter.h"
 
 #include "functions/udf.h"
+#include "annotations/annotations.h"
 #include "functions/function_impl.h"
 
+#include "diagnostics/xquery_warning.h"
+
 #include "types/typeops.h"
 
+#include "store/api/index.h" // needed for destruction of the cache
+
 
 namespace zorba
 {
@@ -54,7 +59,10 @@
   theIsExiting(false),
   theIsLeaf(true),
   theIsOptimized(false),
-  thePlanStateSize(0)
+  thePlanStateSize(0),
+  theCache(0),
+  theCacheResults(false),
+  theCacheComputed(false)
 {
   setFlag(FunctionConsts::isUDF);
   resetFlag(FunctionConsts::isBuiltin);
@@ -318,7 +326,7 @@
 /*******************************************************************************
 
 ********************************************************************************/
-  PlanIter_t user_function::getPlan(CompilerCB* ccb, uint32_t& planStateSize)
+PlanIter_t user_function::getPlan(CompilerCB* ccb, uint32_t& planStateSize)
 {
   if (thePlan == NULL)
   {
@@ -372,9 +380,185 @@
 /*******************************************************************************
 
 ********************************************************************************/
-CODEGEN_DEF(user_function)
-{
-  return new UDFunctionCallIterator(aSctx, aLoc, aArgs, this);
+store::Index* user_function::getCache() const
+{
+  return theCache.getp();
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+void user_function::setCache(store::Index* aCache)
+{
+  theCache = aCache;
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+bool user_function::cacheResults() const
+{
+  return theCacheResults;
+}
+
+
+/*******************************************************************************
+ only cache recursive (non-sequential, non-updating, deterministic)
+ functions with singleton atomic input and output
+********************************************************************************/
+void user_function::computeResultCaching(XQueryDiagnostics* diag)
+{
+  if (theCacheComputed)
+  {
+    return; 
+  }
+
+  struct OnExit 
+  {
+  private:
+    bool& theResult;
+    bool& theCacheComputed;
+
+  public:
+    OnExit(bool& aResult, bool& aCacheComputed)
+      :
+      theResult(aResult),
+      theCacheComputed(aCacheComputed) {}
+
+    void cache() { theResult = true; }
+
+    ~OnExit()
+    {
+      theCacheComputed = true;
+    }
+  };
+
+  // will be destroyed when the function is exited
+  // set caching to true if cache() is called
+  OnExit lExit(theCacheResults, theCacheComputed);
+
+  // check necessary conditions
+  // %ann:cache or not %ann:no-cache
+  if (theAnnotationList &&
+      theAnnotationList->contains(AnnotationInternal::zann_nocache))
+  {
+    return;
+  }
+
+  // was the %ann:cache annotation given explicitly by the user
+  bool lExplicitCacheRequest = 
+    (theAnnotationList ?
+     theAnnotationList->contains(AnnotationInternal::zann_cache) :
+     false);
+
+  if (isVariadic())
+  {
+    if (lExplicitCacheRequest)
+    {
+      diag->add_warning(
+        NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE,
+        WARN_PARAMS(getName()->getStringValue(), ZED(ZWST0005_VARIADIC)),
+        WARN_LOC(theLoc)));
+    }
+    return;
+  }
+
+  // parameter and return types are subtype of xs:anyAtomicType?
+  const xqtref_t& lRes = theSignature.returnType();
+  TypeManager* tm = lRes->get_manager();
+
+  if (!TypeOps::is_subtype(tm,
+                           *lRes,
+                           *GENV_TYPESYSTEM.ANY_ATOMIC_TYPE_ONE,
+                           theLoc))
+  {
+    if (lExplicitCacheRequest)
+    {
+      diag->add_warning(
+        NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE,
+        WARN_PARAMS(getName()->getStringValue(),
+                    ZED(ZWST0005_RETURN_TYPE),
+                    lRes->toString()),
+        WARN_LOC(theLoc)));
+    }
+    return;
+  }
+
+  csize lArity = theSignature.paramCount();
+  for (csize i = 0; i < lArity; ++i)
+  {
+    const xqtref_t& lArg = theSignature[i];
+    if (!TypeOps::is_subtype(tm,
+                             *lArg,
+                             *GENV_TYPESYSTEM.ANY_ATOMIC_TYPE_ONE,
+                             theLoc))
+    {
+      if (lExplicitCacheRequest)
+      {
+        diag->add_warning(
+            NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE,
+            WARN_PARAMS(getName()->getStringValue(),
+                        ZED(ZWST0005_PARAM_TYPE),
+                        i+1,
+                        lArg->toString()),
+            WARN_LOC(theLoc)));
+      }
+      return;
+    }
+  }
+
+  // function updating?
+  if (isUpdating())
+  {
+    if (lExplicitCacheRequest)
+    {
+      diag->add_warning(
+        NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE,
+        WARN_PARAMS(getName()->getStringValue(), ZED(ZWST0005_UPDATING)),
+        WARN_LOC(theLoc)));
+    }
+    return;
+  }
+
+  if (isSequential() || !isDeterministic())
+  {
+    if (lExplicitCacheRequest)
+    {
+      diag->add_warning(
+        NEW_XQUERY_WARNING(zwarn::ZWST0006_CACHING_MIGHT_NOT_BE_INTENDED,
+        WARN_PARAMS(getName()->getStringValue(),
+                    (isSequential()?"sequential":"non-deterministic")),
+        WARN_LOC(theLoc)));
+
+      lExit.cache();
+    }
+    return;
+  }
+  
+
+  // optimization is prerequisite before invoking isRecursive
+  if (!lExplicitCacheRequest && isOptimized() && !isRecursive())
+  {
+    return;
+  }
+
+  lExit.cache();
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+PlanIter_t user_function::codegen(
+      CompilerCB* cb,
+      static_context* sctx,
+      const QueryLoc& loc,
+      std::vector<PlanIter_t>& argv,
+      AnnotationHolder& ann) const
+{
+  return new UDFunctionCallIterator(sctx, loc, argv, this);
 }
 
 

=== modified file 'src/functions/udf.h'
--- src/functions/udf.h	2011-10-18 17:37:41 +0000
+++ src/functions/udf.h	2011-12-13 03:41:26 +0000
@@ -25,6 +25,12 @@
 namespace zorba 
 {
 
+  namespace store
+  {
+    class Index;
+    typedef rchandle<Index> Index_t;
+  }
+
 
 /*******************************************************************************
   A udf with params $x1, $x2, ..., $xn and a body_expr is translated into a
@@ -78,6 +84,25 @@
   references of an arg var, these references are "mutually exclusive", ie, 
   at most one of the references will actually be reached during each particular
   execution of the body.
+
+  theCache:
+  ---------
+  Maps the arg values of an invocation to the result of that invocation.
+  If an invocation uses the same arg values as a previous invocation, the cached
+  result is simply returned without re-evaluating the udf.
+
+  theCacheResults:
+  ----------------
+  Tells whether caching should be done for this udf or not.
+
+  theCacheComputed:
+  -----------------
+  Tells whether theCacheResults has been computed already or not.
+  theCacheResults is computed by the computeResultCaching() method, which is
+  invoked during codegen every time a udf call is encountered. The same udf may
+  be invoked multiple times, but the computation of theCacheResults needs to
+  be done only once. So, during the 1st invcocation of computeResultCaching(),
+  theCacheComputed is set to true, and subsequent invocations are noops.
 ********************************************************************************/
 class user_function : public function 
 {
@@ -102,6 +127,10 @@
   uint32_t                    thePlanStateSize;
   std::vector<ArgVarRefs>     theArgVarsRefs;
 
+  store::Index_t              theCache;
+  bool                        theCacheResults;
+  bool                        theCacheComputed;
+
 public:
   SERIALIZABLE_CLASS(user_function)
   user_function(::zorba::serialization::Archiver& ar);
@@ -160,6 +189,14 @@
 
   const std::vector<ArgVarRefs>& getArgVarsRefs() const;
 
+  store::Index* getCache() const;
+
+  void setCache(store::Index* aCache);
+
+  bool cacheResults() const;
+
+  void computeResultCaching(XQueryDiagnostics*);
+
   PlanIter_t codegen(
         CompilerCB* cb,
         static_context* sctx,

=== modified file 'src/runtime/core/fncall_iterator.cpp'
--- src/runtime/core/fncall_iterator.cpp	2011-10-30 00:18:34 +0000
+++ src/runtime/core/fncall_iterator.cpp	2011-12-13 03:41:26 +0000
@@ -48,6 +48,11 @@
 
 #include "util/string_util.h"
 
+#include "store/api/index.h"
+#include "store/api/store.h"
+#include "store/api/iterator_factory.h"
+#include "store/api/temp_seq.h"
+
 #ifdef ZORBA_WITH_DEBUGGER
 #include "debugger/debugger_commons.h"
 
@@ -93,7 +98,8 @@
   thePlanState(NULL),
   thePlanStateSize(0),
   theLocalDCtx(NULL),
-  thePlanOpen(false)
+  thePlanOpen(false),
+  theCache(0)
 {
 }
 
@@ -153,6 +159,8 @@
   {
     thePlan->reset(*thePlanState);
   }
+
+  theArgValues.clear();
 }
 
 
@@ -198,6 +206,113 @@
 /*******************************************************************************
 
 ********************************************************************************/
+bool UDFunctionCallIterator::isCached() const
+{
+  return theUDF->cacheResults();
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+void UDFunctionCallIterator::createCache(
+    PlanState& planState,
+    UDFunctionCallIteratorState* state)
+{
+  store::Index_t lIndex = theUDF->getCache();
+
+  if (!lIndex && theUDF->cacheResults())
+  {
+    const signature& sig = theUDF->getSignature();
+
+    csize numArgs = theChildren.size();
+
+    store::IndexSpecification lSpec;
+    lSpec.theNumKeyColumns = numArgs;
+    lSpec.theKeyTypes.resize(numArgs);
+    lSpec.theCollations.resize(numArgs);
+    lSpec.theIsTemp = true;
+    lSpec.theIsUnique = true;
+
+    for (csize i = 0; i < numArgs; ++i)
+    {
+      lSpec.theKeyTypes[i] = sig[i]->getBaseBuiltinType()->get_qname().getp();
+    }
+
+    lIndex = GENV_STORE.createIndex(theUDF->getName(), lSpec, 0);
+
+    theUDF->setCache(lIndex.getp()); // cache the cache in the function itself
+
+    state->theArgValues.reserve(numArgs);
+  }
+
+  state->theCache = lIndex.getp();
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+bool UDFunctionCallIterator::probeCache(
+    PlanState& planState,
+    UDFunctionCallIteratorState* state,
+    store::Item_t& result,
+    std::vector<store::Item_t>& aKey) const
+{
+  if (!state->theCache)
+    return false;
+
+  store::IndexCondition_t lCond =
+  state->theCache->createCondition(store::IndexCondition::POINT_VALUE);
+
+  std::vector<store::Iterator_t>::iterator lIter = state->theArgWrappers.begin();
+
+  for (; lIter != state->theArgWrappers.end(); ++lIter)
+  {
+    store::Iterator_t& argWrapper = (*lIter);
+    store::Item_t lArg;
+    if (argWrapper) // might be 0 if argument is not used
+    {
+      argWrapper->next(lArg); // guaranteed to have exactly one result
+    }
+    aKey.push_back(lArg);
+    lCond->pushItem(lArg);
+  }
+
+  store::IndexProbeIterator_t probeIte = 
+  GENV_STORE.getIteratorFactory()->createIndexProbeIterator(state->theCache);
+
+  probeIte->init(lCond);
+  probeIte->open();
+  return probeIte->next(result);
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+void UDFunctionCallIterator::insertCacheEntry(
+  UDFunctionCallIteratorState* state,
+  std::vector<store::Item_t>& aKey,
+  store::Item_t& aValue) const
+{
+  if (state->theCache)
+  {
+    std::auto_ptr<store::IndexKey> k(new store::IndexKey());
+    store::IndexKey* k2 = k.get();
+    k->theItems = aKey;
+    store::Item_t lTmp = aValue; // insert will eventually transfer the Item_t
+    if (!state->theCache->insert(k2, lTmp))
+    {
+      k.release();
+    }
+  }
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
 void UDFunctionCallIterator::openImpl(PlanState& planState, uint32_t& offset)
 {
   UDFunctionCallIteratorState* state;
@@ -229,6 +344,11 @@
   // the plan state (but not the state block) and dynamic context.
   state->open(planState, theUDF);
 
+  // if the results of the function should be cached (prereq: atomic in and out)
+  // this functions stores an index in the dynamic context that contains
+  // the cached results. The name of the index is the name of the function.
+  createCache(planState, state);
+
   // Create a wrapper over each subplan that computes an argument expr, if the
   // associated param is actually used anywhere in the function body.
   csize numArgs = theChildren.size();
@@ -301,6 +421,9 @@
 {
   try
   {
+    std::vector<store::Item_t> lKey;
+    bool lCacheHit;
+
     UDFunctionCallIteratorState* state;
     DEFAULT_STACK_INIT(UDFunctionCallIteratorState, state, planState);
 
@@ -314,19 +437,31 @@
       state->thePlanOpen = true;
     }
 
-    // Bind the args.
+    // check if there is a cache and the result is already in the cache
+    lCacheHit = probeCache(planState, state, result, lKey);
+
+    // if not in the cache, we bind the arguments to the function
+    if (!lCacheHit)
     {
       const std::vector<ArgVarRefs>& argsRefs = theUDF->getArgVarsRefs();
-      std::vector<ArgVarRefs>::const_iterator argsRefsIte = argsRefs.begin();
-      std::vector<ArgVarRefs>::const_iterator argsRefsEnd = argsRefs.end();
-
-      std::vector<store::Iterator_t>::iterator argWrapsIte = 
-      state->theArgWrappers.begin();
-
-      for (; argsRefsIte != argsRefsEnd; ++argsRefsIte, ++argWrapsIte)
+      const std::vector<store::Iterator_t>& argWraps = state->theArgWrappers;
+
+      for (size_t i = 0; i < argsRefs.size(); ++i)
       {
-        store::Iterator_t& argWrapper = (*argWrapsIte);
-        const ArgVarRefs& argVarRefs = (*argsRefsIte);
+        const ArgVarRefs& argVarRefs = argsRefs[i];
+        store::Iterator_t argWrapper;
+        if (state->theCache)
+        {
+          std::vector<store::Item_t> lParam(1, lKey[i]);
+          state->theArgValues.push_back(GENV_STORE.createTempSeq(lParam));
+          argWrapper = state->theArgValues.back()->getIterator();
+          argWrapper->open();
+        }
+        else
+        {
+          argWrapper = argWraps[i];
+        }
+
         ArgVarRefs::const_iterator argVarRefsIte = argVarRefs.begin();
         ArgVarRefs::const_iterator argVarRefsEnd = argVarRefs.end();
 
@@ -343,26 +478,35 @@
       }
     }
 
-#ifdef ZORBA_WITH_DEBUGGER
-    DEBUGGER_PUSH_FRAME;
-#endif
-
-    while (consumeNext(result, state->thePlan, *state->thePlanState))
+    if (lCacheHit)
     {
-#ifdef ZORBA_WITH_DEBUGGER
-      DEBUGGER_POP_FRAME;
-#endif
       STACK_PUSH(true, state);
-
+    }
+    else
+    {
 #ifdef ZORBA_WITH_DEBUGGER
       DEBUGGER_PUSH_FRAME;
 #endif
+      while (consumeNext(result, state->thePlan, *state->thePlanState))
+      {
+#ifdef ZORBA_WITH_DEBUGGER
+        DEBUGGER_POP_FRAME;
+#endif
+
+        insertCacheEntry(state, lKey, result);
+        STACK_PUSH(true, state);
+
+#ifdef ZORBA_WITH_DEBUGGER
+        DEBUGGER_PUSH_FRAME;
+#endif
+      }
+
+#ifdef ZORBA_WITH_DEBUGGER
+      DEBUGGER_POP_FRAME;
+#endif
+
     }
 
-#ifdef ZORBA_WITH_DEBUGGER
-    DEBUGGER_POP_FRAME;
-#endif
-
     STACK_END(state);
 
   }

=== modified file 'src/runtime/core/fncall_iterator.h'
--- src/runtime/core/fncall_iterator.h	2011-10-30 00:18:34 +0000
+++ src/runtime/core/fncall_iterator.h	2011-12-13 03:41:26 +0000
@@ -69,16 +69,31 @@
   the body. So, it is never the case that the arg expr will have more than one 
   consumers, and as a result we can bind all those V references to the same arg
   wrapper.
+
+  theCache:
+  ---------
+  Is an Index which is set in the state if caching for the invoked function
+  should be done. The cache is owned by the UDF itself and shared across
+  all function invocations.
+
+  theCacheHits:
+  -------------
+  If caching is used, this vector contains the results of all arguments
+  of the function evaluation. It's used to bind the variables if the
+  cache didn't give a result in order to avoid duplicate evaluation of
+  the arguments.
 ********************************************************************************/
 class UDFunctionCallIteratorState : public PlanIteratorState 
 {
 public:
-  PlanIterator                 * thePlan;
-  PlanState                    * thePlanState;
-  uint32_t                       thePlanStateSize;
-  dynamic_context              * theLocalDCtx;
-  bool                           thePlanOpen;
-  std::vector<store::Iterator_t> theArgWrappers;
+  PlanIterator                   * thePlan;
+  PlanState                      * thePlanState;
+  uint32_t                         thePlanStateSize;
+  dynamic_context                * theLocalDCtx;
+  bool                             thePlanOpen;
+  std::vector<store::Iterator_t>   theArgWrappers;
+  store::Index                   * theCache;
+  std::vector<store::TempSeq_t>    theArgValues;
 
   UDFunctionCallIteratorState();
 
@@ -130,6 +145,8 @@
 
   void setDynamic() { theIsDynamic = true; }
 
+  bool isCached() const;
+
   void accept(PlanIterVisitor& v) const;
 
   void openImpl(PlanState& planState, uint32_t& offset);
@@ -139,6 +156,22 @@
   void closeImpl(PlanState& planState);
 
   bool nextImpl(store::Item_t& result, PlanState& planState) const;
+
+protected:
+  void createCache(
+    PlanState& planState,
+    UDFunctionCallIteratorState* state);
+
+  bool probeCache(
+    PlanState& planState,
+    UDFunctionCallIteratorState* state,
+    store::Item_t& result,
+    std::vector<store::Item_t>& aKey) const;
+
+  void insertCacheEntry(
+    UDFunctionCallIteratorState* state,
+    std::vector<store::Item_t>& aKey,
+    store::Item_t& aValue) const;
 };
 
 

=== modified file 'src/runtime/visitors/printer_visitor_impl.cpp'
--- src/runtime/visitors/printer_visitor_impl.cpp	2011-12-01 11:02:25 +0000
+++ src/runtime/visitors/printer_visitor_impl.cpp	2011-12-13 03:41:26 +0000
@@ -1209,7 +1209,23 @@
 
 #undef TYPED_VAL_CMP
 
-  PRINTER_VISITOR_DEFINITION (UDFunctionCallIterator)
+  void PrinterVisitor::beginVisit ( const UDFunctionCallIterator& a )
+  {
+    thePrinter.startBeginVisit("UDFunctionCallIterator", ++theId);
+    printCommons(  &a, theId );
+    if (a.isCached())
+    {
+      thePrinter.addAttribute("cached", "true");
+    }
+    thePrinter.endBeginVisit( theId);
+  }
+
+  void PrinterVisitor::endVisit ( const UDFunctionCallIterator& )
+  {
+    thePrinter.startEndVisit();
+    thePrinter.endEndVisit();
+  }
+
   PRINTER_VISITOR_DEFINITION (ExtFunctionCallIterator)
   PRINTER_VISITOR_DEFINITION (FnBooleanIterator)
   PRINTER_VISITOR_DEFINITION (OrIterator)

=== modified file 'src/store/api/index.h'
--- src/store/api/index.h	2011-10-11 12:11:48 +0000
+++ src/store/api/index.h	2011-12-13 03:41:26 +0000
@@ -415,6 +415,18 @@
    * Returns all keys stored in this index
    */
   virtual KeyIterator_t keys() const = 0;
+
+  /**
+   * Insert the given item in the value set of the given key. If the key is not
+   * in the index already, then the key itself is inserted as well. Return true
+   * if the key was already in the index, false otherwise
+   * The index wil take the ownership of the key if it was not already in the
+   * index.
+   *
+   * @error ZDDY0035 if a key with more than one item is inserted into
+   *  a general index
+   */
+  virtual bool insert(store::IndexKey*& key, store::Item_t& item) = 0;
 };
 
 

=== modified file 'src/store/naive/simple_index_general.cpp'
--- src/store/naive/simple_index_general.cpp	2011-11-15 18:48:54 +0000
+++ src/store/naive/simple_index_general.cpp	2011-12-13 03:41:26 +0000
@@ -236,6 +236,20 @@
 /******************************************************************************
 
 *******************************************************************************/
+bool GeneralIndex::insert(store::IndexKey*& key, store::Item_t& value)
+{
+  if (key->size() != 1)
+  {
+    RAISE_ERROR_NO_LOC(zerr::ZDDY0035_INDEX_GENERAL_INSERT,
+    ERROR_PARAMS(getName()->getStringValue()));
+  }
+  return insert((*key)[0], value);
+}
+
+
+/******************************************************************************
+
+*******************************************************************************/
 bool GeneralIndex::insert(store::Item_t& key, store::Item_t& node)
 {
   bool lossy = false;

=== modified file 'src/store/naive/simple_index_general.h'
--- src/store/naive/simple_index_general.h	2011-11-15 18:48:54 +0000
+++ src/store/naive/simple_index_general.h	2011-12-13 03:41:26 +0000
@@ -174,6 +174,8 @@
 
   bool insert(store::Item_t& key, store::Item_t& node);
 
+  bool insert(store::IndexKey*& key, store::Item_t& value);
+
   virtual bool remove(const store::Item_t& key, store::Item_t& item, bool all) = 0;
 };
 

=== modified file 'src/store/naive/simple_store.cpp'
--- src/store/naive/simple_store.cpp	2011-11-02 17:19:09 +0000
+++ src/store/naive/simple_store.cpp	2011-12-13 03:41:26 +0000
@@ -555,6 +555,9 @@
     store::Iterator* aSourceIter,
     ulong aNumColumns)
 {
+  if (!aSourceIter)
+    return;
+
   store::Item_t domainItem;
   store::IndexKey* key = NULL;
 

=== modified file 'src/store/naive/simple_temp_seq.cpp'
--- src/store/naive/simple_temp_seq.cpp	2011-08-19 17:04:48 +0000
+++ src/store/naive/simple_temp_seq.cpp	2011-12-13 03:41:26 +0000
@@ -53,27 +53,6 @@
 /*******************************************************************************
 
 ********************************************************************************/
-store::Item_t SimpleTempSeq::operator[](xs_integer aIndex) 
-{
-  uint64_t lIndex;
-  try {
-    lIndex = to_xs_unsignedLong(aIndex);
-  } catch (std::range_error& e)
-  {
-    throw ZORBA_EXCEPTION(
-        zerr::ZSTR0060_RANGE_EXCEPTION,
-        ERROR_PARAMS(
-          BUILD_STRING("access out of bounds " << e.what() << ")")
-        )
-      );
-  }
-  return theItems[lIndex];
-}
-
-
-/*******************************************************************************
-
-********************************************************************************/
 xs_integer SimpleTempSeq::getSize() 
 {
   return theItems.size();
@@ -312,14 +291,15 @@
   case none:
     if ( theCurPos < to_xs_unsignedLong(theTempSeq->getSize()) ) 
     {
-      result = (*theTempSeq)[theCurPos];
+      result = theTempSeq->theItems[theCurPos];
       return true;
     }
     break;
+
   case startEnd:
     if ( theCurPos < theEndPos ) 
     {
-      result = (*theTempSeq)[theCurPos];
+      result = theTempSeq->theItems[theCurPos];
       return true;
     }
     break;
@@ -336,17 +316,17 @@
 	{
   case none:
     if ( theCurPos < to_xs_unsignedLong(theTempSeq->getSize()) ) 
-      return (*theTempSeq)[theCurPos];
+      return theTempSeq->theItems[theCurPos];
     break;
   case startEnd:
     if ( theCurPos < theEndPos ) 
-      return (*theTempSeq)[theCurPos];
+      return theTempSeq->theItems[theCurPos];
     break;
   }
 
   return NULL;
 }
-  
+
 
 void SimpleTempSeqIter::reset()
 {

=== modified file 'src/store/naive/simple_temp_seq.h'
--- src/store/naive/simple_temp_seq.h	2011-07-29 23:01:30 +0000
+++ src/store/naive/simple_temp_seq.h	2011-12-13 03:41:26 +0000
@@ -35,6 +35,8 @@
 ********************************************************************************/
 class SimpleTempSeq : public store::TempSeq
 {
+  friend class SimpleTempSeqIter;
+
 private:
   std::vector<store::Item_t> theItems;
 
@@ -55,8 +57,6 @@
 
   void append(store::Iterator_t iter, bool copy);
 
-  store::Item_t operator[](xs_integer aIndex);
-
   void getItem(xs_integer position, store::Item_t& res);
 
   bool containsItem(xs_integer position);
@@ -102,9 +102,9 @@
   SimpleTempSeq_t     theTempSeq;
   BorderType          theBorderType;
 
-  uint64_t               theCurPos;
-  uint64_t               theStartPos;
-  uint64_t               theEndPos;
+  uint64_t            theCurPos;
+  uint64_t            theStartPos;
+  uint64_t            theEndPos;
 
  public:
   SimpleTempSeqIter() {}

=== added file 'test/rbkt/ExpCompilerResults/IterPlan/zorba/udf/udf-fib-rec.iter'
--- test/rbkt/ExpCompilerResults/IterPlan/zorba/udf/udf-fib-rec.iter	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpCompilerResults/IterPlan/zorba/udf/udf-fib-rec.iter	2011-12-13 03:41:26 +0000
@@ -0,0 +1,42 @@
+Iterator tree for main query:
+<UDFunctionCallIterator cached="true">
+  <SingletonIterator value="xs:integer(100)"/>
+</UDFunctionCallIterator>
+
+Iterator tree for local:fib:
+<flwor::FLWORIterator>
+  <ForVariable name="n">
+    <LetVarIterator varname="n"/>
+  </ForVariable>
+  <ReturnClause>
+    <IfThenElseIterator>
+      <TypedValueCompareIterator_INTEGER>
+        <ForVarIterator varname="n"/>
+        <SingletonIterator value="xs:integer(0)"/>
+      </TypedValueCompareIterator_INTEGER>
+      <SingletonIterator value="xs:integer(0)"/>
+      <IfThenElseIterator>
+        <TypedValueCompareIterator_INTEGER>
+          <ForVarIterator varname="n"/>
+          <SingletonIterator value="xs:integer(1)"/>
+        </TypedValueCompareIterator_INTEGER>
+        <SingletonIterator value="xs:integer(1)"/>
+        <SpecificNumArithIterator_AddOperation_INTEGER>
+          <UDFunctionCallIterator cached="true">
+            <SpecificNumArithIterator_SubtractOperation_INTEGER>
+              <ForVarIterator varname="n"/>
+              <SingletonIterator value="xs:integer(1)"/>
+            </SpecificNumArithIterator_SubtractOperation_INTEGER>
+          </UDFunctionCallIterator>
+          <UDFunctionCallIterator cached="true">
+            <SpecificNumArithIterator_SubtractOperation_INTEGER>
+              <ForVarIterator varname="n"/>
+              <SingletonIterator value="xs:integer(2)"/>
+            </SpecificNumArithIterator_SubtractOperation_INTEGER>
+          </UDFunctionCallIterator>
+        </SpecificNumArithIterator_AddOperation_INTEGER>
+      </IfThenElseIterator>
+    </IfThenElseIterator>
+  </ReturnClause>
+</flwor::FLWORIterator>
+

=== added file 'test/rbkt/ExpQueryResults/zorba/udf/udf-fib-rec.xml.res'
--- test/rbkt/ExpQueryResults/zorba/udf/udf-fib-rec.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/udf/udf-fib-rec.xml.res	2011-12-13 03:41:26 +0000
@@ -0,0 +1,1 @@
+3736710778780434371

=== added file 'test/rbkt/Queries/zorba/udf/udf-fib-rec.xq'
--- test/rbkt/Queries/zorba/udf/udf-fib-rec.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/udf/udf-fib-rec.xq	2011-12-13 03:41:26 +0000
@@ -0,0 +1,8 @@
+declare function local:fib($n as xs:integer) as xs:integer
+{
+  if ($n eq 0) then 0 
+  else if ($n eq 1) then 1 
+  else local:fib($n - 1) + local:fib($n - 2)
+};
+
+local:fib(100)

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to     : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp

Reply via email to