Matthias Brantner has proposed merging lp:~zorba-coders/zorba/caching into lp:zorba.
Requested reviews: Matthias Brantner (matthias-brantner) Markos Zaharioudakis (markos-za) For more details, see: https://code.launchpad.net/~zorba-coders/zorba/caching/+merge/85423 - automatic caching of recursive, non-sequential, and deterministic functions with atomic parameter and return types - %ann:cache and %ann:no-cache for controlling function result caching -- https://code.launchpad.net/~zorba-coders/zorba/caching/+merge/85423 Your team Zorba Coders is subscribed to branch lp:zorba.
=== modified file 'ChangeLog' --- ChangeLog 2011-12-10 00:10:52 +0000 +++ ChangeLog 2011-12-13 03:41:26 +0000 @@ -1,6 +1,9 @@ Zorba - The XQuery Processor version 2.2 + + * Caching of results for recursive functions with atomic parameter and return types. + * Added %ann:cache and %ann:no-cache to enable or disable caching of results of functions with atomic parameter and return types. * Added index management function to the C++ api's StaticCollectionManager. version 2.1 @@ -77,6 +80,7 @@ * General index cannot be declared as unique if the type of its key is xs:anyAtomicType or xs:untypedAtomic. * Added undo for node revalidation + * Optimization for count(collection()) expressions * Fixed bug #867133 (SWIG PHP build failure on Mac OSX) * Fixed bug #872796 (validate-in-place can interfere with other update primitives) * Fixed bug #872799 (validate-in-place can set incorrect types) === modified file 'doc/zorba/options.dox' --- doc/zorba/options.dox 2011-09-14 06:15:19 +0000 +++ doc/zorba/options.dox 2011-12-13 03:41:26 +0000 @@ -278,6 +278,41 @@ In order to be able to use the value twice, the <tt>string:materialize</tt> function must be used to materialize the entire contents of the file <tt>myfile.txt</tt> in memory. Otherwise, the error zerr:ZSTR0055 is raised. + +\paragraph caching_annotation Caching Results of Functions + +Caching of function results may improve performance when computationally +expensive functions are invoked multiple times with the same arguments. + +If the optimization level is O1 or higher, Zorba automatically caches results +of recursive, deterministic, and non-sequential functions whose parameter and +return types are subtypes of xs:anyAtomicType. Specifically, if such a function +is called more than once with the same arguments, the result of the first call +will be cached and subsequent calls will return the cached value without +re-evaluating the function. + +For example, in the following recursive function computing a fibonacci number, +each result is automatically cached and, hence, dramatically improves performance. + +\include zorba/udf/udf-fib-rec.xq + +Specifically, this optimization reduces the complexity of the function from +O(1.6^n) to O(n). + +In order to explicitly disable function caching, the user can specify the +<tt>%ann:no-cache</tt> annotation. + +In addition, the user can use the <tt>%ann:cache</tt> annotation to cache the +results of functions other than the ones that are automatically cached. However, +this will only work if the function is not updating and its parameter and return +types are subtypes of xs:anyAtomicType; otherwise, Zorba will raise a warning +(zwarn:ZWST0005) and simply ignore the %ann:cache annotation. + +Please note, that explicitly enforcing caching for sequential or nondeterministic +functions might not give the intended result. In such cases, Zorba will raise a +warning (zwarn:ZWST0006) but obey the %ann:cache annotation. + + \paragraph collection_index_annotations Annotations on Collections and Indexes The \ref xqddf uses annotations to assign properties to collections and indexes. === modified file 'include/zorba/pregenerated/diagnostic_list.h' --- include/zorba/pregenerated/diagnostic_list.h 2011-11-15 08:23:20 +0000 +++ include/zorba/pregenerated/diagnostic_list.h 2011-12-13 03:41:26 +0000 @@ -600,6 +600,8 @@ extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0034_INDEX_RANGE_VALUE_PROBE_BAD_KEY_TYPES; +extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0035_INDEX_GENERAL_INSERT; + extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0031_IC_NOT_DECLARED; extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0032_IC_NOT_ACTIVATED; @@ -752,6 +754,10 @@ extern ZORBA_DLL_PUBLIC ZorbaWarningCode ZWST0004_AMBIGUOUS_SEQUENTIAL_FLWOR; +extern ZORBA_DLL_PUBLIC ZorbaWarningCode ZWST0005_CACHING_NOT_POSSIBLE; + +extern ZORBA_DLL_PUBLIC ZorbaWarningCode ZWST0006_CACHING_MIGHT_NOT_BE_INTENDED; + } // namespace zwarn } // namespace zorba #endif /* ZORBA_DIAGNOSTIC_LIST_API_H */ === modified file 'modules/com/zorba-xquery/www/modules/pregenerated/errors.xq' --- modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2011-11-15 08:23:20 +0000 +++ modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2011-12-13 03:41:26 +0000 @@ -501,6 +501,10 @@ (:~ :) +declare variable $zerr:ZDDY0035 as xs:QName := fn:QName($zerr:NS, "zerr:ZDDY0035"); + +(:~ +:) declare variable $zerr:ZDDY0031 as xs:QName := fn:QName($zerr:NS, "zerr:ZDDY0031"); (:~ === modified file 'modules/com/zorba-xquery/www/modules/pregenerated/warnings.xq' --- modules/com/zorba-xquery/www/modules/pregenerated/warnings.xq 2011-11-15 08:10:49 +0000 +++ modules/com/zorba-xquery/www/modules/pregenerated/warnings.xq 2011-12-13 03:41:26 +0000 @@ -53,4 +53,23 @@ (:~ :) -declare variable $zwarn:ZWST0004 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0004"); \ No newline at end of file +declare variable $zwarn:ZWST0004 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0004"); + +(:~ + : + : This warning is raised if the user explicitly enables caching + : of function results (using the %ann:cache annotation) but the function + : is updating or its parameter and return types are not subtypes of + : xs:anyAtomicType. + : +:) +declare variable $zwarn:ZWST0005 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0005"); + +(:~ + : + : This warning is raised if the user explicitly enables caching + : of function results (using the %ann:cache annotation) and the function + : is annotated as sequential or nondeterministic. + : +:) +declare variable $zwarn:ZWST0006 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0006"); \ No newline at end of file === modified file 'src/annotations/annotations.cpp' --- src/annotations/annotations.cpp 2011-11-07 06:32:00 +0000 +++ src/annotations/annotations.cpp 2011-12-13 03:41:26 +0000 @@ -100,6 +100,9 @@ ZANN(streamable, streamable); + ZANN(cache, cache); + ZANN(no-cache, nocache); + // // Zorba annotations - xqddf // @@ -168,6 +171,10 @@ ZANN(zann_nonsequential)); theRuleSet.push_back( + ZANN(zann_cache) | + ZANN(zann_nocache)); + + theRuleSet.push_back( ZANN(fn_private) | ZANN(fn_public)); @@ -433,6 +440,5 @@ } - } /* namespace zorba */ /* vim:set et sw=2 ts=2: */ === modified file 'src/annotations/annotations.h' --- src/annotations/annotations.h 2011-11-01 13:47:10 +0000 +++ src/annotations/annotations.h 2011-12-13 03:41:26 +0000 @@ -55,6 +55,8 @@ zann_nonassignable, zann_sequential, zann_nonsequential, + zann_cache, + zann_nocache, zann_variadic, zann_streamable, zann_unique, === modified file 'src/compiler/codegen/plan_visitor.cpp' --- src/compiler/codegen/plan_visitor.cpp 2011-12-07 20:46:23 +0000 +++ src/compiler/codegen/plan_visitor.cpp 2011-12-13 03:41:26 +0000 @@ -98,6 +98,7 @@ #endif #include "functions/function.h" +#include "functions/udf.h" #include "functions/library.h" #include "types/typeops.h" @@ -2276,7 +2277,7 @@ { CODEGEN_TRACE_OUT(""); - const function* func = v.get_func(); + function* func = v.get_func(); std::vector<PlanIter_t> argv; @@ -2320,45 +2321,12 @@ dynamic_cast<EnclosedIterator*>(iter.getp())->setInUpdateExpr(); } } -#if 0 else if (func->isUdf()) { - const user_function* udf = static_cast<const user_function*>(func); - - if (udf->isExiting()) - { - TypeManager* tm = v.get_type_manager(); - - const xqtref_t& udfType = udf->getSignature().returnType(); - - expr* body = udf->getBody(); - - std::vector<expr*> exitExprs; - ulong numExitExprs; - ulong i; - - body->get_exprs_of_kind(exit_expr_kind, exitExprs); - - for (i = 0; i < numExitExprs; ++i) - { - if (!TypeOps::is_subtype(tm, - *exitExprs[i]->get_return_type(), - *udfType, - loc)) - break; - } - - if (i < numExitExprs) - { - UDFunctionCallIterator* udfIter = - dynamic_cast<UDFunctionCallIterator*>(iter.getp()); - - ZORBA_ASSERT(udfIter != NULL); - udfIter->setCheckType(); - } - } + // need to computeResultCaching here for iterprint to work + user_function* udf = static_cast<user_function*>(func); + udf->computeResultCaching(theCCB->theXQueryDiagnostics); } -#endif } else { === modified file 'src/diagnostics/diagnostic_en.xml' --- src/diagnostics/diagnostic_en.xml 2011-12-07 20:46:23 +0000 +++ src/diagnostics/diagnostic_en.xml 2011-12-13 03:41:26 +0000 @@ -2013,6 +2013,10 @@ <value>"$1": index range-value probe has search keys with incompatible types</value> </diagnostic> + <diagnostic code="ZDDY0035" name="INDEX_GENERAL_INSERT"> + <value>"$1": index inserting more than one key not allowed for general index</value> + </diagnostic> + <diagnostic code="ZDDY0031" name="IC_NOT_DECLARED"> <value>"$1": integrity constraint is not declared</value> </diagnostic> @@ -2323,6 +2327,37 @@ <value>Sequential FLWOR expr may not have the semantics you expect</value> </diagnostic> + <diagnostic code="ZWST0005" name="CACHING_NOT_POSSIBLE"> + <comment> + This warning is raised if the user explicitly enables caching + of function results (using the %ann:cache annotation) but the function + is updating or its parameter and return types are not subtypes of + xs:anyAtomicType. + </comment> + <value>"$1": function caching not possible; $2</value> + <entry key="RETURN_TYPE"> + <value>return type ($3) is not subtype of xs:anyAtomicType</value> + </entry> + <entry key="PARAM_TYPE"> + <value>type of parameter $3 is $4 which is not a subtype of xs:anyAtomicType</value> + </entry> + <entry key="UPDATING"> + <value>function is updating</value> + </entry> + <entry key="VARIADIC"> + <value>function is variadic</value> + </entry> + </diagnostic> + + <diagnostic code="ZWST0006" name="CACHING_MIGHT_NOT_BE_INTENDED"> + <comment> + This warning is raised if the user explicitly enables caching + of function results (using the %ann:cache annotation) and the function + is annotated as sequential or nondeterministic. + </comment> + <value>"$1": function caching might not give the intended result because the function is declared as $2</value> + </diagnostic> + </namespace> <!--////////// Subvalues /////////////////////////////////////////////////--> === modified file 'src/diagnostics/pregenerated/diagnostic_list.cpp' --- src/diagnostics/pregenerated/diagnostic_list.cpp 2011-11-15 08:23:20 +0000 +++ src/diagnostics/pregenerated/diagnostic_list.cpp 2011-12-13 03:41:26 +0000 @@ -879,6 +879,9 @@ ZorbaErrorCode ZDDY0034_INDEX_RANGE_VALUE_PROBE_BAD_KEY_TYPES( "ZDDY0034" ); +ZorbaErrorCode ZDDY0035_INDEX_GENERAL_INSERT( "ZDDY0035" ); + + ZorbaErrorCode ZDDY0031_IC_NOT_DECLARED( "ZDDY0031" ); @@ -1104,6 +1107,12 @@ ZorbaWarningCode ZWST0004_AMBIGUOUS_SEQUENTIAL_FLWOR( "ZWST0004" ); +ZorbaWarningCode ZWST0005_CACHING_NOT_POSSIBLE( "ZWST0005" ); + + +ZorbaWarningCode ZWST0006_CACHING_MIGHT_NOT_BE_INTENDED( "ZWST0006" ); + + } // namespace zwarn } // namespace zorba === modified file 'src/diagnostics/pregenerated/dict_en.cpp' --- src/diagnostics/pregenerated/dict_en.cpp 2011-12-01 16:19:52 +0000 +++ src/diagnostics/pregenerated/dict_en.cpp 2011-12-13 03:41:26 +0000 @@ -298,6 +298,7 @@ { "ZDDY0032", "\"$1\": integrity constraint is not activated" }, { "ZDDY0033", "\"$1\": integrity constraint not met for collection \"$2\"" }, { "ZDDY0034", "\"$1\": index range-value probe has search keys with incompatible types" }, + { "ZDDY0035", "\"$1\": index inserting more than one key not allowed for general index" }, { "ZDST0001", "\"$1\": collection already declared" }, { "ZDST0002", "\"$1\": collection already imported into module \"$2\"" }, { "ZDST0003", "\"$1\": collection declaration not allowed in main module" }, @@ -360,6 +361,8 @@ { "ZWST0002", "\"$1\": unknown or unsupported annotation" }, { "ZWST0003", "\"$1\": function declared sequential, but has non-sequential body" }, { "ZWST0004", "Sequential FLWOR expr may not have the semantics you expect" }, + { "ZWST0005", "\"$1\": function caching not possible; $2" }, + { "ZWST0006", "\"$1\": function caching might not give the intended result because the function is declared as $2" }, { "ZXQD0001", "\"$1\": prefix not declared when calling function \"$2\" from $3" }, { "ZXQD0002", "\"$1\": $2" }, { "ZXQD0003", "inconsistent options to the parse-xml-fragment() function: $1" }, @@ -664,6 +667,10 @@ { "~XUST0002_UDF_2", "\"$2\": function declared updating but body is not updating or vacuous" }, { "~ZDST0060_unknown_localname", "unknown localname ($3)" }, { "~ZDST0060_unknown_namespace", "unknown namespace ($3)" }, + { "~ZWST0005_PARAM_TYPE", "type of parameter $3 is $4 which is not a subtype of xs:anyAtomicType" }, + { "~ZWST0005_RETURN_TYPE", "return type ($3) is not subtype of xs:anyAtomicType" }, + { "~ZWST0005_UPDATING", "function is updating" }, + { "~ZWST0005_VARIADIC", "function is variadic" }, { "~ZXQD0004_NON_NEGATIVE", "given value must be non-negative ($2)" }, { "~ZXQD0004_NOT_WITHIN_RANGE", "not within allowed range ($2)" }, { "~ZXQP0004_TypeOps_is_in_scope_ForFunctionItemTypes", "TypeOps::is_in_scope() for function-item types" }, === modified file 'src/functions/udf.cpp' --- src/functions/udf.cpp 2011-10-18 17:37:41 +0000 +++ src/functions/udf.cpp 2011-12-13 03:41:26 +0000 @@ -26,10 +26,15 @@ #include "compiler/rewriter/framework/rewriter.h" #include "functions/udf.h" +#include "annotations/annotations.h" #include "functions/function_impl.h" +#include "diagnostics/xquery_warning.h" + #include "types/typeops.h" +#include "store/api/index.h" // needed for destruction of the cache + namespace zorba { @@ -54,7 +59,10 @@ theIsExiting(false), theIsLeaf(true), theIsOptimized(false), - thePlanStateSize(0) + thePlanStateSize(0), + theCache(0), + theCacheResults(false), + theCacheComputed(false) { setFlag(FunctionConsts::isUDF); resetFlag(FunctionConsts::isBuiltin); @@ -318,7 +326,7 @@ /******************************************************************************* ********************************************************************************/ - PlanIter_t user_function::getPlan(CompilerCB* ccb, uint32_t& planStateSize) +PlanIter_t user_function::getPlan(CompilerCB* ccb, uint32_t& planStateSize) { if (thePlan == NULL) { @@ -372,9 +380,185 @@ /******************************************************************************* ********************************************************************************/ -CODEGEN_DEF(user_function) -{ - return new UDFunctionCallIterator(aSctx, aLoc, aArgs, this); +store::Index* user_function::getCache() const +{ + return theCache.getp(); +} + + +/******************************************************************************* + +********************************************************************************/ +void user_function::setCache(store::Index* aCache) +{ + theCache = aCache; +} + + +/******************************************************************************* + +********************************************************************************/ +bool user_function::cacheResults() const +{ + return theCacheResults; +} + + +/******************************************************************************* + only cache recursive (non-sequential, non-updating, deterministic) + functions with singleton atomic input and output +********************************************************************************/ +void user_function::computeResultCaching(XQueryDiagnostics* diag) +{ + if (theCacheComputed) + { + return; + } + + struct OnExit + { + private: + bool& theResult; + bool& theCacheComputed; + + public: + OnExit(bool& aResult, bool& aCacheComputed) + : + theResult(aResult), + theCacheComputed(aCacheComputed) {} + + void cache() { theResult = true; } + + ~OnExit() + { + theCacheComputed = true; + } + }; + + // will be destroyed when the function is exited + // set caching to true if cache() is called + OnExit lExit(theCacheResults, theCacheComputed); + + // check necessary conditions + // %ann:cache or not %ann:no-cache + if (theAnnotationList && + theAnnotationList->contains(AnnotationInternal::zann_nocache)) + { + return; + } + + // was the %ann:cache annotation given explicitly by the user + bool lExplicitCacheRequest = + (theAnnotationList ? + theAnnotationList->contains(AnnotationInternal::zann_cache) : + false); + + if (isVariadic()) + { + if (lExplicitCacheRequest) + { + diag->add_warning( + NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE, + WARN_PARAMS(getName()->getStringValue(), ZED(ZWST0005_VARIADIC)), + WARN_LOC(theLoc))); + } + return; + } + + // parameter and return types are subtype of xs:anyAtomicType? + const xqtref_t& lRes = theSignature.returnType(); + TypeManager* tm = lRes->get_manager(); + + if (!TypeOps::is_subtype(tm, + *lRes, + *GENV_TYPESYSTEM.ANY_ATOMIC_TYPE_ONE, + theLoc)) + { + if (lExplicitCacheRequest) + { + diag->add_warning( + NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE, + WARN_PARAMS(getName()->getStringValue(), + ZED(ZWST0005_RETURN_TYPE), + lRes->toString()), + WARN_LOC(theLoc))); + } + return; + } + + csize lArity = theSignature.paramCount(); + for (csize i = 0; i < lArity; ++i) + { + const xqtref_t& lArg = theSignature[i]; + if (!TypeOps::is_subtype(tm, + *lArg, + *GENV_TYPESYSTEM.ANY_ATOMIC_TYPE_ONE, + theLoc)) + { + if (lExplicitCacheRequest) + { + diag->add_warning( + NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE, + WARN_PARAMS(getName()->getStringValue(), + ZED(ZWST0005_PARAM_TYPE), + i+1, + lArg->toString()), + WARN_LOC(theLoc))); + } + return; + } + } + + // function updating? + if (isUpdating()) + { + if (lExplicitCacheRequest) + { + diag->add_warning( + NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE, + WARN_PARAMS(getName()->getStringValue(), ZED(ZWST0005_UPDATING)), + WARN_LOC(theLoc))); + } + return; + } + + if (isSequential() || !isDeterministic()) + { + if (lExplicitCacheRequest) + { + diag->add_warning( + NEW_XQUERY_WARNING(zwarn::ZWST0006_CACHING_MIGHT_NOT_BE_INTENDED, + WARN_PARAMS(getName()->getStringValue(), + (isSequential()?"sequential":"non-deterministic")), + WARN_LOC(theLoc))); + + lExit.cache(); + } + return; + } + + + // optimization is prerequisite before invoking isRecursive + if (!lExplicitCacheRequest && isOptimized() && !isRecursive()) + { + return; + } + + lExit.cache(); +} + + +/******************************************************************************* + +********************************************************************************/ +PlanIter_t user_function::codegen( + CompilerCB* cb, + static_context* sctx, + const QueryLoc& loc, + std::vector<PlanIter_t>& argv, + AnnotationHolder& ann) const +{ + return new UDFunctionCallIterator(sctx, loc, argv, this); } === modified file 'src/functions/udf.h' --- src/functions/udf.h 2011-10-18 17:37:41 +0000 +++ src/functions/udf.h 2011-12-13 03:41:26 +0000 @@ -25,6 +25,12 @@ namespace zorba { + namespace store + { + class Index; + typedef rchandle<Index> Index_t; + } + /******************************************************************************* A udf with params $x1, $x2, ..., $xn and a body_expr is translated into a @@ -78,6 +84,25 @@ references of an arg var, these references are "mutually exclusive", ie, at most one of the references will actually be reached during each particular execution of the body. + + theCache: + --------- + Maps the arg values of an invocation to the result of that invocation. + If an invocation uses the same arg values as a previous invocation, the cached + result is simply returned without re-evaluating the udf. + + theCacheResults: + ---------------- + Tells whether caching should be done for this udf or not. + + theCacheComputed: + ----------------- + Tells whether theCacheResults has been computed already or not. + theCacheResults is computed by the computeResultCaching() method, which is + invoked during codegen every time a udf call is encountered. The same udf may + be invoked multiple times, but the computation of theCacheResults needs to + be done only once. So, during the 1st invcocation of computeResultCaching(), + theCacheComputed is set to true, and subsequent invocations are noops. ********************************************************************************/ class user_function : public function { @@ -102,6 +127,10 @@ uint32_t thePlanStateSize; std::vector<ArgVarRefs> theArgVarsRefs; + store::Index_t theCache; + bool theCacheResults; + bool theCacheComputed; + public: SERIALIZABLE_CLASS(user_function) user_function(::zorba::serialization::Archiver& ar); @@ -160,6 +189,14 @@ const std::vector<ArgVarRefs>& getArgVarsRefs() const; + store::Index* getCache() const; + + void setCache(store::Index* aCache); + + bool cacheResults() const; + + void computeResultCaching(XQueryDiagnostics*); + PlanIter_t codegen( CompilerCB* cb, static_context* sctx, === modified file 'src/runtime/core/fncall_iterator.cpp' --- src/runtime/core/fncall_iterator.cpp 2011-10-30 00:18:34 +0000 +++ src/runtime/core/fncall_iterator.cpp 2011-12-13 03:41:26 +0000 @@ -48,6 +48,11 @@ #include "util/string_util.h" +#include "store/api/index.h" +#include "store/api/store.h" +#include "store/api/iterator_factory.h" +#include "store/api/temp_seq.h" + #ifdef ZORBA_WITH_DEBUGGER #include "debugger/debugger_commons.h" @@ -93,7 +98,8 @@ thePlanState(NULL), thePlanStateSize(0), theLocalDCtx(NULL), - thePlanOpen(false) + thePlanOpen(false), + theCache(0) { } @@ -153,6 +159,8 @@ { thePlan->reset(*thePlanState); } + + theArgValues.clear(); } @@ -198,6 +206,113 @@ /******************************************************************************* ********************************************************************************/ +bool UDFunctionCallIterator::isCached() const +{ + return theUDF->cacheResults(); +} + + +/******************************************************************************* + +********************************************************************************/ +void UDFunctionCallIterator::createCache( + PlanState& planState, + UDFunctionCallIteratorState* state) +{ + store::Index_t lIndex = theUDF->getCache(); + + if (!lIndex && theUDF->cacheResults()) + { + const signature& sig = theUDF->getSignature(); + + csize numArgs = theChildren.size(); + + store::IndexSpecification lSpec; + lSpec.theNumKeyColumns = numArgs; + lSpec.theKeyTypes.resize(numArgs); + lSpec.theCollations.resize(numArgs); + lSpec.theIsTemp = true; + lSpec.theIsUnique = true; + + for (csize i = 0; i < numArgs; ++i) + { + lSpec.theKeyTypes[i] = sig[i]->getBaseBuiltinType()->get_qname().getp(); + } + + lIndex = GENV_STORE.createIndex(theUDF->getName(), lSpec, 0); + + theUDF->setCache(lIndex.getp()); // cache the cache in the function itself + + state->theArgValues.reserve(numArgs); + } + + state->theCache = lIndex.getp(); +} + + +/******************************************************************************* + +********************************************************************************/ +bool UDFunctionCallIterator::probeCache( + PlanState& planState, + UDFunctionCallIteratorState* state, + store::Item_t& result, + std::vector<store::Item_t>& aKey) const +{ + if (!state->theCache) + return false; + + store::IndexCondition_t lCond = + state->theCache->createCondition(store::IndexCondition::POINT_VALUE); + + std::vector<store::Iterator_t>::iterator lIter = state->theArgWrappers.begin(); + + for (; lIter != state->theArgWrappers.end(); ++lIter) + { + store::Iterator_t& argWrapper = (*lIter); + store::Item_t lArg; + if (argWrapper) // might be 0 if argument is not used + { + argWrapper->next(lArg); // guaranteed to have exactly one result + } + aKey.push_back(lArg); + lCond->pushItem(lArg); + } + + store::IndexProbeIterator_t probeIte = + GENV_STORE.getIteratorFactory()->createIndexProbeIterator(state->theCache); + + probeIte->init(lCond); + probeIte->open(); + return probeIte->next(result); +} + + +/******************************************************************************* + +********************************************************************************/ +void UDFunctionCallIterator::insertCacheEntry( + UDFunctionCallIteratorState* state, + std::vector<store::Item_t>& aKey, + store::Item_t& aValue) const +{ + if (state->theCache) + { + std::auto_ptr<store::IndexKey> k(new store::IndexKey()); + store::IndexKey* k2 = k.get(); + k->theItems = aKey; + store::Item_t lTmp = aValue; // insert will eventually transfer the Item_t + if (!state->theCache->insert(k2, lTmp)) + { + k.release(); + } + } +} + + +/******************************************************************************* + +********************************************************************************/ void UDFunctionCallIterator::openImpl(PlanState& planState, uint32_t& offset) { UDFunctionCallIteratorState* state; @@ -229,6 +344,11 @@ // the plan state (but not the state block) and dynamic context. state->open(planState, theUDF); + // if the results of the function should be cached (prereq: atomic in and out) + // this functions stores an index in the dynamic context that contains + // the cached results. The name of the index is the name of the function. + createCache(planState, state); + // Create a wrapper over each subplan that computes an argument expr, if the // associated param is actually used anywhere in the function body. csize numArgs = theChildren.size(); @@ -301,6 +421,9 @@ { try { + std::vector<store::Item_t> lKey; + bool lCacheHit; + UDFunctionCallIteratorState* state; DEFAULT_STACK_INIT(UDFunctionCallIteratorState, state, planState); @@ -314,19 +437,31 @@ state->thePlanOpen = true; } - // Bind the args. + // check if there is a cache and the result is already in the cache + lCacheHit = probeCache(planState, state, result, lKey); + + // if not in the cache, we bind the arguments to the function + if (!lCacheHit) { const std::vector<ArgVarRefs>& argsRefs = theUDF->getArgVarsRefs(); - std::vector<ArgVarRefs>::const_iterator argsRefsIte = argsRefs.begin(); - std::vector<ArgVarRefs>::const_iterator argsRefsEnd = argsRefs.end(); - - std::vector<store::Iterator_t>::iterator argWrapsIte = - state->theArgWrappers.begin(); - - for (; argsRefsIte != argsRefsEnd; ++argsRefsIte, ++argWrapsIte) + const std::vector<store::Iterator_t>& argWraps = state->theArgWrappers; + + for (size_t i = 0; i < argsRefs.size(); ++i) { - store::Iterator_t& argWrapper = (*argWrapsIte); - const ArgVarRefs& argVarRefs = (*argsRefsIte); + const ArgVarRefs& argVarRefs = argsRefs[i]; + store::Iterator_t argWrapper; + if (state->theCache) + { + std::vector<store::Item_t> lParam(1, lKey[i]); + state->theArgValues.push_back(GENV_STORE.createTempSeq(lParam)); + argWrapper = state->theArgValues.back()->getIterator(); + argWrapper->open(); + } + else + { + argWrapper = argWraps[i]; + } + ArgVarRefs::const_iterator argVarRefsIte = argVarRefs.begin(); ArgVarRefs::const_iterator argVarRefsEnd = argVarRefs.end(); @@ -343,26 +478,35 @@ } } -#ifdef ZORBA_WITH_DEBUGGER - DEBUGGER_PUSH_FRAME; -#endif - - while (consumeNext(result, state->thePlan, *state->thePlanState)) + if (lCacheHit) { -#ifdef ZORBA_WITH_DEBUGGER - DEBUGGER_POP_FRAME; -#endif STACK_PUSH(true, state); - + } + else + { #ifdef ZORBA_WITH_DEBUGGER DEBUGGER_PUSH_FRAME; #endif + while (consumeNext(result, state->thePlan, *state->thePlanState)) + { +#ifdef ZORBA_WITH_DEBUGGER + DEBUGGER_POP_FRAME; +#endif + + insertCacheEntry(state, lKey, result); + STACK_PUSH(true, state); + +#ifdef ZORBA_WITH_DEBUGGER + DEBUGGER_PUSH_FRAME; +#endif + } + +#ifdef ZORBA_WITH_DEBUGGER + DEBUGGER_POP_FRAME; +#endif + } -#ifdef ZORBA_WITH_DEBUGGER - DEBUGGER_POP_FRAME; -#endif - STACK_END(state); } === modified file 'src/runtime/core/fncall_iterator.h' --- src/runtime/core/fncall_iterator.h 2011-10-30 00:18:34 +0000 +++ src/runtime/core/fncall_iterator.h 2011-12-13 03:41:26 +0000 @@ -69,16 +69,31 @@ the body. So, it is never the case that the arg expr will have more than one consumers, and as a result we can bind all those V references to the same arg wrapper. + + theCache: + --------- + Is an Index which is set in the state if caching for the invoked function + should be done. The cache is owned by the UDF itself and shared across + all function invocations. + + theCacheHits: + ------------- + If caching is used, this vector contains the results of all arguments + of the function evaluation. It's used to bind the variables if the + cache didn't give a result in order to avoid duplicate evaluation of + the arguments. ********************************************************************************/ class UDFunctionCallIteratorState : public PlanIteratorState { public: - PlanIterator * thePlan; - PlanState * thePlanState; - uint32_t thePlanStateSize; - dynamic_context * theLocalDCtx; - bool thePlanOpen; - std::vector<store::Iterator_t> theArgWrappers; + PlanIterator * thePlan; + PlanState * thePlanState; + uint32_t thePlanStateSize; + dynamic_context * theLocalDCtx; + bool thePlanOpen; + std::vector<store::Iterator_t> theArgWrappers; + store::Index * theCache; + std::vector<store::TempSeq_t> theArgValues; UDFunctionCallIteratorState(); @@ -130,6 +145,8 @@ void setDynamic() { theIsDynamic = true; } + bool isCached() const; + void accept(PlanIterVisitor& v) const; void openImpl(PlanState& planState, uint32_t& offset); @@ -139,6 +156,22 @@ void closeImpl(PlanState& planState); bool nextImpl(store::Item_t& result, PlanState& planState) const; + +protected: + void createCache( + PlanState& planState, + UDFunctionCallIteratorState* state); + + bool probeCache( + PlanState& planState, + UDFunctionCallIteratorState* state, + store::Item_t& result, + std::vector<store::Item_t>& aKey) const; + + void insertCacheEntry( + UDFunctionCallIteratorState* state, + std::vector<store::Item_t>& aKey, + store::Item_t& aValue) const; }; === modified file 'src/runtime/visitors/printer_visitor_impl.cpp' --- src/runtime/visitors/printer_visitor_impl.cpp 2011-12-01 11:02:25 +0000 +++ src/runtime/visitors/printer_visitor_impl.cpp 2011-12-13 03:41:26 +0000 @@ -1209,7 +1209,23 @@ #undef TYPED_VAL_CMP - PRINTER_VISITOR_DEFINITION (UDFunctionCallIterator) + void PrinterVisitor::beginVisit ( const UDFunctionCallIterator& a ) + { + thePrinter.startBeginVisit("UDFunctionCallIterator", ++theId); + printCommons( &a, theId ); + if (a.isCached()) + { + thePrinter.addAttribute("cached", "true"); + } + thePrinter.endBeginVisit( theId); + } + + void PrinterVisitor::endVisit ( const UDFunctionCallIterator& ) + { + thePrinter.startEndVisit(); + thePrinter.endEndVisit(); + } + PRINTER_VISITOR_DEFINITION (ExtFunctionCallIterator) PRINTER_VISITOR_DEFINITION (FnBooleanIterator) PRINTER_VISITOR_DEFINITION (OrIterator) === modified file 'src/store/api/index.h' --- src/store/api/index.h 2011-10-11 12:11:48 +0000 +++ src/store/api/index.h 2011-12-13 03:41:26 +0000 @@ -415,6 +415,18 @@ * Returns all keys stored in this index */ virtual KeyIterator_t keys() const = 0; + + /** + * Insert the given item in the value set of the given key. If the key is not + * in the index already, then the key itself is inserted as well. Return true + * if the key was already in the index, false otherwise + * The index wil take the ownership of the key if it was not already in the + * index. + * + * @error ZDDY0035 if a key with more than one item is inserted into + * a general index + */ + virtual bool insert(store::IndexKey*& key, store::Item_t& item) = 0; }; === modified file 'src/store/naive/simple_index_general.cpp' --- src/store/naive/simple_index_general.cpp 2011-11-15 18:48:54 +0000 +++ src/store/naive/simple_index_general.cpp 2011-12-13 03:41:26 +0000 @@ -236,6 +236,20 @@ /****************************************************************************** *******************************************************************************/ +bool GeneralIndex::insert(store::IndexKey*& key, store::Item_t& value) +{ + if (key->size() != 1) + { + RAISE_ERROR_NO_LOC(zerr::ZDDY0035_INDEX_GENERAL_INSERT, + ERROR_PARAMS(getName()->getStringValue())); + } + return insert((*key)[0], value); +} + + +/****************************************************************************** + +*******************************************************************************/ bool GeneralIndex::insert(store::Item_t& key, store::Item_t& node) { bool lossy = false; === modified file 'src/store/naive/simple_index_general.h' --- src/store/naive/simple_index_general.h 2011-11-15 18:48:54 +0000 +++ src/store/naive/simple_index_general.h 2011-12-13 03:41:26 +0000 @@ -174,6 +174,8 @@ bool insert(store::Item_t& key, store::Item_t& node); + bool insert(store::IndexKey*& key, store::Item_t& value); + virtual bool remove(const store::Item_t& key, store::Item_t& item, bool all) = 0; }; === modified file 'src/store/naive/simple_store.cpp' --- src/store/naive/simple_store.cpp 2011-11-02 17:19:09 +0000 +++ src/store/naive/simple_store.cpp 2011-12-13 03:41:26 +0000 @@ -555,6 +555,9 @@ store::Iterator* aSourceIter, ulong aNumColumns) { + if (!aSourceIter) + return; + store::Item_t domainItem; store::IndexKey* key = NULL; === modified file 'src/store/naive/simple_temp_seq.cpp' --- src/store/naive/simple_temp_seq.cpp 2011-08-19 17:04:48 +0000 +++ src/store/naive/simple_temp_seq.cpp 2011-12-13 03:41:26 +0000 @@ -53,27 +53,6 @@ /******************************************************************************* ********************************************************************************/ -store::Item_t SimpleTempSeq::operator[](xs_integer aIndex) -{ - uint64_t lIndex; - try { - lIndex = to_xs_unsignedLong(aIndex); - } catch (std::range_error& e) - { - throw ZORBA_EXCEPTION( - zerr::ZSTR0060_RANGE_EXCEPTION, - ERROR_PARAMS( - BUILD_STRING("access out of bounds " << e.what() << ")") - ) - ); - } - return theItems[lIndex]; -} - - -/******************************************************************************* - -********************************************************************************/ xs_integer SimpleTempSeq::getSize() { return theItems.size(); @@ -312,14 +291,15 @@ case none: if ( theCurPos < to_xs_unsignedLong(theTempSeq->getSize()) ) { - result = (*theTempSeq)[theCurPos]; + result = theTempSeq->theItems[theCurPos]; return true; } break; + case startEnd: if ( theCurPos < theEndPos ) { - result = (*theTempSeq)[theCurPos]; + result = theTempSeq->theItems[theCurPos]; return true; } break; @@ -336,17 +316,17 @@ { case none: if ( theCurPos < to_xs_unsignedLong(theTempSeq->getSize()) ) - return (*theTempSeq)[theCurPos]; + return theTempSeq->theItems[theCurPos]; break; case startEnd: if ( theCurPos < theEndPos ) - return (*theTempSeq)[theCurPos]; + return theTempSeq->theItems[theCurPos]; break; } return NULL; } - + void SimpleTempSeqIter::reset() { === modified file 'src/store/naive/simple_temp_seq.h' --- src/store/naive/simple_temp_seq.h 2011-07-29 23:01:30 +0000 +++ src/store/naive/simple_temp_seq.h 2011-12-13 03:41:26 +0000 @@ -35,6 +35,8 @@ ********************************************************************************/ class SimpleTempSeq : public store::TempSeq { + friend class SimpleTempSeqIter; + private: std::vector<store::Item_t> theItems; @@ -55,8 +57,6 @@ void append(store::Iterator_t iter, bool copy); - store::Item_t operator[](xs_integer aIndex); - void getItem(xs_integer position, store::Item_t& res); bool containsItem(xs_integer position); @@ -102,9 +102,9 @@ SimpleTempSeq_t theTempSeq; BorderType theBorderType; - uint64_t theCurPos; - uint64_t theStartPos; - uint64_t theEndPos; + uint64_t theCurPos; + uint64_t theStartPos; + uint64_t theEndPos; public: SimpleTempSeqIter() {} === added file 'test/rbkt/ExpCompilerResults/IterPlan/zorba/udf/udf-fib-rec.iter' --- test/rbkt/ExpCompilerResults/IterPlan/zorba/udf/udf-fib-rec.iter 1970-01-01 00:00:00 +0000 +++ test/rbkt/ExpCompilerResults/IterPlan/zorba/udf/udf-fib-rec.iter 2011-12-13 03:41:26 +0000 @@ -0,0 +1,42 @@ +Iterator tree for main query: +<UDFunctionCallIterator cached="true"> + <SingletonIterator value="xs:integer(100)"/> +</UDFunctionCallIterator> + +Iterator tree for local:fib: +<flwor::FLWORIterator> + <ForVariable name="n"> + <LetVarIterator varname="n"/> + </ForVariable> + <ReturnClause> + <IfThenElseIterator> + <TypedValueCompareIterator_INTEGER> + <ForVarIterator varname="n"/> + <SingletonIterator value="xs:integer(0)"/> + </TypedValueCompareIterator_INTEGER> + <SingletonIterator value="xs:integer(0)"/> + <IfThenElseIterator> + <TypedValueCompareIterator_INTEGER> + <ForVarIterator varname="n"/> + <SingletonIterator value="xs:integer(1)"/> + </TypedValueCompareIterator_INTEGER> + <SingletonIterator value="xs:integer(1)"/> + <SpecificNumArithIterator_AddOperation_INTEGER> + <UDFunctionCallIterator cached="true"> + <SpecificNumArithIterator_SubtractOperation_INTEGER> + <ForVarIterator varname="n"/> + <SingletonIterator value="xs:integer(1)"/> + </SpecificNumArithIterator_SubtractOperation_INTEGER> + </UDFunctionCallIterator> + <UDFunctionCallIterator cached="true"> + <SpecificNumArithIterator_SubtractOperation_INTEGER> + <ForVarIterator varname="n"/> + <SingletonIterator value="xs:integer(2)"/> + </SpecificNumArithIterator_SubtractOperation_INTEGER> + </UDFunctionCallIterator> + </SpecificNumArithIterator_AddOperation_INTEGER> + </IfThenElseIterator> + </IfThenElseIterator> + </ReturnClause> +</flwor::FLWORIterator> + === added file 'test/rbkt/ExpQueryResults/zorba/udf/udf-fib-rec.xml.res' --- test/rbkt/ExpQueryResults/zorba/udf/udf-fib-rec.xml.res 1970-01-01 00:00:00 +0000 +++ test/rbkt/ExpQueryResults/zorba/udf/udf-fib-rec.xml.res 2011-12-13 03:41:26 +0000 @@ -0,0 +1,1 @@ +3736710778780434371 === added file 'test/rbkt/Queries/zorba/udf/udf-fib-rec.xq' --- test/rbkt/Queries/zorba/udf/udf-fib-rec.xq 1970-01-01 00:00:00 +0000 +++ test/rbkt/Queries/zorba/udf/udf-fib-rec.xq 2011-12-13 03:41:26 +0000 @@ -0,0 +1,8 @@ +declare function local:fib($n as xs:integer) as xs:integer +{ + if ($n eq 0) then 0 + else if ($n eq 1) then 1 + else local:fib($n - 1) + local:fib($n - 2) +}; + +local:fib(100)
-- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp