@David,

Thx for the links. Indeed I thought that Marklogic DB was not free to use !
I will give it a try.
You are right: precising maps implementation is important, since it can
change a lot performances issues.

Concerning your profile tests for maps, a quick answer. If your numbers are
correct, the BaseX map seems to be 10x faster than MarkLogic one for
1000000 insertions, but behave as n log(n) (it is based over Phil Bagwell’s
hash tries AFAIK), I will try to profile MarkLogic map more precisely. This
might be due to the hash function ? Note that it is interesting to have
imperative language performances in mind. A standard C++ std::map takes
about 0.5 sec to insert 2^22 (i.e. 4 000 000) on my PC, single thread. As a
rule of thumb, a C++ std::map (that is a basic red/black tree AFAIK) should
be around 25 x faster for insertion than BaseX one.





2014/1/1 David Lee <[email protected]>

>
>
> @jean-marc
>
>    "@David, indeed, I am a little bit reluctant to use non standardized
> tools. Anyhow, I am quite surprised by your sentence : there exists map
> containers working in linear access ? Do you have any references to point
> out ?
>
> "
>
>
>
> Just for reference, and to disclose my goal is not to push a particular
> vendor but to make the point that to make generalized statements about
> performance characteristics  of generic data types without respect to the
> implementation is not wise.
>
> Having done performance work professionally for decades I can attest this
> is a general rule - "It Depends".
>
>
>
>
>
> But for reference here is one data point demonstrating linear access to
> both maps and arrays in XQuery (using a vendor extension).
>
>
>
>
>
> Using the developer (free for development use) version of MarkLogic 7.0.1
>
> Downloaded from here:
>
> http://developer.marklogic.com/products
>
>
> On my windows desktop (a fairly high end machine, but not unusual by any
> means).
>
>
>
> map test:
>
> using map:map object
>
> http://docs.marklogic.com/map:map?q=map:map
>
>
>
>
>
> I did not have the code return any results so that the timing was affected
> by the result size.
>
> You are welcome to try for yourself to validate that the optimizer was not
> so incredible as to actually optimize away the code (or it would have taken
> constant time if it realized the result was always () ).
>
>
>
> This was just quickly done in "Query Console" using the built in profiler.
>
> Slightly better tests would be to invoke this as a stored XQuery module to
> avoid the interaction of the profiler and javascript console,
>
> but you can see within reason the results are linear.
>
>
>
>
>
> Note this also incurs a number to string conversion because map:map keys
> are strings.
>
> also note in this case the map was not pre-sized, it grows on demand.
>
>
>
> tested  by incrementing $max from 100 to  10000000 in factors of 10
>
>
>
> -------   XQuery code
>
> declare variable $map := map:map();
>
> declare variable $max := 10000000;
>
>
>
> for $i in 1 to $max return
>
>    map:put( $map , ""||$i , $i ),
>
> for $i in 1 to $max
>
>    let $_ :=  map:get( $map , ""||$i )
>
>    return ()
>
>
>
>
>
> Results
>
>
>
> 100        .000/.001 sec - limits of the time precision used
>
> 1000       .005 sec
>
> 10000      .047 sec
>
> 100000     .46  sec
>
> 1000000   4.8  sec
>
> 10000000 49.49 sec
>
>
>
>
>
> Array test using json:array
>
> http://docs.marklogic.com/json:array?q=json:array
>
>
>
> Note in this case I do pre-size the array to $max to avoid a resize.  Its
> a minor optimization but useful.
>
>
>
> ------ XQuery Code
>
>
>
> declare variable $max := 10000000;
>
> declare variable $array := json:to-array( () , $max );
>
>
>
> for $i in 1 to $max return
>
>    $array[$i] = $i ,
>
> for $i in 1 to $max
>
>    let $_ :=  $array[$i]
>
>    return ()
>
>
>
> ---------  Results
>
> 100        - under timer percision
>
> 1000       .002 sec
>
> 10000      .02  sec
>
> 100000     .23  sec
>
> 1000000   2.06  sec
>
> 10000000 20.65  sec
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* jean-marc Mercier [mailto:[email protected]]
> *Sent:* Wednesday, January 01, 2014 11:05 AM
> *To:* David Lee
> *Cc:* Adam Retter; [email protected]; Andrew Welch; ihe onwuka; Michael
> Sokolov
>
> *Subject:* Re: [xquery-talk] Matrix Multiplication
>
>
>
> @David, indeed, I am a little bit reluctant to use non standardized tools.
> Anyhow, I am quite surprised by your sentence : there exists map containers
> working in linear access ? Do you have any references to point out ?
>
>
>
> 2014/1/1 David Lee <[email protected]>
>
> On 31 Dec 2013 17:03, "jean-marc Mercier" <[email protected]>
> wrote:
>
> @David pairs are also basically needed to write a linear algebra modulus,
> the topic of this thread. And XQUERY don't provide any efficient pair. You
> can't use Marklogic map, or any other vendor map to store vectors for
> performance issues (a map is really slow).
>
> http://x-query.com/mailman/listinfo/talk
>
> *[DAL:] *
>
> *Just an FYI  but MarkLogic also have native arrays  (in addition to maps)
> which are extremely efficient (they are stored internally as C arrays).*
>
> *But if you have to do a lot of iterating through the arrays  - even
> though the accessors are very efficient ,*
>
> *the surrounding FLOWR code is still XQuery which slows things down a
> bit.   Maybe or maybe not enough to make them not useful for you.*
>
> *Also the marklogic maps are not the same as XQuery 3 implemented maps,
> they are a hash map under the hood and typically linear access.*
>
>
>
> *But back to your original issue, these are vendor extensions and hence
> not portable to other implementations (Until XQuery itself *
>
> *standardizes on arrays and maps - then it is likely that vendor extension
> implementations will be used to expose the standard interface).*
>
> *If what you want is pure XQuery ... it doesnt matter how fast these are
> if you cant use them because they dont exist in all implementations.*
>
>
>
>
>
> *-David*
>
>
>
>
>
>
>
>
>
_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk

Reply via email to