@David, Thx for the links. Indeed I thought that Marklogic DB was not free to use ! I will give it a try. You are right: precising maps implementation is important, since it can change a lot performances issues.
Concerning your profile tests for maps, a quick answer. If your numbers are correct, the BaseX map seems to be 10x faster than MarkLogic one for 1000000 insertions, but behave as n log(n) (it is based over Phil Bagwell’s hash tries AFAIK), I will try to profile MarkLogic map more precisely. This might be due to the hash function ? Note that it is interesting to have imperative language performances in mind. A standard C++ std::map takes about 0.5 sec to insert 2^22 (i.e. 4 000 000) on my PC, single thread. As a rule of thumb, a C++ std::map (that is a basic red/black tree AFAIK) should be around 25 x faster for insertion than BaseX one. 2014/1/1 David Lee <[email protected]> > > > @jean-marc > > "@David, indeed, I am a little bit reluctant to use non standardized > tools. Anyhow, I am quite surprised by your sentence : there exists map > containers working in linear access ? Do you have any references to point > out ? > > " > > > > Just for reference, and to disclose my goal is not to push a particular > vendor but to make the point that to make generalized statements about > performance characteristics of generic data types without respect to the > implementation is not wise. > > Having done performance work professionally for decades I can attest this > is a general rule - "It Depends". > > > > > > But for reference here is one data point demonstrating linear access to > both maps and arrays in XQuery (using a vendor extension). > > > > > > Using the developer (free for development use) version of MarkLogic 7.0.1 > > Downloaded from here: > > http://developer.marklogic.com/products > > > On my windows desktop (a fairly high end machine, but not unusual by any > means). > > > > map test: > > using map:map object > > http://docs.marklogic.com/map:map?q=map:map > > > > > > I did not have the code return any results so that the timing was affected > by the result size. > > You are welcome to try for yourself to validate that the optimizer was not > so incredible as to actually optimize away the code (or it would have taken > constant time if it realized the result was always () ). > > > > This was just quickly done in "Query Console" using the built in profiler. > > Slightly better tests would be to invoke this as a stored XQuery module to > avoid the interaction of the profiler and javascript console, > > but you can see within reason the results are linear. > > > > > > Note this also incurs a number to string conversion because map:map keys > are strings. > > also note in this case the map was not pre-sized, it grows on demand. > > > > tested by incrementing $max from 100 to 10000000 in factors of 10 > > > > ------- XQuery code > > declare variable $map := map:map(); > > declare variable $max := 10000000; > > > > for $i in 1 to $max return > > map:put( $map , ""||$i , $i ), > > for $i in 1 to $max > > let $_ := map:get( $map , ""||$i ) > > return () > > > > > > Results > > > > 100 .000/.001 sec - limits of the time precision used > > 1000 .005 sec > > 10000 .047 sec > > 100000 .46 sec > > 1000000 4.8 sec > > 10000000 49.49 sec > > > > > > Array test using json:array > > http://docs.marklogic.com/json:array?q=json:array > > > > Note in this case I do pre-size the array to $max to avoid a resize. Its > a minor optimization but useful. > > > > ------ XQuery Code > > > > declare variable $max := 10000000; > > declare variable $array := json:to-array( () , $max ); > > > > for $i in 1 to $max return > > $array[$i] = $i , > > for $i in 1 to $max > > let $_ := $array[$i] > > return () > > > > --------- Results > > 100 - under timer percision > > 1000 .002 sec > > 10000 .02 sec > > 100000 .23 sec > > 1000000 2.06 sec > > 10000000 20.65 sec > > > > > > > > > > > > > > > > > > > > *From:* jean-marc Mercier [mailto:[email protected]] > *Sent:* Wednesday, January 01, 2014 11:05 AM > *To:* David Lee > *Cc:* Adam Retter; [email protected]; Andrew Welch; ihe onwuka; Michael > Sokolov > > *Subject:* Re: [xquery-talk] Matrix Multiplication > > > > @David, indeed, I am a little bit reluctant to use non standardized tools. > Anyhow, I am quite surprised by your sentence : there exists map containers > working in linear access ? Do you have any references to point out ? > > > > 2014/1/1 David Lee <[email protected]> > > On 31 Dec 2013 17:03, "jean-marc Mercier" <[email protected]> > wrote: > > @David pairs are also basically needed to write a linear algebra modulus, > the topic of this thread. And XQUERY don't provide any efficient pair. You > can't use Marklogic map, or any other vendor map to store vectors for > performance issues (a map is really slow). > > http://x-query.com/mailman/listinfo/talk > > *[DAL:] * > > *Just an FYI but MarkLogic also have native arrays (in addition to maps) > which are extremely efficient (they are stored internally as C arrays).* > > *But if you have to do a lot of iterating through the arrays - even > though the accessors are very efficient ,* > > *the surrounding FLOWR code is still XQuery which slows things down a > bit. Maybe or maybe not enough to make them not useful for you.* > > *Also the marklogic maps are not the same as XQuery 3 implemented maps, > they are a hash map under the hood and typically linear access.* > > > > *But back to your original issue, these are vendor extensions and hence > not portable to other implementations (Until XQuery itself * > > *standardizes on arrays and maps - then it is likely that vendor extension > implementations will be used to expose the standard interface).* > > *If what you want is pure XQuery ... it doesnt matter how fast these are > if you cant use them because they dont exist in all implementations.* > > > > > > *-David* > > > > > > > > >
_______________________________________________ [email protected] http://x-query.com/mailman/listinfo/talk
