[Monetdb-developers] other optimizer pipelines
Dear M5 developers, Yesterday I reported a bug regarding the minimal optimizer pipe (ID: 2983773). And, as expected, I was asked to add a test for the issue, which I would like to do. The bug is triggered as follows: - start an mserver5 instance with the optimizer pipe set to the minimal pipe: "inline,remap,deadcode,multiplex" - connect with an mclient - connect again with an mclient This is obviously very easy to reproduce. But, I couldn't find any other test that sets a different optimizer upon startup of the mserver. I found a few tests that change the optimizer within a SQL script, but this will very likely not trigger bug (ID: 2983773). Then I was wondering; there are currently a few different optimizer pipelines defined in the monetdb5.conf. Isn't it a good idea to run at least a few (preferrably all) tests for each of those optimizer pipelines? I can imagine that testing all optimizer pipelines will take too much time for nightly testing, but running the testweb with different pipelines would probably trigger most obvious bugs that are currently found one-by-one. Bug #2983773 would most probably have been detected too. Anyway, I would like to add a test, and I guess testing multiple optimizer pipelines won't be high on the priority list. Therefore, could somebody perhaps point out how I could specify in the 'prologue' which optimizer pipeline to use? I did find a python file "sql/src/test/Connections/Tests/connections.py" which perhaps could be used? What is the preferred way for adding a test with a different optimizer path? Kind regards, Wouter -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Re: [Monetdb-developers] Search MonetDB Source - character processing at light speed :-)
Dear Stefan, Thank you for your interest. Currently, this service is offered as a test, and is for MonetDB source-code/developers only. The software behind the service is not publicly available, at least not for now. If there is interest from other (software)projects to have a similar interface to their source-code or other data, don't hesitate to contact us. We are aware of some flaws (for example; the links to the source-files are not correct), that we first need to fix first. At this point in time we would just like to learn from user-experiences, and improve the service where needed. Kind regards, Wouter Alink On Mon, Mar 21, 2011 at 3:56 PM, Stefan de Konink wrote: > On Mon, 21 Mar 2011, Arjen P. de Vries wrote: > >> Feel free to use the suffix array search demo of the MonetDB source >> tree where you see fit: >> http://devel.spinque.com/SearchMonetDBSource/ >> The index is refreshed every night! > > Is there a nice tutorial how to setup this index for other projects? > > > Stefan > > -- > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > ___ > Monetdb-developers mailing list > Monetdb-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/monetdb-developers > > -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
[Monetdb-developers] compiling MonetDB4
Hello developers, i (successfully) compiled buildtools and MonetDB. (todays cvs-head, after the recent bugfixes by sjoerd) I tried to compile MonetDB4, but the following thing bothered me: - src/tool/embeddedclient.mx refers to Mapi.h, but the compiler cannot find it. (and me neither) am i doing something wrong? (my OS is fedora core 6) greetings, wouter - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Re: [Monetdb-developers] multiple XQuery statements in one xq file
a function would be the counterexample: declare function x() as node { () }; x() myXQ, wouter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fabian Groffen Sent: vrijdag 13 april 2007 12:44 To: [EMAIL PROTECTED] Subject: Re: [Monetdb-developers] multiple XQuery statements in one xq file On 13-04-2007 12:33:09 +0200, Djoerd Hiemstra wrote: > Dear Sjoerd or other developers, > > Could you please change the "MapiClient -lx" protocol such that the end > of query does not coincide with end-of-file? We would like to provide > little XQuery scripts with multiple XQuery statements, but of course I > cannot put the end-of-file mark into that file without ending the file > (well, you know what I mean). Any end-of-query marker will do, but ';' JdbcClient used to use this "statement separator", but in XQuery it is not correct, as ; is used in XQuery itself. Wouter and Jens probably can easily come up with an example of ; being not correct. > would be prefered I guess, to stay in line with Mil and SQL. > > For instance (Similar to having multiple SQL insert statements): > > pf:add-doc("http://www.utwente.nl/a1.xml";, "a1.xml"); > pf:add-doc("http://www.utwente.nl/a2.xml";, "a2.xml"); > pf:add-doc("http://www.utwente.nl/a3.xml";, "a3.xml"); > > I know I can do this in one transaction, but I do not want to. I believe Peter implemented something like <> as separator, but I'm not sure on that one. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Monetdb-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/monetdb-developers - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Monetdb-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Re: [Monetdb-developers] [Monetdb-pf-checkins] pathfinder/tests/StandOff StandOff.py, , 1.10, 1.11
> was/is it you intension to force all StandOff testing though the Algebra > back-end (added "-A" option for pf; see below), i.e., ignoring/overruling > the compile time default / choice (whch is indeed the Algebra back-end) as > well as the choice on the Mtest.py command line? > > If so, why? Does MPS not longer support StandOff (or v.v.)? > > If not, you should remove the "-A" swicth for pf, again. it was my intention to switch to the algebra backend, but from the testweb I probably incorrectly assumed that the milprint-summer version was still the default for pf. (i thought i saw only the artists query failing when i looked at the testweb this morning) I will remove the -A, again. The StandOff aces should still work with MPS. > > > - observation: order of attributes seems to have changed in some > > tests, the testoutput has been changed accordingly > > Serialization in MonetDB/XQuery has not feature (yet?) to enforce a > particular attribute order. > The order of attributes is only determined by the very implementation and > (physical) order of input data, and can hence change. > If it happens to differe between the MPS & ALG back-end for (some) StandOff > tests, we could consider approving back-end specific (ALG or MPS) stable > output for these tests. > (See `Mtest.py --help` and/or > http://monetdb.cwi.nl/Development/TestWeb/Mtest/ for details and/or feel > free to ask for advice/help.) > I assumed (again probably incorrectly) that the milprint-summer version is deprecated. I'll create separate test-results. Thanks for observing my errors. Wouter -- ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Re: [Monetdb-developers] [Monetdb-pf-checkins] pathfinder/tests/StandOff StandOff.py, , 1.10, 1.11
Ah... I figured out why now... I hadn't seen your recent changes to main.c. Thanks, Wouter 2008/12/23 Wouter Alink : >> was/is it you intension to force all StandOff testing though the Algebra >> back-end (added "-A" option for pf; see below), i.e., ignoring/overruling >> the compile time default / choice (whch is indeed the Algebra back-end) as >> well as the choice on the Mtest.py command line? >> >> If so, why? Does MPS not longer support StandOff (or v.v.)? >> >> If not, you should remove the "-A" swicth for pf, again. > > it was my intention to switch to the algebra backend, but from the > testweb I probably incorrectly assumed that the milprint-summer > version was still the default for pf. (i thought i saw only the > artists query failing when i looked at the testweb this morning) I > will remove the -A, again. The StandOff aces should still work with > MPS. > >> >> > - observation: order of attributes seems to have changed in some >> > tests, the testoutput has been changed accordingly >> >> Serialization in MonetDB/XQuery has not feature (yet?) to enforce a >> particular attribute order. >> The order of attributes is only determined by the very implementation and >> (physical) order of input data, and can hence change. >> If it happens to differe between the MPS & ALG back-end for (some) StandOff >> tests, we could consider approving back-end specific (ALG or MPS) stable >> output for these tests. >> (See `Mtest.py --help` and/or >> http://monetdb.cwi.nl/Development/TestWeb/Mtest/ for details and/or feel >> free to ask for advice/help.) >> > > I assumed (again probably incorrectly) that the milprint-summer > version is deprecated. I'll create separate test-results. > > Thanks for observing my errors. > Wouter > -- ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
[Monetdb-developers] XQ: unaligned access
Hello, I currently get many of the following messages on the Mserver console, while shredding a collection of XML documents with yesterdays stable nightly build (using 32-bit oids on a 64-bit machine): Mserver(31123): unaligned access to 0x2000de29726f, ip=0x2133c101 Mserver(31123): unaligned access to 0x2000de1f746e, ip=0x2133c101 Mserver(31123): unaligned access to 0x2000de1f746e, ip=0x2133c0f0 Mserver(31123): unaligned access to 0x20013013696d, ip=0x2133c101 Mserver(31123): unaligned access to 0x20013013696d, ip=0x2133c0f0 Mserver(31123): unaligned access to 0x2000de29726f, ip=0x2133c101 A new one appears every few seconds. There seems to be only 4 or 5 different addresses accessed for which this message appears. Is this a bug, a feature, or debug info? Can it be safely ignored? Cheers, Wouter -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
[Monetdb-developers] mclient mem-usage during --dump
Hello, Question: is there any reason for mclient to use (large) amounts of memory during a dump of a sql database? syntax used: $ mclient -lsql -D -dsomedatabase > dump.sql I observe >12 GB of resident memory use when dumping a 2GB (in dump text format) database (it steadily grows), using the May2009 stable branch (of last week) Top shows: 28371 walink16 0 12.2g 12g 2944 R 87 4.0 10:48.58 mclient I haven't investigated it any further, but I was first of all wondering whether it actually needs these amounts of memory? Greetings, Wouter -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://www.creativitycat.com ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Re: [Monetdb-developers] mclient mem-usage during --dump
Hello, I had a look at the code just now... looking for why so much memory was used (i think mclient was using 100GB of memory in the end). I am not familiar with the mapiclient, but perhaps is the following diff a solution? Index: src/mapiclient/MapiClient.mx === RCS file: /cvsroot/monetdb/clients/src/mapiclient/MapiClient.mx,v retrieving revision 1.141 diff -u -r1.141 MapiClient.mx --- src/mapiclient/MapiClient.mx19 May 2009 12:02:59 - 1.141 +++ src/mapiclient/MapiClient.mx27 May 2009 22:25:24 - @@ -2048,7 +2048,7 @@ fprintf(stderr,"%s\n",mapi_error_str(mid)); exit(2); } - mapi_cache_limit(mid, -1); + /* mapi_cache_limit(mid, -1); */ if (dump) { if (mode == SQL) { dump_tables(mid, toConsole, 0); This seems to work for me, (at least the moment mclient's memory consumption remains constant), but I can't oversee the consequences. Could somebody perhaps say something sensible about it? Reasoning behind it: This call to mapi_cache_limit makes rowlimit==-1, and this together with cacheall=0, makes mapi_extend_cache (in Mapi.mx) allocate more memory each time it is called (so the cache becomes as large as the largest table). Without this call "mapi_cache_limit(mid, -1);" the default for the rowlimit has been set to 100 lines, so with this change the cache will get flushed every 100 lines. I think I should have filed a bug :) Wouter p.s. while investigating this issue i tried to limit the amount of memory that mclient would get using "ulimit -v $((256*1024))". This revealed that there are a number of places in Mapi.mx where a (m)alloc-call goes unchecked. I don't know the MonetDB coding policy here, but perhaps they should all at least have an accompanying assert? The following one-liner in the clients package reveals some issues: $ grep "alloc(" -A2 src/mapilib/Mapi.mx 2009/5/25 Wouter Alink : > Hello, > > Question: is there any reason for mclient to use (large) amounts of > memory during a dump of a sql database? > > syntax used: > $ mclient -lsql -D -dsomedatabase > dump.sql > > I observe >12 GB of resident memory use when dumping a 2GB (in dump > text format) database (it steadily grows), using the May2009 stable > branch (of last week) > > Top shows: > 28371 walink 16 0 12.2g 12g 2944 R 87 4.0 10:48.58 mclient > > I haven't investigated it any further, but I was first of all > wondering whether it actually needs these amounts of memory? > > Greetings, > Wouter > -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Re: [Monetdb-developers] [ monetdb-Bugs-2787825 ] mclient: stdin + statement
The bug (#2787825) seems to be closed for comments, but I think this bug should not be closed until the documentation gets updated. typing "man mclient" on the stable branch tells me: --statement=stmt (-s stmt) Execute the specified query. The query is run before any queries from files specified on the command line are run, and before the interactive session is started (if the --interactive option is given). This is not in line with Martin's latest comment. Martin, could you re-open the bug (I don't have the permissions to do so)? Greetings, Wouter 2009/7/20 SourceForge.net : > Bugs item #2787825, was opened at 2009-05-06 14:21 > Message generated for change (Comment added) made by mlkersten > You can respond by visiting: > https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2787825&group_id=56967 > > Please note that this message will contain a full copy of the comment thread, > including the initial issue submission, for this request, > not just the latest update. > Category: Mapi > Group: Clients CVS Head >>Status: Closed >>Resolution: Wont Fix > Priority: 5 > Private: No > Submitted By: Wouter Alink (vzzzbx) > Assigned to: Nobody/Anonymous (nobody) > Summary: mclient: stdin + statement > > Initial Comment: > It seems that there is a problem with both providing data via stdin and via > the -s feature in mclient. (see example below). > > A possible solution could perhaps be to forbid this use. Another solution > would be to define a behaviour: either read the '-s' first or the stdin > first. (perhaps this already is the case, but I couldn't find any > documentation about it) > > $ cat data.dat > 1 > 2 > 3 > 4 > 5 > $ N=4; head -n $N data.dat | mclient -lsql -p50151 -dtest -s "copy $N records > into aap from STDIN;" > MAPI = mone...@localhost:50151 > QUERY = copy 4 records into aap from STDIN; > ERROR = !SQLException:sql:value ';' while parsing ';' from line 0 field 0 not > inserted, expecting type int > !SQLException:importTable:failed to import table > > > -- > >>Comment By: Martin Kersten (mlkersten) > Date: 2009-07-20 21:48 > > Message: > Standard input is ignored in combination with -s. > Closing it as at best it can be considered a niche feature request. > > -- > > Comment By: Wouter Alink (vzzzbx) > Date: 2009-05-07 21:55 > > Message: > as discussed on the monetdb-users list, using either the -s _or_ the stdin > works fine (except for other reported/unreported bugs), but the combination > fails. (stefan's example works fine). > > I can very well imagine that using a combination should not be allowed > (and should not even become a feature request), but I feel that the current > message is not very helpful. > > And, actually (I hadn't thought of this option before), if I would have > specified "-i" then the documentation (mclient --help) says it reads from > stdin _after_ reading the command line args, but it generates the same > error. > > After some more tests I discovered that: > - when using the command line args + stdin + mentioning '-i', the > semi-colon after "copy $N records into aap from STDIN;" should be left out, > so the following does work: > > $ echo "1 > 2 > 3 > 4 > 5" | mclient -lsql -dtest -hskadi -p50151 -i -s "COPY 5 RECORDS INTO aap > FROM STDIN" > > (notice the omission of ';' after the COPY statement) > > If I do exactly the same, but leave out the '-i', no error is displayed, > but nothing gets inserted either. > > If I use only stdin only: > > $ echo "COPY 5 RECORDS INTO aap FROM STDIN; > 1 > 2 > 3 > 4 > 5" | mclient -lsql -dtest -hskadi -p50151 > > then this works (only if the ';' after the COPY statement is present). > > I don't know whether there are two different bugs mentioned in this > explanation, but I think there definitely is something wrong. > > by the way: the create statement for aap is: "CREATE TABLE aap (x int);" > > -- > > Comment By: Stefan Manegold (stmane) > Date: 2009-05-07 19:30 > > Message: > What about: > > { N=4 ; echo "copy $N records into aap from STDIN;" ; head -n $N data.dat > ; } | mclient -lsql -p50151 -dtest > > ? > > > -- > > Comment By: Sjoerd Mullender (sjoerd) > Date:
Re: [Monetdb-developers] [ monetdb-Bugs-2787825 ] mclient: stdin + statement
To get back to the original issue: $ cat data.dat 1 2 3 4 5 $ N=4; head -n $N data.dat | mclient -lsql -p50151 -dtest -s "copy $N records into aap from STDIN;" Am i correct in that the above is not allowed, because it doesn't specify "-i", so it won't read stdin after "-s"? This is indeed what I would expect. Initially I wasn't aware of the "-i" feature, that was the reason for the original request. But it confuses me that even with specifying "-i" it wouldn't be correct, as the copy command should not be followed by a semi-colon? This seems odd to me, why is a semi-colon not allowed? Am I missing something? Wouter 2009/7/23 Sjoerd Mullender : > Wouter Alink wrote: >> The bug (#2787825) seems to be closed for comments, but I think this >> bug should not be closed until the documentation gets updated. >> >> typing "man mclient" on the stable branch tells me: >> >> --statement=stmt (-s stmt) >> Execute the specified query. The query is run before any queries >> from files specified on the command line are run, and before the >> interactive session is started (if the --interactive option is given). >> >> This is not in line with Martin's latest comment. Martin, could you >> re-open the bug (I don't have the permissions to do so)? > > Standard input is not ignored if the -i (--interactive) flag is passed. > However, you cannot start a query with -s and finish it from stdin > which is what you originally wanted. And I don't see in the > documentation that you can. If you see it, please point it out. > > As far as I can see, the text you quoted above is correct. > >> Greetings, >> Wouter >> >> >> >> 2009/7/20 SourceForge.net : >>> Bugs item #2787825, was opened at 2009-05-06 14:21 >>> Message generated for change (Comment added) made by mlkersten >>> You can respond by visiting: >>> https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2787825&group_id=56967 >>> >>> Please note that this message will contain a full copy of the comment >>> thread, >>> including the initial issue submission, for this request, >>> not just the latest update. >>> Category: Mapi >>> Group: Clients CVS Head >>>> Status: Closed >>>> Resolution: Wont Fix >>> Priority: 5 >>> Private: No >>> Submitted By: Wouter Alink (vzzzbx) >>> Assigned to: Nobody/Anonymous (nobody) >>> Summary: mclient: stdin + statement >>> >>> Initial Comment: >>> It seems that there is a problem with both providing data via stdin and >>> via the -s feature in mclient. (see example below). >>> >>> A possible solution could perhaps be to forbid this use. Another solution >>> would be to define a behaviour: either read the '-s' first or the stdin >>> first. (perhaps this already is the case, but I couldn't find any >>> documentation about it) >>> >>> $ cat data.dat >>> 1 >>> 2 >>> 3 >>> 4 >>> 5 >>> $ N=4; head -n $N data.dat | mclient -lsql -p50151 -dtest -s "copy $N >>> records into aap from STDIN;" >>> MAPI = mone...@localhost:50151 >>> QUERY = copy 4 records into aap from STDIN; >>> ERROR = !SQLException:sql:value ';' while parsing ';' from line 0 field 0 >>> not inserted, expecting type int >>> !SQLException:importTable:failed to import table >>> >>> >>> -- >>> >>>> Comment By: Martin Kersten (mlkersten) >>> Date: 2009-07-20 21:48 >>> >>> Message: >>> Standard input is ignored in combination with -s. >>> Closing it as at best it can be considered a niche feature request. >>> >>> -- >>> >>> Comment By: Wouter Alink (vzzzbx) >>> Date: 2009-05-07 21:55 >>> >>> Message: >>> as discussed on the monetdb-users list, using either the -s _or_ the stdin >>> works fine (except for other reported/unreported bugs), but the combination >>> fails. (stefan's example works fine). >>> >>> I can very well imagine that using a combination should not be allowed >>> (and should not even become a feature request), but I feel that the current >>> message is not very helpful. >>> >>> And, actually (I hadn't thought of this o
[Monetdb-developers] MonetDB/XQuery: reading XML files from TAR archives
Hello devs, Roberto and I yesterday discussed that it would be useful to be able to load (compressed) XML collections directly into MonetDB/XQuery. The attached diff provides a new feature for loading multiple XML docs directly from tar files. Usage: "mclient -lxq -C " and pass a tarfile via stdin, see example below. My question: is this useful enough to make it into MonetDB? And if so, is the current syntax appropriate. Comments are appreciated. Greetings, Wouter $ mkdir xmlfiles $ echo "" > xmlfiles/aap.xml $ echo "" > xmlfiles/beer.xml $ tar cf xmlfiles.tar xmlfiles $ mclient -lxq -C xmlfiles < xmlfiles.tar Copying TAR file into collection: 'xmlfiles' Name: xmlfiles/beer.xml Length: 7 Name: xmlfiles/aap.xml Length: 7 $ echo 'pf:documents("xmlfiles")' | mclient -lxq xmlfiles/aap.xml, xmlfiles/beer.xml $ tarpatch.diff Description: Binary data -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Re: [Monetdb-developers] MonetDB/XQuery: reading XML files from TAR archives
Hello Djoerd, Thanks for the feedback. One reason (that i can see) to do it from stdin is so that compression can be used (without having to be aware of it), for example: bzcat collection.tar.bz2 | mclient -lxq -C collection But I do agree with you that it would be useful to have an XQuery function too, as not everyone is using the mclient interface. Greetings, Wouter Oh yes, and i forgot cvs does not unify its diff by default... hereby the unified diff for the clients package. (sooner or later I will learn to do things right the first time :) 2009/8/27 Djoerd Hiemstra : > Hi Wouter, > > Sounds very useful to me! > Why is it not simply changed in pf:add-doc(), or put in a new function > pf:add-archive()? > > Best, Djoerd. > > Wouter Alink schreef: >> Hello devs, >> >> Roberto and I yesterday discussed that it would be useful to be able >> to load (compressed) XML collections directly into MonetDB/XQuery. >> The attached diff provides a new feature for loading multiple XML docs >> directly from tar files. >> Usage: "mclient -lxq -C " and pass a tarfile via stdin, see >> example below. >> >> My question: is this useful enough to make it into MonetDB? And if so, >> is the current syntax appropriate. Comments are appreciated. >> >> Greetings, >> Wouter >> >> >> $ mkdir xmlfiles >> $ echo "" > xmlfiles/aap.xml >> $ echo "" > xmlfiles/beer.xml >> $ tar cf xmlfiles.tar xmlfiles >> $ mclient -lxq -C xmlfiles < xmlfiles.tar >> Copying TAR file into collection: 'xmlfiles' >> Name: xmlfiles/beer.xml Length: 7 >> Name: xmlfiles/aap.xml Length: 7 >> $ echo 'pf:documents("xmlfiles")' | mclient -lxq >> > collection="xmlfiles">xmlfiles/aap.xml, >> > collection="xmlfiles">xmlfiles/beer.xml >> $ >> >> >> >> -- >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day >> trial. Simplify your report design, integration and deployment - and focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> >> >> ___ >> Monetdb-developers mailing list >> Monetdb-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/monetdb-developers >> > > tarpatch.diff Description: Binary data -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
[Monetdb-developers] other than bug things...
Hello Developers, Usually I try to report bugs (and try to find the worst of the system), but just this morning I noticed my mserver5 instance (Aug2009), which I have been intensively querying over the last days with ten-thousands of reasonably complex queries and sometimes with more than 20 queries in parallel. It just passed the 4800 minutes of actual cpu time (= 80 hours of hard work) and still going strong. I thought it was worth mentioning! Cheers, Wouter p.s. roberto, actually it's your mserver instance... -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
[Monetdb-developers] hashjoin and strHash
Dear developers, I would like to propose a change in GDK and hear opinions. It is about the following issue: in the BATjoin code, if there is no possibility to do a fetch or merge join, a hashjoin is performed. A hashtable is created for the smallest BAT. The reasons (i could think of) for choosing the smallest BAT for the hashtable are that less space is required for the hashtable (which in turn causes less cache misses when doing a lookup) and also because the hashfunction used is assumed to be very inexpensive (it needs to be calculated for each item in the large bat each time a join is performed). I can see that the hashfunction can be very efficient for data types without indirection, but I feel that for data types like strings in some cases this is a little different. If a string BAT for example contains many different values (i.e. is not a bat which contains enumeration values) the hashfunction will not be inexpensive anymore (many cache misses), as each hashfunction call needs to hash a whole (arbitrary long) string at an arbitrary place in the heap. Is it perhaps possible to specify that, when a BAT of type 'str' has many different values a hashtable may be build on the large BAT instead of on the small BAT? Reason that I ask this: I was analysing costs of a query in which I had a few short strings (26 tuples, 1-column table: varchar) which I wanted to look up in a dictionary (9M tuples, 2-column table: int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON a.strvalue=b.strvalue;" The result is a small list of integers (26 or less tuples). This operation currently takes roughly 1.5 seconds for a hot run, mostly due to 9M strHash operations. By applying the patch below the execution time for a hot run dropped down to .01 seconds. The performance gain is caused by only having to perform strHash on the items in the small bat once the hashtable for the large bat has been created. Any suggestions whether such a change is useful? Which benchmarks will be influenced? I guess this code change is probably not useful for large string BATs with only few different values, but perhaps a guess could be made how diverse the strings in a bat are (by taking a sample or perhaps simply by looking at the ratio batsize/heapsize), and based on that determine whether to build it on the large or small BAT? Greetings, Wouter Index: src/gdk/gdk_relop.mx === RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v retrieving revision 1.167.2.4 diff -u -r1.167.2.4 gdk_relop.mx --- src/gdk/gdk_relop.mx20 Nov 2009 13:04:06 - 1.167.2.4 +++ src/gdk/gdk_relop.mx18 Dec 2009 14:59:13 - @@ -1232,7 +1232,12 @@ @- hash join: the bread&butter join of monet @c - /* Simple rule, always build hash on the smallest */ + /* Simple rule, always build hash on the smallest, +except when it is a string-join, then we do the opposite */ + if (swap && rcount < lcount && l->ttype == TYPE_str) { + ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate); + return BATmirror(BAThashjoin(BATmirror(r), BATmirror(l), estimate)); + } if (swap && rcount > lcount) { ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate); -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Re: [Monetdb-developers] hashjoin and strHash
Lefteris, you are correct in that i meant 'second time the query was run' when I wrote 'hot run'. I see that at GDK level reuse cannot be estimated. Although with current hardware which has an abundance of memory, and the fact that strings take up much more storage than a single BUN (so a hash-entry is usually relatively small compared to its data) GDK might weigh the additional costs. GDK also decides which things to keep in memory or throw it out, which in turn is also based on reuse. The costs for performing the initial join are dominated by the strHash function, and building the hashtable on the big BAT or the smaller BAT makes (almost) no difference, except for the additional memory use. If on such a big bat again a join is performed, it will be beneficial to have the hashtable in place. What I was hoping for were explanations of situations where it makes no sense to build the hashtable on the bigger string BAT, but a good counter-example I haven't seen. In general, i can see, it would not be beneficial if the big BAT is not joined twice, but if it doesn't hurt too much, couldn't it just be the default? Eventually I would like to be using the SQL layer only. Here there would be plenty of tables with string-columns, and some will be joined against. Should a MAL optimizer detect that I am about to join two string-BATs, and that one BAT is bigger than the other and has many different values, and therefore should build a hashtable on the bigger one? The MAL optimizer can only guess about my next query (although I agree that it could do a better job at guessing), and calculating heapsize/batsize seems to be an operation that is also difficult to do on a MAL layer. is really nobody in favour of changing the behavior of joining string bats for large bats with many different values? well, than I give up. Wouter 2009/12/18 Stefan Manegold : > Hi Wouter, > > in the lines of Lefteris' reply: > for a single join with no hash table present a priori, the number of hash > function calls is euqal to the number of BUNs in both BATs; for each inner > BUN the function need to be called to build the hash table, for each outer > BUN it needs to be called to probe the hash table. Hence, the pure hashing > costs are independent of which BAT is inner and which outer. > Given that, the reason the choose the smaller as inner is indeed to increase > spacial locallity (and thus performance) of the inherently random access > while building and probing the hashtable. > > As Lefteris pointed out, the "operational optimization" in GDK is a pure > peephole optimization dealing only with the very operation at hand. I.e., in > general it cannot anticipate the benefits of future re-use of efforts, like > investing in the (more expensive) building of a larger hash table to be able > to re-use this in several later operations --- which IMHO is independent of > the data type. Such descisions need to be made at higher levels, either in > MAL optimizers or in the front-end that generates the MAL plan. > > Stefan > > > On Fri, Dec 18, 2009 at 05:01:07PM +0100, Lefteris wrote: >> Hi Wouter, >> >> funny think, I had the same exact problem and we were thinking about >> this issue. The idea here is that this observation for strings might >> not be always true, and it is a situation that cannot be always >> determined on the kernel level. Correct me if I am wrong, but your >> benefit on query comes because the hash in the large BAT is already >> there, that's why the second time you get 0.01? You mention hot run so >> I assume the BAT is already there with a hash index. While in the >> original situation the hash is on the small BAT thus you don't benefit >> from the hot run. But if a big BAT of strings is to be used again it >> is unknown in the gdk level. So, I solved the problem by forcing the >> hash index on the big BAT in a higher level (in Monet5) where it knows >> something more about the application (in my case RDF store). Can you >> do instead that? force the hash index in a higher level for you >> application? If gdk see a hash index already there, then it will >> choose that independent of the size. >> >> lefteris >> >> On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink wrote: >> > Dear developers, >> > >> > I would like to propose a change in GDK and hear opinions. It is about >> > the following issue: >> > >> > in the BATjoin code, if there is no possibility to do a fetch or merge >> > join, a hashjoin is performed. A hashtable is created for the smallest >> > BAT. The reasons (i could think of) for choosing the smallest BAT for >> > the hashtable are that less space is required for the h
Re: [Monetdb-developers] Memory use
Hello Guido, At the end of your COPY INTO transaction, your database will be saved on disk, to give some guarantee that the data is on a sort of non-volatile storage (see also http://en.wikipedia.org/wiki/ACID#Durability). Besides storing the data on disk, MonetDB tries to fully exploit your available (volatile) main memory to answer your queries quickly, and tries to keep as much of the data as possible in main memory (this is managed by MonetDB internally). There is one way you could trick a DBMS to use memory only: make volatile storage to appear as non-volatile storage (for example by creating a ram-disk). You could also use the database in such a way that you never commit a transaction (leave the transaction open, and roll-back at the end), although the DBMS might still decide at some point to flush the data to disk. None of these tricks are recommended. For a DBMS (and I think this holds for any proper DBMS) to function correctly, you should provide some non-volatile storage, so that durability can be guaranteed. Hope this answers your question, Wouter 2009/12/18 Voornaam Achternaam : > > When I try to fill a database with the COPY INTO command, the data will > (depending on the file used) either go: > - Fully into Memory. > - To HDD. > - A combination of Memory and HDD space. > > Is there a way to configure MonetDB so that it always uses Memory only? > > > Thanks in advance. > -- > View this message in context: > http://old.nabble.com/Memory-use-tp26843876p26843876.html > Sent from the monetdb-developers mailing list archive at Nabble.com. > > > -- > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > ___ > Monetdb-developers mailing list > Monetdb-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/monetdb-developers > > -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Re: [Monetdb-developers] [Monetdb-sql-checkins] sql/src/server rel_schema.mx, Feb2010, 1.2, 1.2.2.1
fantastisch! 2010/2/5 Niels Nes : > Update of /cvsroot/monetdb/sql/src/server > In directory > sfp-cvsdas-1.v30.ch3.sourceforge.com:/tmp/cvs-serv25790/src/server > > Modified Files: > Tag: Feb2010 > rel_schema.mx > Log Message: > fixed bug in handeling topn in create table as select with data. > > > > Index: rel_schema.mx > === > RCS file: /cvsroot/monetdb/sql/src/server/rel_schema.mx,v > retrieving revision 1.2 > retrieving revision 1.2.2.1 > diff -u -d -r1.2 -r1.2.2.1 > --- rel_schema.mx 11 Jan 2010 10:29:17 - 1.2 > +++ rel_schema.mx 5 Feb 2010 10:18:16 - 1.2.2.1 > @@ -127,9 +127,14 @@ > static char * > as_subquery( mvc *sql, sql_table *t, sql_rel *sq, dlist *column_spec ) > { > + sql_rel *r = sq; > + > + if (is_topn(r->op)) > + r = sq->l; > + > if (column_spec) { > dnode *n = column_spec->h; > - node *m = sq->exps->h; > + node *m = r->exps->h; > > for (; n; n = n->next, m = m->next) { > char *cname = n->data.sval; > @@ -143,7 +148,7 @@ > } else { > node *m; > > - for (m = sq->exps->h; m; m = m->next) { > + for (m = r->exps->h; m; m = m->next) { > sql_exp *e = m->data; > char *cname = exp_name(e); > sql_subtype *tp = exp_subtype(e); > > > -- > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > ___ > Monetdb-sql-checkins mailing list > monetdb-sql-check...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/monetdb-sql-checkins > > -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers