[Monetdb-developers] other optimizer pipelines

2010-04-09 Thread Wouter Alink
Dear M5 developers,

Yesterday I reported a bug regarding the minimal optimizer pipe (ID:
2983773). And, as expected, I was asked to add a test for the issue,
which I would like to do.

The bug is triggered as follows:
- start an mserver5 instance with the optimizer pipe set to the
minimal pipe: "inline,remap,deadcode,multiplex"
- connect with an mclient
- connect again with an mclient

This is obviously very easy to reproduce.

But, I couldn't find any other test that sets a different optimizer
upon startup of the mserver. I found a few tests that change the
optimizer within a SQL script, but this will very likely not trigger
bug (ID: 2983773).

Then I was wondering; there are currently a few different optimizer
pipelines defined in the monetdb5.conf. Isn't it a good idea to run at
least a few (preferrably all) tests for each of those optimizer
pipelines? I can imagine that testing all optimizer pipelines will
take too much time for nightly testing, but running the testweb with
different pipelines would probably trigger most obvious bugs that are
currently found one-by-one. Bug #2983773 would most probably have been
detected too.

Anyway, I would like to add a test, and I guess testing multiple
optimizer pipelines won't be high on the priority list. Therefore,
could somebody perhaps point out how I could specify in the 'prologue'
which optimizer pipeline to use? I did find a python file
"sql/src/test/Connections/Tests/connections.py" which perhaps could be
used? What is the preferred way for adding a test with a different
optimizer path?

Kind regards,
Wouter

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


Re: [Monetdb-developers] Search MonetDB Source - character processing at light speed :-)

2011-03-21 Thread Wouter Alink
Dear Stefan,

Thank you for your interest. Currently, this service is offered as a
test, and is for MonetDB source-code/developers only. The software
behind the service is not publicly available, at least not for now. If
there is interest from other (software)projects to have a similar
interface to their source-code or other data, don't hesitate to
contact us. We are aware of some flaws (for example; the links to the
source-files are not correct), that we first need to fix first. At
this point in time we would just like to learn from user-experiences,
and improve the service where needed.

Kind regards,
Wouter Alink

On Mon, Mar 21, 2011 at 3:56 PM, Stefan de Konink  wrote:
> On Mon, 21 Mar 2011, Arjen P. de Vries wrote:
>
>> Feel free to use the suffix array search demo of the MonetDB source
>> tree where you see fit:
>>  http://devel.spinque.com/SearchMonetDBSource/
>> The index is refreshed every night!
>
> Is there a nice tutorial how to setup this index for other projects?
>
>
> Stefan
>
> --
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> ___
> Monetdb-developers mailing list
> Monetdb-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-developers
>
>

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


[Monetdb-developers] compiling MonetDB4

2007-01-19 Thread Wouter Alink
Hello developers,
 
i (successfully) compiled buildtools and MonetDB. (todays cvs-head, after
the recent bugfixes by sjoerd)
 
I tried to compile MonetDB4, but the following thing bothered me:
- src/tool/embeddedclient.mx refers to Mapi.h, but the compiler cannot find
it. (and me neither)
 
am i doing something wrong? (my OS is fedora core 6)
 
greetings,
wouter
 
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


Re: [Monetdb-developers] multiple XQuery statements in one xq file

2007-04-13 Thread Wouter Alink
a function would be the counterexample:

declare function x() as node { () };
x()

myXQ,
wouter


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Fabian Groffen
Sent: vrijdag 13 april 2007 12:44
To: [EMAIL PROTECTED]
Subject: Re: [Monetdb-developers] multiple XQuery statements in one xq file


On 13-04-2007 12:33:09 +0200, Djoerd Hiemstra wrote:
> Dear Sjoerd or other developers,
> 
> Could you please change the "MapiClient -lx" protocol such that the end
> of query does not coincide with end-of-file? We would like to provide
> little XQuery scripts with multiple XQuery statements, but of course I
> cannot put the end-of-file mark into that file without ending the file
> (well, you know what I mean). Any end-of-query marker will do, but ';'

JdbcClient used to use this "statement separator", but in XQuery it is
not correct, as ; is used in XQuery itself.  Wouter and Jens probably
can easily come up with an example of ; being not correct.

> would be prefered I guess, to stay in line with Mil and SQL.
> 
> For instance (Similar to having multiple SQL insert statements):
> 
> pf:add-doc("http://www.utwente.nl/a1.xml";, "a1.xml");
> pf:add-doc("http://www.utwente.nl/a2.xml";, "a2.xml");
> pf:add-doc("http://www.utwente.nl/a3.xml";, "a3.xml");
> 
> I know I can do this in one transaction, but I do not want to.

I believe Peter implemented something like <> as separator, but I'm not
sure on that one.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Monetdb-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Monetdb-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


Re: [Monetdb-developers] [Monetdb-pf-checkins] pathfinder/tests/StandOff StandOff.py, , 1.10, 1.11

2008-12-30 Thread Wouter Alink
> was/is it you intension to force all StandOff testing though the Algebra
> back-end (added "-A" option for pf; see below), i.e., ignoring/overruling
> the compile time default / choice (whch is indeed the Algebra back-end) as
> well as the choice on the Mtest.py command line?
>
> If so, why? Does MPS not longer support StandOff (or v.v.)?
>
> If not, you should remove the "-A" swicth for pf, again.

it was my intention to switch to the algebra backend, but from the
testweb I probably incorrectly assumed that  the milprint-summer
version was still the default for pf. (i thought i saw only the
artists query failing when i looked at the testweb this morning) I
will remove the -A, again. The StandOff aces should still work with
MPS.

>
> > - observation: order of attributes seems to have changed in some
> >   tests, the testoutput has been changed accordingly
>
> Serialization in MonetDB/XQuery has not feature (yet?) to enforce a
> particular attribute order.
> The order of attributes is only determined by the very implementation and
> (physical) order of input data, and can hence change.
> If it happens to differe between the MPS & ALG back-end for (some) StandOff
> tests, we could consider approving back-end specific (ALG or MPS) stable
> output for these tests.
> (See `Mtest.py --help` and/or
> http://monetdb.cwi.nl/Development/TestWeb/Mtest/ for details and/or feel
> free to ask for advice/help.)
>

I assumed (again probably incorrectly) that the milprint-summer
version is deprecated. I'll create separate test-results.

Thanks for observing my errors.
Wouter

--
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


Re: [Monetdb-developers] [Monetdb-pf-checkins] pathfinder/tests/StandOff StandOff.py, , 1.10, 1.11

2008-12-30 Thread Wouter Alink
Ah...

I figured out why now... I hadn't seen your recent changes to main.c.

Thanks,
Wouter


2008/12/23 Wouter Alink :
>> was/is it you intension to force all StandOff testing though the Algebra
>> back-end (added "-A" option for pf; see below), i.e., ignoring/overruling
>> the compile time default / choice (whch is indeed the Algebra back-end) as
>> well as the choice on the Mtest.py command line?
>>
>> If so, why? Does MPS not longer support StandOff (or v.v.)?
>>
>> If not, you should remove the "-A" swicth for pf, again.
>
> it was my intention to switch to the algebra backend, but from the
> testweb I probably incorrectly assumed that  the milprint-summer
> version was still the default for pf. (i thought i saw only the
> artists query failing when i looked at the testweb this morning) I
> will remove the -A, again. The StandOff aces should still work with
> MPS.
>
>>
>> > - observation: order of attributes seems to have changed in some
>> >   tests, the testoutput has been changed accordingly
>>
>> Serialization in MonetDB/XQuery has not feature (yet?) to enforce a
>> particular attribute order.
>> The order of attributes is only determined by the very implementation and
>> (physical) order of input data, and can hence change.
>> If it happens to differe between the MPS & ALG back-end for (some) StandOff
>> tests, we could consider approving back-end specific (ALG or MPS) stable
>> output for these tests.
>> (See `Mtest.py --help` and/or
>> http://monetdb.cwi.nl/Development/TestWeb/Mtest/ for details and/or feel
>> free to ask for advice/help.)
>>
>
> I assumed (again probably incorrectly) that the milprint-summer
> version is deprecated. I'll create separate test-results.
>
> Thanks for observing my errors.
> Wouter
>

--
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


[Monetdb-developers] XQ: unaligned access

2009-04-17 Thread Wouter Alink
Hello,

I currently get many of the following messages on the Mserver console,
while shredding a collection of XML documents with yesterdays stable
nightly build (using 32-bit oids on a 64-bit machine):

Mserver(31123): unaligned access to 0x2000de29726f, ip=0x2133c101
Mserver(31123): unaligned access to 0x2000de1f746e, ip=0x2133c101
Mserver(31123): unaligned access to 0x2000de1f746e, ip=0x2133c0f0
Mserver(31123): unaligned access to 0x20013013696d, ip=0x2133c101
Mserver(31123): unaligned access to 0x20013013696d, ip=0x2133c0f0
Mserver(31123): unaligned access to 0x2000de29726f, ip=0x2133c101

A new one appears every few seconds. There seems to be only 4 or 5
different addresses accessed  for which this message appears.

Is this a bug, a feature, or debug info? Can it be safely ignored?

Cheers,
Wouter

--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


[Monetdb-developers] mclient mem-usage during --dump

2009-05-25 Thread Wouter Alink
Hello,

Question: is there any reason for mclient to use (large) amounts of
memory during a dump of a sql database?

syntax used:
$ mclient -lsql -D -dsomedatabase > dump.sql

I observe >12 GB of resident memory use when dumping a 2GB (in dump
text format) database (it steadily grows), using the May2009 stable
branch (of last week)

Top shows:
28371 walink16   0 12.2g  12g 2944 R   87  4.0  10:48.58 mclient

I haven't investigated it any further, but I was first of all
wondering whether it actually needs these amounts of memory?

Greetings,
Wouter

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://www.creativitycat.com 
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


Re: [Monetdb-developers] mclient mem-usage during --dump

2009-05-27 Thread Wouter Alink
Hello,

I had a look at the code just now... looking for why so much memory
was used (i think mclient was using 100GB of memory in the end).

I am not familiar with the mapiclient, but perhaps is the following
diff a solution?

Index: src/mapiclient/MapiClient.mx
===
RCS file: /cvsroot/monetdb/clients/src/mapiclient/MapiClient.mx,v
retrieving revision 1.141
diff -u -r1.141 MapiClient.mx
--- src/mapiclient/MapiClient.mx19 May 2009 12:02:59 -  1.141
+++ src/mapiclient/MapiClient.mx27 May 2009 22:25:24 -
@@ -2048,7 +2048,7 @@
fprintf(stderr,"%s\n",mapi_error_str(mid));
exit(2);
}
-   mapi_cache_limit(mid, -1);
+   /* mapi_cache_limit(mid, -1); */
if (dump) {
if (mode == SQL) {
dump_tables(mid, toConsole, 0);


This seems to work for me, (at least the moment mclient's memory
consumption remains constant), but I can't oversee the consequences.
Could somebody perhaps say something sensible about it?
Reasoning behind it: This call to mapi_cache_limit makes rowlimit==-1,
and this together with cacheall=0, makes mapi_extend_cache (in
Mapi.mx) allocate more memory each time it is called (so the cache
becomes as large as the largest table).
Without this call "mapi_cache_limit(mid, -1);" the default for the
rowlimit has been set to 100 lines, so with this change the cache will
get flushed every 100 lines.

I think I should have filed a bug :)
Wouter

p.s. while investigating this issue i tried to limit the amount of
memory that mclient would get using "ulimit -v $((256*1024))". This
revealed that there are a number of places in Mapi.mx where a
(m)alloc-call goes unchecked. I don't know the MonetDB coding policy
here, but perhaps they should all at least have an accompanying
assert? The following one-liner in the clients package reveals some
issues:
$ grep "alloc(" -A2 src/mapilib/Mapi.mx


2009/5/25 Wouter Alink :
> Hello,
>
> Question: is there any reason for mclient to use (large) amounts of
> memory during a dump of a sql database?
>
> syntax used:
> $ mclient -lsql -D -dsomedatabase > dump.sql
>
> I observe >12 GB of resident memory use when dumping a 2GB (in dump
> text format) database (it steadily grows), using the May2009 stable
> branch (of last week)
>
> Top shows:
> 28371 walink    16   0 12.2g  12g 2944 R   87  4.0  10:48.58 mclient
>
> I haven't investigated it any further, but I was first of all
> wondering whether it actually needs these amounts of memory?
>
> Greetings,
> Wouter
>

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


Re: [Monetdb-developers] [ monetdb-Bugs-2787825 ] mclient: stdin + statement

2009-07-23 Thread Wouter Alink
The bug (#2787825) seems to be closed for comments, but I think this
bug should not be closed until the documentation gets updated.

typing "man mclient" on the stable branch tells me:

--statement=stmt (-s stmt)
Execute the specified query. The query is run before any queries
from files specified on the command line are run, and before the
interactive session is started (if the --interactive option is given).

This is not in line with Martin's latest comment. Martin, could you
re-open the bug (I don't have the permissions to do so)?

Greetings,
Wouter



2009/7/20 SourceForge.net :
> Bugs item #2787825, was opened at 2009-05-06 14:21
> Message generated for change (Comment added) made by mlkersten
> You can respond by visiting:
> https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2787825&group_id=56967
>
> Please note that this message will contain a full copy of the comment thread,
> including the initial issue submission, for this request,
> not just the latest update.
> Category: Mapi
> Group: Clients CVS Head
>>Status: Closed
>>Resolution: Wont Fix
> Priority: 5
> Private: No
> Submitted By: Wouter Alink (vzzzbx)
> Assigned to: Nobody/Anonymous (nobody)
> Summary: mclient: stdin + statement
>
> Initial Comment:
> It seems that there is a  problem with both providing data via stdin and via 
> the -s feature in mclient.  (see example below).
>
> A possible solution could perhaps be to forbid this use. Another solution 
> would be to define a behaviour: either read the '-s' first or the stdin 
> first. (perhaps this already is the case, but I couldn't find any 
> documentation about it)
>
> $ cat data.dat
> 1
> 2
> 3
> 4
> 5
> $ N=4; head -n $N data.dat | mclient -lsql -p50151 -dtest -s "copy $N records 
> into aap from STDIN;"
> MAPI  = mone...@localhost:50151
> QUERY = copy 4 records into aap from STDIN;
> ERROR = !SQLException:sql:value ';' while parsing ';' from line 0 field 0 not 
> inserted, expecting type int
>        !SQLException:importTable:failed to import table
>
>
> --
>
>>Comment By: Martin Kersten (mlkersten)
> Date: 2009-07-20 21:48
>
> Message:
> Standard input is ignored in combination with -s.
> Closing it as at best it can be considered a niche feature request.
>
> --
>
> Comment By: Wouter Alink (vzzzbx)
> Date: 2009-05-07 21:55
>
> Message:
> as discussed on the monetdb-users list, using either the -s _or_ the stdin
> works fine (except for other reported/unreported bugs), but the combination
> fails. (stefan's example works fine).
>
> I can very well imagine that using a combination should not be allowed
> (and should not even become a feature request), but I feel that the current
> message is not very helpful.
>
> And, actually (I hadn't thought of this option before), if I would have
> specified "-i" then the documentation (mclient --help) says it reads from
> stdin _after_ reading the command line args, but it generates the same
> error.
>
> After some more tests I discovered that:
> - when using the command line args + stdin + mentioning '-i', the
> semi-colon after "copy $N records into aap from STDIN;" should be left out,
> so the following does work:
>
> $ echo "1
> 2
> 3
> 4
> 5" | mclient -lsql -dtest -hskadi -p50151 -i -s "COPY 5 RECORDS INTO aap
> FROM STDIN"
>
> (notice the omission of ';' after the COPY statement)
>
> If I do exactly the same, but leave out the '-i', no error is displayed,
> but nothing gets inserted either.
>
> If I use only stdin only:
>
> $ echo "COPY 5 RECORDS INTO aap FROM STDIN;
> 1
> 2
> 3
> 4
> 5" | mclient -lsql -dtest -hskadi -p50151
>
> then this works (only if the ';' after the COPY statement is present).
>
> I don't know whether there are two different bugs mentioned in this
> explanation, but I think there definitely is something wrong.
>
> by the way: the create statement for aap is: "CREATE TABLE aap (x int);"
>
> --
>
> Comment By: Stefan Manegold (stmane)
> Date: 2009-05-07 19:30
>
> Message:
> What about:
>
> { N=4 ; echo "copy $N records into aap from STDIN;" ; head -n $N data.dat
> ; } | mclient -lsql -p50151 -dtest
>
> ?
>
>
> --
>
> Comment By: Sjoerd Mullender (sjoerd)
> Date: 

Re: [Monetdb-developers] [ monetdb-Bugs-2787825 ] mclient: stdin + statement

2009-07-23 Thread Wouter Alink
To get back to the original issue:

$ cat data.dat
1
2
3
4
5
$ N=4; head -n $N data.dat | mclient -lsql -p50151 -dtest -s "copy $N
records into aap from STDIN;"

Am i correct in that the above is not allowed, because it doesn't
specify "-i", so it won't read stdin after "-s"? This is indeed what I
would expect. Initially I wasn't aware of the "-i" feature, that was
the reason for the original request.

But it confuses me that even with specifying "-i" it wouldn't be
correct, as the copy command should not be followed by a semi-colon?
This seems odd to me, why is a semi-colon not allowed? Am I missing
something?

Wouter


2009/7/23 Sjoerd Mullender :
> Wouter Alink wrote:
>> The bug (#2787825) seems to be closed for comments, but I think this
>> bug should not be closed until the documentation gets updated.
>>
>> typing "man mclient" on the stable branch tells me:
>>
>> --statement=stmt (-s stmt)
>>     Execute the specified query. The query is run before any queries
>> from files specified on the command line are run, and before the
>> interactive session is started (if the --interactive option is given).
>>
>> This is not in line with Martin's latest comment. Martin, could you
>> re-open the bug (I don't have the permissions to do so)?
>
> Standard input is not ignored if the -i (--interactive) flag is passed.
>  However, you cannot start a query with -s and finish it from stdin
> which is what you originally wanted.  And I don't see in the
> documentation that you can.  If you see it, please point it out.
>
> As far as I can see, the text you quoted above is correct.
>
>> Greetings,
>> Wouter
>>
>>
>>
>> 2009/7/20 SourceForge.net :
>>> Bugs item #2787825, was opened at 2009-05-06 14:21
>>> Message generated for change (Comment added) made by mlkersten
>>> You can respond by visiting:
>>> https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2787825&group_id=56967
>>>
>>> Please note that this message will contain a full copy of the comment 
>>> thread,
>>> including the initial issue submission, for this request,
>>> not just the latest update.
>>> Category: Mapi
>>> Group: Clients CVS Head
>>>> Status: Closed
>>>> Resolution: Wont Fix
>>> Priority: 5
>>> Private: No
>>> Submitted By: Wouter Alink (vzzzbx)
>>> Assigned to: Nobody/Anonymous (nobody)
>>> Summary: mclient: stdin + statement
>>>
>>> Initial Comment:
>>> It seems that there is a  problem with both providing data via stdin and 
>>> via the -s feature in mclient.  (see example below).
>>>
>>> A possible solution could perhaps be to forbid this use. Another solution 
>>> would be to define a behaviour: either read the '-s' first or the stdin 
>>> first. (perhaps this already is the case, but I couldn't find any 
>>> documentation about it)
>>>
>>> $ cat data.dat
>>> 1
>>> 2
>>> 3
>>> 4
>>> 5
>>> $ N=4; head -n $N data.dat | mclient -lsql -p50151 -dtest -s "copy $N 
>>> records into aap from STDIN;"
>>> MAPI  = mone...@localhost:50151
>>> QUERY = copy 4 records into aap from STDIN;
>>> ERROR = !SQLException:sql:value ';' while parsing ';' from line 0 field 0 
>>> not inserted, expecting type int
>>>        !SQLException:importTable:failed to import table
>>>
>>>
>>> --
>>>
>>>> Comment By: Martin Kersten (mlkersten)
>>> Date: 2009-07-20 21:48
>>>
>>> Message:
>>> Standard input is ignored in combination with -s.
>>> Closing it as at best it can be considered a niche feature request.
>>>
>>> --
>>>
>>> Comment By: Wouter Alink (vzzzbx)
>>> Date: 2009-05-07 21:55
>>>
>>> Message:
>>> as discussed on the monetdb-users list, using either the -s _or_ the stdin
>>> works fine (except for other reported/unreported bugs), but the combination
>>> fails. (stefan's example works fine).
>>>
>>> I can very well imagine that using a combination should not be allowed
>>> (and should not even become a feature request), but I feel that the current
>>> message is not very helpful.
>>>
>>> And, actually (I hadn't thought of this o

[Monetdb-developers] MonetDB/XQuery: reading XML files from TAR archives

2009-08-27 Thread Wouter Alink
Hello devs,

Roberto and I yesterday discussed that it would be useful to be able
to load (compressed) XML collections directly into MonetDB/XQuery.
The attached diff provides a new feature for loading multiple XML docs
directly from tar files.
Usage: "mclient -lxq -C " and pass a tarfile via stdin, see
example below.

My question: is this useful enough to make it into MonetDB? And if so,
is the current syntax appropriate. Comments are appreciated.

Greetings,
Wouter


$ mkdir xmlfiles
$ echo "" > xmlfiles/aap.xml
$ echo "" > xmlfiles/beer.xml
$ tar cf xmlfiles.tar xmlfiles
$ mclient -lxq -C xmlfiles < xmlfiles.tar
Copying TAR file into collection: 'xmlfiles'
Name: xmlfiles/beer.xml Length: 7
Name: xmlfiles/aap.xml Length: 7
$ echo 'pf:documents("xmlfiles")' | mclient -lxq
xmlfiles/aap.xml,
xmlfiles/beer.xml
$


tarpatch.diff
Description: Binary data
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


Re: [Monetdb-developers] MonetDB/XQuery: reading XML files from TAR archives

2009-08-27 Thread Wouter Alink
Hello Djoerd,

Thanks for the feedback. One reason (that i can see) to do it from
stdin is so that compression can be used (without having to be aware
of it), for example:

bzcat collection.tar.bz2 | mclient -lxq -C collection

But I do agree with you that it would be useful to have an XQuery
function too, as not everyone is using the mclient interface.

Greetings,
Wouter

Oh yes, and i forgot cvs does not unify its diff by default... hereby
the unified diff for the clients package.
(sooner or later I will learn to do things right the first time :)

2009/8/27 Djoerd Hiemstra :
> Hi Wouter,
>
> Sounds very useful to me!
> Why is it not simply changed in pf:add-doc(), or put in a new function
> pf:add-archive()?
>
> Best,  Djoerd.
>
> Wouter Alink schreef:
>> Hello devs,
>>
>> Roberto and I yesterday discussed that it would be useful to be able
>> to load (compressed) XML collections directly into MonetDB/XQuery.
>> The attached diff provides a new feature for loading multiple XML docs
>> directly from tar files.
>> Usage: "mclient -lxq -C " and pass a tarfile via stdin, see
>> example below.
>>
>> My question: is this useful enough to make it into MonetDB? And if so,
>> is the current syntax appropriate. Comments are appreciated.
>>
>> Greetings,
>> Wouter
>>
>>
>> $ mkdir xmlfiles
>> $ echo "" > xmlfiles/aap.xml
>> $ echo "" > xmlfiles/beer.xml
>> $ tar cf xmlfiles.tar xmlfiles
>> $ mclient -lxq -C xmlfiles < xmlfiles.tar
>> Copying TAR file into collection: 'xmlfiles'
>> Name: xmlfiles/beer.xml Length: 7
>> Name: xmlfiles/aap.xml Length: 7
>> $ echo 'pf:documents("xmlfiles")' | mclient -lxq
>> > collection="xmlfiles">xmlfiles/aap.xml,
>> > collection="xmlfiles">xmlfiles/beer.xml
>> $
>>
>> 
>>
>> --
>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
>> trial. Simplify your report design, integration and deployment - and focus on
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>> 
>>
>> ___
>> Monetdb-developers mailing list
>> Monetdb-developers@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/monetdb-developers
>>
>
>


tarpatch.diff
Description: Binary data
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


[Monetdb-developers] other than bug things...

2009-11-27 Thread Wouter Alink
Hello Developers,

Usually I try to report bugs (and try to find the worst of the
system), but just this morning I noticed my mserver5 instance
(Aug2009), which I have been intensively querying over the last days
with ten-thousands of reasonably complex queries and sometimes with
more than 20 queries in parallel. It just passed the 4800 minutes of
actual cpu time (= 80 hours of hard work) and still going strong. I
thought it was worth mentioning!

Cheers,
Wouter

p.s. roberto, actually it's your mserver instance...

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


[Monetdb-developers] hashjoin and strHash

2009-12-18 Thread Wouter Alink
Dear developers,

I would like to propose a change in GDK and hear opinions. It is about
the following issue:

in the BATjoin code, if there is no possibility to do a fetch or merge
join, a hashjoin is performed. A hashtable is created for the smallest
BAT. The reasons (i could think of) for choosing the smallest BAT for
the hashtable are that less space is required for the hashtable (which
in turn causes less cache misses when doing a lookup) and also because
the hashfunction used is assumed to be very inexpensive (it needs to
be calculated for each item in the large bat each time a join is
performed).
I can see that the hashfunction can be very efficient for data types
without indirection, but I feel that for data types like strings in
some cases this is a little different. If a string BAT for example
contains many different values (i.e. is not a bat which contains
enumeration values) the hashfunction will not be inexpensive anymore
(many cache misses), as each hashfunction call needs to hash a whole
(arbitrary long) string at an arbitrary place in the heap.

Is it perhaps possible to specify that, when a BAT of type 'str' has
many different values a hashtable may be build on the large BAT
instead of on the small BAT?

Reason that I ask this: I was analysing costs of a query in which I
had a few short strings (26 tuples, 1-column table: varchar) which I
wanted to look up in a dictionary (9M tuples, 2-column table:
int,varchar).  "SELECT a.id FROM longlist AS a JOIN smalllist as b ON
a.strvalue=b.strvalue;"
The result is a small list of integers (26 or less tuples). This
operation currently takes roughly 1.5 seconds for a hot run, mostly
due to 9M strHash operations. By applying the patch below the
execution time for a hot run dropped down to .01 seconds. The
performance gain is caused by only having to perform strHash on the
items in the small bat once the hashtable for the large bat has been
created.

Any suggestions whether such a change is useful? Which benchmarks will
be influenced?

I guess this code change is probably not useful for large string BATs
with only few different values, but perhaps a guess could be made how
diverse the strings in a bat are (by taking a sample or perhaps simply
by looking at the ratio batsize/heapsize), and based on that determine
whether to build it on the large or small BAT?

Greetings,
Wouter


Index: src/gdk/gdk_relop.mx
===
RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v
retrieving revision 1.167.2.4
diff -u -r1.167.2.4 gdk_relop.mx
--- src/gdk/gdk_relop.mx20 Nov 2009 13:04:06 -  1.167.2.4
+++ src/gdk/gdk_relop.mx18 Dec 2009 14:59:13 -
@@ -1232,7 +1232,12 @@
 @-
 hash join: the bread&butter join of monet
 @c
-   /* Simple rule, always build hash on the smallest */
+   /* Simple rule, always build hash on the smallest,
+except when it is a string-join, then we do the opposite */
+   if (swap && rcount < lcount && l->ttype == TYPE_str) {
+   ALGODEBUG THRprintf(GDKout, "#BATjoin:
BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n",
estimate);
+   return BATmirror(BAThashjoin(BATmirror(r),
BATmirror(l), estimate));
+   }
if (swap && rcount > lcount) {
ALGODEBUG THRprintf(GDKout, "#BATjoin:
BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n",
estimate);

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


Re: [Monetdb-developers] hashjoin and strHash

2009-12-19 Thread Wouter Alink
Lefteris, you are correct in that i meant 'second time the query was
run' when I wrote 'hot run'.

I see that at GDK level reuse cannot be estimated. Although with
current hardware which has an abundance of memory, and the fact that
strings take up much more storage than a single BUN (so a hash-entry
is usually relatively small compared to its data) GDK might weigh the
additional costs. GDK also decides which things to keep in memory or
throw it out, which in turn is also based on reuse.
The costs for performing the initial join are dominated by the strHash
function, and building the hashtable on the big BAT or the smaller BAT
makes (almost) no difference, except for the additional memory use. If
on such a big bat again a join is performed, it will be beneficial to
have the hashtable in place.

What I was hoping for were explanations of situations where it makes
no sense to build the hashtable on the bigger string BAT, but a good
counter-example I haven't seen. In general, i can see, it would not be
beneficial if the big BAT is not joined twice, but if it doesn't hurt
too much, couldn't it just be the default?

Eventually I would like to be using the SQL layer only. Here there
would be plenty of tables with string-columns, and some will be joined
against. Should a MAL optimizer detect that I am about to join two
string-BATs, and that one BAT is bigger than the other and has many
different values, and therefore should build a hashtable on the bigger
one? The MAL optimizer can only guess about my next query (although I
agree that it could do a better job at guessing), and calculating
heapsize/batsize seems to be an operation that is also difficult to do
on a MAL layer.

is really nobody in favour of changing the behavior of joining string
bats for large bats with many different values? well, than I give up.
Wouter


2009/12/18 Stefan Manegold :
> Hi Wouter,
>
> in the lines of Lefteris' reply:
> for a single join with no hash table present a priori, the number of hash
> function calls is euqal to the number of BUNs in both BATs; for each inner
> BUN the function need to be called to build the hash table, for each outer
> BUN it needs to be called to probe the hash table. Hence, the pure hashing
> costs are independent of which BAT is inner and which outer.
> Given that, the reason the choose the smaller as inner is indeed to increase
> spacial locallity (and thus performance) of the inherently random access
> while building and probing the hashtable.
>
> As Lefteris pointed out, the "operational optimization" in GDK is a pure
> peephole optimization dealing only with the very operation at hand. I.e., in
> general it cannot anticipate the benefits of future re-use of efforts, like
> investing in the (more expensive) building of a larger hash table to be able
> to re-use this in several later operations --- which IMHO is independent of
> the data type. Such descisions need to be made at higher levels, either in
> MAL optimizers or in the front-end that generates the MAL plan.
>
> Stefan
>
>
> On Fri, Dec 18, 2009 at 05:01:07PM +0100, Lefteris wrote:
>> Hi Wouter,
>>
>> funny think, I had the same exact problem and we were thinking about
>> this issue. The idea here is that this observation for strings might
>> not be always true, and it is a situation that cannot be always
>> determined on the kernel level. Correct me if I am wrong, but your
>> benefit on query comes because the hash in the large BAT is already
>> there, that's why the second time you get 0.01? You mention hot run so
>> I assume the BAT is already there with a hash index. While in the
>> original situation the hash is on the small BAT thus you don't benefit
>> from the hot run. But if a big BAT of strings is to be used again it
>> is unknown in the gdk level. So, I solved the problem by forcing the
>> hash index on the big BAT in a higher level (in Monet5) where it knows
>> something more about the application (in my case RDF store). Can you
>> do instead that? force the hash index in a higher level for you
>> application? If gdk see a hash index already there, then it will
>> choose that independent of the size.
>>
>> lefteris
>>
>> On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink  wrote:
>> > Dear developers,
>> >
>> > I would like to propose a change in GDK and hear opinions. It is about
>> > the following issue:
>> >
>> > in the BATjoin code, if there is no possibility to do a fetch or merge
>> > join, a hashjoin is performed. A hashtable is created for the smallest
>> > BAT. The reasons (i could think of) for choosing the smallest BAT for
>> > the hashtable are that less space is required for the h

Re: [Monetdb-developers] Memory use

2009-12-19 Thread Wouter Alink
Hello Guido,

At the end of your COPY INTO transaction, your database will be saved
on disk, to give some guarantee that the data is on a sort of
non-volatile storage (see also
http://en.wikipedia.org/wiki/ACID#Durability). Besides storing the
data on disk, MonetDB tries to fully exploit your available (volatile)
main memory to answer your queries quickly, and tries to keep as much
of the data as possible in main memory (this is managed by MonetDB
internally).

There is one way you could trick a DBMS to use memory only: make
volatile storage to appear as non-volatile storage (for example by
creating a ram-disk).
You could also use the database in such a way that you never commit a
transaction (leave the transaction open, and roll-back at the end),
although the DBMS might still decide at some point to flush the data
to disk.

None of these tricks are recommended. For a DBMS (and I think this
holds for any proper DBMS) to function correctly, you should provide
some non-volatile storage, so that durability can be guaranteed.

Hope this answers your question,
Wouter


2009/12/18 Voornaam Achternaam :
>
> When I try to fill a database with the COPY INTO command, the data will
> (depending on the file used) either go:
> - Fully into Memory.
> - To HDD.
> - A combination of Memory and HDD space.
>
> Is there a way to configure MonetDB so that it always uses Memory only?
>
>
> Thanks in advance.
> --
> View this message in context: 
> http://old.nabble.com/Memory-use-tp26843876p26843876.html
> Sent from the monetdb-developers mailing list archive at Nabble.com.
>
>
> --
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast and easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev
> ___
> Monetdb-developers mailing list
> Monetdb-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-developers
>
>

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers


Re: [Monetdb-developers] [Monetdb-sql-checkins] sql/src/server rel_schema.mx, Feb2010, 1.2, 1.2.2.1

2010-02-05 Thread Wouter Alink
fantastisch!

2010/2/5 Niels Nes :
> Update of /cvsroot/monetdb/sql/src/server
> In directory 
> sfp-cvsdas-1.v30.ch3.sourceforge.com:/tmp/cvs-serv25790/src/server
>
> Modified Files:
>      Tag: Feb2010
>        rel_schema.mx
> Log Message:
> fixed bug in handeling topn in create table as select with data.
>
>
>
> Index: rel_schema.mx
> ===
> RCS file: /cvsroot/monetdb/sql/src/server/rel_schema.mx,v
> retrieving revision 1.2
> retrieving revision 1.2.2.1
> diff -u -d -r1.2 -r1.2.2.1
> --- rel_schema.mx       11 Jan 2010 10:29:17 -      1.2
> +++ rel_schema.mx       5 Feb 2010 10:18:16 -       1.2.2.1
> @@ -127,9 +127,14 @@
>  static char *
>  as_subquery( mvc *sql, sql_table *t, sql_rel *sq, dlist *column_spec )
>  {
> +        sql_rel *r = sq;
> +
> +        if (is_topn(r->op))
> +                r = sq->l;
> +
>        if (column_spec) {
>                dnode *n = column_spec->h;
> -               node *m = sq->exps->h;
> +               node *m = r->exps->h;
>
>                for (; n; n = n->next, m = m->next) {
>                        char *cname = n->data.sval;
> @@ -143,7 +148,7 @@
>        } else {
>                node *m;
>
> -               for (m = sq->exps->h; m; m = m->next) {
> +               for (m = r->exps->h; m; m = m->next) {
>                        sql_exp *e = m->data;
>                        char *cname = exp_name(e);
>                        sql_subtype *tp = exp_subtype(e);
>
>
> --
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> ___
> Monetdb-sql-checkins mailing list
> monetdb-sql-check...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-sql-checkins
>
>

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers