[basex-talk] Controlling json serialization of JSON-XML

2018-05-25 Thread Hondros, Constantine (ELS-AMS)
Hello all, I'm expecting the answer to be no, but is it possible to influence the JSON serialization of elements at all? xml-to-json( http://www.w3.org/2005/xpath-functions";> 10111234 ) Results in {"number":1.0111234E7} I read this in the spec, so it looks like the number is always

Re: [basex-talk] Distributed XML processing on Apache Spark

2018-05-08 Thread Hondros, Constantine (ELS-AMS)
You want to run the same query over a large set of XML chunks and persist the result as a Dataset or DataFrame? Just store the XML chunks in a sequence file or parquet and use BaseX as a query processor. Map over your input partitions and create an in-memory database from each chunk and apply t

[basex-talk] Basex API: repeated execution of same query over different input

2017-12-27 Thread Hondros, Constantine (ELS-AMS)
I want to be able to execute the same XQuery hundreds of thousands (even millions) of times over different XML input. Is there a way to avoid re-compiling the query itself every time? It looks to me like using either XQuery.execute(ctx) or the QueryProcessor class will end up re-parsing the que

Re: [basex-talk] XQuery 3.1/Build array of maps dynamically

2017-12-12 Thread Hondros, Constantine (ELS-AMS)
Ah, fantastic. Thank you Christian (and Martin). Yes, I was using an incorrect constructor for Arrays in this instance. Many thanks, Constantine -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 12 December 2017 13:41 To: Hondros, Constantine (ELS-AMS) Cc

Re: [basex-talk] XQuery 3.1/Build array of maps dynamically

2017-12-12 Thread Hondros, Constantine (ELS-AMS)
; map { "authors": [ (for $b in $authors/author return map{"fn":$b/*:given-name/text(), "ln":$b/*:surname/text()}) ] } -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 12 December 2017 13:20 To:

[basex-talk] XQuery 3.1/Build array of maps dynamically

2017-12-12 Thread Hondros, Constantine (ELS-AMS)
Hello all, I want to dynamically build a map where the value is an array of maps. I am having a problem with the syntax. (My use-case is to deliver this as JSON for simple ingestion to Spark, by the way). Here's my test code, which translates cleanly into JSON: declare option output:method 'js

Re: [basex-talk] How to save Table views as spreadsheet

2016-12-14 Thread Hondros, Constantine (ELS-AMS)
Hi there, Unless I'm mistaken, I think the easiest thing to do is to serialise as CSV and import into something like Excel. The CSV module can help you here: http://docs.basex.org/wiki/CSV_Module Although thinking about it, perhaps other folks have tried wrapping a library like Apache POI (htt

[basex-talk] Serialize a string to unencode XML entities?

2016-08-26 Thread Hondros, Constantine (ELS-AMS)
Ok, this must be possible. Can someone point me towards the correct serialization $params such that let $doi := 10.1002/1521-4141(200207)32:7<2021::AID-IMMU2021>3.0.CO;2-J return serialize($doi, $params) will return (to a calling Java application) the UTF-8 string 10.1002/1521-4141(200207)32:7

Re: [basex-talk] BaseX GUI doesn't show combining glyphs properly

2016-06-17 Thread Hondros, Constantine (ELS-AMS)
Hi Kristian, This came up a few times before on the list: it's a issue with the default font that BaseX uses in the GUI (Consolas for the GUI on Windows). Try Options -> Fonts and switching to a more complete font set like Arial Unicode Cheers, C. -Original Message- From: basex-talk-b

Re: [basex-talk] Problems with IPA diacritics display

2016-04-14 Thread Hondros, Constantine (ELS-AMS)
Hello Jack, Probably down to the default font that the Basex GUI uses, which is Consolas. In the past I've simply set the GUI to use a Unicode font like Arial Unicode MS and characters display fine. Options => fonts Cheers, Constantine From: basex-talk-boun...@mailman.uni-konstanz.de [mailto

Re: [basex-talk] out of main memory

2016-03-01 Thread Hondros, Constantine (ELS-AMS)
Hi Michele, 1. Go to /bin 2. Open basexgui.bat (Windows) or basexgui (*nix) 3. Increase value of -Xmx in BASEX_JVM variable, from 512 MB to something that your system can support 4. Restart the GUI For example, here is the line from my config - I allocate almost 5 Gi

Re: [basex-talk] Indenting XML query result from a function

2016-02-16 Thread Hondros, Constantine (ELS-AMS)
D'oh. Of course :)` Thanks. -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 16 February 2016 16:00 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Indenting XML query result from a function For p

Re: [basex-talk] Indenting XML query result from a function

2016-02-16 Thread Hondros, Constantine (ELS-AMS)
t - beyond rolling my own function. -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 16 February 2016 13:59 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Indenting XML query result from a function Hi C

[basex-talk] Indenting XML query result from a function

2016-02-16 Thread Hondros, Constantine (ELS-AMS)
This is probably easy, but I can't find an inbuilt method or function to do this easily. I have a function that will return certain nodes from a DB. The DB was built with CHOP off - plenty of white-space present between nodes. declare function dq:errorGivenName($db as xs:string) as element()* {

Re: [basex-talk] Spectactularly slow performance with db:open vs. db:text

2016-02-15 Thread Hondros, Constantine (ELS-AMS)
t: 15 February 2016 10:44 To: Hondros, Constantine (ELS-AMS) Cc: BaseX Subject: Re: [basex-talk] Spectactularly slow performance with db:open vs. db:text Hi Constantine, > for $a in (db:open('DB1')/item/order-id) return > if (db:open('DB2')//order-id[. = $a]) then >

Re: [basex-talk] JavaDoc link?

2016-02-15 Thread Hondros, Constantine (ELS-AMS)
This is where I tend to get the Javadoc from – though I note it’s a few releases out of date. http://docs.basex.org/javadoc/ C. From: basex-talk-boun...@mailman.uni-konstanz.de [mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of buddyonweb-softw...@yahoo.com Sent: 13 February 201

[basex-talk] Spectactularly slow performance with db:open vs. db:text

2016-02-15 Thread Hondros, Constantine (ELS-AMS)
Hello Basexers, I'm getting such a low performance on a relatively simple join between two databases that I feel there must be something going wrong here. I can provide the sources if necessary, but basically DB1 is 26 MB, about 80,000 small documents; DB2 is 47 MB, about 18,500 small documents

[basex-talk] Reference the currently open database?

2016-02-11 Thread Hondros, Constantine (ELS-AMS)
Hello all, Maybe a dumb question, but I need to reference the currently open database. Is there a more elegant way than this? db:name((*)[1]) Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registe

[basex-talk] tumbling window and db:add in same expression

2016-01-29 Thread Hondros, Constantine (ELS-AMS)
Hello all, BaseX doesn't seem to like the query below. Any particular reason - it makes sense to me. I'm trying to add items to a DB incrementally to (attempt to) avoid running into out of memory exceptions when cutting a slice of a large DB into a smaller one. for tumbling window $w in db:ope

Re: [basex-talk] group-by behaviour for clustering XML fragments

2016-01-11 Thread Hondros, Constantine (ELS-AMS)
) => results in "Institut für Organische Chemie der Universität Heidelberg" as a grouping key. Thanks for pointing me in the right direction. C. -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 09 January 2016 18:49 To: Hondros, Constantine (ELS-A

[basex-talk] group-by behaviour for clustering XML fragments

2016-01-08 Thread Hondros, Constantine (ELS-AMS)
Hello all, I'm using BaseX to cluster a set of millions of small XML fragments which look something like this: Institut für Organische Chemie der Universität Heidelberg I need to cluster based on fragment similarity - so taking into account elements, attributes and text nodes. If I

Re: [basex-talk] db:add speed proportional to DB size?

2016-01-06 Thread Hondros, Constantine (ELS-AMS)
an.gr...@gmail.com] Sent: 06 January 2016 13:57 To: Hondros, Constantine (ELS-AMS) Cc: BaseX Subject: Re: [basex-talk] db:add speed proportional to DB size? Hi Hondros, > Processing hundreds of thousands of zips, using db:add to to append > small XML fragments from each into a single DB, I noti

[basex-talk] db:add speed proportional to DB size?

2016-01-06 Thread Hondros, Constantine (ELS-AMS)
Hello all, Processing hundreds of thousands of zips, using db:add to to append small XML fragments from each into a single DB, I notice that the process becomes successively slower. Without having done any proper profiling, and aware that I might be looking in the wrong direction here, would it

[basex-talk] db:create, namespaces not stripped when input is variable?

2015-11-26 Thread Hondros, Constantine (ELS-AMS)
Hello all, Any particular reason behind this observed behaviour? (Basex 8.3). .. [1] .. declare variable $test := http://www.elsevier.com/xml/ani/common"; xmlns="http://www.else

Re: [basex-talk] zip:update-entries - unexpected results

2015-11-18 Thread Hondros, Constantine (ELS-AMS)
riginal Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 17 November 2015 19:31 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] zip:update-entries - unexpected results We didn’t spend energy in the ZIP Module for a long ti

[basex-talk] zip:update-entries - unexpected results

2015-11-16 Thread Hondros, Constantine (ELS-AMS)
Hi all, I'm attempting to create a modified copy a zipfile using zip:update-entries, but my output is a zip structure with zero-byte files. Am I missing a step? I've followed the example in: http://docs.basex.org/wiki/ZIP_Module#zip:update-entries Code snippet as follows: declare function uti

Re: [basex-talk] Best way to batch up a set of DB creates, updates and queries?

2015-11-01 Thread Hondros, Constantine (ELS-AMS)
Hi Fabrice, Nice tip, I can see this working. Thanks. From: Etanchaud Fabrice [mailto:fabrice.etanch...@horanet.com] Sent: 30 October 2015 14:39 To: Hondros, Constantine (ELS-AMS); 'basex-talk@mailman.uni-konstanz.de' Subject: RE: [basex-talk] Best way to batch up a set of DB create

Re: [basex-talk] Best way to batch up a set of DB creates, updates and queries?

2015-10-30 Thread Hondros, Constantine (ELS-AMS)
ent: 30 October 2015 12:38 To: Hondros, Constantine (ELS-AMS); 'basex-talk@mailman.uni-konstanz.de' Subject: RE: [basex-talk] Best way to batch up a set of DB creates, updates and queries? Hello Constantine, Facing the same problem, as commands can be written in xml, I found the foll

Re: [basex-talk] Best way to batch up a set of DB creates, updates and queries?

2015-10-30 Thread Hondros, Constantine (ELS-AMS)
onstanz.de] On Behalf Of Hondros, Constantine (ELS-AMS) Sent: 29 October 2015 18:12 To: 'basex-talk@mailman.uni-konstanz.de' Subject: [basex-talk] Best way to batch up a set of DB creates, updates and queries? I'm looking for the best way to batch up a series of updating operation

[basex-talk] Best way to batch up a set of DB creates, updates and queries?

2015-10-29 Thread Hondros, Constantine (ELS-AMS)
I'm looking for the best way to batch up a series of updating operations into a single pipeline that can be run at once. To give you an idea, we have to create multiple databases, perform a number of updating post-processes on each, before finally pulling certain fields out. I've got a lot of p

Re: [basex-talk] Which zip did BaseX choke on?

2015-10-19 Thread Hondros, Constantine (ELS-AMS)
onstantine -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 19 October 2015 13:58 To: Hondros, Constantine (ELS-AMS) Cc: Tim Thompson; BaseX Subject: Re: [basex-talk] Which zip did BaseX choke on? Hi Constantine, I guess that try/catch won't help you here, be

Re: [basex-talk] Which zip did BaseX choke on?

2015-10-16 Thread Hondros, Constantine (ELS-AMS)
ts, BaseX users? C. From: Tim Thompson [mailto:timat...@gmail.com] Sent: 16 October 2015 16:40 To: Hondros, Constantine (ELS-AMS) Cc: Alexander Holupirek; BaseX Subject: Re: [basex-talk] Which zip did BaseX choke on? I may be wrong, but I think you need to wrap the $err:code in db:output(): see

Re: [basex-talk] Which zip did BaseX choke on?

2015-10-16 Thread Hondros, Constantine (ELS-AMS)
irek [mailto:a...@holupirek.de] Sent: 16 October 2015 15:23 To: Hondros, Constantine (ELS-AMS) Cc: BaseX Subject: Re: [basex-talk] Which zip did BaseX choke on? Have you tried using try/catch? for zip in zips return try { unzip() } catch ... Am 16.10.2015 um 15:16 schrieb Hondros, Constantine

[basex-talk] Which zip did BaseX choke on?

2015-10-16 Thread Hondros, Constantine (ELS-AMS)
I love BaseX for the simplicity it brings to XML handling. But this is a problem I have not encountered before. I am creating a DB from about 17,000 small zipfiles, each containing a directory structure and somewhere within each, some XML. BaseX chokes on one of these files giving the error: "

Re: [basex-talk] Create DB from large XML extraction without root element?

2015-05-19 Thread Hondros, Constantine (ELS-AMS)
al Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 19 May 2015 12:14 To: Hondros, Constantine (ELS-AMS) Cc: Dirk Kirsten; basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Create DB from large XML extraction without root element? Hi Constantine, you could use som

Re: [basex-talk] Create DB from large XML extraction without root element?

2015-05-19 Thread Hondros, Constantine (ELS-AMS)
Thanks, I suspected I would have to tokenise on an end element. Cheers, Constantine From: Dirk Kirsten [mailto:d...@basex.org] Sent: 19 May 2015 11:46 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Create DB from large XML extraction without

[basex-talk] Create DB from large XML extraction without root element?

2015-05-19 Thread Hondros, Constantine (ELS-AMS)
Hello all, I have a huge extraction of XML documents in a single file, without a root element, something along these lines: ... Other than by editing it to add a root element, is there a clever way of creating a Basex DB from this? Thanks for any pointers, C.

[basex-talk] DB size sanity check

2015-05-18 Thread Hondros, Constantine (ELS-AMS)
Hi all, Database created from 17692 MB: resulting DB size 30450 MB. CHOP is set to false, but for good reasons. TEXTINDEX and ATTRINDEX are true, but FTINDEX false. Can I just sanity check this size of DB? It surprises me a little. (Basex 8.0.2 by the way). Thanks in advance, Constantine __

Re: [basex-talk] Pulling files from multiple zips into one DB

2015-05-04 Thread Hondros, Constantine (ELS-AMS)
brice Etanchaud [mailto:fetanch...@questel.com] Sent: 04 May 2015 14:19 To: Hondros, Constantine (ELS-AMS); basex-talk@mailman.uni-konstanz.de Subject: RE: Pulling files from multiple zips into one DB If your archives contain a mix of raw and xml files, Have a look at the old zip module, that may

[basex-talk] Pulling files from multiple zips into one DB

2015-05-04 Thread Hondros, Constantine (ELS-AMS)
Hello all, I need to merge any XML files located in 500 GB of zips into a single DB for further analysis. Is there any faster or more efficient way to do it in BaseX than this? TIA. for $zip in file:list($src, false(), '*.zip') let $arch := file:read-binary(concat($src, '\', $zip)) for $a in

[basex-talk] Basex 8.0: white-spaces left after delete node?

2015-02-11 Thread Hondros, Constantine (ELS-AMS)
Hi all, Using Basex 8.0, I notice a difference in behaviour compared to Basex 7.82 that I thought worth mentioning. After creating a 5 GB db, then pruning it down to roughly 2GB using Xquery 'delete node', and after running db:optimize, in the GUI Result window I see vst areas of white-spac

Re: [basex-talk] Pruned, optimized DB shows same document count, db size

2015-01-16 Thread Hondros, Constantine (ELS-AMS)
Yup - and for the record, BaseX simply rocks. Great work, guys! From: Fabrice Etanchaud [mailto:fetanch...@questel.com] Sent: 12 January 2015 17:24 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] Pruned, optimized DB shows same document count

Re: [basex-talk] Pruned, optimized DB shows same document count, db size

2015-01-12 Thread Hondros, Constantine (ELS-AMS)
Hi Fabrice, I used the Xquery delete node command, rather than the (document-oriented) db:delete. I guess that explains the difference. C. From: Fabrice Etanchaud [mailto:fetanch...@questel.com] Sent: 12 January 2015 17:01 To: Hondros, Constantine (ELS-AMS); basex-talk@mailman.uni-konstanz.de

Re: [basex-talk] Pruned, optimized DB shows same document count, db size

2015-01-12 Thread Hondros, Constantine (ELS-AMS)
Aha, of course, the all boolean. I did not try that, and will. DB metadata I get via the GUI menu : Database -> Open and Manage Cheers, C. From: Fabrice Etanchaud [mailto:fetanch...@questel.com] Sent: 12 January 2015 16:14 To: Hondros, Constantine (ELS-AMS); basex-talk@mailman.uni-konstanz

[basex-talk] Pruned, optimized DB shows same document count, db size

2015-01-12 Thread Hondros, Constantine (ELS-AMS)
Hello all, I've pruned about half the documents of a +-5GB database using XQuery delete. I then optimised the database using db:optimize. The database metadata still shows the original number of documents, and the overall db filesize remains roughly the same. I was sort of hoping this pruning

[basex-talk] CSV multiple character separators

2014-04-14 Thread Hondros, Constantine (ELS-AMS)
A small request for your growing backlog ... it would be wonderful if your CSV import wizard was able to accept multiple-character separators. We have oodles of data in which the double at-sign '@@' signifies a field separator. No ... don't ask me why! Keep up the good work! C. __

Re: [basex-talk] XQuery comments in Return clause?

2014-04-09 Thread Hondros, Constantine (ELS-AMS)
XML comments ... of course ;-) Thanks for your incredibly fast response, Christian. -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 09 April 2014 23:58 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk

[basex-talk] XQuery comments in Return clause?

2014-04-09 Thread Hondros, Constantine (ELS-AMS)
Hi all, I can't tell if this is a bug or not. I commented out a line in my Return clause so that I could test performance. To my surprise, the comment was interpreted as part of the literal response, and the expression within the commented line was evaluated. If this is expected behaviour, is

Re: [basex-talk] Efficient query for duplicates

2014-04-09 Thread Hondros, Constantine (ELS-AMS)
to approve it when I tried it once, and the habit stuck ;-) Cheers, Constantine -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 09 April 2014 14:51 To: Dirk Kirsten Cc: Hondros, Constantine (ELS-AMS); basex-talk@mailman.uni-konstanz.de Subject: Re: [

[basex-talk] Efficient query for duplicates

2014-04-09 Thread Hondros, Constantine (ELS-AMS)
I'm running out of memory (1.5 GB allocated) when querying for duplicate node values over a fairly flat XML database of approximately 450 MB. Can anyone suggest a more memory-efficient approach to framing this query than iterating over distinct-values as I do below? I'm hoping that there are so

[basex-talk] Performance of matches() vs castable as xs:integer

2014-04-03 Thread Hondros, Constantine (ELS-AMS)
Hello all, I'm scanning millions of XML records imported from CSV looking for instances of 'bib_rec_id' which are non-numeric. Which of these two is if-statements likely to complete earlier? for $a in (/csv/record/bib_rec_id) return if ($a castable as xs:integer) then ... blah or if

Re: [basex-talk] Accessing DOCTYPE information after DB creation?

2014-03-28 Thread Hondros, Constantine (ELS-AMS)
Thanks all, Unfortunately this is legacy content – and there is an unbelievable amount of it too. So, I will probably pre-process the content and write the DTD info out into an element or PI node. org.basex.core.Command.setInput(org.xml.sax.InputSource is) looks like a probable place to do it

[basex-talk] Accessing DOCTYPE information after DB creation?

2014-03-28 Thread Hondros, Constantine (ELS-AMS)
Hi all, I would really like to be able to query a large corpus of documents to get names and counts of the DTDs which are declared in the (somewhat old-fashioned now) DOCTYPE declaration: Is there any way to get BaseX to preserve this information? Can I rewrite the doctype declaration int

Re: [basex-talk] Bug in tar reading support

2014-03-07 Thread Hondros, Constantine (ELS-AMS)
Many thanks for your swift attention to this problem, Christian! I will download the snapshot and have a good test. Cheers, Constantine -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 07 March 2014 12:52 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk

[basex-talk] Bug in tar reading support

2014-03-05 Thread Hondros, Constantine (ELS-AMS)
Hi Basex team, I have discovered an apparent bug in the tar-reading support new in Basex 7.8.1. Roughly 50% of the files in my tars are not being imported. When the tar is unpacked and imported from the file-system, 100% of files are imported. @Christian: I will mail you privately the location

Re: [basex-talk] Return file source of a given node?

2014-02-28 Thread Hondros, Constantine (ELS-AMS)
Thanks - I should have guessed it would be so easy with Basex. Cheers, Constantine -Original Message- From: Fabrice Etanchaud [mailto:fetanch...@questel.com] Sent: 28 February 2014 11:53 To: Arve Gengelbach; Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject

[basex-talk] Return file source of a given node?

2014-02-28 Thread Hondros, Constantine (ELS-AMS)
Hello all, So, I am using Basex to inspect a gargantuan archive of XML records split over thousands of .tar files (thanks Christian for the recent fix allowing me to create a DB from tar input!). In some cases, I would like to know which file, in which tar archive, was the source for a given q

Re: [basex-talk] Creating a Basex database directly from tarred XML?

2014-02-19 Thread Hondros, Constantine (ELS-AMS)
or. If you are interested, I can make an example available to you. Kind regards, Constantine Hondros -Original Message- From: Christian Grün [mailto:christian.gr...@gmail.com] Sent: 15 February 2014 14:03 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject:

[basex-talk] Creating a Basex database directly from tarred XML?

2014-02-14 Thread Hondros, Constantine (ELS-AMS)
Hello all, Is there a known workaround for creating a db from tarred XML using the Database -> New menu? I can always untar the tarfile first, but it would be sort-of nice just to point at the tarfile and load. TIA, Constantine Elsevier B.V. Registered Office