Hello all,
I'm expecting the answer to be no, but is it possible to influence the JSON
serialization of elements at all?
xml-to-json(
http://www.w3.org/2005/xpath-functions";>
10111234
)
Results in
{"number":1.0111234E7}
I read this in the spec, so it looks like the number is always
You want to run the same query over a large set of XML chunks and persist the
result as a Dataset or DataFrame?
Just store the XML chunks in a sequence file or parquet and use BaseX as a
query processor. Map over your input partitions and create an in-memory
database from each chunk and apply t
I want to be able to execute the same XQuery hundreds of thousands (even
millions) of times over different XML input.
Is there a way to avoid re-compiling the query itself every time? It looks to
me like using either XQuery.execute(ctx) or the QueryProcessor class will end
up re-parsing the que
Ah, fantastic. Thank you Christian (and Martin).
Yes, I was using an incorrect constructor for Arrays in this instance.
Many thanks,
Constantine
-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 12 December 2017 13:41
To: Hondros, Constantine (ELS-AMS)
Cc
;
map {
"authors": [
(for $b in $authors/author return
map{"fn":$b/*:given-name/text(), "ln":$b/*:surname/text()})
]
}
-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 12 December 2017 13:20
To:
Hello all,
I want to dynamically build a map where the value is an array of maps. I am
having a problem with the syntax. (My use-case is to deliver this as JSON for
simple ingestion to Spark, by the way).
Here's my test code, which translates cleanly into JSON:
declare option output:method 'js
Hi there,
Unless I'm mistaken, I think the easiest thing to do is to serialise as CSV and
import into something like Excel.
The CSV module can help you here: http://docs.basex.org/wiki/CSV_Module
Although thinking about it, perhaps other folks have tried wrapping a library
like Apache POI (htt
Ok, this must be possible. Can someone point me towards the correct
serialization $params such that
let $doi :=
10.1002/1521-4141(200207)32:7<2021::AID-IMMU2021>3.0.CO;2-J
return serialize($doi, $params)
will return (to a calling Java application) the UTF-8 string
10.1002/1521-4141(200207)32:7
Hi Kristian,
This came up a few times before on the list: it's a issue with the default font
that BaseX uses in the GUI (Consolas for the GUI on Windows).
Try Options -> Fonts and switching to a more complete font set like Arial
Unicode
Cheers,
C.
-Original Message-
From: basex-talk-b
Hello Jack,
Probably down to the default font that the Basex GUI uses, which is Consolas.
In the past I've simply set the GUI to use a Unicode font like Arial Unicode MS
and characters display fine.
Options => fonts
Cheers,
Constantine
From: basex-talk-boun...@mailman.uni-konstanz.de
[mailto
Hi Michele,
1. Go to /bin
2. Open basexgui.bat (Windows) or basexgui (*nix)
3. Increase value of -Xmx in BASEX_JVM variable, from 512 MB to something
that your system can support
4. Restart the GUI
For example, here is the line from my config - I allocate almost 5 Gi
D'oh. Of course :)`
Thanks.
-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 16 February 2016 16:00
To: Hondros, Constantine (ELS-AMS)
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Indenting XML query result from a function
For p
t - beyond rolling my own
function.
-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 16 February 2016 13:59
To: Hondros, Constantine (ELS-AMS)
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Indenting XML query result from a function
Hi C
This is probably easy, but I can't find an inbuilt method or function to do
this easily.
I have a function that will return certain nodes from a DB. The DB was built
with CHOP off - plenty of white-space present between nodes.
declare function dq:errorGivenName($db as xs:string) as element()* {
t: 15 February 2016 10:44
To: Hondros, Constantine (ELS-AMS)
Cc: BaseX
Subject: Re: [basex-talk] Spectactularly slow performance with db:open vs.
db:text
Hi Constantine,
> for $a in (db:open('DB1')/item/order-id) return
> if (db:open('DB2')//order-id[. = $a]) then
>
This is where I tend to get the Javadoc from – though I note it’s a few
releases out of date.
http://docs.basex.org/javadoc/
C.
From: basex-talk-boun...@mailman.uni-konstanz.de
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of
buddyonweb-softw...@yahoo.com
Sent: 13 February 201
Hello Basexers,
I'm getting such a low performance on a relatively simple join between two
databases that I feel there must be something going wrong here. I can provide
the sources if necessary, but basically DB1 is 26 MB, about 80,000 small
documents; DB2 is 47 MB, about 18,500 small documents
Hello all,
Maybe a dumb question, but I need to reference the currently open database.
Is there a more elegant way than this?
db:name((*)[1])
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The
Netherlands, Registration No. 33156677, Registe
Hello all,
BaseX doesn't seem to like the query below. Any particular reason - it makes
sense to me. I'm trying to add items to a DB incrementally to (attempt to)
avoid running into out of memory exceptions when cutting a slice of a large DB
into a smaller one.
for tumbling window $w in db:ope
)
=> results in "Institut für Organische Chemie der Universität Heidelberg" as a
grouping key.
Thanks for pointing me in the right direction.
C.
-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 09 January 2016 18:49
To: Hondros, Constantine (ELS-A
Hello all,
I'm using BaseX to cluster a set of millions of small XML fragments which look
something like this:
Institut für Organische Chemie der Universität
Heidelberg
I need to cluster based on fragment similarity - so taking into account
elements, attributes and text nodes.
If I
an.gr...@gmail.com]
Sent: 06 January 2016 13:57
To: Hondros, Constantine (ELS-AMS)
Cc: BaseX
Subject: Re: [basex-talk] db:add speed proportional to DB size?
Hi Hondros,
> Processing hundreds of thousands of zips, using db:add to to append
> small XML fragments from each into a single DB, I noti
Hello all,
Processing hundreds of thousands of zips, using db:add to to append small XML
fragments from each into a single DB, I notice that the process becomes
successively slower. Without having done any proper profiling, and aware that I
might be looking in the wrong direction here, would it
Hello all,
Any particular reason behind this observed behaviour? (Basex 8.3).
..
[1]
..
declare variable $test :=
http://www.elsevier.com/xml/ani/common";
xmlns="http://www.else
riginal Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 17 November 2015 19:31
To: Hondros, Constantine (ELS-AMS)
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] zip:update-entries - unexpected results
We didn’t spend energy in the ZIP Module for a long ti
Hi all,
I'm attempting to create a modified copy a zipfile using zip:update-entries,
but my output is a zip structure with zero-byte files. Am I missing a step?
I've followed the example in:
http://docs.basex.org/wiki/ZIP_Module#zip:update-entries
Code snippet as follows:
declare function uti
Hi Fabrice,
Nice tip, I can see this working.
Thanks.
From: Etanchaud Fabrice [mailto:fabrice.etanch...@horanet.com]
Sent: 30 October 2015 14:39
To: Hondros, Constantine (ELS-AMS); 'basex-talk@mailman.uni-konstanz.de'
Subject: RE: [basex-talk] Best way to batch up a set of DB create
ent: 30 October 2015 12:38
To: Hondros, Constantine (ELS-AMS); 'basex-talk@mailman.uni-konstanz.de'
Subject: RE: [basex-talk] Best way to batch up a set of DB creates, updates and
queries?
Hello Constantine,
Facing the same problem, as commands can be written in xml,
I found the foll
onstanz.de] On Behalf Of Hondros,
Constantine (ELS-AMS)
Sent: 29 October 2015 18:12
To: 'basex-talk@mailman.uni-konstanz.de'
Subject: [basex-talk] Best way to batch up a set of DB creates, updates and
queries?
I'm looking for the best way to batch up a series of updating operation
I'm looking for the best way to batch up a series of updating operations into a
single pipeline that can be run at once.
To give you an idea, we have to create multiple databases, perform a number of
updating post-processes on each, before finally pulling certain fields out.
I've got a lot of p
onstantine
-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 19 October 2015 13:58
To: Hondros, Constantine (ELS-AMS)
Cc: Tim Thompson; BaseX
Subject: Re: [basex-talk] Which zip did BaseX choke on?
Hi Constantine,
I guess that try/catch won't help you here, be
ts, BaseX users?
C.
From: Tim Thompson [mailto:timat...@gmail.com]
Sent: 16 October 2015 16:40
To: Hondros, Constantine (ELS-AMS)
Cc: Alexander Holupirek; BaseX
Subject: Re: [basex-talk] Which zip did BaseX choke on?
I may be wrong, but I think you need to wrap the $err:code in db:output(): see
irek [mailto:a...@holupirek.de]
Sent: 16 October 2015 15:23
To: Hondros, Constantine (ELS-AMS)
Cc: BaseX
Subject: Re: [basex-talk] Which zip did BaseX choke on?
Have you tried using try/catch?
for zip in zips return try { unzip() } catch ...
Am 16.10.2015 um 15:16 schrieb Hondros, Constantine
I love BaseX for the simplicity it brings to XML handling. But this is a
problem I have not encountered before.
I am creating a DB from about 17,000 small zipfiles, each containing a
directory structure and somewhere within each, some XML. BaseX chokes on one of
these files giving the error: "
al Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 19 May 2015 12:14
To: Hondros, Constantine (ELS-AMS)
Cc: Dirk Kirsten; basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Create DB from large XML extraction without root
element?
Hi Constantine,
you could use som
Thanks, I suspected I would have to tokenise on an end element.
Cheers,
Constantine
From: Dirk Kirsten [mailto:d...@basex.org]
Sent: 19 May 2015 11:46
To: Hondros, Constantine (ELS-AMS)
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Create DB from large XML extraction without
Hello all,
I have a huge extraction of XML documents in a single file, without a root
element, something along these lines:
...
Other than by editing it to add a root element, is there a clever way of
creating a Basex DB from this?
Thanks for any pointers,
C.
Hi all,
Database created from 17692 MB: resulting DB size 30450 MB.
CHOP is set to false, but for good reasons. TEXTINDEX and ATTRINDEX are true,
but FTINDEX false.
Can I just sanity check this size of DB? It surprises me a little.
(Basex 8.0.2 by the way).
Thanks in advance,
Constantine
__
brice Etanchaud [mailto:fetanch...@questel.com]
Sent: 04 May 2015 14:19
To: Hondros, Constantine (ELS-AMS); basex-talk@mailman.uni-konstanz.de
Subject: RE: Pulling files from multiple zips into one DB
If your archives contain a mix of raw and xml files,
Have a look at the old zip module, that may
Hello all,
I need to merge any XML files located in 500 GB of zips into a single DB for
further analysis. Is there any faster or more efficient way to do it in BaseX
than this? TIA.
for $zip in file:list($src, false(), '*.zip')
let $arch := file:read-binary(concat($src, '\', $zip))
for $a in
Hi all,
Using Basex 8.0, I notice a difference in behaviour compared to Basex 7.82 that
I thought worth mentioning.
After creating a 5 GB db, then pruning it down to roughly 2GB using Xquery
'delete node', and after running db:optimize, in the GUI Result window I see
vst areas of white-spac
Yup - and for the record, BaseX simply rocks. Great work, guys!
From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 12 January 2015 17:24
To: Hondros, Constantine (ELS-AMS)
Cc: basex-talk@mailman.uni-konstanz.de
Subject: RE: [basex-talk] Pruned, optimized DB shows same document count
Hi Fabrice,
I used the Xquery delete node command, rather than the (document-oriented)
db:delete.
I guess that explains the difference.
C.
From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 12 January 2015 17:01
To: Hondros, Constantine (ELS-AMS); basex-talk@mailman.uni-konstanz.de
Aha, of course, the all boolean. I did not try that, and will.
DB metadata I get via the GUI menu : Database -> Open and Manage
Cheers,
C.
From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 12 January 2015 16:14
To: Hondros, Constantine (ELS-AMS); basex-talk@mailman.uni-konstanz
Hello all,
I've pruned about half the documents of a +-5GB database using XQuery delete. I
then optimised the database using db:optimize.
The database metadata still shows the original number of documents, and the
overall db filesize remains roughly the same.
I was sort of hoping this pruning
A small request for your growing backlog ... it would be wonderful if your CSV
import wizard was able to accept multiple-character separators. We have oodles
of data in which the double at-sign '@@' signifies a field separator. No ...
don't ask me why!
Keep up the good work!
C.
__
XML comments ... of course ;-)
Thanks for your incredibly fast response, Christian.
-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 09 April 2014 23:58
To: Hondros, Constantine (ELS-AMS)
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk
Hi all,
I can't tell if this is a bug or not.
I commented out a line in my Return clause so that I could test performance. To
my surprise, the comment was interpreted as part of the literal response, and
the expression within the commented line was evaluated. If this is expected
behaviour, is
to approve it when I tried it once, and the habit stuck ;-)
Cheers,
Constantine
-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 09 April 2014 14:51
To: Dirk Kirsten
Cc: Hondros, Constantine (ELS-AMS); basex-talk@mailman.uni-konstanz.de
Subject: Re: [
I'm running out of memory (1.5 GB allocated) when querying for duplicate node
values over a fairly flat XML database of approximately 450 MB.
Can anyone suggest a more memory-efficient approach to framing this query than
iterating over distinct-values as I do below? I'm hoping that there are so
Hello all,
I'm scanning millions of XML records imported from CSV looking for instances of
'bib_rec_id' which are non-numeric. Which of these two is if-statements likely
to complete earlier?
for $a in (/csv/record/bib_rec_id)
return
if ($a castable as xs:integer) then ... blah
or
if
Thanks all,
Unfortunately this is legacy content – and there is an unbelievable amount of
it too.
So, I will probably pre-process the content and write the DTD info out into an
element or PI node.
org.basex.core.Command.setInput(org.xml.sax.InputSource is) looks like a
probable place to do it
Hi all,
I would really like to be able to query a large corpus of documents to get
names and counts of the DTDs which are declared in the (somewhat old-fashioned
now) DOCTYPE declaration:
Is there any way to get BaseX to preserve this information? Can I rewrite the
doctype declaration int
Many thanks for your swift attention to this problem, Christian! I will
download the snapshot and have a good test.
Cheers,
Constantine
-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 07 March 2014 12:52
To: Hondros, Constantine (ELS-AMS)
Cc: basex-talk
Hi Basex team,
I have discovered an apparent bug in the tar-reading support new in Basex
7.8.1. Roughly 50% of the files in my tars are not being imported. When the tar
is unpacked and imported from the file-system, 100% of files are imported.
@Christian: I will mail you privately the location
Thanks - I should have guessed it would be so easy with Basex.
Cheers,
Constantine
-Original Message-
From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 28 February 2014 11:53
To: Arve Gengelbach; Hondros, Constantine (ELS-AMS)
Cc: basex-talk@mailman.uni-konstanz.de
Subject
Hello all,
So, I am using Basex to inspect a gargantuan archive of XML records split over
thousands of .tar files (thanks Christian for the recent fix allowing me to
create a DB from tar input!).
In some cases, I would like to know which file, in which tar archive, was the
source for a given q
or.
If you are interested, I can make an example available to you.
Kind regards,
Constantine Hondros
-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 15 February 2014 14:03
To: Hondros, Constantine (ELS-AMS)
Cc: basex-talk@mailman.uni-konstanz.de
Subject:
Hello all,
Is there a known workaround for creating a db from tarred XML using the
Database -> New menu? I can always untar the tarfile first, but it would be
sort-of nice just to point at the tarfile and load.
TIA,
Constantine
Elsevier B.V. Registered Office
59 matches
Mail list logo