from:"Fabrice ETANCHAUD"

Re: [basex-talk] Reg : Taking time xmls to collections

2019-05-03 Thread Fabrice ETANCHAUD

Hi Chandra,

Performance is a very subjective matter… that deserves measurements.
Did you set UPDINDEX or AUTOOPTIMIZE options ?
In your opinion, what would a good performance be for a 1 Gb file ingestion ?

Did you try to compare with an embedded BaseX db (in standalone mode) ?

Best regards,
Fabrice Etanchaud


De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de chandra Sekhar
Envoyé : vendredi 3 mai 2019 09:04
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Reg : Taking time xmls to collections

Hai Basex Team,

We have already created one collection after that adding xmls to that 
collection,
This process is taking much time and the total xml are nearly 2 lakhs. How to 
get performance to add xmls to collections.

We are using basex(version 7.9) client server architecture with java technology 
to perform creating collection and adding xmls to collection.
below two ways we are using to add the xml to collection

1) Using FileInputStream(When file size is small,less than 1 gb)
2) Using ADD command(While file size is large,greater than 1 gb)

[basex-talk] FTP client module

2019-04-12 Thread Fabrice ETANCHAUD

Hi all !

I am happy to get back to basics, after a lot of SQL ;-)

On november 2017, Christian proposed to broadcast a FTP client module to the 
list,
I must have missed the link, I cannot find the module.

Is it still possible to get it ?

Thank you,

Best regards,
Fabrice

Re: [basex-talk] upper limits on storage; database admin

2019-01-03 Thread Fabrice ETANCHAUD

Hi Thufir,

You will find here [1] the BaseX current limitations.

The only limit I found was the maximum number of XML nodes in a collection.
But I never handled more than 100 Gb of content.
You can safely add documents by enabling the ADDCACHE option [2]; May the 
collection overflow, the operation will fail gracefully without corrupting it.
As indexes are mono-collection structures, it is all about dispatching your 
data wisely in collections.

And you are right, IMHO BaseX can fit in many places, and shines in prototyping 
!

Best regards,

[1] http://docs.basex.org/wiki/Statistics
[2] http://docs.basex.org/wiki/Options#ADDCACHE


-Message d'origine-
De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de thufir
Envoyé : mercredi 2 janvier 2019 22:39
À : BaseX
Objet : [basex-talk] upper limits on storage; database admin

What are the upper bounds to basex in size?  Assuming it's just text 
xml, gigabytes is quite a bit to my thinking.  At a certain, it's "big 
data" -- but how do you know when you're approaching that point?

Or, is the bottleneck more read/write and consistency problems?  What 
little I know of RDBMS is that master/slave can alleviate some bottlenecks.

To put this another way:  I'm so enthusiastic about basex that I'm 
having trouble finding a place it doesn't fit.  As you approach 
terabytes and beyond what dbadmin approaches are employed?



-Thufir

Re: [basex-talk] Bulk import (moving from eXist to BaseX)

2019-01-02 Thread Fabrice ETANCHAUD

My mistake Andreas, I completely forgot raw content...

By the way, Happy new year to all happy BaseX users, the BaseX team, and to 
you, Christian !

Best regards,
Fabrice

-Message d'origine-
De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de Christian Grün
Envoyé : mercredi 2 janvier 2019 17:05
À : Andreas Jung
Cc : BaseX
Objet : Re: [basex-talk] Bulk import (moving from eXist to BaseX)

Hi Andreas,

In BaseX, you can use the EXPORT command or the db:export function to
write all database contents to disk [1,2]. I know too little about
eXist-db; maybe they offer a similar feature?

After that, as Fabrice indicated, you can use the CREATE DB command to
create a database by pointing to that directory. If the database
contains non-XML resources, you may need to write a little XQuery
script or a BaseX command script that imports the data in the format
you prefer.

Hope this helps
Christian

[1] http://docs.basex.org/wiki/Commands#EXPORT
[2] http://docs.basex.org/wiki/Database_Module#db:export

On Tue, Jan 1, 2019 at 4:21 PM Andreas Jung  wrote:
>
> Hi there,
>
> consider moving my XML stuff for a particular project from eXist to BaseX.
>
> I have a full dump from eXist with all related content and I want to import 
> it 1:1
> into BaseX…is there a bulk loader or what is the preferred way to import an 
> existing folder structure?
> Of course I can create my own script what would import the data through 
> WebDAV but the dump has about
> 50.000 files.
>
> Andreas

Re: [basex-talk] Bulk import (moving from eXist to BaseX)

2019-01-02 Thread Fabrice ETANCHAUD

Hi Andreas,

The CREATE-DB command can create a collection from a directory's structure [1]

Best regards,
Fabrice ETANCHAUD

[1] : http://docs.basex.org/wiki/Commands#CREATE_DB


-Message d'origine-
De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de Andreas Jung
Envoyé : mardi 1 janvier 2019 16:22
À : BaseX
Objet : [basex-talk] Bulk import (moving from eXist to BaseX)

Hi there,

consider moving my XML stuff for a particular project from eXist to BaseX.

I have a full dump from eXist with all related content and I want to import it 
1:1 into BaseX…is there a bulk loader or what is the preferred way to import an 
existing folder structure?
Of course I can create my own script what would import the data through WebDAV 
but the dump has about
50.000 files.

Andreas

Re: [basex-talk] Perceived performance of REST API

2018-11-07 Thread Fabrice ETANCHAUD

Hi Sebastian,

You should switch to the REST-XQ interface,
There is a %rest :single annotation [1] dedicated to that problem.

Best regards,
Fabrice Etanchaud
CERFrance PCH

[1] http://docs.basex.org/wiki/RESTXQ



De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de Sebastian Zimmer
Envoyé : mercredi 7 novembre 2018 14:32
À : 'BaseX'
Objet : [basex-talk] Perceived performance of REST API


Hi,

I'm looking for a way to increase the *perceived* performance of the REST API.

On my web page there's a results list that is updated automatically when the 
text content of an input field is changed. This means that on each keystroke by 
the user, a GET request to the REST API api is sent to obtain the new results 
list.

One GET request takes about 800 ms, which is acceptable to me given the 
complexity of the query and the amount of data.

The problem is that when the user types faster than the queries are executed, 
the app starts to respond very slowly, because there are now several requests 
pending. BaseX seems to execute all of them in parallel and it may well take up 
to 14 seconds until it returns the results of the most recent search query.

Is there a way to tell BaseX to abort running queries (by a specific user) to 
privilege the most recent query (of this user)? Could this be done with the 
Jobs module? Or is there a better way to implement such a search?

Best regards,
Sebastian
--
Sebastian Zimmer
sebastian.zim...@uni-koeln.de<mailto:sebastian.zim...@uni-koeln.de>

Cologne Center for eHumanities<http://cceh.uni-koeln.de>
DH Center at the University of Cologne
@CCeHum<https://twitter.com/CCeHum>

Re: [basex-talk] Database Updates

2018-10-15 Thread Fabrice ETANCHAUD

Hi Dave,

Like you, because of my RDBMS background, I had to feel the ‘document’ paradigm 
in order to obtain good performance results.

I suggest you do not try to update your catalog document in place using XQuery 
Update facility.

A simple solution is to add each daily-update, and annotate it with a 
my-sort-value value you can sort on (the date in your use case).
When looking for a given SKU, use an index to obtain the list of entry 
elements, ordered by descending my-sort-value, and take the first item.
When you hit performance or storage limits, create a collection containing only 
the latest version of each entry element.

But as your data is not really document oriented, did you consider using JSON 
repositories like couchbase ?
You could choose to store each entry element in a separate document with SKU as 
key.
Coupled with just a little preprocessing step - that could be written easly in 
XQuery - transforming your entries element in a JSON  array of entry objects, 
you will obtain incredible performance.
Couchbase as a ‘SQL like’ language to query your bucket of documents called 
N1QL.

Best regards,
Fabrice


De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de Dave Kopecek
Envoyé : dimanche 14 octobre 2018 00:10
À : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Database Updates

Here are some tweaked examples & better problem definition:

Given a database created from catalog.xml below, what's the best way to update 
& add records received daily as in file daily-update.xml?

For every  node in daily-update.xml
-- If ENTRY/SKU found in catalog.xml replace the entire  node in 
catalog.xml with the one from daily-update.xml
-- If ENTRY/SKU not found in catalog.xml add the  node from 
daily-update.xml to catalog.xml

catalog.xml


   
  
 1
 Empire Burlesque
 Dylan
 10.90
  
  
 2
 Hide your heart
 Bonnie Tyler
 9.90
  
   


daily-update.xml


   
  
 1
 Empire Burlesque
 Bob Dylan
 29.99
 some value
  
  
 5
 Tupelo Honey
 Van Morrison
 8.20
  
   


catalog.xml after daily update:

   
  
 1
 Empire Burlesque
 Bob Dylan
 29.99
 some value
  
  
 2
 Hide your heart
 Bonnie Tyler
 9.90
  
  
 5
 Tupelo Honey
 Van Morrison
 8.20
  
   


Thanks,
-Dave

On Fri, Aug 17, 2018 at 11:27 AM Dave Kopecek 
mailto:dave.kope...@gmail.com>> wrote:
Hi All,

Given a database created from catalog.xml below. What's the best way to update 
& add records received daily as in file daily-update.xml ?

Coming from relational DBs & new to this. I ultimately need to script/automate 
this. Hoping there's some magic command I'm missing & looking for best way to 
approach the problem.

Thanks,
-Dave


catalog.xml


   
  
 1
 Empire Burlesque
 Dylan
 10.90
  
  
 2
 Hide your heart
 Bonnie Tyler
 9.90
  
   


daily-update.xml


   
  
 1
 19.90
 I wasn't here before the update
  
  
 1971
 Tupelo Honey
 Van Morrison
 8.20
  
   




--
DAVE KOPECEK OFFICE 607-431-8565 CELL 607-267-3449
6 CROSS STREET, DELHI NY 13753

Re: [basex-talk] .basex setup

2018-09-24 Thread Fabrice ETANCHAUD

Hi Jan,

Did you read that page ?

http://docs.basex.org/wiki/Options

Best regards,
Fabrice


-Message d'origine-
De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de kali...@centrum.cz
Envoyé : lundi 24 septembre 2018 13:33
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] .basex setup

Hi all,
I'm wondering with my colleague Radim how to setup settings in .basex file. 

We have not found any detailed documentation what the concrete field set.

We had problems with interrupting file upload to database so we played around 
and so changing the

PARALLEL from 8 to 60(!) helped. Also increasid CACHETIMEOUT and TIMEOUT for 
different reasons.

Anybody have detailed documentation of what we set up, what all it affects, 
what are the ranges of values, and such for these settings?

Regards,
Jan

Re: [basex-talk] dB:update()

2018-09-12 Thread Fabrice ETANCHAUD

Hi Michael,


IMHO I don't think it is the right way to handle data changes in a document 
oriented database.



An efficient way may be to add new versions as they come.



There is always a way to sort the related documents - sometimes with an 
attribute in the data,

or with a part of the filename. 

If not, you might have to build an index database containing the tuples 
 (because pre-id node is constant in a append-only 
db).

Then I would write a simple function(object_id) returning the top element in 
the versions' list ordered by descending version (using hof:top-k-by for 
example).



You can also split your data in two :

a big readonly database containing the data before one point in time (index 
already setup).

a light append-only database containing the data after that point in time 
(where index update is fast or even UPDINDEX option is set).

On schedule, you would construct a new readonly database aggregating the back 
and front data.

Note that with two (or even more !) databases, you would have to add the 
database name in the index tuple 



I had success with that update strategy when working with the EPO DOCDB 
collection 
(https://www.epo.org/searching-for-patents/data/bulk-data-sets/docdb.html#tab-2).

Thanks to Christian for giving me the right pointers when I needed to !



Hoping it helps,



Best regards,



Fabrice ETANCHAUD

De : @pyschny.de 
À : basex-talk@mailman.uni-konstanz.de
Sujet : [basex-talk] dB:update()
Date : 11/09/2018 16:09:01 CEST

I want to solve the following problem:
For $doc in $list-of-docs
detect differences in doc against the basex-db and add the changed records to 
the basex-db. 
After differences of each doc are added to the basex-dB create a new index for 
basex-dB which is required for the next $doc

How can I solve the problem that the added records are not visible for the 
index creation?
Michael

Re: [basex-talk] Strange index values with numerics

2018-08-28 Thread Fabrice ETANCHAUD

Hi all,

This constant's  name is somhow misleading,
Because it seems to contain the smallest positive value actually,
Not the biggest negative one [1] :

[1] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#MIN_VALUE

Hoping it helps,

Best regards,
Fabrice


-Message d'origine-
De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de Christian Grün
Envoyé : mardi 28 août 2018 09:07
À : Zachary N. Dean
Cc : BaseX
Objet : Re: [basex-talk] Strange index values with numerics

Hi Zack,

I helps indeed! I learnt that -Double.MAX_VALUE is smaller than
Double.MIN_VALUE in Java. The fix turned out to be pretty
straightforward [1]; a new stable snapshot is available [2].

Have fun,
Christian

[1] https://github.com/BaseXdb/basex/issues/1616
[2] http://files.basex.org/releases/latest/


On Mon, Aug 27, 2018 at 10:41 PM Zachary N. Dean  wrote:
>
> Hi,
>
> I was recently taking a look into the index data files (for reasons) and came 
> across something that I found strange...
>
>
>
> When numeric values are in nodes they are put into the index with min/max and 
> distinct token values, which is cool…
>
> What's strange is, when negative integer values are in text and attribute 
> nodes the index contains the minimum value correctly, but the maximum value 
> is '4.9E-324' ([0,0,0,0,0,0,0,1]).
>
> This doesn't seem to happen with positive values.
>
> Now, with small value ranges I assume this is okay, but with many values I 
> would imagine it could slow things down.
>
>
>
> Not sure if this is a bug or a feature, so I figured I'd bring it up.
>
>
>
> Here an example:
>
> 
>
>   -1
>
>   0
>
>   1
>
>   -5
>
>   -49000
>
>   2
>
>   3
>
>   
>
>   -1
>
>   1
>
> 
>
>
>
> I would have assumed that the index would see that element "d" has a min of 
> -5 and a max of -49000.
>
>
>
> Here the index infos:
>
>
>
> Elements
>
> - Structure: Hash
>
> - Entries: 8
>
>   g  2x, 2 distinct integers [-1, 1], leaf
>
>   e  2x, 2 distinct integers [2, 3], leaf
>
>   d  2x, 2 distinct integers [-5, 4.9E-324], leaf
>
>   c  1x, integer [1, 1], leaf
>
>   f  1x, leaf
>
>   r  1x
>
>   a  1x, integer [-1, 4.9E-324], leaf
>
>   b  1x, integer [0, 4.9E-324], leaf
>
>
>
> Attributes
>
> - Structure: Hash
>
> - Entries: 1
>
>   a  1x, integer [-1, 4.9E-324], leaf
>
>
>
>
>
> If it's a feature, then cool. Keep on rockin'! If not, then I hope this helps 
> a little.
>
>
>
> Thanks,
>
>
>
> Zack Dean

Re: [basex-talk] archive

2018-08-22 Thread Fabrice ETANCHAUD

Hi Vladimir,

If your question is : is there a compression option to reduce database files 
and use built-in index features ? the answer is no.
I can remember that BaseX use custom compression for text nodes. [1]

Christian, I hope you are doing well ?
Did I loose my mind or Did BaseX have a long time ago a compressed database 
option ?

Best regards,

[1] http://basex.org/2018/03/23/basex-9.0--the-spring-edition/


BaseX 


-Message d'origine-
De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de Christian Grün
Envoyé : mercredi 22 août 2018 17:08
À : Ветошкин Владимир
Cc : BaseX
Objet : Re: [basex-talk] archive

> I can compress xml-data and store it in db.
> But then how can I search inside that db using index? Is it possible?

I am not sure if I understand. How do you proceed? Could you possibly
give us a step-by-step explanation?

Re: [basex-talk] Add line-number function

2018-07-05 Thread Fabrice ETANCHAUD

As BaseX does not work on the XML textual representation, it might not be 
possible.


De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de ? ??
Envoyé : jeudi 5 juillet 2018 17:10
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Add line-number function



Hello, could the $err:line-number [1] variable help you ?

[1] http://docs.basex.org/wiki/XQuery_3.0#Try.2FCatch

Best regards,

Fabrice ETANCHAUD
cerfrancepch
No, $err:line-number show line number of xquery file.
I want this:

Example.xml ->
1: 
2:   
3:  text1
4:  text2
5:  text3
6:  text4
7:
8: 

Xquery ->
let $f := doc("example.xml")
let $e := $f/root/child[1]/grandchild[3]

let $line := line-number($e)

And I want get $line = 5 !

Re: [basex-talk] Add line-number function

2018-07-05 Thread Fabrice ETANCHAUD

Hello, could the $err:line-number [1] variable help you ?

[1] http://docs.basex.org/wiki/XQuery_3.0#Try.2FCatch

Best regards,

Fabrice ETANCHAUD
cerfrancepch


De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de ? ??
Envoyé : jeudi 5 juillet 2018 06:39
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Add line-number function

Hello!

Can you add to BaseX (as module) function "line-number" which retrieves the 
line number of the expression ?
Same as in eXist-db 
(http://exist-db.org/exist/apps/fundocs/view.html?uri=http://exist-db.org/xquery/util=java:org.exist.xquery.functions.util.UtilModule=true)
 or saxon 
(http://www.saxonica.com/html/documentation/functions/saxon/line-number.html).

it is very important for us because our users want see error line number in xml 
docs when our xquery scripts validate them.

Re: [basex-talk] basex csv import cmd

2018-07-04 Thread Fabrice ETANCHAUD

Hi Maike,

In order to automate non XML files importation,
You could write and run a command file (.BXS) [2] containing two things :

-  needed parsing options [1]

-  Create or add command

[1] http://docs.basex.org/wiki/Options#Parsing
[2] http://docs.basex.org/wiki/Commands

BXS files can easily be written in xml, so you can even automate the automation 
process,
By writing parametrized XQuery generating BXS XML running BaseX Commands...

Best regards,

Fabrice Etanchaud
Cerfrance PCH


De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de Kittelmann, Maike
Envoyé : mercredi 4 juillet 2018 11:31
À : 'basex-talk@mailman.uni-konstanz.de'
Objet : [basex-talk] basex csv import cmd

Dear BaseX users,


I often make use of importing files to BaseX via command line. For XML this 
works fine. Currently, I work with CSV-files that I import to BaseX via GUI.

I would like to automatize this process as I do with XML-Import files. 
Unfortunately, I didn't find any instructions so far how I can apply these same 
GUI CSV import settings to the import via command line. Is this possible?


Best regards,

Maike Kittelmann


--

Maike Kittelmann
Metadaten und Datenkonversion

Georg-August-Universität Göttingen
Niedersächsische Staats- und Universitätsbibliothek Göttingen
D-37070 Göttingen

Papendiek 14 (Historisches Gebäude, Raum 1.617)
+49 551 39-10249

kittelm...@sub.uni-goettingen.de
http://www.sub.uni-goettingen.de<https://email.gwdg.de/owa/redir.aspx?C=TxY71q4kfUCSjLXiZog64-dZmmdWU9EIVyC3M2WgfbXTx2JT4hHmMWOp4VihS16upJ2RNhJuxcE.=http%3a%2f%2fwww.sub.uni-goettingen.de>

Re: [basex-talk] Tracing query execution

2018-06-28 Thread Fabrice ETANCHAUD

Hi Iwan,

IMHO it is more a design issue than a tool issue.
If you need to know exactly where a boolean expression is decided,
You might have to implement a boolean algebra interpreter.
You could even describe your questions in xml format, to be interpreted by a 
recursive function against your hardware corpus.

That way you could implement rules like :
All ancestors are ‘and’ operators and my current node is ‘false’ => ‘false’
All ancestors are ‘or’ operators’ and my current node is ‘true’ => ‘true’
And detect exactly where your expression is decided.

This make me think of the MarkLogic stored query feature.

Sorry I just thought about that a few minutes,
I hope it helps,

Best regards,
Fabrice

De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de Iwan Briquemont
Envoyé : mercredi 27 juin 2018 23:04
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Tracing query execution

Hello,

I use BaseX mostly for boolean queries, I have a hardware database and I check 
if specific hardware support features I need.

It works great but when something is not as expected, it's hard to find out the 
reason.

E.g. given an expression like:

$some-value = $some-list and (custom:predicate() or $x > $y)

I would like to know why it's true or false.
For now I add trace() manually on the subexpressions, rerun, add trace to more 
specific parts, etc.

Ideally I would want to break down the query execution so it shows the values 
of subparts of the query to pinpoint why it is false, e.g. have an output like:

$some-value = $some-list and (custom:predicate() or $x > $y) -> false()

$some-value = $some-list -> true()

$some-value -> 1

$some-list -> (1, 2, ..., 10)

custom:predicate() or $x > $y -> false()

custom:predicate() -> false()

... # It should also go inside the function

$x > $y -> false()

$x -> 10

$y -> 11


Any ideas how it could be achieved?
Looking at the code, maybe a debug() (like iter() or item()) method could be 
added to Expr objects which would trace the expression query, file, line and 
the result of the expression (or probably the first x characters of the result 
to avoid huge output)? With an xquery function which would trigger it.

I also thought of modifying the query programmatically to add trace() calls but 
that seems overly complicated.

Best regards,
Iwan

Re: [basex-talk] Full-Text

2018-06-25 Thread Fabrice ETANCHAUD

Hi Vladimir,

So what about storing those specific files in an ad-hoc collection, and only 
index that one ?
Is there a partition pattern you could use to split these specific files in 
several collections of medium size ?

If you do not need fulltext capabilities, you can use incremental text() or 
attribute indices.
Do you need to index only a specific set of text() or attributes ? If so, you 
can also specify a list of element or attribute names to be indexed, and by the 
way reduce the time needed to reindex.

Best regards,
Fabrice

De : Ветошкин Владимир [mailto:en-tra...@yandex.ru]
Envoyé : lundi 25 juin 2018 09:42
À : Fabrice ETANCHAUD; BaseX
Objet : Re: [basex-talk] Full-Text

Hi, Fabrice!
Thank you.

All databases constantly change.That is why there is no way to single out "a 
big readonly collection" :(
Maybe it is possible to use some other incremental indexes?
I have to index specific xml-files, not all files in database.

21.06.2018, 17:16, "Fabrice ETANCHAUD" :

Hi Vladimir,



I don’t think there is something like a incremental full text index for the 
moment [1].

As index is per collection, the recommanded way shall be to split your data in 
two collections :

-  A big readonly collection of all the past updates, indexed once

-  A small/medium sized collection whom full text index can be 
recreated in an acceptable time after each update.

At the end of a predefined time period, you have to add the live collection to 
the readonly one, reindex it, and truncate the live one.



Best regards from France,

Fabrice Etanchaud



[1] http://docs.basex.org/wiki/Indexes#Updates









De : BaseX-Talk 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>]
 De la part de  
Envoyé : jeudi 21 juin 2018 16:02
À : BaseX
Objet : [basex-talk] Full-Text



Hi, everyone!



Is there any way to index only imported xml-files?

Now, when I import xml-files the full-text index is deleted.

After importing I recreate whole full-text index and it takes too much time :(



--

С уважением,

Ветошкин Владимир Владимирович




--
С уважением,
Ветошкин Владимир Владимирович

Re: [basex-talk] Full-Text

2018-06-21 Thread Fabrice ETANCHAUD

Hi Vladimir,

I don’t think there is something like a incremental full text index for the 
moment [1].
As index is per collection, the recommanded way shall be to split your data in 
two collections :

-  A big readonly collection of all the past updates, indexed once

-  A small/medium sized collection whom full text index can be 
recreated in an acceptable time after each update.
At the end of a predefined time period, you have to add the live collection to 
the readonly one, reindex it, and truncate the live one.

Best regards from France,
Fabrice Etanchaud

[1] http://docs.basex.org/wiki/Indexes#Updates




De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de  
Envoyé : jeudi 21 juin 2018 16:02
À : BaseX
Objet : [basex-talk] Full-Text

Hi, everyone!

Is there any way to index only imported xml-files?
Now, when I import xml-files the full-text index is deleted.
After importing I recreate whole full-text index and it takes too much time :(

--
С уважением,
Ветошкин Владимир Владимирович

Re: [basex-talk] Usage of doc's in BaseX

2018-06-14 Thread Fabrice ETANCHAUD

Hello Bram,

IMHO the main argument for data/index separation is the ease of index 
recreation, and the ease of reindexation of your index database.
Is there still a need for ad hoc indexing, now that BaseX let us index only a  
node name selection ? I guess you need to index  computed values ?

For current BaseX limitations, you will find them in [1], but you might have 
already read that page.
I hit the database node number limit once working with the European Patent 
Office DOCDB collection. So I had to set up a database naming politics to 
dispatch the documents.

Hoping it helps,

Best regards,

Fabrice Etanchaud
Senior Data Specialist
CERFrance PCH

[1] http://docs.basex.org/wiki/Statistics


De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part 
de Bram Vanroy
Envoyé : jeudi 14 juin 2018 10:47
À : BaseX
Objet : [basex-talk] Usage of doc's in BaseX

Dear BaseX team

I am planning an update on our previous custom indexing system [1]. But to do 
this I have a couple of questions. The major ones will be how to write an 
efficient custom indexing query in XQuery, but that'll be for another email. 
(In fact, we have a dual indexing system, so two index files per main file.) 
For now I am mainly interested in different documents in a single databases, 
and the doc() functionality.

Intuitively, I'd say that documents that are related to each other should be 
put in the same database. E.g. one database with different documents for 
plants, and one database with different documents for animals. But when I was 
scrolling through the documentation of BaseX, I noticed that when creating 
custom indices you do not put those in the same db as the original content, so 
you have on database for the content and one for the index [2]. Is this the way 
it's typically done?

More generally, the questions that I have are the following:

* What is the actual difference in BaseX between using separate 
documents in a single database, or using different databases all together?

* Is there a performance difference when I would put my index file in 
the same database as the content, vs. when using different databases altogether?

* What is the max allowed size for a document in a database and a 
database itself respectively? (I have files that are 100's of GB in size. It 
might not be plausible to have a file and its index file in the same database.)


Thank you in advance
Kind regards

Bram Vanroy
Doctoral Research at Ghent University, Belgium
https://www.lt3.ugent.be/people/bram-vanroy/


[1] https://biblio.ugent.be/publication/8534144
[2] http://docs.basex.org/wiki/Indexes#Custom_Index_Structures

Re: [basex-talk] database creation baseX.8.6.7

2018-04-23 Thread Fabrice ETANCHAUD

Dear Christian,
I am a curious man,
Is the Pending Update List bypassed in some way when using addcache ?

Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Giuseppe 
Celano
Envoyé : lundi 23 avril 2018 16:53
À : Christian Grün
Cc : BaseX
Objet : Re: [basex-talk] database creation baseX.8.6.7

Yes, with "addcache" it works! Thanks to both of you.


> On Apr 23, 2018, at 3:56 PM, Christian Grün  wrote:
> 
> Hi Giuseppe,
> 
> Apart rom Fabrice’s helpful hints, you could try to enable the
> ADDCACHE option [1].
> 
> Cheers,
> Christian
> 
> [1] http://docs.basex.org/wiki/Database_Module#db:create
> 
> On Mon, Apr 23, 2018 at 3:03 PM, Giuseppe Celano
>  wrote:
>> Hi All,
>> 
>> I can create a database via the GUI, but if I use db:create [1] I get the 
>> message "out of main memory": why? Thanks!
>> 
>> db:create("myDB",
>> "sourceDirectory",
>> "destinationDirectory",
>> map{"ftindex": true(), "language": false()}
>> )
>> 
>> Best,
>> Giuseppe
>> 
>

Re: [basex-talk] database creation baseX.8.6.7

2018-04-23 Thread Fabrice ETANCHAUD

Hi Giuseppe,

I think it is because you are using xquery and your backfile is huge with 
respect to your java memory settings.
Db:create function is an updating function, so all your inserted data has to go 
to the pending update list (see [1] for explanation) before database creation, 
causing out of memory.
If you want to automate database creation/update you could have a look at the 
xml command files, a way to create a batch of BaseX commands ([2]).
In my use cases, I found it easier to write XQuery queries, which generate XML 
Command files, to get rid of memory concerns.

Hoping it helps,

Best regards,

 [1] : http://docs.basex.org/wiki/XQuery_Update#Pending_Update_List

[2] : http://docs.basex.org/wiki/Commands


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Giuseppe 
Celano
Envoyé : lundi 23 avril 2018 15:03
À : BaseX
Objet : [basex-talk] database creation baseX.8.6.7

Hi All,

I can create a database via the GUI, but if I use db:create [1] I get the 
message "out of main memory": why? Thanks! 

db:create("myDB",
"sourceDirectory",
"destinationDirectory",
map{"ftindex": true(), "language": false()}
 )

Best,
Giuseppe

[basex-talk] TR: Marklogic XXE and XML Bomb prevention

2018-03-14 Thread Fabrice ETANCHAUD

Hello,

I found this MarkLogic post interesting,
So I forward it to the BaseX users.
I do not remember loading data I did not trust, but did somebody experience 
this kind of issue ?

Best regards,
Fabrice Etanchaud

De : general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] De la part de Marcel de Kleine
Envoyé : mercredi 14 mars 2018 13:43
À : gene...@developer.marklogic.com
Objet : [MarkLogic Dev General] Marklogic XXE and XML Bomb prevention

Hello,

We have noticed Marklogic is vulnerable to xxe (entity expansion) and xml bomb 
attacks. When loading an malicious document using xdmp:document-insert it won't 
catch these and cause either loading of unwanted external documents (xxe) and 
lockup of the system (xml bomb).

For example, if I load this document :


   " >]>


The file test.xml gets nicely added to the xml document.

See OWASP and others for examples.

This is clearly a xml processing issue so the question is : can we disable 
this? And if so, on what levels would this be possible. Best should be 
system-wide.
( And if you cannot disable this, I think this is something ML should address 
immediately.

Thank you in advance,
Marcel de Kleine, EPAM

Marcel de Kleine
Senior Software Engineer

Office: +31 20 241 6134 x 30530<tel:+31%2020%20241%206134;ext=30530>   Cell: 
+31 6 14806016<tel:+31%206%2014806016>   Email: 
marcel_de_kle...@epam.com<mailto:marcel_de_kle...@epam.com>
Delft, Netherlands   epam.com<http://www.epam.com>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) 
to which it is addressed and contains information that is legally privileged 
and confidential. If you are not the intended recipient, or the person 
responsible for delivering the message to the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this 
communication is strictly prohibited. All unintended recipients are obliged to 
delete this message and destroy any printed copies.

Re: [basex-talk] One document per database or multiple?

2018-02-06 Thread Fabrice ETANCHAUD

Jonathan, in my humble opinion, here are the main reasons you may need several 
collections :


-  FullText indexing in several languages (because language is 
collection wide) : a per language partition of your data

-  Size (usually in number of nodes) limitation

-  Huge updates : a read only backlog collection + a read/write front 
collection of fresh data + queries tailored to read both collections.

Best regards,
And maybe good night ?

Fabrice Etanchaud

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Jonathan Robie
Envoyé : mardi 6 février 2018 23:31
À : BaseX
Objet : [basex-talk] One document per database or multiple?

If I have a set of related documents (a text, a lexicon, frequency counts, a 
discourse analysis, etc), how should I decide when to put more than one 
document in a single database at different paths, as opposed to putting one 
document in each database?

When I create a database from the GUI, it seems to prefer one document per 
database.  Should I take that as a hint?

Jonathan

Re: [basex-talk] how to declare an updating function parameter in a high-order function signature ?

2018-02-06 Thread Fabrice ETANCHAUD

Thank you Christian.

There is something faster than BaseX : its support !

Best regards,

-Message d'origine-
De : Christian Grün [mailto:christian.gr...@gmail.com] 
Envoyé : mardi 6 février 2018 15:44
À : Fabrice ETANCHAUD
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] how to declare an updating function parameter in a 
high-order function signature ?

Hi Fabrice,

I believe it should be sufficient to prepend the "updating" keyword before your 
function call [1]:

  updating $builder(...)

Greetings from the (yes, currently) sunny Lake Constance, Christian

[1] http://docs.basex.org/wiki/XQuery_Update#User-Defined_Functions



On Tue, Feb 6, 2018 at 3:37 PM, Fabrice ETANCHAUD <fetanch...@pch.cerfrance.fr> 
wrote:
> Dear all,
>
>
>
> I cannot find a way to tell the xquey engine that a function parameter 
> of my high-order function is updating :
>
>
>
> declare %updating function etebac:build-updating($file as xs:string, 
> $builder as %updating function(*) ) {
>
>   for tumbling window $line in (file:read-text-lines($file))
>
>   start $start-item when etebac:record-header($start-item) = '01'
>
>   end $end-item when etebac:record-header($end-item) = '07'
>
>   let $doc := etebac:build-json($line)
>
>   let $start := format-date(xs:date($doc/date-ancien-solde),
> '[Y0001][M01][D01]')
>
>   let $stop := format-date(xs:date($doc/date), '[Y0001][M01][D01]')
>
>   let $doc-id := string-join(($doc/rib, $start, $stop), '-')
>
>   let $doc-path := string-join(
>
> ($doc/(banque, guichet, compte), $start || '-' || $stop), '/')
>
>   let $out-file := concat($doc-path, '.json')
>
>   where count($doc/operations/_) > 0
>
>   return
>
> $builder(
>
>   $doc-path,
>
>   element json {
>
> attribute type { 'object' },
>
> element meta {
>
>   attribute type { 'object' },
>
>   element type { 'releve-operations' },
>
>   element id { $doc-id }
>
> },
>
> element data {
>
>   $doc/(*|@*)
>
> }
>
>   }
>
>)
>
> };
>
>
>
> I have a [XPTY0004] Function must not be updating: %updating
> function($path,$doc) a... error.
>
>
>
> The outer call is  :
>
>
>
> import module namespace etebac =
> 'http://pch.cerfrance.fr/basex/modules/etebac' at 'etebac.xqm';
>
>
>
> etebac:build-updating('D:\edi\banque\input\10278\10278_CMLACO_20160701
> .183',
>
> db:replace('test', ?, ?)
>
> )
>
>
>
> What have I to do to tag my $builder function reference as updating in 
> the main body ?
>
>
>
> Best regards from the snowy french atlantic coast,
>
> Fabrice

[basex-talk] how to declare an updating function parameter in a high-order function signature ?

2018-02-06 Thread Fabrice ETANCHAUD

Dear all,

I cannot find a way to tell the xquey engine that a function parameter of my 
high-order function is updating :

declare %updating function etebac:build-updating($file as xs:string, $builder 
as %updating function(*) ) {
  for tumbling window $line in (file:read-text-lines($file))
  start $start-item when etebac:record-header($start-item) = '01'
  end $end-item when etebac:record-header($end-item) = '07'
  let $doc := etebac:build-json($line)
  let $start := format-date(xs:date($doc/date-ancien-solde), 
'[Y0001][M01][D01]')
  let $stop := format-date(xs:date($doc/date), '[Y0001][M01][D01]')
  let $doc-id := string-join(($doc/rib, $start, $stop), '-')
  let $doc-path := string-join(
($doc/(banque, guichet, compte), $start || '-' || $stop), '/')
  let $out-file := concat($doc-path, '.json')
  where count($doc/operations/_) > 0
  return
$builder(
  $doc-path,
  element json {
attribute type { 'object' },
element meta {
  attribute type { 'object' },
  element type { 'releve-operations' },
  element id { $doc-id }
},
element data {
  $doc/(*|@*)
}
  }
   )
};

I have a [XPTY0004] Function must not be updating: %updating 
function($path,$doc) a... error.

The outer call is  :

import module namespace etebac = 'http://pch.cerfrance.fr/basex/modules/etebac' 
at 'etebac.xqm';

etebac:build-updating('D:\edi\banque\input\10278\10278_CMLACO_20160701.183',
db:replace('test', ?, ?)
)

What have I to do to tag my $builder function reference as updating in the main 
body ?

Best regards from the snowy french atlantic coast,
Fabrice

Re: [basex-talk] How to send Mails

2018-02-05 Thread Fabrice ETANCHAUD

Hi Navin,

You can use the mailing list archive to search for undocumented information.
This Christian’s answer will help you :

https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg08412.html

Best regards,
Fabrice


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Navin Rawat
Envoyé : lundi 5 février 2018 15:41
À : BaseX
Objet : [basex-talk] How to send Mails

Hi Team,

Is there any way to send mails through BaseX to the users?

Regards,
Navin

Re: [basex-talk] Bulk adding with "add" in client

2017-12-19 Thread Fabrice ETANCHAUD

Hi Robert,

CREATEFILTER may be the option you are looking for :

http://docs.basex.org/wiki/Options#CREATEFILTER

Best regards,
Fabrice Etanchaud
Cerfrance.pch

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Robert Crews
Envoyé : mardi 19 décembre 2017 03:54
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Bulk adding with "add" in client

With a database created and opened, I can recursively bulk add the all the 
*.xml files in a directory called /opt/sources With this command

 > add /opt/sources

http://docs.basex.org/wiki/Commands#ADD

In the GUI, I can change the pattern to, for example, "*.dita*", to load all 
the *.dita and *.ditamap files instead of the *.xml files.

Is there a way I can define a glob pattern for bulk loads using the add command 
or some other command from the client?

Thanks,
Robert

Re: [basex-talk] Update/Delete JSON

2017-11-15 Thread Fabrice ETANCHAUD

Hi all,

I think we  should make the difference between memory and persisted JSON 
representations.

From what I know, only a JSON document representation will be persisted,
And map representations are only memory ones.

Just use the map (or xquery starting v9) format option at parsing time.

Best regards,

Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de 
m.vankeu...@utwente.nl
Envoyé : mercredi 15 novembre 2017 08:30
À : wray.john...@gmail.com; christian.gr...@gmail.com
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Update/Delete JSON

Christian,

The JSON Module suggests that JSON objects are represented as XML-documents 
with a certain structure, not as immutable maps. Is there support in BaseX for 
json:parse and json:serialize for parsing JSON in textual form into immutable 
maps and back to JSON in textual form?

Kind regards,
Maurice van Keulen

--

--

Dr.Ir. M. van Keulen - Associate Professor, Data Management Technology

Univ. of Twente, Dept of EEMCS, POBox 217, 7500 AE Enschede, Netherlands

Email: m.vankeu...@utwente.nl, Phone: +31 
534893688, Fax: +31 534892927

Room: ZI 2013, WWW: http://www.cs.utwente.nl/~keulen

On 13 Nov 2017 23:31 +0100, Christian Grün 
>, wrote:

In XQuery, JSON objects are represented as immutable maps. Please
check out the documentations to learn how these data structures can be
changed [1,2].
___

[1] http://docs.basex.org/wiki/Map_Module
[2] https://www.w3.org/TR/xpath-functions-31/#maps-and-arrays

On Sun, Nov 12, 2017 at 3:42 AM, E. Wray Johnson 
> wrote:

What is the best way to update JSON objects (only some object values) by unique 
id?

What is the best way to delete JSON objects by unique id?

Wray Johnson
(m) 704-293-9008

Re: [basex-talk] TR: Options for creating database...

2017-10-20 Thread Fabrice ETANCHAUD

Which version do you use ?
It seems to me we already heard about that issue, and it was a display issue in 
 a previous version.
But I can be wrong.

Best regards,

De : France Baril [mailto:france.ba...@architextus.com]
Envoyé : vendredi 20 octobre 2017 13:30
À : Fabrice ETANCHAUD
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] TR: Options for creating database...

Ok so the DB gets created. I don't expect to run into issue with large 
documents since our project is many small snippets.

However, I run into another issue: I go into the GUI, I see in Manage database 
> Information that autooptimize is set to true. I modify a document through a 
webdav connection. I look in the GUI. My Autooptimize is set to false. I 
thought that autooptimize would be a persistent value. If I open the db and 
force optimization, then the autooptimize is set back to true. The 'auto' part 
of the expression seems to not apply to real life.

On Fri, Oct 20, 2017 at 11:47 AM, Fabrice ETANCHAUD 
<fetanch...@pch.cerfrance.fr<mailto:fetanch...@pch.cerfrance.fr>> wrote:


De : Fabrice ETANCHAUD
Envoyé : vendredi 20 octobre 2017 11:46
À : 'France Baril'
Objet : RE: [basex-talk] Options for creating database...

The XQuery will succeed if you remove the INDENT option (serialization), and 
lower case all your option names.

You are right, you may not have a memory overflow issue on creation.
But I had the problem with db:add with several huge files, before I switched to 
BXS.

Best regards,
Fabrice

De : France Baril [mailto:france.ba...@architextus.com]
Envoyé : vendredi 20 octobre 2017 11:36
À : Fabrice ETANCHAUD
Cc : BaseX
Objet : Re: [basex-talk] Options for creating database...

In db:create I don't think the issue is memory. I get: [bxerr:BASX0002] Unknown 
database option 'UPDINDEX'.

My function is:


let $options := map:merge((
  map:entry("CHOP", false()),
  map:entry('INDENT', false()),
  map:entry('STRIPNS', false()),
  map:entry('INTPARSE', true()),
  map:entry('DTD', false()),
  map:entry('XINCLUDE', false()),
  map:entry('UPDINDEX', true()),
  map:entry('AUTOOPTIMIZE', true())
   ))
   return (
  db:create($db-name, $src-folder, (), $options),
  db:output(done)

   )





On Fri, Oct 20, 2017 at 11:29 AM, Fabrice ETANCHAUD 
<fetanch...@pch.cerfrance.fr<mailto:fetanch...@pch.cerfrance.fr>> wrote:
Bonjour France,

Didn’t you find Updindex and autooptimize options in the ‘Options’  panel of 
the db creation window ?

Db:create last parameter is the place to put all your options :

http://docs.basex.org/wiki/Database_Module#db:create

But the Pending update list may overflow memory during db creation/update.

You should definitively have a look at the BaseX Scripts (BXS).
You can declare a batch of commands in XML, and ask BaseX to run it :

http://docs.basex.org/wiki/Commands#Command_Scripts

This is the way to set options before invoking the CREATE-DB command :

http://docs.basex.org/wiki/Commands#SET

I usually write XQuery to generate a BXS that will do the job.

Cordialement,

Fabrice
CERFrance Poitou-Charentes

De : 
basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>
 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>]
 De la part de France Baril
Envoyé : vendredi 20 octobre 2017 10:44
À : BaseX
Objet : [basex-talk] Options for creating database...

Hi, I usually create dbs using the gui. I'm now trying to create asb without it 
to be able to set the updindex and autooptimize options, which I can't find in 
the GUI.

I couldn't figure how to set them out using db:create either, so I switch to 
command line.  Now I'm feeling dumb, I can't find how to set up 
parsing/indexing options for the command line. I'm looking at: 
http://docs.basex.org/wiki/Command-Line_Options. I found -sindent and -wchop 
only.

What am I missing?

Here are all the options that I'd want to set:


   let $options := map:merge((
  map:entry("CHOP", false()),
  map:entry('INDENT', false()),
  map:entry('STRIPNS', false()),
  map:entry('INTPARSE', true()),
  map:entry('DTD', false()),
  map:entry('XINCLUDE', false()),
  map:entry('UPDINDEX', true()),
  map:entry('AUTOOPTIMIZE', true())
   ))


I don't want to set UPDINDEX and AUTOOPTIMIZE in .basex because I only want 
them to be true() on one of my DBs. Other DBs should remain as is. 

--
France Baril
Architecte documentaire / Documentation architect
france.ba...@architextus.com<mailto:france.ba...@architextus.com>



--
France Baril
Architecte documentaire / Documentation architect
france.ba...@architextus.com<mailto:france.ba...@architextus.com>



--
France Baril
Architecte documentaire / Documentation architect
france.ba...@architextus.com<mailto:france.ba...@architextus.com>

[basex-talk] TR: Options for creating database...

2017-10-20 Thread Fabrice ETANCHAUD



De : Fabrice ETANCHAUD
Envoyé : vendredi 20 octobre 2017 11:46
À : 'France Baril'
Objet : RE: [basex-talk] Options for creating database...

The XQuery will succeed if you remove the INDENT option (serialization), and 
lower case all your option names.

You are right, you may not have a memory overflow issue on creation.
But I had the problem with db:add with several huge files, before I switched to 
BXS.

Best regards,
Fabrice

De : France Baril [mailto:france.ba...@architextus.com]
Envoyé : vendredi 20 octobre 2017 11:36
À : Fabrice ETANCHAUD
Cc : BaseX
Objet : Re: [basex-talk] Options for creating database...

In db:create I don't think the issue is memory. I get: [bxerr:BASX0002] Unknown 
database option 'UPDINDEX'.

My function is:


let $options := map:merge((
  map:entry("CHOP", false()),
  map:entry('INDENT', false()),
  map:entry('STRIPNS', false()),
  map:entry('INTPARSE', true()),
  map:entry('DTD', false()),
  map:entry('XINCLUDE', false()),
  map:entry('UPDINDEX', true()),
  map:entry('AUTOOPTIMIZE', true())
   ))
   return (
  db:create($db-name, $src-folder, (), $options),
  db:output(done)

   )





On Fri, Oct 20, 2017 at 11:29 AM, Fabrice ETANCHAUD 
<fetanch...@pch.cerfrance.fr<mailto:fetanch...@pch.cerfrance.fr>> wrote:
Bonjour France,

Didn’t you find Updindex and autooptimize options in the ‘Options’  panel of 
the db creation window ?

Db:create last parameter is the place to put all your options :

http://docs.basex.org/wiki/Database_Module#db:create

But the Pending update list may overflow memory during db creation/update.

You should definitively have a look at the BaseX Scripts (BXS).
You can declare a batch of commands in XML, and ask BaseX to run it :

http://docs.basex.org/wiki/Commands#Command_Scripts

This is the way to set options before invoking the CREATE-DB command :

http://docs.basex.org/wiki/Commands#SET

I usually write XQuery to generate a BXS that will do the job.

Cordialement,

Fabrice
CERFrance Poitou-Charentes

De : 
basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>
 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>]
 De la part de France Baril
Envoyé : vendredi 20 octobre 2017 10:44
À : BaseX
Objet : [basex-talk] Options for creating database...

Hi, I usually create dbs using the gui. I'm now trying to create asb without it 
to be able to set the updindex and autooptimize options, which I can't find in 
the GUI.

I couldn't figure how to set them out using db:create either, so I switch to 
command line.  Now I'm feeling dumb, I can't find how to set up 
parsing/indexing options for the command line. I'm looking at: 
http://docs.basex.org/wiki/Command-Line_Options. I found -sindent and -wchop 
only.

What am I missing?

Here are all the options that I'd want to set:


   let $options := map:merge((
  map:entry("CHOP", false()),
  map:entry('INDENT', false()),
  map:entry('STRIPNS', false()),
  map:entry('INTPARSE', true()),
  map:entry('DTD', false()),
  map:entry('XINCLUDE', false()),
  map:entry('UPDINDEX', true()),
  map:entry('AUTOOPTIMIZE', true())
   ))


I don't want to set UPDINDEX and AUTOOPTIMIZE in .basex because I only want 
them to be true() on one of my DBs. Other DBs should remain as is. 

--
France Baril
Architecte documentaire / Documentation architect
france.ba...@architextus.com<mailto:france.ba...@architextus.com>



--
France Baril
Architecte documentaire / Documentation architect
france.ba...@architextus.com<mailto:france.ba...@architextus.com>

Re: [basex-talk] Options for creating database...

2017-10-20 Thread Fabrice ETANCHAUD

Bonjour France,

Didn’t you find Updindex and autooptimize options in the ‘Options’  panel of 
the db creation window ?

Db:create last parameter is the place to put all your options :

http://docs.basex.org/wiki/Database_Module#db:create

But the Pending update list may overflow memory during db creation/update.

You should definitively have a look at the BaseX Scripts (BXS).
You can declare a batch of commands in XML, and ask BaseX to run it :

http://docs.basex.org/wiki/Commands#Command_Scripts

This is the way to set options before invoking the CREATE-DB command :

http://docs.basex.org/wiki/Commands#SET

I usually write XQuery to generate a BXS that will do the job.

Cordialement,

Fabrice
CERFrance Poitou-Charentes

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de France Baril
Envoyé : vendredi 20 octobre 2017 10:44
À : BaseX
Objet : [basex-talk] Options for creating database...

Hi, I usually create dbs using the gui. I'm now trying to create asb without it 
to be able to set the updindex and autooptimize options, which I can't find in 
the GUI.

I couldn't figure how to set them out using db:create either, so I switch to 
command line.  Now I'm feeling dumb, I can't find how to set up 
parsing/indexing options for the command line. I'm looking at: 
http://docs.basex.org/wiki/Command-Line_Options. I found -sindent and -wchop 
only.

What am I missing?

Here are all the options that I'd want to set:


   let $options := map:merge((
  map:entry("CHOP", false()),
  map:entry('INDENT', false()),
  map:entry('STRIPNS', false()),
  map:entry('INTPARSE', true()),
  map:entry('DTD', false()),
  map:entry('XINCLUDE', false()),
  map:entry('UPDINDEX', true()),
  map:entry('AUTOOPTIMIZE', true())
   ))


I don't want to set UPDINDEX and AUTOOPTIMIZE in .basex because I only want 
them to be true() on one of my DBs. Other DBs should remain as is. 

--
France Baril
Architecte documentaire / Documentation architect
france.ba...@architextus.com

Re: [basex-talk] OutOfMemoryError at Query#more()

2017-09-22 Thread Fabrice ETANCHAUD

Be warned : by using XQuery and BaseX, you are going to feel your coworkers’ 
fear for your new gain of productivity !
Like your management’s fear for a such powerful and underrated technology ! ;-)

Best regards,
Fabrice



De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Simon 
Chatelain
Envoyé : vendredi 22 septembre 2017 16:45
À : BaseX
Objet : Re: [basex-talk] OutOfMemoryError at Query#more()

Hello,

Excellent, thank you very much.
It does work, and quite fast it seems.

Now I'll go and read some documentation on xquery...

Merci encore, et bon week-end

Simon

On 22 September 2017 at 14:58, Fabrice ETANCHAUD 
<fetanch...@pch.cerfrance.fr<mailto:fetanch...@pch.cerfrance.fr>> wrote:
Bonjour à nouveau, Simon,

I think that tumbling windows could be of great help in your use case :

Let consider the following test db :


1.   Creation

db:create(‘test’)


2.   Documents insertion (in @ts descending order to check that the 
solution is working whatever the document physical order)

for $i in 1 to 100
let $ts := current-dateTime() + xs:dayTimeDuration('PT'||(100-$i+1)||'S')
let $flag := random:integer(2)
return
  db:add(
'test',

  {$flag}
,
'notif' || $i || '.xml')

Then the following query should do the job :

for tumbling window $i in sort(
  db:open('test'),
  (),
  function($doc) {
$doc/notif/@ts/data()
  })
start $s when fn:true()
end $e next $n when $e/notif/flag != $n/notif/flag
return
  $i[1]

It iterate on the sorted documents (by ascending @ts),
And output the first document of each monotonic flag group.

Hoping I did it right,
Best regards,

Fabrice
CERFrance Poitou-Charentes

De : 
basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>
 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>]
 De la part de Simon Chatelain
Envoyé : vendredi 22 septembre 2017 13:32
À : BaseX
Objet : Re: [basex-talk] OutOfMemoryError at Query#more()

Bonjour Fabrice,

Thanks for the suggestion. I did try that (sending a query for each document), 
and it does work … sort of. Performance wise, it's really slow even if the 
database is fully optimized.

As for writing my process in xquery, that’s a good question. Honestly I don’t 
know as I am quite new at xquery, I lack the expertise.

I’ll try to give more detail about what I am trying to achieve.

In my database I have a series of XML documents, which, once really simplified, 
look like that.


  0


  0


  0

...

  1



  0


  0


  0

...

  1


What I need to get is:
The first XML document (first as in smallest @ts value)
Then the next document with 1 (again next in the @ts order)
Then the next document with 0
And so on…

That would be the documents highlighted in red in the above example.
Roughly only 1 out of 1000 documents has 1

I tried several approaches to do that, but the faster one I found is to iterate 
through all documents with a very simple xquery and keep only the ones I need,
for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d
 Another approach was to first select all documents with 1
for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 
return $d
then for each of those get the next document
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and 
$d/@ts > ‘[ts of previous document]’ return $d)[1]

Or select the first document,
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d)[1]
then query the next
 (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 
and $d/@ts > ‘[ts of previous document]’ return $d)[1]
And the next…
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and 
$d/@ts > ‘[ts of previous document]’ return $d)[1]
And so on.

But none of those is as fast as the first one, and then I hit this OutOfMemory 
issue.

So if there is a way to rewrite all that process in xquery that could be an 
option worth trying, or if there is a more efficient way to write the query
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and 
$d/@ts > ‘[ts of previous document]’ return $d)[1]
That could also solve my problem.

Regards

Simon



On 22 September 2017 at 09:53, Fabrice ETANCHAUD 
<fetanch...@pch.cerfrance.fr<mailto:fetanch...@pch.cerfrance.fr>> wrote:
Bonjour  Simon,

I would send a query for each document,
externalizing the loop in java.

A question : could you process be written in xquery ? That way you might not 
face memory overflow.

Best regards,
Fabrice Etanchaud
CERFrance Poitou-Charentes

De : 
basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>
 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>]
 De la part de

Re: [basex-talk] OutOfMemoryError at Query#more()

2017-09-22 Thread Fabrice ETANCHAUD

Bonjour à nouveau, Simon,

I think that tumbling windows could be of great help in your use case :

Let consider the following test db :


1.   Creation

db:create(‘test’)


2.   Documents insertion (in @ts descending order to check that the 
solution is working whatever the document physical order)

for $i in 1 to 100
let $ts := current-dateTime() + xs:dayTimeDuration('PT'||(100-$i+1)||'S')
let $flag := random:integer(2)
return
  db:add(
'test',

  {$flag}
,
'notif' || $i || '.xml')

Then the following query should do the job :

for tumbling window $i in sort(
  db:open('test'),
  (),
  function($doc) {
$doc/notif/@ts/data()
  })
start $s when fn:true()
end $e next $n when $e/notif/flag != $n/notif/flag
return
  $i[1]

It iterate on the sorted documents (by ascending @ts),
And output the first document of each monotonic flag group.

Hoping I did it right,
Best regards,

Fabrice
CERFrance Poitou-Charentes

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Simon 
Chatelain
Envoyé : vendredi 22 septembre 2017 13:32
À : BaseX
Objet : Re: [basex-talk] OutOfMemoryError at Query#more()

Bonjour Fabrice,

Thanks for the suggestion. I did try that (sending a query for each document), 
and it does work … sort of. Performance wise, it's really slow even if the 
database is fully optimized.

As for writing my process in xquery, that’s a good question. Honestly I don’t 
know as I am quite new at xquery, I lack the expertise.

I’ll try to give more detail about what I am trying to achieve.

In my database I have a series of XML documents, which, once really simplified, 
look like that.


  0


  0


  0

...

  1



  0


  0


  0

...

  1


What I need to get is:
The first XML document (first as in smallest @ts value)
Then the next document with 1 (again next in the @ts order)
Then the next document with 0
And so on…

That would be the documents highlighted in red in the above example.
Roughly only 1 out of 1000 documents has 1

I tried several approaches to do that, but the faster one I found is to iterate 
through all documents with a very simple xquery and keep only the ones I need,
for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d
 Another approach was to first select all documents with 1
for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 
return $d
then for each of those get the next document
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and 
$d/@ts > ‘[ts of previous document]’ return $d)[1]

Or select the first document,
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d)[1]
then query the next
 (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 
and $d/@ts > ‘[ts of previous document]’ return $d)[1]
And the next…
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and 
$d/@ts > ‘[ts of previous document]’ return $d)[1]
And so on.

But none of those is as fast as the first one, and then I hit this OutOfMemory 
issue.

So if there is a way to rewrite all that process in xquery that could be an 
option worth trying, or if there is a more efficient way to write the query
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and 
$d/@ts > ‘[ts of previous document]’ return $d)[1]
That could also solve my problem.

Regards

Simon



On 22 September 2017 at 09:53, Fabrice ETANCHAUD 
<fetanch...@pch.cerfrance.fr<mailto:fetanch...@pch.cerfrance.fr>> wrote:
Bonjour  Simon,

I would send a query for each document,
externalizing the loop in java.

A question : could you process be written in xquery ? That way you might not 
face memory overflow.

Best regards,
Fabrice Etanchaud
CERFrance Poitou-Charentes

De : 
basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>
 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>]
 De la part de Simon Chatelain
Envoyé : vendredi 22 septembre 2017 09:34
À : BaseX
Objet : [basex-talk] OutOfMemoryError at Query#more()

Hello,
I am facing an issue while retrieving some big amount of XML documents from a 
BaseX collection.
Each document (as an XML file) is around 10 KB, and in the problematic case I 
must retrieve around 7 of them.
I am using Session#query(String query) then Query#more() and Query#next() to 
iterate through the result of my query.

try (final Query query = l_Session.query(“query”)) {
while (query.more()) {
String xml = query.next();
}
}
If there is more than a certain amount of XML document in the result of my 
query I get a OutOfMemoryError (full stack trace in attached file) when 
executing query.more().

I did the test with BaseX 8.6.6 and 8.6.7, Java 8, VM arguments –Xmx1024m

Increasing the Xmx value is not a solution as I don’t kno

Re: [basex-talk] OutOfMemoryError at Query#more()

2017-09-22 Thread Fabrice ETANCHAUD

Bonjour  Simon,

I would send a query for each document,
externalizing the loop in java.

A question : could you process be written in xquery ? That way you might not 
face memory overflow.

Best regards,
Fabrice Etanchaud
CERFrance Poitou-Charentes

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Simon 
Chatelain
Envoyé : vendredi 22 septembre 2017 09:34
À : BaseX
Objet : [basex-talk] OutOfMemoryError at Query#more()

Hello,
I am facing an issue while retrieving some big amount of XML documents from a 
BaseX collection.
Each document (as an XML file) is around 10 KB, and in the problematic case I 
must retrieve around 7 of them.
I am using Session#query(String query) then Query#more() and Query#next() to 
iterate through the result of my query.

try (final Query query = l_Session.query(“query”)) {
while (query.more()) {
String xml = query.next();
}
}
If there is more than a certain amount of XML document in the result of my 
query I get a OutOfMemoryError (full stack trace in attached file) when 
executing query.more().

I did the test with BaseX 8.6.6 and 8.6.7, Java 8, VM arguments –Xmx1024m

Increasing the Xmx value is not a solution as I don’t know what the maximum 
amount of data I will have to retrieve in the future. So what I need is a 
reliable way of executing such queries and iterate through the result without 
exploding the heap size.
I also try to use QueryProcessor and QueryProcessor#iter() instead of 
Session#query(String query). But is it safe to use it knowing that my 
application is multithreaded and that each thread has its own session to query 
or add elements from/to multiple collections?
Moreover, for now all access to BaseX are done through a session, so my 
application can run with an embedded BaseX or with a BaseX server. If I start 
using QueryProcessor, then it will be embedded BaseX only, right?

I also attached a simple example showing the problem.

Any advice would be much appreciated

Thanks
Simon

Re: [basex-talk] Server Variables, cached vars, etc

2017-09-18 Thread Fabrice ETANCHAUD

Hello Christian !

Yes, a -c option for the basexhttp would help, as mentioned earlier, for 
example creating a shared mainmem collection.

Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Christian Grün
Envoyé : lundi 18 septembre 2017 16:20
À : Kendall Shaw; coach3pete
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Server Variables, cached vars, etc

Hi Erik,

I think that Xavier-Laurent, Marco, Fabrice and Kendall have already given 
excellent feedback.

In our own projects, we store all global data in databases, or in local 
configuration files. One advantage is that this data requires no initialization 
and will automatically be available after a restart.

I don’t know anything about »server variables« in MarkLogic so far, so
@Erik: feel free to pass me on a link to the documentation, and I can check if 
a similar solution could make sense for BaseX.

Talking about the start script server option: The basexserver command comes 
with a -c flag, which allows you run initial commands [1]. We could add such a 
flag for basexhttp, or even allow an initial input for both startup commands 
(similar to basex/basexclient). Would this be helpful for some of you reading 
this? Quite obviously, this requires BaseX to be run via these scripts (it 
wouldn’t have any effect if BaseX is deployed as servlet).

Cheers,
Christian

[1] http://docs.basex.org/wiki/Command-Line_Options#Server



On Sun, Sep 10, 2017 at 1:56 AM, Kendall Shaw <kendall.s...@workday.com> wrote:
> The servlet could populate your singleton just once upon startup, or 
> run xquery etc. The load-on-startup configuration means that the 
> servlet is initialized after basex has been loaded. So, if you restart 
> jetty or whatever web server/web container you are using basex 
> restarts and then your servlet’s init method is invoked.
>
>
>
> Kendall
>
>
>
> From: Erik Peterson <e...@ardec.com>
> Date: Saturday, September 9, 2017 at 4:16 AM
> To: Kendall Shaw <kendall.s...@workday.com>
> Cc: "basex-talk@mailman.uni-konstanz.de"
> <basex-talk@mailman.uni-konstanz.de>
>
>
> Subject: Re: [basex-talk] Server Variables, cached vars, etc
>
>
>
> Thanks Kendal for your reply. What would be the advantage of creating 
> a servlet over a singleton class to do the same thing?
>
>
>
> On Fri, Sep 8, 2017 at 11:12 AM, Kendall Shaw 
> <kendall.s...@workday.com>
> wrote:
>
> I thought it might be useful to mention advice I was given about 
> startup
> hooks:
>
>
>
>> From: "Kirsten, Dirk" dirk.kirs...@senacor.com
>
> ,,,
>
>> there is currently no way to do this using BaseX itself. But I also 
>> don’t think that should be the job of BaseX. Instead you can write a 
>> servlet and deploy it using Tomcat which runs some Java application, 
>> e.g. which could trigger some BaseXX command. See 
>> http://crunchify.com/how-to-run-java-program-automatically-on-tomcat-
>> startup/
>> for an example how to do this.
>
>
>
> I switched from using a cron job, to doing this in order to schedule jobs.
> I have very simple servlet that is configured with 
> 2 (basex has load-on-startup 2). It 
> runs a shell script which schedules the jobs, soon after basex is loaded.
>
>
>
> Kendall
>
>
>
> From: <basex-talk-boun...@mailman.uni-konstanz.de> on behalf of Erik 
> Peterson <e...@ardec.com>
> Date: Tuesday, September 5, 2017 at 7:02 AM
> To: Fabrice ETANCHAUD <fetanch...@pch.cerfrance.fr>, 
> "basex-talk@mailman.uni-konstanz.de" 
> <basex-talk@mailman.uni-konstanz.de>
> Subject: Re: [basex-talk] Server Variables, cached vars, etc
>
>
>
> Thank you all for your replys.  It looks like a main memory database 
> is the best "built in" option.  However, I have created Jar file  to 
> drop him/lib with a Java Singleton object...holding a map.  That should be  
> accessible
> across requests and sessions.   The question is how to populate this just
> once upon start up?  Perhaps I could do a job that would do that?  
> Also I could memoize the variables in a global script.  That way the 
> expensive operation is only run the first time it is needed.
>
>
>
> Any other suggestions welcome.  Recommend that a standard built-in 
> feature be added to handle these scenarios.
>
>
>
> On Tue, Sep 5, 2017 at 1:33 AM Fabrice ETANCHAUD 
> <fetanch...@pch.cerfrance.fr> wrote:
>
> To be confirmed : there is no 'start script' server option.
> I do manually create and populate the mainmem db in the dba query interface.
>
> Best regards,
> Fabrice
>
> -Mes

Re: [basex-talk] Basex Inner Workings

2017-09-18 Thread Fabrice ETANCHAUD

Hi Athanasios,

Could you please give us a idea of your resulting document size after 1.5 
minutes of BaseX time ?

Best regards,
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Anastasiou A.
Envoyé : lundi 18 septembre 2017 14:47
À : 'Graydon Saunders'; basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Basex Inner Workings

Hello

Many thanks, Dirk, Fabrice and Graydon.

I was going to look up ways of enabling the server to run as fast as possible 
anyway later on, so it is always good to know how is BaseX “thinking”.

I can see what you mean Graydon. This is a simple nested `for` to denormalise 
some of the structures of the XML file, where “some” is defined by
an XPath expression.

As far as I can tell, there is nothing being re-evaluated repeatedly within the 
inner loop that could be brought outside.

I have gone through the dot plans of the quickest and slowest versions of the 
query and the only thing they differ is in the addition of the CElems.

The “scaling” of the timings, in case it helps, is as follows:

Simple query, returning elements: 1100-1500 ms

Adding an `element` to what is returned just by the innermost `for`: 7500-9311 
ms
This means:
For…
   For….
Return element item{someElement|someOtherElement}

Adding an `element` to the whole block (no `element` to the innermost 
`for`):49000-67000ms
This means:
Element Items{
For…
For…
 Return someElement|someOtherElement
}

Adding an `element` to both places: 5-8ms
This means:
Element Items{
For…
For …
Return element Item {someElement|someOtherElement}
}


I don’t mind the ~8sec time but when we get to 1.5min, then yes…that’s going to 
be a bit annoying.

All the best







From: Graydon Saunders [mailto:graydon...@gmail.com]
Sent: 15 September 2017 17:04
To: Anastasiou A.; 
basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] Basex Inner Workings

As a follow-on to Dirk, it's amazing how much of a performance difference it 
can make to use typed variables when you're constructing something for output.  
(So far as I can tell, variables declarations function as an "optimize this!" 
flag for BaseX.)

If you get good performance when you're just throwing the resulting nodes and 
lose it massively by adding structure, as you relate up there somewhere are:
The change was to go from simply returning the nodes themselves with a `return 
thisnode | thatnode |theothernode` to a "formatted" document that has an outer 
 with a number of `return 
{thisNode|thatNode|theOtherNode}` inside it.

my immediate thought was "it's querying the same thing multiple times".

Most programming languages it's good practice to not create variables when you 
can inline.  XQuery does not appear to be one of those languages. :)  I try to 
think of this as "how can I make things easy for the optimizer?"

-- Graydon

On Fri, Sep 15, 2017 at 11:55 AM, Kirsten, Dirk 
<dirk.kirs...@senacor.com<mailto:dirk.kirs...@senacor.com>> wrote:
Hello Athanasios,

I think you should really check the actual query plan which is executed. If you 
have such a huge spike in performance surely they processor will be executing 
it differently. I don't think looking into file access patterns BaseX 
internally uses is very useful for an end user. You should let BaseX handle 
that (but of course, if you find better/more efficient ways I am sure 
Christian' gladly accepts Pull Requests). But the pattern you describe sounds 
very much excepted, so reads if you open databases seem logical and short write 
operations are also expected when just reading a database, because e.g. BaseX 
has to lock the databases.

So I think it would be more useful to look into the query plan. Of course you 
are more than welcome to ask about what is going on there on this list. I would 
expect that because of your rewrite maybe some indexes are not applied anymore 
(or if your rewrite is simply very big that most of the time is spent 
serializing the data).

Cheers
Dirk

Senacor Technologies Aktiengesellschaft - Sitz: Eschborn - Amtsgericht 
Frankfurt am Main - Reg.-Nr.: HRB 105546
Vorstand: Matthias Tomann, Marcus Purzer - Aufsichtsratsvorsitzender: Daniel 
Grözinger

-Ursprüngliche Nachricht-
Von: 
basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] Im Auftrag von Fabrice 
ETANCHAUD
Gesendet: Freitag, 15. September 2017 17:35
An: 'Anastasiou A.' 
<a.anastas...@swansea.ac.uk<mailto:a.anastas...@swansea.ac.uk>>; 
basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>
Betreff: Re: [basex-talk] Basex Inner Workings


You can find the time spent in each step in the query info bar graph.

If you are looking for the schema and the

Re: [basex-talk] Basex Inner Workings

2017-09-15 Thread Fabrice ETANCHAUD

You can find the time spent in each step in the query info bar graph.

If you are looking for the schema and the facets of your dataset, you should 
have a look at the index module, and for sure at index:facets()

Best regards,
Fabrice

-Message d'origine-
De : Anastasiou A. [mailto:a.anastas...@swansea.ac.uk] 
Envoyé : vendredi 15 septembre 2017 17:23
À : Fabrice ETANCHAUD; basex-talk@mailman.uni-konstanz.de
Objet : RE: Basex Inner Workings

Thank you Fabrice. I understand.

I have not tried querying from the command prompt or sending the output to a 
file directly, which I could also work with. But, my understanding is that the 
time we are being quoted by the gui is the DB time, not taking into account the 
time it takes for the list to be pushed into whatever data structures the list 
boxes might be supporting (?).

I am trying to get a better understanding of the dataset at the moment and I 
have short and long queries which depending on the results I get from this step 
could be optimised further.

All the best

-Original Message-
From: Fabrice ETANCHAUD [mailto:fetanch...@pch.cerfrance.fr]
Sent: 15 September 2017 16:17
To: Anastasiou A.; basex-talk@mailman.uni-konstanz.de
Subject: RE: Basex Inner Workings

I understand that you are reformatting a lot of data, aren't you ?
I will have only little advice, because this is not my use case.

>From what I know, resulting document will be materialized entirely in memory 
>before presentation or export.
You should export your results to disk, in order not to lose time in BaseXGUI 
rendering.

To reformat very big amounts of data, you might have a look at saxon streaming 
features (not in the free version).

But usually, big results are not requested frequently.

Best regards,
Fabrice

-Message d'origine-
De : Anastasiou A. [mailto:a.anastas...@swansea.ac.uk]
Envoyé : vendredi 15 septembre 2017 16:39 À : Fabrice ETANCHAUD; 
basex-talk@mailman.uni-konstanz.de
Objet : RE: Basex Inner Workings

Hello Fabrice

Yes, I am having a query which jumped from ~1500 ms to about a minute with a 
tiny little change...

The DB is about 2GB and it is my test set before putting the query to work on 
the full dataset.

The change was to go from simply returning the nodes themselves with a `return 
thisnode | thatnode |theothernode` to a "formatted" document that has an outer 
 with a number of `return 
{thisNode|thatNode|theOtherNode}` inside it.

I understand that the new query might be creating some new entities but 
compared to the element content, these few extra characters are not THAT many 
more.

The query jumps from ~1500 ms when using plain XML, to ~55000ms with the 
addition of the collection, item nodes, to ~57000ms with the addition of CSV 
exporting via the CSV module. These are "informal average" values. So, I have 
not run the same query a few times and then obtain the average, but that's the 
sort of vicinity I have seen numbers in from the times I have run the queries 
so far.

The database itself is "static", there are no update/insert transactions at the 
moment, the only thing that I am trying to do is extract some data in a 
different format from it.

I have Text, Attribute and Token indexes on that database (optimised right 
after importing) but no further options enabled. I also have not experimented 
with the SPLITSIZE (?). I have 32GB of memory and it should be enough to handle 
this 2GB test dataset (?). I will have a go with DEBUG on.

Did you have to enable any additional options for indexes to work faster?

All the best





-Original Message-
From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Fabrice 
ETANCHAUD
Sent: 15 September 2017 13:27
To: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Basex Inner Workings

Hi Athanasios,

Did you experience slow queries ?
Are you sure to use all the index features ?
Are these queries operational ones (direct access on a key value) or analytics ?

I never experienced slow queries, even on huge xml corpus (patent 
registrations), But this is at the cost of longer indexing times on updates.

Best regards,


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Anastasiou A.
Envoyé : vendredi 15 septembre 2017 14:01 À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Basex Inner Workings

Hello everyone

Quick question: Is there any document / URL where I could find out more about 
how does Basex access the disk during its operation?

For example, are there any reads to be expected during executing a query?

Through iotop, I can see 3-4 processes reading during startup, then another 2, 
very briefly firing when opening the database and then during querying there 
are periodic writes (?) but of very brief duration.

I was wondering if there is anything that

Re: [basex-talk] Basex Inner Workings

2017-09-15 Thread Fabrice ETANCHAUD

I understand that you are reformatting a lot of data, aren't you ?
I will have only little advice, because this is not my use case.

>From what I know, resulting document will be materialized entirely in memory 
>before presentation or export.
You should export your results to disk, in order not to lose time in BaseXGUI 
rendering.

To reformat very big amounts of data, you might have a look at saxon streaming 
features (not in the free version).

But usually, big results are not requested frequently.

Best regards,
Fabrice

-Message d'origine-
De : Anastasiou A. [mailto:a.anastas...@swansea.ac.uk] 
Envoyé : vendredi 15 septembre 2017 16:39
À : Fabrice ETANCHAUD; basex-talk@mailman.uni-konstanz.de
Objet : RE: Basex Inner Workings

Hello Fabrice

Yes, I am having a query which jumped from ~1500 ms to about a minute with a 
tiny little change...

The DB is about 2GB and it is my test set before putting the query to work on 
the full dataset.

The change was to go from simply returning the nodes themselves with a `return 
thisnode | thatnode |theothernode` to a "formatted" document that has an outer 
 with a number of `return 
{thisNode|thatNode|theOtherNode}` inside it.

I understand that the new query might be creating some new entities but 
compared to the element content, these few extra characters are not THAT many 
more.

The query jumps from ~1500 ms when using plain XML, to ~55000ms with the 
addition of the collection, item nodes, to ~57000ms with the addition of CSV 
exporting via the CSV module. These are "informal average" values. So, I have 
not run the same query a few times and then obtain the average, but that's the 
sort of vicinity I have seen numbers in from the times I have run the queries 
so far.

The database itself is "static", there are no update/insert transactions at the 
moment, the only thing that I am trying to do is extract some data in a 
different format from it.

I have Text, Attribute and Token indexes on that database (optimised right 
after importing) but no further options enabled. I also have not experimented 
with the SPLITSIZE (?). I have 32GB of memory and it should be enough to handle 
this 2GB test dataset (?). I will have a go with DEBUG on.

Did you have to enable any additional options for indexes to work faster?

All the best





-Original Message-
From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Fabrice 
ETANCHAUD
Sent: 15 September 2017 13:27
To: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Basex Inner Workings

Hi Athanasios,

Did you experience slow queries ?
Are you sure to use all the index features ?
Are these queries operational ones (direct access on a key value) or analytics ?

I never experienced slow queries, even on huge xml corpus (patent 
registrations), But this is at the cost of longer indexing times on updates.

Best regards,


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Anastasiou A.
Envoyé : vendredi 15 septembre 2017 14:01 À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Basex Inner Workings

Hello everyone

Quick question: Is there any document / URL where I could find out more about 
how does Basex access the disk during its operation?

For example, are there any reads to be expected during executing a query?

Through iotop, I can see 3-4 processes reading during startup, then another 2, 
very briefly firing when opening the database and then during querying there 
are periodic writes (?) but of very brief duration.

I was wondering if there is anything that could be done from the point of view 
of the hardware to speed up queries (?) (except a more powerful machine at the 
moment)

All  the best
Athanasios Anastasiou

Re: [basex-talk] Basex Inner Workings

2017-09-15 Thread Fabrice ETANCHAUD

Hi Athanasios,

Did you experience slow queries ?
Are you sure to use all the index features ?
Are these queries operational ones (direct access on a key value) or analytics ?

I never experienced slow queries, even on huge xml corpus (patent 
registrations),
But this is at the cost of longer indexing times on updates.

Best regards,


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Anastasiou A.
Envoyé : vendredi 15 septembre 2017 14:01
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Basex Inner Workings

Hello everyone

Quick question: Is there any document / URL where I could find out more about 
how does Basex access the disk during its operation?

For example, are there any reads to be expected during executing a query?

Through iotop, I can see 3-4 processes reading during startup, then another 2, 
very briefly firing when opening the database and then during querying there 
are periodic writes (?) but of very brief duration.

I was wondering if there is anything that could be done from the point of view 
of the hardware to speed up queries (?) (except a more powerful machine at the 
moment)

All  the best
Athanasios Anastasiou

Re: [basex-talk] Possible Bug in BaseX 8.2.3 when importing XML (Was RE: A few general questions about BaseX)

2017-09-14 Thread Fabrice ETANCHAUD

Oops : 

De : Fabrice ETANCHAUD
Envoyé : jeudi 14 septembre 2017 10:26
À : basex-talk@mailman.uni-konstanz.de
Objet : RE: Possible Bug in BaseX 8.2.3 when importing XML (Was RE: 
[basex-talk] A few general questions about BaseX)

Hi Athanasios,

Did you set the DEBUG option to get detailed information ?

Could you confirm you are creating a db from a directory content ?
If this is the case, as suggested, you should generate a command script to 
force the loading order, and use this script to load the data in forced order 
to detect where it fails.
You can easily create such a bxs file in xquery with a for file:list() loop.

This should look like :





myphysicalpath
myphysicalpath

..



Best regards,
Fabrice Etanchaud

De : Anastasiou A. [mailto:a.anastas...@swansea.ac.uk]
Envoyé : mercredi 13 septembre 2017 11:23
À : 
basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>
Cc : 'Alexander Holupirek'; 'Michael Seiferle'; Fabrice ETANCHAUD; 'Bridger 
Dyson-Smith'
Objet : Possible Bug in BaseX 8.2.3 when importing XML (Was RE: [basex-talk] A 
few general questions about BaseX)

Hello everyone

Many thanks to Alexander, Bridger, Fabrice, Michael for getting back to me with 
very detailed responses, these have been really helpful.

A few notes:


1)  The name is Athanasios :D. Sorry, just couldn’t help it, it seemed 
incredibly formal to be addressed via the surname in our communications.
Our mail server advertises the “Surname. Initial” pattern, so I can see where 
the confusion came from.

2)  I think that there is scope for adding some sort of “logging” to all 
actions of the server in general because I think I may have hit a bug but I 
cannot
provide any more illuminating comments. Here is what is happening:

a.   During import, I get an error that file somethingsomething140.xml has 
an incredibly long element that cannot be imported at line (blahblah). The 
whole process just dies there.

b.  This is a bug, because if I simply imported JUST the offending file 
itself, a single file database is created without any problems and I can query 
it and all. So, maybe, the error is caused because of the previous file OR 
because of the way the files are loaded. But I have absolutely no way of 
knowing the “load history” of the files or the exception that was caught or 
anything else. In fact, once you press “OK” in the error dialog box, any 
database files that have been created are lost. In addition to this, the XML 
files to import are enumerated in a random order. So, I had to run the import 
again and stay there looking at each one of the files loading, to witness that 
the system “breaks” after 254 files (which is suspiciously close to 256). None 
of the files around the vicinity of the offending file caused any problems, so 
this may be a more difficult to catch bug (but it is thrown with both the 
internal and external parsers). Following this, I created smaller databases 
with 250 XML files and then got “predictable” errors on running out of memory 
and not creating indexes which I can solve more easily.

3)  It’s good to know that I don’t need the original files because that’s a 
lot of space I can get rid of. Thank you.

4)  Seems like the ADDCACHE would have saved me some trouble here, many 
thanks for that, but of course, if you don’t know the file enumeration order, 
you are still stuck in not knowing which files have already been imported.

5)  Michael, logging won’t help with the internal import procedure, except 
of course if you were implying writing a quick script to do the import 
“manually”?

6)  Michael, the fork-join and “client connect” are really interesting and 
worth a try before I start connecting things together via Hadoop. Are these 
modules already available to BaseX? Do I simply import their namespace or is it 
not even needed?

Many thanks again.

All the best






From: Bridger Dyson-Smith [mailto:bdysonsm...@gmail.com]
Sent: 12 September 2017 16:53
To: Anastasiou A.
Cc: 
basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] A few general questions about BaseX

Hi Anastasiou,
Hopefully some of these answers are somewhat helpful.

On Tue, Sep 12, 2017 at 4:54 AM, Anastasiou A. 
<a.anastas...@swansea.ac.uk<mailto:a.anastas...@swansea.ac.uk>> wrote:
Hello everyone

I am trying to load BaseX with a large number of XML files (~500), each one a 
few hundreds of MBs big.
BaseX fails with a message along the lines “This is too big for one database”.

Can I please ask:


1)  Are there any logs, beyond the DB logs? If yes, where can I find them?

a.  The reason I am asking is because once basexgui gives the message, 
there is no indication about the error.
Ideally, I would like to know if this is a limitation on memory amount or 
number of items (?).
I'm not sure how to enable more verbose logging with

Re: [basex-talk] Possible Bug in BaseX 8.2.3 when importing XML (Was RE: A few general questions about BaseX)

2017-09-14 Thread Fabrice ETANCHAUD

Hi Athanasios,

Did you set the DEBUG option to get detailed information ?

Could you confirm you are creating a db from a directory content ?
If this is the case, as suggested, you should generate a command script to 
force the loading order, and use this script to load the data in forced order 
to detect where it fails.
You can easily create such a bxs file in xquery with a for file:list() loop.

This should look like :





myphysicalpath
myphysicalpath

..



Best regards,
Fabrice Etanchaud

De : Anastasiou A. [mailto:a.anastas...@swansea.ac.uk]
Envoyé : mercredi 13 septembre 2017 11:23
À : basex-talk@mailman.uni-konstanz.de
Cc : 'Alexander Holupirek'; 'Michael Seiferle'; Fabrice ETANCHAUD; 'Bridger 
Dyson-Smith'
Objet : Possible Bug in BaseX 8.2.3 when importing XML (Was RE: [basex-talk] A 
few general questions about BaseX)

Hello everyone

Many thanks to Alexander, Bridger, Fabrice, Michael for getting back to me with 
very detailed responses, these have been really helpful.

A few notes:


1)  The name is Athanasios :D. Sorry, just couldn’t help it, it seemed 
incredibly formal to be addressed via the surname in our communications.
Our mail server advertises the “Surname. Initial” pattern, so I can see where 
the confusion came from.

2)  I think that there is scope for adding some sort of “logging” to all 
actions of the server in general because I think I may have hit a bug but I 
cannot
provide any more illuminating comments. Here is what is happening:

a.   During import, I get an error that file somethingsomething140.xml has 
an incredibly long element that cannot be imported at line (blahblah). The 
whole process just dies there.

b.  This is a bug, because if I simply imported JUST the offending file 
itself, a single file database is created without any problems and I can query 
it and all. So, maybe, the error is caused because of the previous file OR 
because of the way the files are loaded. But I have absolutely no way of 
knowing the “load history” of the files or the exception that was caught or 
anything else. In fact, once you press “OK” in the error dialog box, any 
database files that have been created are lost. In addition to this, the XML 
files to import are enumerated in a random order. So, I had to run the import 
again and stay there looking at each one of the files loading, to witness that 
the system “breaks” after 254 files (which is suspiciously close to 256). None 
of the files around the vicinity of the offending file caused any problems, so 
this may be a more difficult to catch bug (but it is thrown with both the 
internal and external parsers). Following this, I created smaller databases 
with 250 XML files and then got “predictable” errors on running out of memory 
and not creating indexes which I can solve more easily.

3)  It’s good to know that I don’t need the original files because that’s a 
lot of space I can get rid of. Thank you.

4)  Seems like the ADDCACHE would have saved me some trouble here, many 
thanks for that, but of course, if you don’t know the file enumeration order, 
you are still stuck in not knowing which files have already been imported.

5)  Michael, logging won’t help with the internal import procedure, except 
of course if you were implying writing a quick script to do the import 
“manually”?

6)  Michael, the fork-join and “client connect” are really interesting and 
worth a try before I start connecting things together via Hadoop. Are these 
modules already available to BaseX? Do I simply import their namespace or is it 
not even needed?

Many thanks again.

All the best






From: Bridger Dyson-Smith [mailto:bdysonsm...@gmail.com]
Sent: 12 September 2017 16:53
To: Anastasiou A.
Cc: 
basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] A few general questions about BaseX

Hi Anastasiou,
Hopefully some of these answers are somewhat helpful.

On Tue, Sep 12, 2017 at 4:54 AM, Anastasiou A. 
<a.anastas...@swansea.ac.uk<mailto:a.anastas...@swansea.ac.uk>> wrote:
Hello everyone

I am trying to load BaseX with a large number of XML files (~500), each one a 
few hundreds of MBs big.
BaseX fails with a message along the lines “This is too big for one database”.

Can I please ask:


1)  Are there any logs, beyond the DB logs? If yes, where can I find them?

a.  The reason I am asking is because once basexgui gives the message, 
there is no indication about the error.
Ideally, I would like to know if this is a limitation on memory amount or 
number of items (?).
I'm not sure how to enable more verbose logging with the GUI -- hopefully one 
of the devs or power users can weigh in on this.

2)  The parser options include reading XML files from archives, which is 
very convenient, but once the file has been
parsed, does BaseX require the “originals” for queries / returning results?

Re: [basex-talk] A few general questions about BaseX

2017-09-12 Thread Fabrice ETANCHAUD

Hi Anastasiou,

When adding many big documents, I usually set the ADDCACHE option [1], and add 
files sequentially (for example in a BaseX command script).
So, when I hit the db size limit, no work is lost and I can continue adding the 
remaining files in a new db.

Best regards,
Fabrice Etanchaud

[1] http://docs.basex.org/wiki/Options#ADDCACHE



De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Anastasiou A.
Envoyé : mardi 12 septembre 2017 11:01
À : 'basex-talk@mailman.uni-konstanz.de'
Objet : [basex-talk] FW: A few general questions about BaseX

I am sorry, turns out the error is probably due to malformed input in one of 
the files which I will have to look into, not BaseX, would however still 
appreciate some indication regarding the rest of the questions.

All the best



From: Anastasiou A.
Sent: 12 September 2017 09:54
To: 
basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>
Subject: A few general questions about BaseX

Hello everyone

I am trying to load BaseX with a large number of XML files (~500), each one a 
few hundreds of MBs big.
BaseX fails with a message along the lines "This is too big for one database".

Can I please ask:


1)  Are there any logs, beyond the DB logs? If yes, where can I find them?

a.   The reason I am asking is because once basexgui gives the message, 
there is no indication about the error.
Ideally, I would like to know if this is a limitation on memory amount or 
number of items (?).

2)  The parser options include reading XML files from archives, which is 
very convenient, but once the file has been
parsed, does BaseX require the "originals" for queries / returning results?

3)  Is it possible to do federation with BaseX? In other words, let's say I 
split a database in two large parts (as per #1),
is it possible to launch two baseX servers and then have them talk to each 
other so that ultimately I just query one of
them and get back unified results?

All the best

Re: [basex-talk] Issue with Full Text Retrieval

2017-09-11 Thread Fabrice ETANCHAUD

Hello Ron,

I don’t know how ft operators behave on document nodes.
Supposing documents are converted to their data() representation, Your query 
would yield the same negative answer.
You should consider applying ft operators on text nodes like this :

for $trial in db:open('NCT00473512')//text() (: 
[clinical_study/id_info/nct_id='NCT00473512'] :)
return $trial[. contains text { 'neoplasms' }]

Best regards,
Fabrice Etanchaud


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Ron Katriel
Envoyé : lundi 11 septembre 2017 00:42
À : BaseX
Objet : [basex-talk] Issue with Full Text Retrieval

Hi,

I am seeing strange behavior with Full Text retrieval. The following query 
fails for a number of words that are in the XML document (see attached):

for $trial in db:open('CTGovDebug') (: 
[clinical_study/id_info/nct_id='NCT00473512'] :)
return $trial contains text { 'neoplasms' }

It fails on a good number of words including neoplasms, cougar, industry, yes, 
completed, november, 2005, interventional, single, male, female, assignment, 
none, research, principal, primary, secondary, age, years, gender, etc. But it 
matches most of the words in the file.

Observation: The words that fail are located at the beginning and/or end of the 
text and do not occur anywhere else in the middle of any text.

The document is the only one in the database. It does not make a difference 
whether full text indexing is on or off. My BaseX version is 8.6.4.

Thanks,
Ron


Ron Katriel, Ph.D. | Principal Data Scientist | Medidata 
Solutions<http://www.mdsol.com/>
350 Hudson Street, 7th Floor, New York, NY 10014
rkatr...@mdsol.com<mailto:tbro...@mdsol.com> | direct: +1 201 337 
3622 | mobile: +1 201 675 
5598 | main: +1 212 918 
1800

Re: [basex-talk] Server Variables, cached vars, etc

2017-09-05 Thread Fabrice ETANCHAUD

To be confirmed : there is no 'start script' server option.
I do manually create and populate the mainmem db in the dba query interface.

Best regards,
Fabrice

-Message d'origine-
De : Fabrice ETANCHAUD 
Envoyé : mardi 5 septembre 2017 09:29
À : 'Marco Lettere'; basex-talk@mailman.uni-konstanz.de
Objet : RE: [basex-talk] Server Variables, cached vars, etc

Hi all,

Another solution is to share a main memory database, that behaves like a memory 
cache.
In Client/Server mode, any main memory created by one client is available to 
all the other ones.

Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Marco Lettere 
Envoyé : mardi 5 septembre 2017 09:14 À : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Server Variables, cached vars, etc

On 05/09/2017 01:37, Erik Peterson wrote:
> How can I create a variable that is evaluated only once but accessed 
> across many RestXQ requests and sessions. I'm trying to cache data 
> that comes from an integration with an expensive operation. Does BaseX 
> support something similar to server variables like Mark Logic?

Hi Erik,

AFAIK you have the following possibilities to keep a variable live accross 
multiple RestXQ calls:

1) Use session http://docs.basex.org/wiki/Session_Module

2) Use a database which is builtin in BaseX and is very lightweight. 
Especially if your data is serializable to XML you could benefit also from 
indexes to speed up access to your cached objects.

3) Use the file system.

Hope this helps [cit] ;-)

Marco.

Re: [basex-talk] Server Variables, cached vars, etc

2017-09-05 Thread Fabrice ETANCHAUD

Hi all,

Another solution is to share a main memory database, that behaves like a memory 
cache.
In Client/Server mode, any main memory created by one client is available to 
all the other ones.

Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Marco Lettere
Envoyé : mardi 5 septembre 2017 09:14
À : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Server Variables, cached vars, etc

On 05/09/2017 01:37, Erik Peterson wrote:
> How can I create a variable that is evaluated only once but accessed 
> across many RestXQ requests and sessions. I'm trying to cache data 
> that comes from an integration with an expensive operation. Does BaseX 
> support something similar to server variables like Mark Logic?

Hi Erik,

AFAIK you have the following possibilities to keep a variable live accross 
multiple RestXQ calls:

1) Use session http://docs.basex.org/wiki/Session_Module

2) Use a database which is builtin in BaseX and is very lightweight. 
Especially if your data is serializable to XML you could benefit also from 
indexes to speed up access to your cached objects.

3) Use the file system.

Hope this helps [cit] ;-)

Marco.

Re: [basex-talk] file:name -> admin permission required

2017-09-01 Thread Fabrice ETANCHAUD

Hi Guenter,

Your file:name() usage is a bit borrowed (outside the file module standard 
usage),
Could it be because file module usage is restricted in a webapp context for 
security reasons ?

Is the document serialization leading to a file path ?
It seems you are giving a document-node to the file:name function, did you try 
file:name(db:path($doc)) ?


Usually I just use something like tokenize(db:path($doc), '/')[last()]

Best regards,
Fabrice


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Günter 
Dunz-Wolff
Envoyé : vendredi 1 septembre 2017 14:11
À : BaseX
Objet : [basex-talk] file:name -> admin permission required

Hi all,

I updated my older basex-version, now I get in my RESTXQ-App the error 'admin 
permission required‘ with

let $collection := collection("data/stories") for $doc in $collection
let $file_name := file:name($doc)
return $file_name

Whats going wrong?

Thanks for any help.
Guenter

Re: [basex-talk] Implement read lock and write lock

2017-08-17 Thread Fabrice ETANCHAUD

Hi Dhamendra Kumar,

XQuery and BaseX are extremely productive technologies,
But at the cost of a minimal learning time.

I am sorry I cannot help at that point,
You should take the time to read a few tutorials and have an overview of all 
the BaseX capabilities by reading the documentation starter sections.

Best regards,
Fabrice


De : Dharmendra Singh [mailto:dharam.m...@gmail.com]
Envoyé : jeudi 17 août 2017 11:04
À : Fabrice ETANCHAUD; BaseX
Objet : Re: [basex-talk] Implement read lock and write lock

Thanks for your reply favrice,

I am new in the BaseX so if you little explore.


i have gone through the documentation within documentation these line has been 
given

declare option query:read-lock "foo,bar";
declare option query:read-lock "batz";
  declare option query:write-lock "quix";

i have the file structure:

/bloomsbury/config/audit.xml

so i have to lock the audit.xml so how can i do that.


Regards
Dharmendra Kumar Singh

On Thursday, 17 August 2017 2:01 PM, Fabrice ETANCHAUD 
<fetanch...@pch.cerfrance.fr<mailto:fetanch...@pch.cerfrance.fr>> wrote:

Hi Dharmendra Kumar,

Maybe you shall have a look at the following resource :

http://docs.basex.org/wiki/Transaction_Management


Regards,
Fabrice Etanchaud

“Perfection is finally attained not when there is no longer anything to add,
but when there is no longer anything to take away."

Saint-Exupéry


De : 
basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de DK Singh
Envoyé : jeudi 17 août 2017 09:53
À : BaseX
Objet : [basex-talk] Implement read lock and write lock

Hi All,

How can i implement read lock and write lock,  any functions to do this

Regards
Dharmendra Kumar Singh

Re: [basex-talk] Implement read lock and write lock

2017-08-17 Thread Fabrice ETANCHAUD

Hi Dharmendra Kumar,

Maybe you shall have a look at the following resource :

http://docs.basex.org/wiki/Transaction_Management


Regards,
Fabrice Etanchaud

“Perfection is finally attained not when there is no longer anything to add,
but when there is no longer anything to take away."

Saint-Exupéry


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de DK Singh
Envoyé : jeudi 17 août 2017 09:53
À : BaseX
Objet : [basex-talk] Implement read lock and write lock

Hi All,

How can i implement read lock and write lock,  any functions to do this

Regards
Dharmendra Kumar Singh

Re: [basex-talk] Connect a BaseX web application to an already running BaseX server

2017-08-02 Thread Fabrice ETANCHAUD

Thank you Christian for you fast reply, as usual !

I wish I could use BaseX more often :

For now I am writing hundred lines of annotated java with jersey/jpa/hibernate 
just to expose a few rest resources…
Not to mention the ETL stuff to move data from one rdbms to the others…
And the liquibase changlogs to maintain the schemas…
And…

Meilleures salutations !

Fabrice


De : Christian Grün [mailto:christian.gr...@gmail.com]
Envoyé : mercredi 2 août 2017 12:04
À : Fabrice ETANCHAUD
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Connect a BaseX web application to an already running 
BaseX server

Hi Fabrice,

I’m sorry that’s not possible: If you start a BaseX HTTP server with the 
embedded BaseX server instance, it will not communicate with this server via 
protocols/APIs, but share the same context.

Salutations,
Christian


On Wed, Aug 2, 2017 at 11:48 AM, Fabrice ETANCHAUD 
<fetanch...@pch.cerfrance.fr<mailto:fetanch...@pch.cerfrance.fr>> wrote:
Dear all,

I would like to connect a BaseX web application to an already running BaseX 
server,
But I cannot figure out how to do it without using the Client module.
I tried to set the –n (HOST) option at webapp startup but it did not help.

Best regards,


[cerfranceimage]

Fabrice ETANCHAUD
Analyste Developpeur

Les Rocs, Chavagné CS 40070
79260 LA CRECHE
Tél. 05 49 76 45 45- Fax 05 49 25 83 87
Email : fetanch...@pch.cerfrance.fr<mailto:fetanch...@pch.cerfrance.fr>

[cerfrancedevise]

 poitoucharentes.cerfrance.fr<http://www.poitoucharentes.cerfrance.fr>

[facebook]<https://www.facebook.com/Cerfrance-Poitou-Charentes-686810324682939>

[twitter]<https://twitter.com/CERFRANCE_PCH>

[youtube]<https://www.youtube.com/user/CERFRANCEnational>

[linkedin]<https://fr.linkedin.com/company/cerfrance-national->

[viadeo]<http://fr.viadeo.com/fr/company/cerfrance>

[basex-talk] Connect a BaseX web application to an already running BaseX server

2017-08-02 Thread Fabrice ETANCHAUD

Dear all,

I would like to connect a BaseX web application to an already running BaseX 
server,
But I cannot figure out how to do it without using the Client module.
I tried to set the -n (HOST) option at webapp startup but it did not help.

Best regards,


[cerfranceimage]

Fabrice ETANCHAUD
Analyste Developpeur

Les Rocs, Chavagné CS 40070
79260 LA CRECHE
Tél. 05 49 76 45 45- Fax 05 49 25 83 87
Email : fetanch...@pch.cerfrance.fr

[cerfrancedevise]

 poitoucharentes.cerfrance.fr<http://www.poitoucharentes.cerfrance.fr>

[facebook]<https://www.facebook.com/Cerfrance-Poitou-Charentes-686810324682939>

[twitter]<https://twitter.com/CERFRANCE_PCH>

[youtube]<https://www.youtube.com/user/CERFRANCEnational>

[linkedin]<https://fr.linkedin.com/company/cerfrance-national->

[viadeo]<http://fr.viadeo.com/fr/company/cerfrance>

Re: [basex-talk] quick and dirty guide to increasing memory capacity?

2017-07-12 Thread Fabrice ETANCHAUD

Hello,

In the BaseXGUI.(sh|bat),

You shall change the java -Xmx parameter value 

https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/jrdocs/refman/optionX.html

Best regards,
Fabrice Etanchaud


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de C. M. 
Sperberg-McQueen
Envoyé : mercredi 12 juillet 2017 06:17
À : basex-talk@mailman.uni-konstanz.de
Cc : C. M. Sperberg-McQueen
Objet : [basex-talk] quick and dirty guide to increasing memory capacity?

I’m working on some XQuery code for language corpora and testing a function 
that reads a corpus and gathers information about its contents.  The function 
works fine on the one-million word Brown Corpus, but it’s failing with an 
out-of-memory error on the five-million-word Hamburg Dependency Treebank (the 
XML for which is somewhat more verbose than Brown’s).

Before I spend a lot of time trying to rewrite the code to reduce memory usage: 
 Is there a quick way to give the BaseX GUI interface more memory?

Once the code is more complete, it will make sense to try to improve its speed 
and lower its resource consumption (and at that point I’ll surely have lots of 
questions for the list), but for the moment I would prefer to postpone such 
considerations, and focus on writing code I understand.

Thanks!


C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cms...@blackmesatech.com
http://www.blackmesatech.com

Re: [basex-talk] fetch:content-type file: customisation

2017-06-12 Thread Fabrice ETANCHAUD

Hi Christian,

Yes I already used RestXQ to implement JSON API partially, you are right, 
RestXQ is the only way serverside.
The need in REST was only for blind storage of redirected json-api responses, 
that can also be done with RestXQ.

Thank you Christian for providing us with such a flexible software !

Best regards,
Fabrice

De : Christian Grün [mailto:christian.gr...@gmail.com]
Envoyé : dimanche 11 juin 2017 22:00
À : Fabrice ETANCHAUD
Cc : Andy Bunce; BaseX
Objet : RE: [basex-talk] fetch:content-type file: customisation

Hi Fabrice,

I guess it would be better to use RESTXQ for building responses that match the 
JSON API specification. Did you try that already?

Cheers
Christian

Am 09.06.2017 15:11 schrieb "Fabrice ETANCHAUD" 
<fetanch...@pch.cerfrance.fr<mailto:fetanch...@pch.cerfrance.fr>>:
Hi all,

Dear Christian, I hope you are doing well.

On the same subject, would it be possible for the REST interface to handle PUT 
requests of

application/vnd.api+json<http://www.iana.org/assignments/media-types/application/vnd.api+json>

resources (http://jsonapi.org/) like JSON ones ?

Best regards,

Fabrice

De : 
basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>

[mailto:basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>]
 De la part de Andy Bunce
Envoyé : vendredi 9 juin 2017 12:38
À : Christian Grün
Cc : 
basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>
Objet : Re: [basex-talk] fetch:content-type file: customisation

Hi Christian,
Just these two I am thinking about at the moment.

Is there some reason not to go with "application/xproc+xml"? [1]
Many in the existing list are in this style e.g.
atom=application/atom+xml
svg=image/svg+xml
lostxml=application/lost+xml
And it looks like BaseX will treat it as XML [2]
/Andy
[1] https://www.w3.org/XML/XProc/docs/langspec.html#media-type-registration
[2] 
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/http/MediaType.java#L148

On 9 June 2017 at 11:01, Christian Grün 
<christian.gr...@gmail.com<mailto:christian.gr...@gmail.com>> wrote:
Hi Andy,

Adding the extensions statically is surely the least effort for now.

I can add the following two mappings:

  xpl=application/xml
  xproc=application/xml

Are some more that I should include?

Thanks,
Christian

On Fri, Jun 9, 2017 at 11:49 AM, Andy Bunce 
<bunce.a...@gmail.com<mailto:bunce.a...@gmail.com>> wrote:
> Hi,
>
> I notice that fetch:content-type()[1] returns "application/octet-stream" for
> files with the commonly used XProc file extensions *.xpl and *.xproc.
> This seems to be driven from the list in
> src/main/resources/media-types.properties [2]
> Would it be possible to add these extensions as "application/xproc+xml" or
> add a mechanism to allow extension/customisation? Maybe similar to [3]
> For static files served from jetty adding mime-mapping elements to web.xml
> works [4]
>
> /Andy
>
> [1] http://docs.basex.org/wiki/Fetch_Module#fetch:content-type
> [2]
> https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/resources/media-types.properties
> [3]
> http://docs.oracle.com/javase/7/docs/api/javax/activation/MimetypesFileTypeMap.html
> [4]
> https://stackoverflow.com/questions/33803109/how-can-i-set-mime-mapping-to-a-file-served-as-static-content-by-jetty-runner/33809187#33809187

Re: [basex-talk] fetch:content-type file: customisation

2017-06-09 Thread Fabrice ETANCHAUD

Hi all,

Dear Christian, I hope you are doing well.

On the same subject, would it be possible for the REST interface to handle PUT 
requests of

application/vnd.api+json

resources (http://jsonapi.org/) like JSON ones ?

Best regards,

Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Andy Bunce
Envoyé : vendredi 9 juin 2017 12:38
À : Christian Grün
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] fetch:content-type file: customisation

Hi Christian,
Just these two I am thinking about at the moment.

Is there some reason not to go with "application/xproc+xml"? [1]
Many in the existing list are in this style e.g.
atom=application/atom+xml
svg=image/svg+xml
lostxml=application/lost+xml
And it looks like BaseX will treat it as XML [2]
/Andy
[1] https://www.w3.org/XML/XProc/docs/langspec.html#media-type-registration
[2] 
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/http/MediaType.java#L148

On 9 June 2017 at 11:01, Christian Grün 
> wrote:
Hi Andy,

Adding the extensions statically is surely the least effort for now.

I can add the following two mappings:

  xpl=application/xml
  xproc=application/xml

Are some more that I should include?

Thanks,
Christian

On Fri, Jun 9, 2017 at 11:49 AM, Andy Bunce 
> wrote:
> Hi,
>
> I notice that fetch:content-type()[1] returns "application/octet-stream" for
> files with the commonly used XProc file extensions *.xpl and *.xproc.
> This seems to be driven from the list in
> src/main/resources/media-types.properties [2]
> Would it be possible to add these extensions as "application/xproc+xml" or
> add a mechanism to allow extension/customisation? Maybe similar to [3]
> For static files served from jetty adding mime-mapping elements to web.xml
> works [4]
>
> /Andy
>
> [1] http://docs.basex.org/wiki/Fetch_Module#fetch:content-type
> [2]
> https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/resources/media-types.properties
> [3]
> http://docs.oracle.com/javase/7/docs/api/javax/activation/MimetypesFileTypeMap.html
> [4]
> https://stackoverflow.com/questions/33803109/how-can-i-set-mime-mapping-to-a-file-served-as-static-content-by-jetty-runner/33809187#33809187

Re: [basex-talk] server and/or web application up times

2017-04-07 Thread Fabrice ETANCHAUD

Thank you Andy !

Best regards,
Fabrice


De : Andy Bunce [mailto:bunce.a...@gmail.com]
Envoyé : vendredi 7 avril 2017 10:10
À : Fabrice ETANCHAUD <fetanch...@groupefbo.com>
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] server and/or web application up times

Hi Fabrice,

One option is to use the Java API to get the jvm uptime[1]. This requires admin 
permission and the assumption that the server started at the same time as the 
jvm instance.
/Andy
[1] 
https://github.com/Quodatum/openshift-basex-quick-start/blob/master/basex/repo/quodatum/basex/env.xqm#L90-L105

On 7 April 2017 at 08:45, Fabrice ETANCHAUD 
<fetanch...@groupefbo.com<mailto:fetanch...@groupefbo.com>> wrote:
Dear all,

Is there a mean to obtain server and/or web application up times ?


Best regards,
Fabrice

[basex-talk] server and/or web application up times

2017-04-07 Thread Fabrice ETANCHAUD

Dear all,

Is there a mean to obtain server and/or web application up times ?


Best regards,
Fabrice

Re: [basex-talk] Performance: associating one with many, or many with one?

2017-03-29 Thread Fabrice ETANCHAUD

Hi Jay,

You are asking for advice, so here is mine :

Working with BaseX, you shall get rid of any entity/relationship model (or 
stick to java JPA plumbing), and focus on the documents or the messages your 
application will exchange.

Each document might be identified - either by path and/or an unique attribute 
or element, for cross references (maybe with xml:base attribute ?).

At first glance, one could think of the following document types :

-  Artist or Band (auto referencing Artists)

-  Album (including Songs)

-  Tour

As I write, I just realize that your question could be all about the 
distinction between aggregation and reference relations in the object world 
(just think of the Eiffel expanded keyword).

Good designing !

Best regards,

Fabrice ETANCHAUD
SPI Informatique


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Jay Straw
Envoyé : mercredi 29 mars 2017 03:16
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Performance: associating one with many, or many with one?

Hi List!

I'm building an application in base-x that's sort of like a local music wiki: 
bands, members, venues, promoters, and albums and songs are some of what I'm 
representing in XML.

And what I'm wondering is that, in a situation where there's one of something 
(a musician), and a bunch of something else they're associated with (song 
writting or performance credits), which node should get the reference?

The numerous node gets the reference back to the one:





[ etc ]

Or the one refers to the many:








Example 1, I could use something like:

track[@written-by="Joe Schmoe"]

Example 2:

person[@name="Joe Schmoe"]/songs-written/song

My other thought was putting the references in both, but I'll have to do this 
with many types of objects I'm representing and I don't want things to get out 
of control (then again, if I go to export an  or  and I'm using 
the second method, I'll have to do a join)

I have a feeling in a large set, the second way wins out. Even where I live, 
with only around 750,000 people in our entire state (Alaska), we have easily 
100 - 200 acts, including bands, solo performers, traditional acts, etc, with 
thousands of individuals that compose them. But I want bands that are extinct, 
bands that are on hiatus, etc, and I hope my software is useful to others in 
larger cities as well.

So, maybe a pointless question because my db will be so tiny compared to some 
folks who post on here, but just curious :-)

Also, if this problem has a name I can research myself, that'd be great!

Be well,
Jay Straw

Re: [basex-talk] Index on specific node

2017-03-20 Thread Fabrice ETANCHAUD

Hello Tushar,
It’s all in the doc :

http://docs.basex.org/wiki/Index#Selective_Indexing

Best regards,
Fabrice Etanchaud
groupefbo


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Tushar 
Deshmukh
Envoyé : lundi 20 mars 2017 01:48
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Index on specific node

Namaste All,

Can we index the database only on one/few specific node.

I am not able to find out the help or might be putting different text for 
search.

Thanks

Tushar Deshmukh

Re: [basex-talk] Simple query

2017-03-09 Thread Fabrice ETANCHAUD

Hello Aaron,

You would learn faster by using the BaseXGUI application,
You will benefit from syntax highlighting, real time execution, and hits on 
function signatures.

Best regards,
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Aaron Weber
Envoyé : jeudi 9 mars 2017 00:31
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Simple query

Newbie alert.

I'm trying to get my feet wet with BaseX, and in doing so, am trying to 
understand XQuery and how to apply it to a database full of documents (not just 
a single document that is typically queried).

I am using Java and can post my code, but with a LocalSession, and a query, the 
following produces 0 results.

For $doc in collection() return $doc

I realize there's no "where", and in the sql world that would match all. Maybe 
not in XQuery?

Obviously just a test query, but I need to start somewhere. :-)

Thanks for any help!
--
AJ

Re: [basex-talk] [XPTY0004] Item expected, sequence found

2017-03-03 Thread Fabrice ETANCHAUD

Hello Bram,

Sorry I mean it might not be xquery related, but xml related (in your dataset).

Maybe running a modified version of your script twice,
Replacing separately each bold statement below with

and count(../node[
@rel="obj1" and @cat="np"
  ]/node[
@rel="mod" and @cat="pp"
  ]/node[
@rel="hd" and @pt="vz"
  ]/@begin) > 1

Will help you identify where you have more than one matching element ?

Best regards,
Fabrice

//node[
  @cat="pp"
  and node[
@rel="hd"
and @pt="vz"
and number(@begin) < number(
  ../node[
@rel="obj1" and @cat="np"
  ]/node[
@rel="mod" and @cat="pp"
  ]/node[
@rel="hd" and @pt="vz"
  ]/@begin)
]
  and node[
@rel="obj1"
and @cat="np"
and node[
  @rel="mod"
  and @cat="pp"
  and node[
@rel="hd"
and @pt="vz"
and number(@begin) < number(
  ../node[
@rel="obj1"
and @cat="np"
  ]/node[
@rel="mod"
and @cat="pp"
  ]/node[
@rel="hd"
and @pt="vz"
  ]/@begin)
]
and node[
  @rel="obj1"
  and @cat="np"
      and node[
@rel="mod"
and @cat="pp"
and node[
  @rel="hd"
  and @pt="vz"
]
  ]
]
  ]
]
  ]




De : Bram Vanroy [mailto:bram.vanr...@student.kuleuven.be]
Envoyé : vendredi 3 mars 2017 12:52
À : Fabrice ETANCHAUD <fetanch...@groupefbo.com>; 'BaseX' 
<basex-talk@mailman.uni-konstanz.de>
Objet : RE: [basex-talk] [XPTY0004] Item expected, sequence found

Hi Fabrice thank you for the quick reply.

Unfortunately I do not understand what you mean. I have queried other XPath 
codes with the number() function and the begin attribute without fault. 
However, it does seem to point in that direction as the error explicitly 
mentions that attribute:

attribute begin {"6"},

Because other lines in the benchmark with the number() function and begin 
attribute do not throw this error, I do not know where to look.

Van: Fabrice ETANCHAUD [mailto:fetanch...@groupefbo.com]
Verzonden: vrijdag 3 maart 2017 12:33
Aan: Bram Vanroy 
<bram.vanr...@student.kuleuven.be<mailto:bram.vanr...@student.kuleuven.be>>; 
'BaseX' 
<basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>>
Onderwerp: RE: [basex-talk] [XPTY0004] Item expected, sequence found

Hi Bram

>From what one can read there :

http://docs.basex.org/wiki/XQuery_Errors

It seems to me that number() function could be called with a sequence of @begin.
There might be a place in your query where multiple node elements are returned, 
leading to a node/@begin sequence.

Best regards,
Fabrice


De : 
basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Bram Vanroy
Envoyé : vendredi 3 mars 2017 12:19
À : 'BaseX' 
<basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>>
Objet : [basex-talk] [XPTY0004] Item expected, sequence found

Hi BaseX peeps

I'm running BaseX 8.6.1 on Windows, and I'm running into an issue I can't 
explain. I am using BaseX to run some benchmarks. As a benchmark I am running 
different XPath structures to match on the same database. In other words, in a 
file of a hundred lines each line is an XPath structure, and with Perl I query 
each line and measure the time it takes. Simple enough, right?

The strange thing is that my script runs fine but it crashes on line 17, which 
is this XPath code:

//node[@cat="pp" and node[@rel="hd" and @pt="vz" and number(@begin) < 
number(../node[@rel="obj1" and @cat="np"]/node[@rel="mod" and 
@cat="pp"]/node[@rel="hd" and @pt="vz"]/@begin)] and node[@rel="obj1" and 
@cat="np" and node[@rel="mod" and @cat="pp" and node[@rel="hd" and @pt="vz" and 
number(@begin) < number(../node[@rel="obj1" and @cat="np"]/node[@rel="mod" and 
@cat="pp"]/node[@rel="hd" and @pt="vz"]/@begin)] and node[@rel="obj1" and 
@cat="np" and node[@rel="mod" and @cat="pp" and node[@rel="hd" and 
@pt="vz"]]

Admittedly, it's quite a long string, but I don't see anything wrong with it 
and I don't think the error lies in the XPath. The full trace is

Re: [basex-talk] [XPTY0004] Item expected, sequence found

2017-03-03 Thread Fabrice ETANCHAUD

Hi Bram

>From what one can read there :

http://docs.basex.org/wiki/XQuery_Errors

It seems to me that number() function could be called with a sequence of @begin.
There might be a place in your query where multiple node elements are returned, 
leading to a node/@begin sequence.

Best regards,
Fabrice


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Bram Vanroy
Envoyé : vendredi 3 mars 2017 12:19
À : 'BaseX' 
Objet : [basex-talk] [XPTY0004] Item expected, sequence found

Hi BaseX peeps

I'm running BaseX 8.6.1 on Windows, and I'm running into an issue I can't 
explain. I am using BaseX to run some benchmarks. As a benchmark I am running 
different XPath structures to match on the same database. In other words, in a 
file of a hundred lines each line is an XPath structure, and with Perl I query 
each line and measure the time it takes. Simple enough, right?

The strange thing is that my script runs fine but it crashes on line 17, which 
is this XPath code:

//node[@cat="pp" and node[@rel="hd" and @pt="vz" and number(@begin) < 
number(../node[@rel="obj1" and @cat="np"]/node[@rel="mod" and 
@cat="pp"]/node[@rel="hd" and @pt="vz"]/@begin)] and node[@rel="obj1" and 
@cat="np" and node[@rel="mod" and @cat="pp" and node[@rel="hd" and @pt="vz" and 
number(@begin) < number(../node[@rel="obj1" and @cat="np"]/node[@rel="mod" and 
@cat="pp"]/node[@rel="hd" and @pt="vz"]/@begin)] and node[@rel="obj1" and 
@cat="np" and node[@rel="mod" and @cat="pp" and node[@rel="hd" and 
@pt="vz"]]

Admittedly, it's quite a long string, but I don't see anything wrong with it 
and I don't think the error lies in the XPath. The full trace is like follows:

[XPTY0004] Item expected, sequence found: (attribute begin {"6"}, ...). at 
C:\xampp\htdocs\grinding\BaseXClient.pm line 213.
at C:\xampp\htdocs\grinding\BaseXClient.pm line 213.
Query::exc("\x{5}", 133) called at 
C:\xampp\htdocs\grinding\BaseXClient.pm line 177
Query::execute(Query=HASH(0x3adac40)) called at SonarBenchNew.pl line 
249
main::query_sonar("for \$node in 
db:open(\"WRPEE000treebank\")/treebank//node[\@"...) called at 
SonarBenchNew.pl line 176
main::loop_databases(ARRAY(0x31a2ae0), "//node[\@cat=\"pp\" and 
node[\@rel=\"hd\" and \@pt=\"vz\" and number(\@"...) called at SonarBenchNew.pl 
line 140
main::regular_sonar("//node[\@cat=\"pp\" and node[\@rel=\"hd\" and 
\@pt=\"vz\" and number(\@"...) called at SonarBenchNew.pl line 110

I have found that the XPTY0004 error is often caused by an order-by clause in 
XQuery but I don't have that. The full XQuery for the XPath is as follows:

for $node in db:open("WRPEE000treebank")/treebank//node[@cat="pp" and 
node[@rel="hd" and @pt="vz" and number(@begin) < number(../node[@rel="obj1" and 
@cat="np"]/node[@rel="mod" and @cat="pp"]/node[@rel="hd" and @pt="vz"]/@begin)] 
and node[@rel="obj1" and @cat="np" and node[@rel="mod" and @cat="pp" and 
node[@rel="hd" and @pt="vz" and number(@begin) < number(../node[@rel="obj1" and 
@cat="np"]/node[@rel="mod" and @cat="pp"]/node[@rel="hd" and @pt="vz"]/@begin)] 
and node[@rel="obj1" and @cat="np" and node[@rel="mod" and @cat="pp" and 
node[@rel="hd" and @pt="vz"]]let $sentid := 
($node/ancestor::alpino_ds/@id)let $sentence := 
($node/ancestor::alpino_ds/sentence)let $tb := \'WRPEE000treebank\' return 
{data($sentid)}||{data($sentence)}||{data($tb)}

I don't know if you guys can give me directions solely based on this. But if 
you could point me to a likely cause, e.g. input OR database contents, that 
would be great.


Thanks in advance!

Bram Vanroy
https://bramvanroy.be

[basex-talk] [REST-XQ] cookies and path cookie

2017-03-02 Thread Fabrice ETANCHAUD

Dear all (again ;))

As BaseX as a Web Application can serve several 'sub' applications,
I ran into the following issue :


1-  Logged in into the DBA interface

2-  Session:close() in another rest-xq end point (in the same browser)

3-  I got disconnected from the DBA interface

It seems normal behavior because cookie is shared among all basex applications 
in the same Web Application (path cookie is the same for all sub applications).
I cannot figure out how we could easily have private cookies for each 
'sub-application' ?

Best regards,
Fabrice Etanchaud

[basex-talk] [DBA] jobs-users/server-sessions

2017-03-02 Thread Fabrice ETANCHAUD

Dear all @BaseX,

Could you please explain the exact meaning of the server sessions table in the 
dba jobs-users window ?
I expected to find a list of (servlet) sessions, but instead one can see all 
the session variables (and among them the session ids).
Is it expected behavior ?
In that case what does it mean to 'kill' a session variable ?

Thank you for showing session ids in the log !

Best regards,
Fabrice Etanchaud
Groupe FBO

Re: [basex-talk] querying xml with namespaces.

2017-03-01 Thread Fabrice ETANCHAUD

You are right,

BaseX and XQuery are awesome !

And one day you should take a look at REST-XQ and XForms (like Orbeon) to 
embrace the whole picture...

Best regards,
Fabrice

-Message d'origine-
De : Witold E Wolski [mailto:wewol...@gmail.com] 
Envoyé : mercredi 1 mars 2017 10:58
À : Fabrice ETANCHAUD <fetanch...@groupefbo.com>
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] querying xml with namespaces.

Awesome. Both work! Thank you.

On 1 March 2017 at 10:30, Fabrice ETANCHAUD <fetanch...@groupefbo.com> wrote:
> Hi, from what I remember,
>
> You can get rid of namespace wildcards if you just declare a default element 
> namespace for your query :
>
> declare default element namespace 
> 'http://psidev.info/psi/pi/mzIdentML/1.1';
>
> db:open("myrimatchSubset","20160312_17_B3_myrimatch_2_2_140.xml")/MzId
> entML/SequenceCollection/Peptide
>
> Another way is to declare explicitly your namespace (here named mz but you 
> could choose any other name) :
>
> declare namespace mz = 'http://psidev.info/psi/pi/mzIdentML/1.1';
>
> db:open("myrimatchSubset","20160312_17_B3_myrimatch_2_2_140.xml")/mz:M
> zIdentML/mz:SequenceCollection/mz:Peptide
>
> Best regards,
> Fabrice Etanchaud
>
> -Message d'origine-
> De : basex-talk-boun...@mailman.uni-konstanz.de 
> [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de 
> Witold E Wolski Envoyé : mercredi 1 mars 2017 10:18 À : 
> basex-talk@mailman.uni-konstanz.de
> Objet : [basex-talk] querying xml with namespaces.
>
> Apologies if I am hard to understand but I am new to the XML querying.
> I would like to query an XML with namespaces.
>
> As I did learn, you can do it in BaseX by including an *: in your 
> path, e.g.,
>
> db:open("myrimatchSubset","20160312_17_B3_myrimatch_2_2_140.xml")/*:Mz
> IdentML/*:SequenceCollection/*:Peptide
>
>
> But I would like to be more specific than *. I did search the schema for the 
> XML and all  I found related to the search term namespace is not brief name, 
> but:
> targetNamespace="http://psidev.info/psi/pi/mzIdentML/1.1;
>
>  I did also found the following entry on stack overflow:
> http://stackoverflow.com/questions/5239685/xml-namespace-breaking-my-x
> path
>
> And If I interpret the answer there correctly somehow one can register his 
> own acronym for a namespace?
> Is this correct? So I am wondering if I could do the same somehow in BaseX?
>
> Alternatively, I am wondering if I could tell BaseX to disable namespace 
> checking?
>
> Thank you a lot
>
> --
> Witold Eryk Wolski



--
Witold Eryk Wolski

Re: [basex-talk] querying xml with namespaces.

2017-03-01 Thread Fabrice ETANCHAUD

Hi, from what I remember,

You can get rid of namespace wildcards if you just declare a default element 
namespace for your query :

declare default element namespace 'http://psidev.info/psi/pi/mzIdentML/1.1';

db:open("myrimatchSubset","20160312_17_B3_myrimatch_2_2_140.xml")/MzIdentML/SequenceCollection/Peptide

Another way is to declare explicitly your namespace (here named mz but you 
could choose any other name) :

declare namespace mz = 'http://psidev.info/psi/pi/mzIdentML/1.1';

db:open("myrimatchSubset","20160312_17_B3_myrimatch_2_2_140.xml")/mz:MzIdentML/mz:SequenceCollection/mz:Peptide

Best regards,
Fabrice Etanchaud

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Witold E 
Wolski
Envoyé : mercredi 1 mars 2017 10:18
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] querying xml with namespaces.

Apologies if I am hard to understand but I am new to the XML querying.
I would like to query an XML with namespaces.

As I did learn, you can do it in BaseX by including an *: in your path, e.g.,

db:open("myrimatchSubset","20160312_17_B3_myrimatch_2_2_140.xml")/*:MzIdentML/*:SequenceCollection/*:Peptide


But I would like to be more specific than *. I did search the schema for the 
XML and all  I found related to the search term namespace is not brief name, 
but:
targetNamespace="http://psidev.info/psi/pi/mzIdentML/1.1;

 I did also found the following entry on stack overflow:
http://stackoverflow.com/questions/5239685/xml-namespace-breaking-my-xpath

And If I interpret the answer there correctly somehow one can register his own 
acronym for a namespace?
Is this correct? So I am wondering if I could do the same somehow in BaseX?

Alternatively, I am wondering if I could tell BaseX to disable namespace 
checking?

Thank you a lot

--
Witold Eryk Wolski

Re: [basex-talk] DFDL module and XSLT module

2017-02-13 Thread Fabrice ETANCHAUD

Dear Kristian,

Did you have a look here ?

https://github.com/BaseXdb/basex/tree/master/basex-core/src/main/java/org/basex/query/func/xslt


Best regards,
Fabrice Etanchaud
Groupe FBO


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Kristian 
Kankainen
Envoyé : lundi 13 février 2017 11:30
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] DFDL module and XSLT module

Hello all.

I am making a simple wrapper java module for Daffodil so that I can run DFDL 
parsing and unparsing from XQuery. I got communication working between BaseX 
and my module allready, but I can only send and return Strings.

Because the module is so similar to the XSLT module of BaseX, I wonder if the 
java source code is available. I can't find it in the GitHub repository.

Thanks
Kristian K

Re: [basex-talk] RestXQ returning JSON

2017-01-23 Thread Fabrice ETANCHAUD

Hi Marco,

What about using a root element like :


...


Best regards,
Fabrice ETANCHAUD


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Marco Lettere
Envoyé : lundi 23 janvier 2017 09:32
À : BaseX <basex-talk@mailman.uni-konstanz.de>
Objet : [basex-talk] RestXQ returning JSON

Hello all,

this morning just stumbled into this issue and I'm wondering whether it could 
be somehow smoothed up ...

I have the following RestXQ function:

declare
   %rest:path("/f")
   %rest:GET
   %output:method("json")
function a:f() {
   (:code ...:)
};

1) Replacing comment with the following code correctly returns an empty
object:

   let $out := ()
   return $out

2) Replacing comment with the following code correctly returns an expected 
exception (" [SERE0023] Only one item can be serialized with JSON." :

let $out := (, )
   return $out

3) Replacing comment with the following code returns a somewhat less expected 
Null-pointer exception :

let $out := (, ) return 
array{$out}

4) In order to get the expected Json array I need to write the code like the 
following:

  let $out := (, )
  return array{ $out ! json:parse(json:serialize(.), map{ "format" : 
"map"}) }

I'm not sure whether the array could be inferred from the sequence directly in 
a pattern like 2. But I would expect that 3) should work. 
Isn't it?

Thanks for any explanation.

Regards,

Marco.

Re: [basex-talk] Performance and heavy load

2015-07-28 Thread Fabrice Etanchaud

An another idea :
If you never replace a file,
You may expect better performance setting up a REST-XQ function simply calling
db:add.
The documentation explicitly mentions that the REST PUT test for the existence
of the file, that is time consuming.

Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Fabrice
Etanchaud
Envoyé : mardi 28 juillet 2015 11:36
À : Maximilian Gärber; Martín Ferrari
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Performance and heavy load

Dear Martin,

Which version are you using ?

With 8.2.3,
I can put 10 000 simple xml files via the rest interface in 120 secs (with 10
parallel requests), Without any error message.

Maybe PARALLEL=1 could help you.

Are you sure you database is not meanwhile opened directly by another process
and not exclusively via the server ?

Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Maximilian
Gärber Envoyé : mardi 28 juillet 2015 09:34 À : Martín Ferrari Cc :
basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Performance and heavy load

Hi Martin,

how do you spread the log files? All into one db or do you create new dbs?

If you keep on adding all files to the same database, the add times will slow
down over time. Please keep in mind that you can query multiple databases at
once, so I would rather have more databases.

With 8.3 setting http://docs.basex.org/wiki/Options#CACHERESTXQ should help.

Finally, for storing very large number of log files I'd consider using a Job
Queue for throttling or switching to append-only capable data stores like
couchDB or redis.

Regards,

Max

2015-07-28 3:34 GMT+02:00 Martín Ferrari ferrari_mar...@hotmail.com:
Hi guys,
I'm quite new to BaseX. I've read a bit already, but perhaps you
can help so I can investigate further. We are having a performance
problem with our BaseX server. We're running it on a VM, and hitting
it from around 5 web servers.

Under no stress, I get this timing from the log for a 1191 bytes file.

00:01:23.526ww.aa.yy.xx:56312 admin REQUEST [PUT]
http://basex.xx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_Response_20150728000116.xml
00:01:24.967ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in
1401.17 ms. 1441.24 ms

A call to /rest takes about 4-5 ms (it's called around once each 2
seconds, though it's not needed):

00:01:23.520ww.aa.yy.zz:56312 admin REQUEST [GET]
http://basex.:8984/rest
00:01:23.524ww.aa.yy.xx:56312 admin 200 4.67 ms

Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?

And also, when the load starts to get heavier, from 7 to 12 files
per second, BaseX server quickly starts to get slower, then taking
minutes to respond, until finally it starts giving errors about the
database being currently opened by another process, and too many open
files. Many connections remain in the CLOSE_WAIT state, and the server
is no longer usable.

Is it reasonable to expect to [PUT] more than 10 files per second,
some of them taking more than 10kb? We're using it for logging, so
that's a lot of xml files. If it's reasonable to use it that way, I'll
dig more into optimizing it. Is anyone using it in a similar way?

Thanks,
Martín.

Re: [basex-talk] Performance and heavy load

2015-07-28 Thread Fabrice Etanchaud

Dear Martin,

Which version are you using ?

With 8.2.3,
I can put 10 000 simple xml files via the rest interface in 120 secs (with 10
parallel requests),
Without any error message.

Maybe PARALLEL=1 could help you.

Are you sure you database is not meanwhile opened directly by another process
and not exclusively via the server ?

Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Maximilian
Gärber
Envoyé : mardi 28 juillet 2015 09:34
À : Martín Ferrari
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Performance and heavy load

Hi Martin,

how do you spread the log files? All into one db or do you create new dbs?

If you keep on adding all files to the same database, the add times will slow
down over time. Please keep in mind that you can query multiple databases at
once, so I would rather have more databases.

With 8.3 setting http://docs.basex.org/wiki/Options#CACHERESTXQ should help.

Finally, for storing very large number of log files I'd consider using a Job
Queue for throttling or switching to append-only capable data stores like
couchDB or redis.

Regards,

Max

Under no stress, I get this timing from the log for a 1191 bytes file.

A call to /rest takes about 4-5 ms (it's called around once each 2
seconds, though it's not needed):

00:01:23.520ww.aa.yy.zz:56312 admin REQUEST [GET]
http://basex.:8984/rest
00:01:23.524ww.aa.yy.xx:56312 admin 200 4.67 ms

Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?

Thanks,
Martín.

Re: [basex-talk] Optimization of a slow query with `//`

2015-06-12 Thread Fabrice Etanchaud

Gioele, did you check in the execution plan that you query does use an index ?

One way to force the use of the text index could be to start your query with :
db:text('your-collection-name', 'arci')/parent::tei:orth/ and so on.

Regards,

-Message d'origine-
De : Fabrice Etanchaud 
Envoyé : vendredi 12 juin 2015 11:13
À : basex-talk@mailman.uni-konstanz.de
Objet : RE: [basex-talk] Optimization of a slow query with `//`

Hello Gioele,

I have a souvenir that the use of namespaces was slowing down (or maybe 
invalidating) the structure index.
Someone @BaseX will certainly correct me if I am wrong, but if your data is 
single namespaced, what about reloading data with the skip namespaces option 
enabled and test if performance improves ?

Another solution could be to create an index collection, where key would be 
your search terms, and values the node-pre or node-id of your (sub-)documents.

Best regards,
Fabrice


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Gioele 
Barabucci Envoyé : vendredi 12 juin 2015 10:42 À : 
basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Optimization of a slow query with `//`

Hello,

I am working on an application that retrieves its data from a TEI XML file via 
BaseX. The following query lies at the core of this application but is too slow 
to be used in production: on a modern PC it requires about 600 ms to run over a 
4MB file (1/10 of the complete dataset). Any suggestion on how to improve its 
performance (without changing the underlying TEI files) would be much 
appreciated.

Here is the query:

 declare namespace tei='http://www.tei-c.org/ns/1.0';

 /tei:TEI/tei:text/tei:body//
   *[self::tei:entry or self::tei:re]
   [./tei:form/tei:orth[. = arci]
 [ancestor-or-self::*
   [@xml:lang][1]
   [(starts-with(@xml:lang, san))]
 ]
   ]

In human terms is should return all the `tei:entry` or `tei:re` that

* have the word arci in their `/tei:form/tei:orth` element,
* their nearest `xml:lang` attribute starts with 'san'.

I made some tests and it turned out that the main culprit is the use of `//` in 
the first line. (_Main_ culprit, not the only one...)

I use the `//` axis because I do not know what is the structure of the 
underlying TEI file. I expect BaseX to keep track of all the `tei:entry` and 
`tei:re` elements and their parents, so selecting the correct ones should be 
quite fast anyway. But the measurements disagree with my assumptions...

What could I do to improve the performance of this query?


Now, some remarks based on some small tests I have done:

1. Removing the

 [ancestor-or-self::*[]]

predicate slashes the run time in half, but the query is still way too slow.

2. Changing

 ./tei:form/tei:orth[. = arci]

to

 ./tei:form[1]/tei:orth[1][. = arci]

makes the query even slower.

3. changing `starts-with(@xml:lang, san)` to `@xml:lang = 'san-xxx'` has a 
negligible effect.

4. Dropping the `[1]` from

 [@xml:lang][1]

makes the whole query twice as fast.

Regards,

--
Gioele Barabucci gio...@svario.it

Re: [basex-talk] Optimization of a slow query with `//`

2015-06-12 Thread Fabrice Etanchaud

Hello Gioele,

I have a souvenir that the use of namespaces was slowing down (or maybe 
invalidating) the structure index.
Someone @BaseX will certainly correct me if I am wrong,
but if your data is single namespaced, what about reloading data with the skip 
namespaces option enabled and test if performance improves ?

Another solution could be to create an index collection, where key would be 
your search terms, and values the node-pre or node-id of your (sub-)documents.

Best regards,
Fabrice


-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Gioele 
Barabucci
Envoyé : vendredi 12 juin 2015 10:42
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Optimization of a slow query with `//`

Hello,

I am working on an application that retrieves its data from a TEI XML file via 
BaseX. The following query lies at the core of this application but is too slow 
to be used in production: on a modern PC it requires about 600 ms to run over a 
4MB file (1/10 of the complete dataset). Any suggestion on how to improve its 
performance (without changing the underlying TEI files) would be much 
appreciated.

Here is the query:

 declare namespace tei='http://www.tei-c.org/ns/1.0';

 /tei:TEI/tei:text/tei:body//
   *[self::tei:entry or self::tei:re]
   [./tei:form/tei:orth[. = arci]
 [ancestor-or-self::*
   [@xml:lang][1]
   [(starts-with(@xml:lang, san))]
 ]
   ]

In human terms is should return all the `tei:entry` or `tei:re` that

* have the word arci in their `/tei:form/tei:orth` element,
* their nearest `xml:lang` attribute starts with 'san'.

I made some tests and it turned out that the main culprit is the use of `//` in 
the first line. (_Main_ culprit, not the only one...)

I use the `//` axis because I do not know what is the structure of the 
underlying TEI file. I expect BaseX to keep track of all the `tei:entry` and 
`tei:re` elements and their parents, so selecting the correct ones should be 
quite fast anyway. But the measurements disagree with my assumptions...

What could I do to improve the performance of this query?


Now, some remarks based on some small tests I have done:

1. Removing the

 [ancestor-or-self::*[]]

predicate slashes the run time in half, but the query is still way too slow.

2. Changing

 ./tei:form/tei:orth[. = arci]

to

 ./tei:form[1]/tei:orth[1][. = arci]

makes the query even slower.

3. changing `starts-with(@xml:lang, san)` to `@xml:lang = 'san-xxx'` has a 
negligible effect.

4. Dropping the `[1]` from

 [@xml:lang][1]

makes the whole query twice as fast.

Regards,

--
Gioele Barabucci gio...@svario.it

Re: [basex-talk] Pulling files from multiple zips into one DB

2015-05-04 Thread Fabrice Etanchaud

Dear Constantine,

In my experience, commands are always faster than db:* calls.
Maybe someone @basex could confirm that, and that commands do not use the 
Pending Update List ?

Are you sure you disabled ADDRAW ?
If there are many raw files along the xml files, you may have better results 
extracting and rearchiving only xml before.
I have the same problem with patent archives, where each xml file may come with 
many pdf and gif.

Best regards,
Fabrice

De : Hondros, Constantine (ELS-AMS) [mailto:c.hond...@elsevier.com]
Envoyé : lundi 4 mai 2015 14:01
À : Fabrice Etanchaud
Objet : RE: Pulling files from multiple zips into one DB

Is that going to be any faster do you think? I tried it and it took a long 
time to read through the zips, so I am hoping there might be a faster more 
direct way of doing it.

From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 04 May 2015 13:56
To: Hondros, Constantine (ELS-AMS)
Subject: RE: Pulling files from multiple zips into one DB

Hello Constantine,

Why don't you simply create a new collection with ADDARCHIVES=true ?

Best regards,
Fabrice

De : 
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Hondros, 
Constantine (ELS-AMS)
Envoyé : lundi 4 mai 2015 13:50
À : 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Pulling files from multiple zips into one DB

Hello all,
I need to merge any XML files located in 500 GB of zips into a single DB for 
further analysis. Is there any faster or more efficient way to do it in BaseX 
than this? TIA.

for $zip in file:list($src, false(), '*.zip')
  let $arch := file:read-binary(concat($src, '\', $zip))
  for $a in archive:entries($arch)[ends-with(., 'xml')]
  return db:add('my_db', archive:extract-text($arch, $a), $a)


TIA,
Constantine




Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.

Re: [basex-talk] Pulling files from multiple zips into one DB

2015-05-04 Thread Fabrice Etanchaud

If your archives contain a mix of raw and xml files,
Have a look at the old zip module, that may avoid reading the entire archive.

Best regards,
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Fabrice 
Etanchaud
Envoyé : lundi 4 mai 2015 14:12
À : Hondros, Constantine (ELS-AMS); basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Pulling files from multiple zips into one DB

Dear Constantine,

In my experience, commands are always faster than db:* calls.
Maybe someone @basex could confirm that, and that commands do not use the 
Pending Update List ?

Are you sure you disabled ADDRAW ?
If there are many raw files along the xml files, you may have better results 
extracting and rearchiving only xml before.
I have the same problem with patent archives, where each xml file may come with 
many pdf and gif.

Best regards,
Fabrice

De : Hondros, Constantine (ELS-AMS) [mailto:c.hond...@elsevier.com]
Envoyé : lundi 4 mai 2015 14:01
À : Fabrice Etanchaud
Objet : RE: Pulling files from multiple zips into one DB

Is that going to be any faster do you think? I tried it and it took a long 
time to read through the zips, so I am hoping there might be a faster more 
direct way of doing it.

From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 04 May 2015 13:56
To: Hondros, Constantine (ELS-AMS)
Subject: RE: Pulling files from multiple zips into one DB

Hello Constantine,

Why don't you simply create a new collection with ADDARCHIVES=true ?

Best regards,
Fabrice

De : 
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Hondros, 
Constantine (ELS-AMS)
Envoyé : lundi 4 mai 2015 13:50
À : 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Pulling files from multiple zips into one DB

Hello all,
I need to merge any XML files located in 500 GB of zips into a single DB for 
further analysis. Is there any faster or more efficient way to do it in BaseX 
than this? TIA.

for $zip in file:list($src, false(), '*.zip')
  let $arch := file:read-binary(concat($src, '\', $zip))
  for $a in archive:entries($arch)[ends-with(., 'xml')]
  return db:add('my_db', archive:extract-text($arch, $a), $a)


TIA,
Constantine




Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.

Re: [basex-talk] Pulling files from multiple zips into one DB

2015-05-04 Thread Fabrice Etanchaud

Constantine,

I guess it's because commands do not have to maintain  a pending update list,
And can insert data directly in the collection.

A mixed approach could be to use your *.zip iteration to build a command file 
that run one at a time the piece of xquery adding a single archive, in order to 
shorten the PUL.

And you are speaking of more than 500 Gb of data !

Best regards,
Fabrice

De : Hondros, Constantine (ELS-AMS) [mailto:c.hond...@elsevier.com]
Envoyé : lundi 4 mai 2015 14:36
À : Fabrice Etanchaud; basex-talk@mailman.uni-konstanz.de
Objet : RE: Pulling files from multiple zips into one DB

Hi Fabrice,

Indeed my archives contain massive amounts of PDF. However I did a quick 
benchmark and the GUI, using standard options (parse archives, don't add raw 
files) is over 10 (!) times faster to create a DB than my code sample below. 
Not sure why that would be the case.

C.

From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 04 May 2015 14:19
To: Hondros, Constantine (ELS-AMS); 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Subject: RE: Pulling files from multiple zips into one DB

If your archives contain a mix of raw and xml files,
Have a look at the old zip module, that may avoid reading the entire archive.

Best regards,
Fabrice

De : 
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Fabrice 
Etanchaud
Envoyé : lundi 4 mai 2015 14:12
À : Hondros, Constantine (ELS-AMS); 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Pulling files from multiple zips into one DB

Dear Constantine,

In my experience, commands are always faster than db:* calls.
Maybe someone @basex could confirm that, and that commands do not use the 
Pending Update List ?

Are you sure you disabled ADDRAW ?
If there are many raw files along the xml files, you may have better results 
extracting and rearchiving only xml before.
I have the same problem with patent archives, where each xml file may come with 
many pdf and gif.

Best regards,
Fabrice

De : Hondros, Constantine (ELS-AMS) [mailto:c.hond...@elsevier.com]
Envoyé : lundi 4 mai 2015 14:01
À : Fabrice Etanchaud
Objet : RE: Pulling files from multiple zips into one DB

Is that going to be any faster do you think? I tried it and it took a long 
time to read through the zips, so I am hoping there might be a faster more 
direct way of doing it.

From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 04 May 2015 13:56
To: Hondros, Constantine (ELS-AMS)
Subject: RE: Pulling files from multiple zips into one DB

Hello Constantine,

Why don't you simply create a new collection with ADDARCHIVES=true ?

Best regards,
Fabrice

De : 
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Hondros, 
Constantine (ELS-AMS)
Envoyé : lundi 4 mai 2015 13:50
À : 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Pulling files from multiple zips into one DB

Hello all,
I need to merge any XML files located in 500 GB of zips into a single DB for 
further analysis. Is there any faster or more efficient way to do it in BaseX 
than this? TIA.

for $zip in file:list($src, false(), '*.zip')
  let $arch := file:read-binary(concat($src, '\', $zip))
  for $a in archive:entries($arch)[ends-with(., 'xml')]
  return db:add('my_db', archive:extract-text($arch, $a), $a)


TIA,
Constantine




Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.

Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Fabrice Etanchaud

Dear Goetz,

I have the same requirement (patent documents containing text in different 
languages).
I ended up splitting/filtering each original document in localized parts 
inserted in different collections (each collection having its own full text 
index configuration).
BaseX is as flexible as our data !

Best regards,


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Goetz Heller
Envoyé : mercredi 22 avril 2015 10:50
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] multi-language full-text indexing

I'm working with documents destined to be consumed anywhere in the European 
Community. Many of them have the same tags multiple times but with a different 
language attribute. It does not make sense to create a full-text index for the 
whole of these documents therefore. It is desirable to have documents indexed 
by locale-specific parts, e.g.

CREATE FULL-TEXT INDEX ON DATABASE XY STARTING WITH (
(path_a)/LOCALIZED_PART_A[@LANG=$lang],
(path_b)/LOCALIZED_PART_B[@LG=$lang],...
) FOR LANGUAGE $lang IN (
BG,
DN,
DE WITH STOPWORDS filepath_de WITH STEM = YES,
EN WITH STOPWORDS filepath_en,
FR, ...
)  [USING language_code_map]
and then to write full-text retrieval queries with a clause such as 'FOR 
LANGUAGE BG', for example. The index parts would be much smaller and full-text 
retrieval therefore much faster. The language codes would be mapped somehow to 
standard values recognized by BaseX in the language_code_map file.
Are there any efforts towards such a feature?

Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Fabrice Etanchaud

Great, Goetz !

A last thing :
If you need to rebuild the original document from parts, be sure to have a way
to retrieve them all (by document path, attribute index, or separate index
collection with node-id/pre values).

If disk space is not an issue, you could store the original document as it is,
and create localized collection for full text indexing purposes.

Hoping it helps,

Best regards,
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Goetz Heller
Envoyé : mercredi 22 avril 2015 11:20
À : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] multi-language full-text indexing

Fabrice,
For the time being, this sounds quite nice. I'd to split up the files in some
common part and a set of satellites, one satellite for each language present
in the document.

Thanks!

Kind regards,

Goetz

Von: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Gesendet: Mittwoch, 22. April 2015 11:04
An: Goetz Heller;
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Betreff: RE: [basex-talk] multi-language full-text indexing

Dear Goetz,

I have the same requirement (patent documents containing text in different
languages).
I ended up splitting/filtering each original document in localized parts
inserted in different collections (each collection having its own full text
index configuration).
BaseX is as flexible as our data !

Best regards,

De :
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Goetz Heller
Envoyé : mercredi 22 avril 2015 10:50
À :
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] multi-language full-text indexing

I'm working with documents destined to be consumed anywhere in the European
Community. Many of them have the same tags multiple times but with a different
language attribute. It does not make sense to create a full-text index for the
whole of these documents therefore. It is desirable to have documents indexed
by locale-specific parts, e.g.

CREATE FULL-TEXT INDEX ON DATABASE XY STARTING WITH (
(path_a)/LOCALIZED_PART_A[@LANG=$lang],
(path_b)/LOCALIZED_PART_B[@LG=$lang],...
) FOR LANGUAGE $lang IN (
BG,
DN,
DE WITH STOPWORDS filepath_de WITH STEM = YES,
EN WITH STOPWORDS filepath_en,
FR, ...
) [USING language_code_map]
and then to write full-text retrieval queries with a clause such as 'FOR
LANGUAGE BG', for example. The index parts would be much smaller and full-text
retrieval therefore much faster. The language codes would be mapped somehow to
standard values recognized by BaseX in the language_code_map file.
Are there any efforts towards such a feature?

Re: [basex-talk] Path Summary order

2015-04-02 Thread Fabrice Etanchaud

Hi Cecil,

Maybe the following code could help you :

declare function local:ordered-facets($facets) {
  element { $facets/name() } {
$facets/@name,
for $node in $facets/(attribute|element)
order by $node/name(), $node/@name
return
  local:ordered-facets($node)
  }
};

local:ordered-facets(index:facets('my collection')/document-node)


But be careful because in XML, elements' order is meaningful.

Everybody would be lost without basex, not only newbies !

Best regards,
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Cecil Tarasoff
Envoyé : jeudi 2 avril 2015 07:17
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Path Summary order

Hi BaseX Gurus,

Thanks for such a powerful reporting tool.   I am still a newbie to XML and I 
would be lost without BaseX.  Thank-you!

I was wondering whether there is any way to have the path summary output it's 
result in alphabetical sequence of attributes and then elements?  I export the 
path summary from different versions of the XML and use BeyondCompare to reveal 
the differences but the inconsistent ordering of the elements and attributes 
makes this a bit of a challenge.

It would be really handy for me if there were an option available where I could 
elect to output the Path Summary with the attributes sorted alphabetically 
followed by the elements also in alphabetical order (within their parent).   If 
you could consider this for a future enhancement I would be forever grateful.

Of course, if there is an easier way for me to compare XML structure, I would 
love to know the tricks!

Thanks again for your dedication to this os product.

Cecil Tarasoff

Re: [basex-talk] How to associate catalog.xml with validate function

2015-01-26 Thread Fabrice Etanchaud

Hello Marc,

Did you already take a look at the CATFILE property ?

http://docs.basex.org/wiki/Catalog_Resolver

Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Marc van 
Grootel
Envoyé : lundi 26 janvier 2015 13:56
À : BaseX
Objet : Re: [basex-talk] How to associate catalog.xml with validate function

Sorry, for the no subject, lost it somewhere 


On Mon, Jan 26, 2015 at 1:53 PM, Marc van Grootel marc.van.groo...@gmail.com 
wrote:
 Hi,

 I'm trying to run validations on XML files that use DTD doctype 
 declarations (DITA), I have a catalog.xml that contains the correct 
 references for the entity resolver. DITA schemas refer to other 
 entities but I have no control over where it loads them from so I get 
 Failed to read schema document 
 'urn:oasis:names:tc:dita:xsd:commonElementGrp.xsd:1.2', because 1) 
 could not find the document; 

 btw I also have Saxon present.

 I have no clue what's the easiest way of using this catalog from 
 XQuery code. In Ant I always used xmlcatalog element as part of the 
 xmlvalidate task.

 Or, maybe I could set some system property that forces the use of a 
 specific catalog (I hope to avoid using one fixed catalog though).

 Ideas?


 --Marc



--
--Marc

Re: [basex-talk] Pruned, optimized DB shows same document count, db size

2015-01-12 Thread Fabrice Etanchaud

Dear Constantine,

From what I know, optimize only rebuilt indexes and metadata.
Did you try optimize all to compact db files ?

But documents' number should be updated.
How do you get metadata information ?

Best regards,

Fabrice Etanchaud
Questel/Orbit

http://docs.basex.org/wiki/Database_Module#db:optimize


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Hondros, 
Constantine (ELS-AMS)
Envoyé : lundi 12 janvier 2015 14:29
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Pruned, optimized DB shows same document count, db size

Hello all,

I've pruned about half the documents of a +-5GB database using XQuery delete. I 
then optimised the database using db:optimize.

The database metadata still shows the original number of documents, and the 
overall db filesize remains roughly the same.

I was sort of hoping this pruning would improve performance, but if there's any 
difference it's negligable. Am I missing something obvious?

Thanks in advance,

Constantine Hondros





Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.

Re: [basex-talk] Pruned, optimized DB shows same document count, db size

2015-01-12 Thread Fabrice Etanchaud

Constantine,

It seems Questel and Elsevier are competitors...
But using the same great piece of software !

Best regards,
Fabrice


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Hondros, 
Constantine (ELS-AMS)
Envoyé : lundi 12 janvier 2015 17:10
À : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Pruned, optimized DB shows same document count, db size

Hi Fabrice,

I used the Xquery delete node command, rather than the (document-oriented) 
db:delete.

I guess that explains the difference.

C.

From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 12 January 2015 17:01
To: Hondros, Constantine (ELS-AMS); 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Subject: RE: [basex-talk] Pruned, optimized DB shows same document count, db 
size

Constantine,

Did you mean db :delete or delete nodes ?

I'm using 8.0 769b53b and after db:delete, even without optimize, I get correct 
documents' count in the GUI window.

Best,
Fabrice

De : 
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Hondros, 
Constantine (ELS-AMS)
Envoyé : lundi 12 janvier 2015 16:32
À : 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Pruned, optimized DB shows same document count, db size

Aha, of course, the all boolean. I did not try that, and will.

DB metadata I get via the GUI menu : Database - Open and Manage

Cheers,
C.

From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 12 January 2015 16:14
To: Hondros, Constantine (ELS-AMS); 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Subject: RE: Pruned, optimized DB shows same document count, db size

Dear Constantine,

From what I know, optimize only rebuilt indexes and metadata.
Did you try optimize all to compact db files ?

But documents' number should be updated.
How do you get metadata information ?

Best regards,

Fabrice Etanchaud
Questel/Orbit

http://docs.basex.org/wiki/Database_Module#db:optimize


De : 
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Hondros, 
Constantine (ELS-AMS)
Envoyé : lundi 12 janvier 2015 14:29
À : 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Pruned, optimized DB shows same document count, db size

Hello all,

I've pruned about half the documents of a +-5GB database using XQuery delete. I 
then optimised the database using db:optimize.

The database metadata still shows the original number of documents, and the 
overall db filesize remains roughly the same.

I was sort of hoping this pruning would improve performance, but if there's any 
difference it's negligable. Am I missing something obvious?

Thanks in advance,

Constantine Hondros





Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.

Re: [basex-talk] Pruned, optimized DB shows same document count, db size

2015-01-12 Thread Fabrice Etanchaud

Constantine,

Did you mean db :delete or delete nodes ?

I'm using 8.0 769b53b and after db:delete, even without optimize, I get correct 
documents' count in the GUI window.

Best,
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Hondros, 
Constantine (ELS-AMS)
Envoyé : lundi 12 janvier 2015 16:32
À : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Pruned, optimized DB shows same document count, db size

Aha, of course, the all boolean. I did not try that, and will.

DB metadata I get via the GUI menu : Database - Open and Manage

Cheers,
C.

From: Fabrice Etanchaud [mailto:fetanch...@questel.com]
Sent: 12 January 2015 16:14
To: Hondros, Constantine (ELS-AMS); 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Subject: RE: Pruned, optimized DB shows same document count, db size

Dear Constantine,

From what I know, optimize only rebuilt indexes and metadata.
Did you try optimize all to compact db files ?

But documents' number should be updated.
How do you get metadata information ?

Best regards,

Fabrice Etanchaud
Questel/Orbit

http://docs.basex.org/wiki/Database_Module#db:optimize


De : 
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Hondros, 
Constantine (ELS-AMS)
Envoyé : lundi 12 janvier 2015 14:29
À : 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Pruned, optimized DB shows same document count, db size

Hello all,

I've pruned about half the documents of a +-5GB database using XQuery delete. I 
then optimised the database using db:optimize.

The database metadata still shows the original number of documents, and the 
overall db filesize remains roughly the same.

I was sort of hoping this pruning would improve performance, but if there's any 
difference it's negligable. Am I missing something obvious?

Thanks in advance,

Constantine Hondros





Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.

Re: [basex-talk] RESTXQ - access to InitParameters of ServletContext and/or ServletConfig

2014-12-16 Thread Fabrice Etanchaud

Dear Christian,
Thank you !

I use RESTXQ to easily transform RDBMS data into XML,
And am looking for a way to retrieve configuration parameters like jdbc url, 
username and password, for example.

How would you initialize persisting variables like a jdbc connection ?

Thank you again for your fantastic job.

Bien à vous,
Fabrice


-Message d'origine-
De : Christian Grün [mailto:christian.gr...@gmail.com] 
Envoyé : lundi 15 décembre 2014 20:25
À : Fabrice Etanchaud
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] RESTXQ - access to InitParameters of ServletContext 
and/or ServletConfig

Hi Fabrice,

 I would like to get InitParameters of the ServletContext or 
 ServletConfig instances,

The servlet init parameters are currently accessed by the BaseX itself [1,2], 
but there is no way to retrieve them from RESTXQ.

Well, that's only partially true. One way to reach the ServletContext init 
parrameters is to:

• Write a Java class that extends QueryModule [3] • Access queryContext.http in 
that class (see e.g. [4]) • Call 
http.req.getServletContext().getInitParameterNames()

I am not sure, though, if that also works for the ServletConfig parameters 
(maybe you know?).

What do you want to do with that information? What parameters are you 
interested in?
Christian

[1] 
https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/java/org/basex/http/BaseXServlet.java#L38
[2] 
https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/java/org/basex/http/HTTPContext.java#L421
[3] http://docs.basex.org/wiki/Java_Bindings#Context-Awareness
[4] 
https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/java/org/basex/modules/Session.java#L142

Re: [basex-talk] RESTXQ - access to InitParameters of ServletContext and/or ServletConfig

2014-12-16 Thread Fabrice Etanchaud

Maybe it could help other BaseX users :

package questel.basex.modules;

import org.basex.http.HTTPContext;
import org.basex.query.*;

public class Parameter extends QueryModule {

  @Requires(Permission.NONE)
  @Deterministic
  @ContextDependent
  public String get(String key) {
 return ((HTTPContext) 
queryContext.http).req.getServletContext().getInitParameter(key);
  }  
  
}

Don't forget to add the Main-Class: questel.basex.modules.Parameter line in the 
jar's MANIFEST.
And finally install it in the repository : REPO INSTALL 'your jar filepath'

Best regards,
Fabrice

-Message d'origine-
De : Christian Grün [mailto:christian.gr...@gmail.com] 
Envoyé : mardi 16 décembre 2014 16:10
À : Fabrice Etanchaud
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] RESTXQ - access to InitParameters of ServletContext 
and/or ServletConfig

Hi Fabrice,

 How would you initialize persisting variables like a jdbc connection ?

You could simply use the doc(...) function in XQuery and access the document 
from your web application directory. One more way would be to store the data in 
a database instance, but this is probably too much of an overhead in your case.

J'espère avoir pu t'aider,
Christian



 Thank you again for your fantastic job.

 Bien à vous,
 Fabrice


 -Message d'origine-
 De : Christian Grün [mailto:christian.gr...@gmail.com]
 Envoyé : lundi 15 décembre 2014 20:25
 À : Fabrice Etanchaud
 Cc : basex-talk@mailman.uni-konstanz.de
 Objet : Re: [basex-talk] RESTXQ - access to InitParameters of 
 ServletContext and/or ServletConfig

 Hi Fabrice,

 I would like to get InitParameters of the ServletContext or 
 ServletConfig instances,

 The servlet init parameters are currently accessed by the BaseX itself [1,2], 
 but there is no way to retrieve them from RESTXQ.

 Well, that's only partially true. One way to reach the ServletContext init 
 parrameters is to:

 • Write a Java class that extends QueryModule [3] • Access 
 queryContext.http in that class (see e.g. [4]) • Call 
 http.req.getServletContext().getInitParameterNames()

 I am not sure, though, if that also works for the ServletConfig parameters 
 (maybe you know?).

 What do you want to do with that information? What parameters are you 
 interested in?
 Christian

 [1] 
 https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/java/o
 rg/basex/http/BaseXServlet.java#L38
 [2] 
 https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/java/o
 rg/basex/http/HTTPContext.java#L421
 [3] http://docs.basex.org/wiki/Java_Bindings#Context-Awareness
 [4] 
 https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/java/o
 rg/basex/modules/Session.java#L142

Re: [basex-talk] RESTXQ - access to InitParameters of ServletContext and/or ServletConfig

2014-12-16 Thread Fabrice Etanchaud

Merci Christian, I coded my first java module extending QueryModule, and 
installed it in the repository !

Sorry, I can't speak German, you know how French people are :-(

Best regards,
Fabrice

-Message d'origine-
De : Christian Grün [mailto:christian.gr...@gmail.com] 
Envoyé : mardi 16 décembre 2014 16:10
À : Fabrice Etanchaud
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] RESTXQ - access to InitParameters of ServletContext 
and/or ServletConfig

Hi Fabrice,

 How would you initialize persisting variables like a jdbc connection ?

You could simply use the doc(...) function in XQuery and access the document 
from your web application directory. One more way would be to store the data in 
a database instance, but this is probably too much of an overhead in your case.

J'espère avoir pu t'aider,
Christian



 Thank you again for your fantastic job.

 Bien à vous,
 Fabrice


 -Message d'origine-
 De : Christian Grün [mailto:christian.gr...@gmail.com]
 Envoyé : lundi 15 décembre 2014 20:25
 À : Fabrice Etanchaud
 Cc : basex-talk@mailman.uni-konstanz.de
 Objet : Re: [basex-talk] RESTXQ - access to InitParameters of 
 ServletContext and/or ServletConfig

 Hi Fabrice,

 I would like to get InitParameters of the ServletContext or 
 ServletConfig instances,

 The servlet init parameters are currently accessed by the BaseX itself [1,2], 
 but there is no way to retrieve them from RESTXQ.

 Well, that's only partially true. One way to reach the ServletContext init 
 parrameters is to:

 • Write a Java class that extends QueryModule [3] • Access 
 queryContext.http in that class (see e.g. [4]) • Call 
 http.req.getServletContext().getInitParameterNames()

 I am not sure, though, if that also works for the ServletConfig parameters 
 (maybe you know?).

 What do you want to do with that information? What parameters are you 
 interested in?
 Christian

 [1] 
 https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/java/o
 rg/basex/http/BaseXServlet.java#L38
 [2] 
 https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/java/o
 rg/basex/http/HTTPContext.java#L421
 [3] http://docs.basex.org/wiki/Java_Bindings#Context-Awareness
 [4] 
 https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/java/o
 rg/basex/modules/Session.java#L142

[basex-talk] RESTXQ - access to InitParameters of ServletContext and/or ServletConfig

2014-12-15 Thread Fabrice Etanchaud

Hi all BaseX users ,

Inside a RESTXQ function,
I would like to get InitParameters of the ServletContext or ServletConfig 
instances,
I cannot find that in a module.
Did I miss something ?

Best regards,
Fabrice Etanchaud

Re: [basex-talk] BaseX 8.0: DBA

2014-12-12 Thread Fabrice Etanchaud

Dear Christian,

Do I have to download a module ?
I have the following error :

HTTP ERROR 400

Problem accessing /dba. Reason:

Stopped at C:/Program Files (x86)/BaseX/webapp/dba/modules/html.xqm, 8/66:
[XQST0059] Module 'http://exquery.org/ns/request' not found.


Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Christian Grün
Envoyé : vendredi 12 décembre 2014 00:30
À : BaseX; BaseX
Objet : [basex-talk] BaseX 8.0: DBA

Hi all,

This is yet another update on the next, upcoming version of BaseX:

We have added a new database administration (DBA) web frontend to our ZIP and 
EXE distributions. It allows you to create and administrate databases, evaluate 
queries in realtime, view log files, manage users, and more. It was completely 
written with XQuery and RESTXQ, apart from a few lines of Javascript code.

The new frontend was added due to the known shortcomings of the Java GUI, which 
cannot be used to run queries on other servers. As a consequence, we have 
removed the server dialog from the desktop application. We hope that DBA will 
help you to write your own web applications based on RESTXQ!

Some more infos and design goals can be found in our Wiki [1].

Have fun, we are looking forward to your feedback, Christian

PS: NO, the web frontend does not provide all the cute features of the desktop 
gui, like the treemap, scatterplot, syntax highlighting, etc… But it may 
provide some of them in future. Your contributions are welcome!

[1] http://docs.basex.org/wiki/DBA

Re: [basex-talk] BaseX 8.0: DBA

2014-12-12 Thread Fabrice Etanchaud

Sorry,
The upgrade was not consistent.
Everything is ok now.
Thank you for the web administration console !

Best regards,

-Message d'origine-
De : Fabrice Etanchaud 
Envoyé : vendredi 12 décembre 2014 09:18
À : BaseX
Objet : RE: [basex-talk] BaseX 8.0: DBA

Dear Christian,

Do I have to download a module ?
I have the following error :

HTTP ERROR 400

Problem accessing /dba. Reason:

Stopped at C:/Program Files (x86)/BaseX/webapp/dba/modules/html.xqm, 8/66:
[XQST0059] Module 'http://exquery.org/ns/request' not found.


Best regards,
Fabrice

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Christian 
Grün Envoyé : vendredi 12 décembre 2014 00:30 À : BaseX; BaseX Objet : 
[basex-talk] BaseX 8.0: DBA

Hi all,

This is yet another update on the next, upcoming version of BaseX:

We have added a new database administration (DBA) web frontend to our ZIP and 
EXE distributions. It allows you to create and administrate databases, evaluate 
queries in realtime, view log files, manage users, and more. It was completely 
written with XQuery and RESTXQ, apart from a few lines of Javascript code.

The new frontend was added due to the known shortcomings of the Java GUI, which 
cannot be used to run queries on other servers. As a consequence, we have 
removed the server dialog from the desktop application. We hope that DBA will 
help you to write your own web applications based on RESTXQ!

Some more infos and design goals can be found in our Wiki [1].

Have fun, we are looking forward to your feedback, Christian

PS: NO, the web frontend does not provide all the cute features of the desktop 
gui, like the treemap, scatterplot, syntax highlighting, etc… But it may 
provide some of them in future. Your contributions are welcome!

[1] http://docs.basex.org/wiki/DBA

Re: [basex-talk] BaseX 8.0: DBA

2014-12-12 Thread Fabrice Etanchaud

And maybe an indication of the current administrated instance  ?

Best regards,
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Andy Bunce
Envoyé : vendredi 12 décembre 2014 11:01
À : Christian Grün
Cc : BaseX; BaseX
Objet : Re: [basex-talk] BaseX 8.0: DBA

Hi Christian,

The DBA looks good.
One comment inspired by a quick play with it. It does not show the name of the 
logged in user.  Indeed I have previously looked to get this information and it 
does not seem to be available via any BaseX function. Is this intentional?

/Andy

On 11 December 2014 at 23:30, Christian Grün 
christian.gr...@gmail.commailto:christian.gr...@gmail.com wrote:
Hi all,

This is yet another update on the next, upcoming version of BaseX:

We have added a new database administration (DBA) web frontend to our
ZIP and EXE distributions. It allows you to create and administrate
databases, evaluate queries in realtime, view log files, manage users,
and more. It was completely written with XQuery and RESTXQ, apart from
a few lines of Javascript code.

The new frontend was added due to the known shortcomings of the Java
GUI, which cannot be used to run queries on other servers. As a
consequence, we have removed the server dialog from the desktop
application. We hope that DBA will help you to write your own web
applications based on RESTXQ!

Some more infos and design goals can be found in our Wiki [1].

Have fun, we are looking forward to your feedback,
Christian

PS: NO, the web frontend does not provide all the cute features of the
desktop gui, like the treemap, scatterplot, syntax highlighting, etc…
But it may provide some of them in future. Your contributions are
welcome!

[1] http://docs.basex.org/wiki/DBA

Re: [basex-talk] How to use ft:search() for querying 2 or more phrases?

2014-11-28 Thread Fabrice Etanchaud

Hi John,

What about using the intersect operator on the two result sets ?

Best regards,
Fabrice Etanchaud
Questel/Orbit


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de John Best
Envoyé : vendredi 28 novembre 2014 10:35
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] How to use ft:search() for querying 2 or more phrases?

Hi BaseX Team,
The subject itself is explanatory. I have following query -

ft:search(Articles, (electromagnetic waves), map { 'mode':='Phrase' 
})/ancestor::Doc
return $x
This query searches for electromagnetic waves as a phrase. I want to search 
another phrase electromagnetic particles with the previous phrase.

List only those Articles having both these phrases.

--
Have a nice day
JBest

Re: [basex-talk] CSV : escape character feature

2014-11-28 Thread Fabrice Etanchaud

Thank you so much, Christian.

-Message d'origine-
De : Christian Grün [mailto:christian.gr...@gmail.com] 
Envoyé : vendredi 28 novembre 2014 01:15
À : Fabrice Etanchaud
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] CSV : escape character feature

Hi Fabrice,

my first escaping solution was a bit shortsighted. I have now added a new 
BACKSLASHES option, which allows you (and everyone else in this little XQuery 
world) to explicitly turn on backslash escaping [1].
This works both for parsing and serializing CSV.

The new snapshot is available in appr. 10 minutes.

Have fun,
Christian

[1] http://docs.basex.org/wiki/CSV_Module




On Sat, Nov 15, 2014 at 9:56 PM, Fabrice Etanchaud fetanch...@questel.com 
wrote:
 Thank you so much Christian !

 -Message d'origine-
 De : Christian Grün [mailto:christian.gr...@gmail.com]
 Envoyé : vendredi 14 novembre 2014 20:56 À : Fabrice Etanchaud Cc : 
 basex-talk@mailman.uni-konstanz.de
 Objet : Re: [basex-talk] CSV : escape character feature

 Hi Fabrice,

 I decided to change the default behavior of the BaseX CSV parser:
 backslashes will now always be treated as esape characters. \r, \n and \t 
 will be encoded as CR, NL, and TAB, and other characters will be returned 
 literally. A new snapshot is online.

 Everyone: please report if this new default causes surprises in your setting.

 Best,
 Christian


 On Thu, Nov 13, 2014 at 10:34 AM, Fabrice Etanchaud fetanch...@questel.com 
 wrote:
 Dear all,

 I did not find a way to specify how to escape quotes when importing a 
 csv file with quotes=yes.

 Here is an example :

 12345|TOTO LE HERO|Toto le héro 3 \A''
 LP|67|8051|4000|XX||LU|||ITE|||GB||20.10

 where quotes are escaped with a leading \ (the '' after the A are two 
 single quotes).

 But it seems BaseX detects quote quote (“”) as escaped quote.

 Could it be possible to have in the future an option to override the 
 default behavior ?

 Best regards,
 Fabrice Etanchaud
 Questel/Orbit

Re: [basex-talk] How to use ft:search() for querying 2 or more phrases?

2014-11-28 Thread Fabrice Etanchaud

Dear John

Maybe you should add the following option :

map { 'mode':'all' }

because ’any’ is the default.

Best regards,
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de John Best
Envoyé : vendredi 28 novembre 2014 11:32
À : Christian Grün
Cc : BaseX
Objet : Re: [basex-talk] How to use ft:search() for querying 2 or more phrases?

Hi Christian,
Thanks for the reply.
Using intersect, results in 5 Articles. Whereas you suggestion with -

ft:search(Articles,
(electromagnetic waves, electromagnetic particles)
  )/ancestor::Doc
results in 218 Articles.
Why the difference of 213 ?
I am checking some of these additional, do they have both phrases. Will come 
back again.



On Fri, Nov 28, 2014 at 3:52 PM, Christian Grün 
christian.gr...@gmail.commailto:christian.gr...@gmail.com wrote:
Hi John,

You can specify more than one search string with ft:search [2]:

  ft:search(Articles,
(electromagnetic waves, electromagnetic particles)
  )/ancestor::Doc

In your case, the phrase option is not required, because it creates
a single string from all of your search strings. Please see [1] for
more information on the search modes.

Best,
Christian

[1] http://docs.basex.org/wiki/Full-Text_Module#ft:search
[2] http://docs.basex.org/wiki/Full-Text#Combining_Results



On Fri, Nov 28, 2014 at 10:34 AM, John Best 
johnbest5...@gmail.commailto:johnbest5...@gmail.com wrote:
 Hi BaseX Team,

 The subject itself is explanatory. I have following query -

 ft:search(Articles, (electromagnetic waves), map { 'mode':='Phrase'
 })/ancestor::Doc
 return $x

 This query searches for electromagnetic waves as a phrase. I want to
 search another phrase electromagnetic particles with the previous phrase.


 List only those Articles having both these phrases.

 --
 Have a nice day
 JBest





--
Have a nice day
JBest

Re: [basex-talk] BaseX and MySQL

2014-11-20 Thread Fabrice Etanchaud

Dear all,

You might have to add the following call before sql :connect() :

sql:init(com.mysql.jdbc.Driver)

I found it mandatory in xqm modules run inside Tomcat, for example.

If you start playing with the SQL module,
You will realize that BaseX rocks also on SQL repositories !
With RESTXQ+SQLMODULE you can write in a few lines a REST service exposing SQL 
data.

Best regards,
Fabrice Etanchaud
Questel/Orbit


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Lizzi, Vincent
Envoyé : mercredi 19 novembre 2014 23:31
À : Hans-Juergen Rennau; basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] BaseX and MySQL

Hans,

I had to do this a few days ago, so it’s fresh in my mind. Download Connector/J 
from the MySQL website and place .jar file in the basex/lib folder. Restart 
BaseX (if you’re using the server version) so it picks up the new .jar.

Queries can then be run like:

let $c := sql:connect('jdbc:mysql://localhost/database', 'user', 'password')
sql:execute($c, select …)
sql:execute($c, call stored_procedure…();)


Cheers,
Vincent


From: 
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Hans-Juergen 
Rennau
Sent: Wednesday, November 19, 2014 5:20 PM
To: 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Subject: [basex-talk] BaseX and MySQL

Dear BaseX team,

can you point me to some information about which steps I must take in order to 
start accessing MySQL databases via the sql module?

Thank you for help,
Hans-Juergen

Re: [basex-talk] Opened by another process

2014-11-17 Thread Fabrice Etanchaud

Hi Paul,
Is there any basexhttp instance that could have opened the db ?

Best regards,
Fabrice
Questel/Orbit

-Message d'origine-
De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Paul 
Swennenhuis
Envoyé : lundi 17 novembre 2014 22:44
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Opened by another process

Why would I get a bxerr:BXDB0007 error Database 'profiles' cannot be updated, 
as it is opened by another process
when executing these commands from a BaseX client:

open profiles;
xquery insert node profileabc/profile into /profiles

Where profiles is an existing database, and /profiles an existing root 
element, and I am quite positive that the database is NOT being used in another 
process?

When I issue these commands on localhost it is working fine.

Paul

Re: [basex-talk] CSV : escape character feature

2014-11-15 Thread Fabrice Etanchaud

Thank you so much Christian !

-Message d'origine-
De : Christian Grün [mailto:christian.gr...@gmail.com] 
Envoyé : vendredi 14 novembre 2014 20:56
À : Fabrice Etanchaud
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] CSV : escape character feature

Hi Fabrice,

I decided to change the default behavior of the BaseX CSV parser:
backslashes will now always be treated as esape characters. \r, \n and \t will 
be encoded as CR, NL, and TAB, and other characters will be returned literally. 
A new snapshot is online.

Everyone: please report if this new default causes surprises in your setting.

Best,
Christian


On Thu, Nov 13, 2014 at 10:34 AM, Fabrice Etanchaud fetanch...@questel.com 
wrote:
 Dear all,

 I did not find a way to specify how to escape quotes when importing a 
 csv file with quotes=yes.

 Here is an example :

 12345|TOTO LE HERO|Toto le héro 3 \A''
 LP|67|8051|4000|XX||LU|||ITE|||GB||20.10

 where quotes are escaped with a leading \ (the '' after the A are two 
 single quotes).

 But it seems BaseX detects quote quote (“”) as escaped quote.

 Could it be possible to have in the future an option to override the 
 default behavior ?

 Best regards,
 Fabrice Etanchaud
 Questel/Orbit

[basex-talk] CSV : escape character feature

2014-11-13 Thread Fabrice Etanchaud

Dear all,
I did not find a way to specify how to escape quotes when importing a csv file 
with quotes=yes.

Here is an example :

12345|TOTO LE HERO|Toto le héro 3 \A'' 
LP|67|8051|4000|XX||LU|||ITE|||GB||20.10

where quotes are escaped with a leading \
(the '' after the A are two single quotes).

But it seems BaseX detects quote quote () as escaped quote.
Could it be possible to have in the future an option to override the default 
behavior ?

Best regards,
Fabrice Etanchaud
Questel/Orbit

Re: [basex-talk] Out Of Memory

2014-11-07 Thread Fabrice Etanchaud

Hi Mansi,

From what I can see,
for each pqr value, you could use db:attribute-range to retrieve all the file
names, group by/count to obtain statistics.
You could also create a new collection from an extraction of only the data you
need, changing @name into element and use full text fuzzy match.

Hoping it helps

Cordialement
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Mansi Sheth
Envoyé : jeudi 6 novembre 2014 20:55
À : Christian Grün
Cc : BaseX
Objet : Re: [basex-talk] Out Of Memory

I would be doing tons of post processing. I never use UI. I either use REST
thru cURL or command line.

I would basically need data in below format:

XML File Name, @name

I am trying to whitelist picking up values for only starts-with(@name,pqr).
where pqr is a list of 150 odd values.

My file names, are essentially some ID/keys, which I would need to map it
further using sqlite to some values and may be group by it.. etc.

So, basically I am trying to visualize some data, based on its existence in
which xml files. So, yes count(query) would be fine, but won't solve much
purpose, since I still need value pqr.

- Mansi

On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün
christian.gr...@gmail.commailto:christian.gr...@gmail.com wrote:
Query: /A/*//E/@name/string()

In the GUI, all results will be cached, so you could think about
switching to command line.

Do you really need to output all results, or do you do some further
processing with the intermediate results?

For example, the query count(/A/*//E/@name/string()) will probably
run without getting stuck.

This query, was going OOM, within few mins.

I tried a few ways, of whitelisting, with contain clause, to truncate the
result set. That didn't help too. So, now I am out of ideas. This is giving
JVM 10GB of dedicated memory.

Once, above query works and doesn't go Out Of Memory, I also need
corresponding file names too:

XYZ.xml //E/@name
PQR.xml //E/@name

Let me know if you would need more details, to appreciate the issue ?
- Mansi

On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün
christian.gr...@gmail.commailto:christian.gr...@gmail.com
wrote:

Hi Mansi,

I think we need more information on the queries that are causing the
problems.

Best,
Christian

On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth
mansi.sh...@gmail.commailto:mansi.sh...@gmail.com wrote:
Hello,

I have a use case, where I have to extract lots in information from each
XML
in each DB. Something like, attribute values of most of the nodes in an
XML.
For such, queries based goes Out Of Memory with below exception. I am
giving
it ~12GB of RAM on i7 processor. Well I can't complain here since I am
most
definitely asking for loads of data, but is there any way I can get
these
kinds of data successfully ?

mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
BaseX 8.0 beta b45c1e2 [Server]
Server was started (port: 1984)
HTTP Server was started (port: 8984)
Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java
heap
space
at

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
at

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
at

org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
at

org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
at

org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
at

org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:744)

--
- Mansi

Re: [basex-talk] Out Of Memory

2014-11-06 Thread Fabrice Etanchaud

Hi Mansi,

Here you have a natural partition of your data : the files you ingested.
So my first suggestion would be to query your data on a file basis:

for $doc in db:open(‘your_collection_name’)
let $file-name := db:path($doc)
return
file:write(
$file-name,
names
   {
   for $name in $doc//E/@name/data()
   return
   
name{$name}/name
}
/names
)

Is it for indexing ?

Hope it helps,

Best regards,

Fabrice Etanchaud
Questel/Orbit

De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Mansi Sheth
Envoyé : jeudi 6 novembre 2014 16:33
À : Christian Grün
Cc : BaseX
Objet : Re: [basex-talk] Out Of Memory

This would need a lot of details, so bear with me below:

Briefly my XML files look like:

A name=
B name=
   C name=
D name=
 E name=/

A can contain B, C or D and B, C or D can contain E. We have 1000s 
(currently 3000 in my test data set) of such xml files, of size 50MB on an 
average. Its tons of data ! Currently, my database is of ~18GB in size.

Query: /A/*//E/@name/string()

This query, was going OOM, within few mins.

I tried a few ways, of whitelisting, with contain clause, to truncate the 
result set. That didn't help too. So, now I am out of ideas. This is giving JVM 
10GB of dedicated memory.

Once, above query works and doesn't go Out Of Memory, I also need corresponding 
file names too:

XYZ.xml //E/@name
PQR.xml //E/@name

Let me know if you would need more details, to appreciate the issue ?
- Mansi

On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün 
christian.gr...@gmail.commailto:christian.gr...@gmail.com wrote:
Hi Mansi,

I think we need more information on the queries that are causing the problems.

Best,
Christian



On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth 
mansi.sh...@gmail.commailto:mansi.sh...@gmail.com wrote:
 Hello,

 I have a use case, where I have to extract lots in information from each XML
 in each DB. Something like, attribute values of most of the nodes in an XML.
 For such, queries based goes Out Of Memory with below exception. I am giving
 it ~12GB of RAM on i7 processor. Well I can't complain here since I am most
 definitely asking for loads of data, but is there any way I can get these
 kinds of data successfully ?

 mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
 BaseX 8.0 beta b45c1e2 [Server]
 Server was started (port: 1984)
 HTTP Server was started (port: 8984)
 Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java heap
 space
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
 at
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
 at java.lang.Thread.run(Thread.java:744)


 --
 - Mansi



--
- Mansi

Re: [basex-talk] Out Of Memory

2014-11-06 Thread Fabrice Etanchaud

The solution depends on the usage you will have of your extraction.
May I ask you what is your extraction for ?

Best regards,
Fabrice

De : Mansi Sheth [mailto:mansi.sh...@gmail.com]
Envoyé : jeudi 6 novembre 2014 17:11
À : Fabrice Etanchaud
Cc : Christian Grün; BaseX
Objet : Re: [basex-talk] Out Of Memory

Interesting idea, I thought of using db partition, but didn't pursue it 
further, mainly due to below thought process.

Currently, I am ingesting ~3000 xml files, storing ~50 xml files per db, which 
would be growing quickly. So, below approach would lead to ~3000 more files 
(which would be increasing), increasing I/O operations considerably for further 
pre-processing.

However, I don't really care if process takes few minutes to few hours (as long 
as its not day(s) ;)). Given the situation and my options, I would surely try 
this.

Database, is currently indexed at attribute level, as thats what I would be 
querying the most. Do you think, I should do anything differently ?

Thanks,
- Mansi

On Thu, Nov 6, 2014 at 10:48 AM, Fabrice Etanchaud 
fetanch...@questel.commailto:fetanch...@questel.com wrote:
Hi Mansi,

Here you have a natural partition of your data : the files you ingested.
So my first suggestion would be to query your data on a file basis:

for $doc in db:open(‘your_collection_name’)
let $file-name := db:path($doc)
return
file:write(
$file-name,
names
   {
   for $name in $doc//E/@name/data()
   return
   
name{$name}/name
}
/names
)

Is it for indexing ?

Hope it helps,

Best regards,

Fabrice Etanchaud
Questel/Orbit

De : 
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
 
[mailto:basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de]
 De la part de Mansi Sheth
Envoyé : jeudi 6 novembre 2014 16:33
À : Christian Grün
Cc : BaseX
Objet : Re: [basex-talk] Out Of Memory

This would need a lot of details, so bear with me below:

Briefly my XML files look like:

A name=
B name=
   C name=
D name=
 E name=/

A can contain B, C or D and B, C or D can contain E. We have 1000s 
(currently 3000 in my test data set) of such xml files, of size 50MB on an 
average. Its tons of data ! Currently, my database is of ~18GB in size.

Query: /A/*//E/@name/string()

This query, was going OOM, within few mins.

I tried a few ways, of whitelisting, with contain clause, to truncate the 
result set. That didn't help too. So, now I am out of ideas. This is giving JVM 
10GB of dedicated memory.

Once, above query works and doesn't go Out Of Memory, I also need corresponding 
file names too:

XYZ.xml //E/@name
PQR.xml //E/@name

Let me know if you would need more details, to appreciate the issue ?
- Mansi

On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün 
christian.gr...@gmail.commailto:christian.gr...@gmail.com wrote:
Hi Mansi,

I think we need more information on the queries that are causing the problems.

Best,
Christian



On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth 
mansi.sh...@gmail.commailto:mansi.sh...@gmail.com wrote:
 Hello,

 I have a use case, where I have to extract lots in information from each XML
 in each DB. Something like, attribute values of most of the nodes in an XML.
 For such, queries based goes Out Of Memory with below exception. I am giving
 it ~12GB of RAM on i7 processor. Well I can't complain here since I am most
 definitely asking for loads of data, but is there any way I can get these
 kinds of data successfully ?

 mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
 BaseX 8.0 beta b45c1e2 [Server]
 Server was started (port: 1984)
 HTTP Server was started (port: 8984)
 Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java heap
 space
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
 at
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
 at java.lang.Thread.run(Thread.java:744)


 --
 - Mansi



--
- Mansi



--
- Mansi

Re: [basex-talk] Import a large XML file and add an attribute on all children

2014-10-31 Thread Fabrice Etanchaud

Such a preprocessing transformation step would be very useful.

In order to do that entirely in BaseX, I load the data twice.
Once in a temporary collection (that may be an in memory collection), from 
where I perform the transformations.
Then I load the transformed data in the final collection.

A great improvement would be to provide the import step with an iteration’s 
xpath and a xquery transformation function,
So the transformations are loaded instead (Zorba and Saxon-enterprise have a 
streaming mode like this).

Best regards,
Fabrice Etanchaud
Data integration team
Questel/Orbit


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Mallika Jimmy
Envoyé : vendredi 31 octobre 2014 10:08
À : basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Import a large XML file and add an attribute on all 
children

Hello,

We have a requirement to create a database by importing a large XML file. 
During this process, we also want to insert a particular attribute on all child 
nodes. Which is the efficient method to do the same? The file size is nearly 
1GB.

Thanks in advance,
Mallika Jacob

Re: [basex-talk] Import a large XML file and add an attribute on all children

2014-10-31 Thread Fabrice Etanchaud

If you use XQuery update facility,
I agree that adding a attribute on every element will lead to a huge pending
update list that could fill your ram.

I suggest to process your data in smaller partitions.
For each partition,
You could transform your data with a recursive function and store it in a
temporary file.
Then you can create a new collection from that files.

In order to transform each partition,
You can use copy clause / xquery update, or a recursive xquery function.
Then you can file:write() / put the result.

Cordialement
Fabrice

De : Mallika Jimmy [mailto:mallikaji...@gmail.com]
Envoyé : vendredi 31 octobre 2014 11:00
À : Fabrice Etanchaud
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Import a large XML file and add an attribute on all
children

Hello Etanchaud,

Such a preprocessing transformation step would be very useful.
This means that BaseX does not have such a tranformation step at present.

To achieve this in the current BaseX version, what would be the efficient
method?
I have now tried creating database by importing the XML file first.Then
inserting attribute on all child nodes using an xquery recursive function. It
works in small files. But gives Out of memory in large files say 1Gb.

Please help.

Thanks in advance,
Mallika Jacob

On Fri, Oct 31, 2014 at 3:07 PM, Fabrice Etanchaud
fetanch...@questel.commailto:fetanch...@questel.com wrote:
Such a preprocessing transformation step would be very useful.

In order to do that entirely in BaseX, I load the data twice.
Once in a temporary collection (that may be an in memory collection), from
where I perform the transformations.
Then I load the transformed data in the final collection.

A great improvement would be to provide the import step with an iteration’s
xpath and a xquery transformation function,
So the transformations are loaded instead (Zorba and Saxon-enterprise have a
streaming mode like this).

Best regards,
Fabrice Etanchaud
Data integration team
Questel/Orbit

De :
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de

[mailto:basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de]
De la part de Mallika Jimmy
Envoyé : vendredi 31 octobre 2014 10:08
À :
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Objet : [basex-talk] Import a large XML file and add an attribute on all
children

Hello,

We have a requirement to create a database by importing a large XML file.
During this process, we also want to insert a particular attribute on all child
nodes. Which is the efficient method to do the same? The file size is nearly
1GB.

Thanks in advance,
Mallika Jacob

Re: [basex-talk] Import a large XML file and add an attribute on all children

2014-10-31 Thread Fabrice Etanchaud

Malika,

Is this attribute a kind of annotation or index ?
If you need to annotate/index your data,
I suggest not to mix source data and annotations,
But to create a annotation/indexation collection afterwards.

Maybe node-pre functions could help you ?
You could create a separate collection containing the (attribute,
node-pre(element)) mappings.

See : http://docs.basex.org/wiki/Database_Module#Read_Operations

Cordialement
Fabrice

De : basex-talk-boun...@mailman.uni-konstanz.de
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Fabrice
Etanchaud
Envoyé : vendredi 31 octobre 2014 11:22
À : Mallika Jimmy
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Import a large XML file and add an attribute on all
children

If you use XQuery update facility,
I agree that adding a attribute on every element will lead to a huge pending
update list that could fill your ram.

In order to transform each partition,
You can use copy clause / xquery update, or a recursive xquery function.
Then you can file:write() / put the result.

Cordialement
Fabrice

De : Mallika Jimmy [mailto:mallikaji...@gmail.com]
Envoyé : vendredi 31 octobre 2014 11:00
À : Fabrice Etanchaud
Cc :
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Import a large XML file and add an attribute on all
children

Hello Etanchaud,

Such a preprocessing transformation step would be very useful.
This means that BaseX does not have such a tranformation step at present.

Please help.

Thanks in advance,
Mallika Jacob

On Fri, Oct 31, 2014 at 3:07 PM, Fabrice Etanchaud
fetanch...@questel.commailto:fetanch...@questel.com wrote:
Such a preprocessing transformation step would be very useful.

Best regards,
Fabrice Etanchaud
Data integration team
Questel/Orbit

De :
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de

Hello,

Thanks in advance,
Mallika Jacob

Re: [basex-talk] Text Index just over some elements

2014-09-25 Thread Fabrice Etanchaud

Dear Oscar,

From what I read, I’m not sure you had a look at  the underlying BaseX data 
structure yet.

Xml files  in BaseX are digested in a binary format

http://docs.basex.org/wiki/Node_Storage

but ‘stored’ raw files are simply copied on the filesystem.

You can only index digested data.
Best regards,
Fabrice
Questel/Orbit


De : basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Oscar Herrera
Envoyé : mercredi 24 septembre 2014 19:41
À : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Text Index just over some elements

== The Scenario ==
What we have is a dynamic collection with information from people who registers 
on the site. Basically the information is retrieved from third party companies 
that provide us the information on XML via WebService calls, so we do request 
the person information to these third parties on the moment people gets 
registered. So that's how we got into BaseX since we consider is inconvenient 
to store large XML files on a RDBMS and I don't see the point on having to 
parse all the information when we receive it to re-organize it mostly because 
from my point of view the information is already well structured via these 
large XML files.

This XML are on average 2mb each. Of course, there are some that are very small 
(80kb) as there are some that we have been advised might get up to 500mb.

So, from all the information we receive, at this moment I estimate we only need 
around 25%, I though about having different databases with full and partial 
information but the thing is that somehow the requirements are not entirely 
defined on one hand, and on the other, there's information that we use on the 
queries and some other that we still need to display to its owner and that 
we're displaying using XSLT.
== Question 1: Indexes are only required for some fields ==
We usually need to locate the records by some id, or query over some of the 
elements available on the XML files, but those are pretty much always the same, 
so those are the elements that I'd like to have indexed. That's why I don't see 
a reason for having indexes over the contents of all the elements since is 
unlikely (at least right now) we'll make use of those and instead they consume 
a lot of hard drive.
== Question 2: to store files on the filesystem or as raw on BaseX? ==
Right now, we're storing the information we receive as XML files on the file 
system on a RAID 10, anyway what's your advice?, to keep the files stored on 
the filesystem directly or to let BaseX handle those (I think this is the 
difference between add/replace and store commands right?), is there any article 
you could point me I could use for reference?, as I see BaseX right now it is 
handling the queries and the index information right now but depends on the 
filesystem to retrieve the entire document, am I right?
== Question 3: dynamic optimize and index updates? ==
As you can imagine, I'll need to have the indexes updated sincedata-mining 
will be done with the information from the people registered on it. I've seen 
is not possible to run the optimize command while the app is up, I'm not sure 
about the indexes getting updated on real time either, but this somehow is 
troubling me since the idea is to have the app running 24x7, and if we get to 
have a lot of registered users, to update the indexes or to optimize the db 
will take a long time, isn't it?. So any strategies on this?
== Question 4: connection pooling ==
I have only found XQJ-Pool to be used with BaseX, does anybody know about any 
other pooling mechanism available for BaseX?
Thank you so much for your help with this subject, and sorry for the long long 
email ;)
Oscar H




2014-09-24 3:21 GMT-05:00 Fabrice Etanchaud 
fetanch...@questel.commailto:fetanch...@questel.com:
Hi Oscar,

You will have to maintain a separate collection in order to do that.

That separate collection will contain the node-pre or node-id of each value to 
be indexed.
Storing the node-pre is the faster way but require a append-only main 
collection if you do not want to have to recreate the entire separate 
collection after each main collection update.


1.   Add the new map entries (value-to-be-indexed,node-pre or node-id) in 
the separate collection

2.   Reindex the separate collection

An even faster solution is to store the values in text nodes and node-pre or 
node-id in attributes in order to create only a text index (or vice/versa). 
That will speed up the reindexation.

To use this custom index :

1.Use the db:attribute or db:text function on the separate collection 
to obtain the list of node-id or node-pre associated with a given value,

2.   For each node-xx, use the db:open-xx function on the main collection 
to obtain the real node.

If you are familiar with CouchBase/CouchDB, it’s a little like creating a view 
;-)

But such a built-in feature would be great !

Best regards,
Fabrice

1 2 >

1 - 100 of 156 matches

Mail list logo