from:"Giuseppe Celano"

Re: [basex-talk] archive

2018-08-23 Thread Giuseppe Celano

This possibility to open zipped XMLs via doc() is awesome.


Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano 
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Aug 22, 2018, at 4:28 PM, Christian Grün  wrote:
> 
>> Because of the size, is there a way to store xml-data as archive
> 
> If you want to compress your XML documents with XQuery, you can have a
> look at the Archive or ZIP Module of BaseX. I am not sure, however, if
> this is what you are looking for?
> 
>> (with possibility to search) ?
> 
> The BaseX implementation of fn:doc() and fn:collection() supports
> zipped files as arguments. The contents will be unzipped
> automatically:
> 
>  collection('my-xml-files.zip')
> 
> If you create databases, you can specify ZIP files as input, too.

Re: [basex-talk] Huge CSV

2018-08-12 Thread Giuseppe Celano

Yes, I build them, but I do not use them explicitly all the time.  

> On Aug 13, 2018, at 12:04 AM, Liam R. E. Quin  wrote:
> 
> On Sun, 2018-08-12 at 23:58 +0200, Giuseppe Celano wrote:
>> more documents accessed sequentially is better than one
>> big file.
> 
> Are you building indexes in the database? Do yourqueries make use of
> them?
> 
> You may find using the full text extensions useful.
> 
> Liam
> 
> 
> -- 
> Liam Quin, https://www.holoweb.net/liam/cv/
> Web slave for vintage clipart http://www.fromoldbooks.org/
> Available for XML/Document/Information Architecture/
> XSL/XQuery/Web/Text Processing/A11Y work & consulting.
>

Re: [basex-talk] Huge CSV

2018-08-12 Thread Giuseppe Celano

Hi Liam,

Thanks for answering. The problem is not only the XML transformation per se, 
but also the subsequent query of the documents. I see that if I parcel the big 
csv into smaller (XML) documents and query them sequentially, I have no
performance problems. This is also the case in the database, as far as I can 
see: more documents accessed sequentially is better than one big file.

Ciao,
Giuseppe 

> On Aug 10, 2018, at 9:09 PM, Liam R. E. Quin  wrote:
> 
> On Fri, 2018-08-10 at 13:43 +0200, Giuseppe Celano wrote:
>> I uploaded the file, as it is, in the database,
> 
> i'd probably look for an XSLT transformation to turn it into XSLT - of
> there are python and perl scripts or other programs that can do it -
> and then load the result intoa database.
> 
> It's not all that large a file, so maybe it'd help if you described the
> exact problems you were having -- what did you try, what did you expect
> to happen, what actually happen, what steps did you take to
> investigate...
> 
> Liam
> 
> 
> -- 
> Liam Quin, https://www.holoweb.net/liam/cv/
> Web slave for vintage clipart http://www.fromoldbooks.org/
> Available for XML/Document/Information Architecture/
> XSL/XQuery/Web/Text Processing/A11Y work & consulting.
>

Re: [basex-talk] Huge CSV

2018-08-10 Thread Giuseppe Celano

I uploaded it as csv (it is csv) via the GUI and it is then converted into XML 
(this conversion probably makes it too big)


> On Aug 10, 2018, at 1:50 PM, Christian Grün  wrote:
> 
>> I uploaded the file, as it is, in the database
> 
> So you uploaded the file as binary? Did you try to import it as XML,
> too? Does »upload« mean that you used the simple REST API?
>

Re: [basex-talk] Huge CSV

2018-08-10 Thread Giuseppe Celano

I uploaded the file, as it is, in the database, but this does not help. The 
idea was to preliminary transform the file into xml and then query it, but this 
cannot be done on the fly. So the only thing I can think of is to parcel the 
original csv file into multiple csv files and then tranform each of them in 
xml, and then query these latter. Are there alternatives? Thanks.

Giuseppe

Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano 
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Aug 10, 2018, at 1:37 PM, Christian Grün  wrote:
> 
> As there are many different ways to process large CSV data with BaseX…
> What did you try so far?
> 
> 
> On Fri, Aug 10, 2018 at 1:36 PM Giuseppe Celano
>  wrote:
>> 
>> Hi,
>> 
>> I am trying to work with a huge CSV file (about 380 MB), but If I built the 
>> database it seems that even simple operations cannot be evaluated. Is 
>> splitting the CSV file the only option or am I missing something here? 
>> Thanks.
>> 
>> Giuseppe
>> 
>> 
>

[basex-talk] Huge CSV

2018-08-10 Thread Giuseppe Celano

Hi,

I am trying to work with a huge CSV file (about 380 MB), but If I built the 
database it seems that even simple operations cannot be evaluated. Is splitting 
the CSV file the only option or am I missing something here? Thanks.

Giuseppe

Re: [basex-talk] Parallelization

2018-07-24 Thread Giuseppe Celano

I have to experiment more, but since I tried to copy many xml files (which can 
take some time) and did not see a difference, I would be tempted to say that 
maybe the problem is something else. But as soon as I have some time, I will 
test it again and let you know.


> On Jul 24, 2018, at 9:55 AM, Christian Grün  wrote:
> 
>> I tried with and without xquery:fork-join and I do not see any real 
>> difference as far as evaluation time is concerned. When it works, time gets, 
>> approximately, halved.
>> In my "activity monitor" I can actually see more R processes started by 
>> BaseX, but in the other case I cannot see any new process (but I guess this 
>> is expected for parallelization inside the BaseX process).
> 
> I guess that the process of writing files is pretty fast, so there may
> be no real threading. You can use prof:sleep do delay the process.
> 
> 
> 
>> Universität Leipzig
>> Institute of Computer Science, NLP
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
>> On Jul 24, 2018, at 9:39 AM, Christian Grün  
>> wrote:
>> 
>> Thanks.
>> 
>> However, if I change the content of function() with only a file:write() 
>> function (see below), it does not seem to work in parallel
>> 
>> : do you know why?
>> 
>> How did you find out?
>> 
>> 
>

Re: [basex-talk] Parallelization

2018-07-24 Thread Giuseppe Celano

I tried with and without xquery:fork-join and I do not see any real difference 
as far as evaluation time is concerned. When it works, time gets, 
approximately, halved. 

In my "activity monitor" I can actually see more R processes started by BaseX, 
but in the other case I cannot see any new process (but I guess this is 
expected for parallelization inside the BaseX process).

Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano 
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Jul 24, 2018, at 9:39 AM, Christian Grün  wrote:
> 
> Thanks.
> 
>> However, if I change the content of function() with only a file:write() 
>> function (see below), it does not seem to work in parallel
> : do you know why?
> 
> How did you find out?

Re: [basex-talk] Parallelization

2018-07-24 Thread Giuseppe Celano

Hi Christian,

Thanks for the reply. My query is of the type (simplified (pseudo)code):

let $u :=   for $r in (list of document names)
let  $dirToWrite := "/directory/" || $r
return
function () {
( file:write($dirToWrite,  "a=5;a"), proc:system("dir/Rscript", 
$dirToWrite)   )
}
return
xquery:fork-join($u)

This allows me to run R scripts in parallel (which can also write something). 
However, if I change the content of function() with only a file:write() 
function (see below), it does not seem to work in parallel: do you know why?


let $u :=   for $r at $u in db:open("mio")
let  $dirToWrite := "/directory/" || db:list("mio")[$u]
return
function () {
file:write($dirToWrite,  $r) 
}
return
xquery:fork-join($u)

Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano 
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Jul 23, 2018, at 4:34 PM, Christian Grün  wrote:
> 
> Hi again,
> 
>> I am having fun with xquery:fork-join() and I see that it really reduces 
>> evaluation time (!)
> 
> Ottimo!
> 
>> My computer has two cores. I was wondering what would happen if a computer 
>> had more cores/CPUs (or if the script were run on a computer cluster): could 
>> the function take advantage of all of CPUs/cores?
>> In the future, will there be the possibility to maybe control this via 
>> parameters to pass to the function?
> 
> Yes, more cores are supported. It may be possible to enhance the
> function signature and provide options for controlling the number of
> concurrent threads. Currently, we simply rely on Java’s ForkJoinPool
> to distribute threads [1].
> 
> Feel free to send us the query patterns that benefit from multi-threading.
> 
> Best,
> Christian
> 
> [1] 
> https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/func/xquery/XQueryForkJoin.java
>

Re: [basex-talk] Db directory

2018-07-23 Thread Giuseppe Celano

Ok. So as I thought I would like to create different databases for different 
projects, it seems that the best strategy is to have new (complete) basex 
folders for each project. BaseX is so light that this does not seem an issue, 
but still I was not sure this was the way to go.

Danke!
Giuseppe


Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano 
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Jul 23, 2018, at 5:13 PM, Christian Grün  wrote:
> 
>> I see now that I can change the path in the GUI under preferences, but once 
>> I change it I cannot access databases in the "data" folder anymore.
> 
> Right. In a single BaseX instance, you can only work with a single
> database directory.
> 
>> It seems that in db:open I can specify to open a particular file of a 
>> database, but I cannot specify the path to a database in a folder different 
>> from the default one: is that right? Thanks!
> 
> Right.

Re: [basex-talk] Db directory

2018-07-23 Thread Giuseppe Celano

Hi,

I see now that I can change the path in the GUI under preferences, but once I 
change it I cannot access databases in the "data" folder anymore. It seems that 
in db:open I can specify to open a particular file of a database, but I cannot 
specify the path to a database in a folder different from the default one: is 
that right? Thanks!


Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano 
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Jul 23, 2018, at 4:30 PM, Christian Grün  wrote:
> 
> Hi Giuseppe,
> 
> As DBPATH is a command, no option, you will need to set as detailed in
> the Options page of our documentation [1]. Please note that DBPATH is
> a global option. It cannot be assigned at runtime; instead, it must be
> assigned before BaseX is started
> 
> Ciao,
> Christian
> 
> [1] http://docs.basex.org/wiki/Options
> 
> 
> 
> On Mon, Jul 23, 2018 at 4:19 PM Giuseppe Celano
>  wrote:
>> 
>> Hi,
>> 
>> I would like to create a database in a directory which is not "data" within 
>> the Basex folder. I used (within the GUI) the command
>> 
>> 
>>  
>> 
>> 
>> but it does not work. How can I specify that? Thanks.
>> 
>> Ciao,
>> Giuseppe
>>

[basex-talk] Db directory

2018-07-23 Thread Giuseppe Celano

Hi,

I would like to create a database in a directory which is not "data" within the 
Basex folder. I used (within the GUI) the command


  
  

but it does not work. How can I specify that? Thanks.

Ciao,
Giuseppe

[basex-talk] Parallelization

2018-07-22 Thread Giuseppe Celano

I am having fun with xquery:fork-join() and I see that it really reduces 
evaluation time (!): I apply the same script to a collection of files, and if I 
use xquery:fork-join() it takes about half of the time.
My computer has two cores. I was wondering what would happen if a computer had 
more cores/CPUs (or if the script were run on a computer cluster): could the 
function take advantage of all of CPUs/cores? In the future, will there be the 
possibility to maybe control this via parameters to pass to the function?

Ciao,
Giuseppe

[basex-talk] serialize vs csv:serialize

2018-07-20 Thread Giuseppe Celano

Hi All,

I am not sure whether the serialize function is working properly (the first 
example works, the second does not, because instead of tabs I get commas, and 
there is no way to specify to add the header)



f
f
f
f
f
f

 => csv:serialize(map{"header":"yes", "separator": "  "})   

return
a   b   c   d   e   f
f   f   f   f   f   f



f
f
f
f
f
f

 => serialize(map{"method":"csv", "item-separator": " "})

f,f,f,f,f,f


Thanks!
Giuseppe

Re: [basex-talk] maps

2018-07-13 Thread Giuseppe Celano

> Which ordering criteria does this particular dictionary use?

It is the insertion order. I am just converting this code for pure fun and make 
some tests. I will definitively have a look at Leo's code as well!

> On Jul 13, 2018, at 11:16 AM, Christian Grün  
> wrote:
> 
> Hi Giuseppe,
> 
>> I was wondering why to change the original order anyway.
> 
> As maps are defined to be unordered in the spec, it is perfectly legal
> for XQuery processors to change the order of map entries when
> optimizing the query and constructing the map, and there is no
> guarantee that the “original order”, which may be derived from the
> string representation of the query, will be preserved in the compiled
> query. For example, in the following query, …
> 
>map:merge((1 to 10) ! map:entry(., .))
> 
> …an implementation may decide to add the 10 input maps in parallel, or
> to reorder them before adding them to the final map.
> 
>> What is the BaseX rationale for order?
> 
> BaseX uses a hash-based map implementation, which has no notion of
> ordering [1]. While the entries of these data structure would yield
> have a deterministic order when being serialized, this order will have
> no similarities with the order in which the map entries were written
> down in the textual query.
> 
>> I was also trying to hack it adding initial numbers in the key names, but it 
>> does not work (but it does if key names are only numbers).
> 
> It might look like that ;) The chosen order (for parts of the data
> structure) depends on the computed hash code [2].
> 
>> Bottom line: I am trying to "closely" reproduce some Python code involving 
>> dictionaries, where original order in dictionary.keys() is now interestingly 
>> kept (from Python 3.6, as far as I know). Unfortunately, the computation on 
>> the dictionary values assumes a specific order of the keys (which is not 
>> even an alphabetical one )...
> 
> Which ordering criteria does this particular dictionary use?
> 
> If performance is not critical (i.e., if you do not plan to store
> millions of items in the map), you could write a simple XQuery wrapper
> module that does what you need. If performance is an important factor,
> and if you have some (more) spare time, you could dive into the
> fascinating world of Leo’s XQuery data structures [3] and adapt the
> provided map implementations to your needs…
> 
> Christian
> 
> [1] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
> [2] 
> https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/value/item/ANum.java#L115
> [3] https://github.com/LeoWoerteler/xq-modules
>

Re: [basex-talk] maps

2018-07-13 Thread Giuseppe Celano

Hi Christian,

Thanks. I know the order is not significant in maps/json by definition, but I 
was wondering why to change the original order anyway. I was also trying to 
hack it adding initial numbers in the key names, but it does not work (but it 
does if key names are only numbers). What is the BaseX rationale for order?

Bottom line: I am trying to "closely" reproduce some Python code involving 
dictionaries, where original order in dictionary.keys() is now interestingly 
kept (from Python 3.6, as far as I know). Unfortunately, the computation on the 
dictionary values assumes a specific order of the keys (which is not even an 
alphabetical one )... 

Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Jul 13, 2018, at 12:57 AM, Giuseppe Celano 
>  wrote:
> 
> Hi
> 
> Is it possible to preserve the order of the keys in a map when the map is 
> returned?:
> 
> map{"b": 2, "c": 2, "a": 3} 
> 
> return
> 
> map {
>  "a": 3,
>  "b": 2,
>  "c": 2
> } 
> 
> Thanks!
> Giuseppe
>

[basex-talk] maps

2018-07-12 Thread Giuseppe Celano

Hi

Is it possible to preserve the order of the keys in a map when the map is 
returned?:

map{"b": 2, "c": 2, "a": 3} 

return

map {
  "a": 3,
  "b": 2,
  "c": 2
} 

Thanks!
Giuseppe

Re: [basex-talk] slash operator in Basex 9.0.2

2018-07-06 Thread Giuseppe Celano

here's the plan (it actually wrongly translates / into !):

Optimized Query:
(element j { ("fgrtu") }/data(.) ! replace(., "g", "h"))
Query:
fgrtu/data(.)/replace(., "g", "h")


Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Jul 7, 2018, at 1:25 AM, Christian Grün  wrote:
> 
> Looks like a too eager optimization. Did you have a chance to look at the 
> resulting query plan?
> 
> 
> 
> Giuseppe Celano  <mailto:cel...@informatik.uni-leipzig.de>> schrieb am Fr., 6. Juli 2018, 
> 22:40:
> I have noticed that in BaseX 9.0.2 a query like
> 
> fgrtu/data(.)/replace(., "g", "h")
> 
> gets evaluated (returning "fhrtu"), while in BaseX 8.x, Exist, and Zorba I 
> get an error message (since, as expected, replace() is preceded not by a node 
> but a string). 
> 
> Is this a bug?
> 
> Ciao,
> Giuseppe
>

[basex-talk] slash operator in Basex 9.0.2

2018-07-06 Thread Giuseppe Celano

I have noticed that in BaseX 9.0.2 a query like

fgrtu/data(.)/replace(., "g", "h")

gets evaluated (returning "fhrtu"), while in BaseX 8.x, Exist, and Zorba I get 
an error message (since, as expected, replace() is preceded not by a node but a 
string). 

Is this a bug?

Ciao,
Giuseppe

Re: [basex-talk] Add line-number function

2018-07-06 Thread Giuseppe Celano

Thanks, I will experiment with this


Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Jul 6, 2018, at 2:32 PM, Christian Grün  wrote:
> 
> This is definitely something you can do in XQuery itself. Just a
> little example, to get you started (shorter suggestions are welcome):
> 
>  let $text := text { 'this is an example' }
>  let $snippets := analyze-string($text, ' ')/*
>  let $starts := (0, fold-left($snippets, (), function($list, $result) {
>let $length := string-length($result)
>return if(empty($list)) then (
>  $length
>) else (
>  ($list, $list[last()] + $length)
>)
>  }))
>  for $snippet at $pos in $snippets
>  where local-name($snippet) = 'non-match'
>  return {
> $snippet/text() }
> 
> Cheers,
> Christian
> 
> 
> On Fri, Jul 6, 2018 at 1:59 PM Giuseppe Celano
>  wrote:
>> 
>> Yes, fn:path (not fn:node)!
>> 
>> the following works
>> 
>> this is an example/nom/fn:path(.)
>> 
>> with the useful result
>> 
>> Q{http://www.w3.org/2005/xpath-functions}root()/Q{}nom[1]
>> 
>> but the following does not (because tokenize() does not return a node)
>> 
>> this is an example/tokenize(nom, " ")/fn:path(.)
>> 
>> what I was looking for is a function returning something like
>> 
>> http://www.w3.org/2005/xpath-functions}root()/Q{}nom[1]" 
>> start="0" end="3">this
>> http://www.w3.org/2005/xpath-functions}root()/Q{}nom[1]" 
>> start="5" end="6">is
>> http://www.w3.org/2005/xpath-functions}root()/Q{}nom[1]" 
>> start="8" end="9">an
>> http://www.w3.org/2005/xpath-functions}root()/Q{}nom[1]" 
>> start="11" end="17">example
>> 
>> 
>>> On Jul 6, 2018, at 1:39 PM, Christian Grün  
>>> wrote:
>>> 
>>> Hi Giuseppe,
>>> 
>>>> fn:node() returns the path to a node (including the text node): Is there a 
>>>> similar function to get character offsets within a text node?
>>> 
>>> I am not sure what you need. Do you talk about fn:path? What could the
>>> character offset be / do you have an example?
>>> 
>>> Grazie,
>>> Christian
>>> 
>>> 
>>> 
>>>> On Jul 6, 2018, at 10:24 AM, Christian Grün  
>>>> wrote:
>>>> 
>>>> Hi Symantis,
>>>> 
>>>> The original line numbers are not stored in XML databases (they may
>>>> change after updated, and would consume additional memory), so you
>>>> won’t be able to retrieve them with XQuery.
>>>> 
>>>> As far as I know, this does not work in eXist-db either; the eXist
>>>> link you referenced gives you the line of the util:line-number
>>>> expression in your XQuery module. As Fabrice pointed out (thanks!),
>>>> this could also be realized with $err:line-number.
>>>> 
>>>> With Saxon, it works indeed. However, you’ll need you use the -l
>>>> command line option (otherwise, due to performance considerations,
>>>> line numbers will be discarded as well).
>>>> 
>>>> On query/database level, there are two ways to get a direct reference:
>>>> • With fn:path, you get an XPath expression that points to your node.
>>>> • With db:node-pre [1], you get a direct reference to the node in a 
>>>> database.
>>>> 
>>>> Best,
>>>> Christian
>>>> 
>>>> [1] http://docs.basex.org/wiki/Database_Module#db:node-id
>>>> 
>>>> 
>>>> On Thu, Jul 5, 2018 at 5:49 PM Fabrice ETANCHAUD
>>>>  wrote:
>>>> 
>>>> 
>>>> As BaseX does not work on the XML textual representation, it might not be 
>>>> possible.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la 
>>>> part de ? ??
>>>> Envoyé : jeudi 5 juillet 2018 17:10
>>>> À : basex-talk@mailman.uni-konstanz.de
>>>> Objet : [basex-talk] Add line-number function
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Hello, could the $err:line-number [1] variable help you ?
>>>> 
>>>> [1] http://docs.basex.org/wiki/XQuery_3.0#Try.2FCatch
>>>> 
>>>> Best regards,
>>>> 
>>>> Fabrice ETANCHAUD
>>>> cerfrancepch
>>>> 
>>>> No, $err:line-number show line number of xquery file.
>>>> I want this:
>>>> 
>>>> Example.xml ->
>>>> 1: 
>>>> 2:   
>>>> 3:  text1
>>>> 4:  text2
>>>> 5:  text3
>>>> 6:  text4
>>>> 7:
>>>> 8: 
>>>> 
>>>> Xquery ->
>>>> let $f := doc("example.xml")
>>>> let $e := $f/root/child[1]/grandchild[3]
>>>> 
>>>> let $line := line-number($e)
>>>> 
>>>> And I want get $line = 5 !
>>>> 
>>>> 
>>>> 
>>> 
>> 
>

Re: [basex-talk] Add line-number function

2018-07-06 Thread Giuseppe Celano

Yes, fn:path (not fn:node)!

the following works

this is an example/nom/fn:path(.)  

with the useful result

Q{http://www.w3.org/2005/xpath-functions}root()/Q{}nom[1]

but the following does not (because tokenize() does not return a node)

this is an example/tokenize(nom, " ")/fn:path(.)

what I was looking for is a function returning something like

http://www.w3.org/2005/xpath-functions}root()/Q{}nom[1]" start="0" 
end="3">this
http://www.w3.org/2005/xpath-functions}root()/Q{}nom[1]" start="5" 
end="6">is
http://www.w3.org/2005/xpath-functions}root()/Q{}nom[1]" start="8" 
end="9">an
http://www.w3.org/2005/xpath-functions}root()/Q{}nom[1]" start="11" 
end="17">example


> On Jul 6, 2018, at 1:39 PM, Christian Grün  wrote:
> 
> Hi Giuseppe,
> 
>> fn:node() returns the path to a node (including the text node): Is there a 
>> similar function to get character offsets within a text node?
> 
> I am not sure what you need. Do you talk about fn:path? What could the
> character offset be / do you have an example?
> 
> Grazie,
> Christian
> 
> 
> 
>> On Jul 6, 2018, at 10:24 AM, Christian Grün  
>> wrote:
>> 
>> Hi Symantis,
>> 
>> The original line numbers are not stored in XML databases (they may
>> change after updated, and would consume additional memory), so you
>> won’t be able to retrieve them with XQuery.
>> 
>> As far as I know, this does not work in eXist-db either; the eXist
>> link you referenced gives you the line of the util:line-number
>> expression in your XQuery module. As Fabrice pointed out (thanks!),
>> this could also be realized with $err:line-number.
>> 
>> With Saxon, it works indeed. However, you’ll need you use the -l
>> command line option (otherwise, due to performance considerations,
>> line numbers will be discarded as well).
>> 
>> On query/database level, there are two ways to get a direct reference:
>> • With fn:path, you get an XPath expression that points to your node.
>> • With db:node-pre [1], you get a direct reference to the node in a database.
>> 
>> Best,
>> Christian
>> 
>> [1] http://docs.basex.org/wiki/Database_Module#db:node-id
>> 
>> 
>> On Thu, Jul 5, 2018 at 5:49 PM Fabrice ETANCHAUD
>>  wrote:
>> 
>> 
>> As BaseX does not work on the XML textual representation, it might not be 
>> possible.
>> 
>> 
>> 
>> 
>> 
>> De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la 
>> part de ? ??
>> Envoyé : jeudi 5 juillet 2018 17:10
>> À : basex-talk@mailman.uni-konstanz.de
>> Objet : [basex-talk] Add line-number function
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Hello, could the $err:line-number [1] variable help you ?
>> 
>> [1] http://docs.basex.org/wiki/XQuery_3.0#Try.2FCatch
>> 
>> Best regards,
>> 
>> Fabrice ETANCHAUD
>> cerfrancepch
>> 
>> No, $err:line-number show line number of xquery file.
>> I want this:
>> 
>> Example.xml ->
>> 1: 
>> 2:   
>> 3:  text1
>> 4:  text2
>> 5:  text3
>> 6:  text4
>> 7:
>> 8: 
>> 
>> Xquery ->
>> let $f := doc("example.xml")
>> let $e := $f/root/child[1]/grandchild[3]
>> 
>> let $line := line-number($e)
>> 
>> And I want get $line = 5 !
>> 
>> 
>> 
>

Re: [basex-talk] Add line-number function

2018-07-06 Thread Giuseppe Celano

fn:node() returns the path to a node (including the text node): Is there a 
similar function to get character offsets within a text node? 

I am thinking of a case where, for example, one tokenizes a text within an 
element and would like to get the xpath + offsets for every token.


> On Jul 6, 2018, at 10:24 AM, Christian Grün  wrote:
> 
> Hi Symantis,
> 
> The original line numbers are not stored in XML databases (they may
> change after updated, and would consume additional memory), so you
> won’t be able to retrieve them with XQuery.
> 
> As far as I know, this does not work in eXist-db either; the eXist
> link you referenced gives you the line of the util:line-number
> expression in your XQuery module. As Fabrice pointed out (thanks!),
> this could also be realized with $err:line-number.
> 
> With Saxon, it works indeed. However, you’ll need you use the -l
> command line option (otherwise, due to performance considerations,
> line numbers will be discarded as well).
> 
> On query/database level, there are two ways to get a direct reference:
> • With fn:path, you get an XPath expression that points to your node.
> • With db:node-pre [1], you get a direct reference to the node in a database.
> 
> Best,
> Christian
> 
> [1] http://docs.basex.org/wiki/Database_Module#db:node-id
> 
> 
> On Thu, Jul 5, 2018 at 5:49 PM Fabrice ETANCHAUD
>  wrote:
>> 
>> As BaseX does not work on the XML textual representation, it might not be 
>> possible.
>> 
>> 
>> 
>> 
>> 
>> De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la 
>> part de ? ??
>> Envoyé : jeudi 5 juillet 2018 17:10
>> À : basex-talk@mailman.uni-konstanz.de
>> Objet : [basex-talk] Add line-number function
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Hello, could the $err:line-number [1] variable help you ?
>> 
>> [1] http://docs.basex.org/wiki/XQuery_3.0#Try.2FCatch
>> 
>> Best regards,
>> 
>> Fabrice ETANCHAUD
>> cerfrancepch
>> 
>> No, $err:line-number show line number of xquery file.
>> I want this:
>> 
>> Example.xml ->
>> 1: 
>> 2:   
>> 3:  text1
>> 4:  text2
>> 5:  text3
>> 6:  text4
>> 7:
>> 8: 
>> 
>> Xquery ->
>> let $f := doc("example.xml")
>> let $e := $f/root/child[1]/grandchild[3]
>> 
>> let $line := line-number($e)
>> 
>> And I want get $line = 5 !
>

Re: [basex-talk] file:write and arrow operator

2018-07-04 Thread Giuseppe Celano

Thanks to both of you! This is very helpful. I will experiment with both 
solutions.

Ciao,
Giuseppe

> On Jul 4, 2018, at 6:21 PM, Giuseppe Celano 
>  wrote:
> 
> Hi All,
> 
> I was wondering if there is a way to take full advantage of the arrow 
> operator with file:write(). If I want to write the results of a query, it 
> would be ideal, I think, if the first parameter of file:write() were the 
> content to write and the second the path: in this case I could have: my query 
> => file:write("myPath"), where I could easily comment out "=> 
> file:write("myPath")", if needed. However, the first parameter of 
> file:write() is the path, so I end up with using the function in the way I 
> usually use all the other functions (this is not a big problem, but I was 
> wondering if I could take advantage of the arrow operator here: wouldn't it 
> be better to have the path as the second parameter in file:write()?).
> 
> Ciao,
> Giuseppe
> 
>

[basex-talk] file:write and arrow operator

2018-07-04 Thread Giuseppe Celano

Hi All,

I was wondering if there is a way to take full advantage of the arrow operator 
with file:write(). If I want to write the results of a query, it would be 
ideal, I think, if the first parameter of file:write() were the content to 
write and the second the path: in this case I could have: my query => 
file:write("myPath"), where I could easily comment out "=> 
file:write("myPath")", if needed. However, the first parameter of file:write() 
is the path, so I end up with using the function in the way I usually use all 
the other functions (this is not a big problem, but I was wondering if I could 
take advantage of the arrow operator here: wouldn't it be better to have the 
path as the second parameter in file:write()?).

Ciao,
Giuseppe

Re: [basex-talk] GUI

2018-06-29 Thread Giuseppe Celano

Yes, the problem with the GUI started with Java 10. I was reluctant to upgrade 
to Java 10, but I was assuming I could still use BaseXGUI 8.x.


> On Jun 29, 2018, at 7:59 PM, Christian Grün  wrote:
> 
> Hi Giuseppe,
> 
> You’ll probably need to wait until we have worked on the issue that Alex has 
> referenced in an earlier reply. For now, I recommend you to stick with Java 
> 8. As Oracle will only provide LTS versions for Java 8 and 11 (but not for 
> version 9 and 10), Java 11 will be the next version that we’ll officially 
> support.
> 
> Did you time to check if the errors you reported in your last mail are 
> dependent on the Java version you are using?
> 
> Best,
> Christian
> 
> 
> 
> 
> 
> Giuseppe Celano  <mailto:cel...@informatik.uni-leipzig.de>> schrieb am Fr., 29. Juni 2018, 
> 19:55:
> Hi Christian,
> 
> After I installed Java 10, I cannot use BaseXGUI 8.x anymore. The standalone 
> access of BaseX 8.x is possible, but not with the GUI. The following is the 
> error message (which is partially the same I get with BaseXGUI 9.0, but in 
> that case I can still open the GUI):
> 
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.basex.gui.GUIMacOSX 
> (file:/Users/mycomputer/Desktop/basex%20867/BaseX.jar) to method 
> com.apple.eawt.Application.getApplication()
> WARNING: Please consider reporting this to the maintainers of 
> org.basex.gui.GUIMacOSX
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> Failed to initialize native Mac OS X interface
> 
> 
> Thanks!
> Giuseppe
> 
> 
>> On Jun 29, 2018, at 6:04 PM, Christian Grün > <mailto:christian.gr...@gmail.com>> wrote:
>> 
>> Hi Giuseppe,
>> 
>> Did this happen with BaseX 8, too? Does it make a difference which
>> Java version you are using?
>> 
>> Cheers,
>> Christian
>> 
>> 
>> On Fri, Jun 29, 2018 at 5:04 PM Giuseppe Celano
>> mailto:cel...@informatik.uni-leipzig.de>> 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> I have also noticed that after evaluating a query, BaseXGUI continues to 
>>> "work", absorbing CPU  (as I can see in my "Activity monitor" and in the 
>>> "Used Memory" field of the GUI). I do not know if this is somehow related 
>>> to the org.basex.gui.GUIMacOSX problem, but it happens regularly.
>>> 
>>> Best,
>>> Giuseppe
>>> 
>>> 
>>> Universität Leipzig
>>> Institute of Computer Science, NLP
>>> Augustusplatz 10
>>> 04109 Leipzig
>>> Deutschland
>>> E-mail: cel...@informatik.uni-leipzig.de 
>>> <mailto:cel...@informatik.uni-leipzig.de>
>>> E-mail: giuseppegacel...@gmail.com <mailto:giuseppegacel...@gmail.com>
>>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/ 
>>> <http://www.dh.uni-leipzig.de/wo/team/>
>>> Web site 2: https://sites.google.com/site/giuseppegacelano/ 
>>> <https://sites.google.com/site/giuseppegacelano/>
>>> 
>>> On Jun 21, 2018, at 9:48 AM, Alexander Holupirek >> <mailto:a...@holupirek.de>> wrote:
>>> 
>>> Hi Guiseppe,
>>> 
>>> it is a known issue and refers to macOS-specific features that have been 
>>> removed, starting in JDK 9.
>>> 
>>> We've already started to prepare a fix and I've just filed an issue [1] for 
>>> it.
>>> 
>>> All the best,
>>> Alex
>>> 
>>> [1] https://github.com/BaseXdb/basex/issues/1582 
>>> <https://github.com/BaseXdb/basex/issues/1582>
>>> 
>>> On 20. Jun 2018, at 21:28, Giuseppe Celano 
>>> >> <mailto:cel...@informatik.uni-leipzig.de>> wrote:
>>> 
>>> Hi,
>>> 
>>> I  have updated Java (10 from 8) and I cannot apparently customize the GUI 
>>> anymore on my Mac (if I click on BaseXGUI > aboutBaseXGUI, I cannot access 
>>> the relevant tabs). Is this a known issue? Moreover, if I start the GUI 
>>> from the command line, I keep getting the warning message "Illegal 
>>> reflective access by org.basex.gui.GUIMacOSX". Is there a way to avoid 
>>> that? Thanks.
>>> 
>>> Ciao,
>>> Giuseppe
>>> 
>>> 
>>> Universität Leipzig
>>> Institute of Computer Science, NLP
>>> Augustusplatz 10
>>> 04109 Leipzig
>>> Deutschland
>>> E-mail: cel...@informatik.uni-leipzig.de 
>>> <mailto:cel...@informatik.uni-leipzig.de>
>>> E-mail: giuseppegacel...@gmail.com <mailto:giuseppegacel...@gmail.com>
>>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/ 
>>> <http://www.dh.uni-leipzig.de/wo/team/>
>>> Web site 2: https://sites.google.com/site/giuseppegacelano/ 
>>> <https://sites.google.com/site/giuseppegacelano/>
>>> 
>>> 
>>> 
>>> 
>> 
> 
>

Re: [basex-talk] GUI

2018-06-29 Thread Giuseppe Celano

Hi Christian,

After I installed Java 10, I cannot use BaseXGUI 8.x anymore. The standalone 
access of BaseX 8.x is possible, but not with the GUI. The following is the 
error message (which is partially the same I get with BaseXGUI 9.0, but in that 
case I can still open the GUI):

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.basex.gui.GUIMacOSX 
(file:/Users/mycomputer/Desktop/basex%20867/BaseX.jar) to method 
com.apple.eawt.Application.getApplication()
WARNING: Please consider reporting this to the maintainers of 
org.basex.gui.GUIMacOSX
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
Failed to initialize native Mac OS X interface


Thanks!
Giuseppe


> On Jun 29, 2018, at 6:04 PM, Christian Grün  wrote:
> 
> Hi Giuseppe,
> 
> Did this happen with BaseX 8, too? Does it make a difference which
> Java version you are using?
> 
> Cheers,
> Christian
> 
> 
> On Fri, Jun 29, 2018 at 5:04 PM Giuseppe Celano
>  wrote:
>> 
>> Hi,
>> 
>> I have also noticed that after evaluating a query, BaseXGUI continues to 
>> "work", absorbing CPU  (as I can see in my "Activity monitor" and in the 
>> "Used Memory" field of the GUI). I do not know if this is somehow related to 
>> the org.basex.gui.GUIMacOSX problem, but it happens regularly.
>> 
>> Best,
>> Giuseppe
>> 
>> 
>> Universität Leipzig
>> Institute of Computer Science, NLP
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
>> On Jun 21, 2018, at 9:48 AM, Alexander Holupirek  wrote:
>> 
>> Hi Guiseppe,
>> 
>> it is a known issue and refers to macOS-specific features that have been 
>> removed, starting in JDK 9.
>> 
>> We've already started to prepare a fix and I've just filed an issue [1] for 
>> it.
>> 
>> All the best,
>> Alex
>> 
>> [1] https://github.com/BaseXdb/basex/issues/1582
>> 
>> On 20. Jun 2018, at 21:28, Giuseppe Celano 
>>  wrote:
>> 
>> Hi,
>> 
>> I  have updated Java (10 from 8) and I cannot apparently customize the GUI 
>> anymore on my Mac (if I click on BaseXGUI > aboutBaseXGUI, I cannot access 
>> the relevant tabs). Is this a known issue? Moreover, if I start the GUI from 
>> the command line, I keep getting the warning message "Illegal reflective 
>> access by org.basex.gui.GUIMacOSX". Is there a way to avoid that? Thanks.
>> 
>> Ciao,
>> Giuseppe
>> 
>> 
>> Universität Leipzig
>> Institute of Computer Science, NLP
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
>> 
>> 
>> 
>

Re: [basex-talk] GUI

2018-06-29 Thread Giuseppe Celano

Hi,

I have also noticed that after evaluating a query, BaseXGUI continues to 
"work", absorbing CPU  (as I can see in my "Activity monitor" and in the "Used 
Memory" field of the GUI). I do not know if this is somehow related to the 
org.basex.gui.GUIMacOSX problem, but it happens regularly.

Best,
Giuseppe 


Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Jun 21, 2018, at 9:48 AM, Alexander Holupirek  wrote:
> 
> Hi Guiseppe,
> 
> it is a known issue and refers to macOS-specific features that have been 
> removed, starting in JDK 9.
> 
> We've already started to prepare a fix and I've just filed an issue [1] for 
> it.
> 
> All the best,
>   Alex
> 
> [1] https://github.com/BaseXdb/basex/issues/1582
> 
>> On 20. Jun 2018, at 21:28, Giuseppe Celano 
>>  wrote:
>> 
>> Hi,
>> 
>> I  have updated Java (10 from 8) and I cannot apparently customize the GUI 
>> anymore on my Mac (if I click on BaseXGUI > aboutBaseXGUI, I cannot access 
>> the relevant tabs). Is this a known issue? Moreover, if I start the GUI from 
>> the command line, I keep getting the warning message "Illegal reflective 
>> access by org.basex.gui.GUIMacOSX". Is there a way to avoid that? Thanks.
>> 
>> Ciao,
>> Giuseppe
>> 
>> 
>> Universität Leipzig
>> Institute of Computer Science, NLP
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
> 
>

[basex-talk] GUI

2018-06-20 Thread Giuseppe Celano

Hi,

I  have updated Java (10 from 8) and I cannot apparently customize the GUI 
anymore on my Mac (if I click on BaseXGUI > aboutBaseXGUI, I cannot access the 
relevant tabs). Is this a known issue? Moreover, if I start the GUI from the 
command line, I keep getting the warning message "Illegal reflective access by 
org.basex.gui.GUIMacOSX". Is there a way to avoid that? Thanks.

Ciao,
Giuseppe


Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

Re: [basex-talk] Data analysis

2018-05-26 Thread Giuseppe Celano

Hi Tim,

You can serialize your data as you prefer in BaseX [1]: therefore you can 
easily make your computations in XML and then output whatever format is 
required for your visualization tool. 
For a fully automated approach, you can also take advantage of the Process 
Module [2], which enables you to invoke data/visualization functions from other 
programming languages by giving as an input what you create in BaseX.

Ciao,
Giuseppe

[1] http://docs.basex.org/wiki/Serialization 

[2] http://docs.basex.org/wiki/Process_Module 

Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On May 26, 2018, at 4:36 PM, Tim Clement  wrote:
> 
> Power BI

[basex-talk] -math:log10(1)

2018-05-26 Thread Giuseppe Celano

-  math:log10(1) returns -0 but -0 returns 0: is there a reason for that?

Thanks!
Giuseppe


Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

Re: [basex-talk] Atomization

2018-05-24 Thread Giuseppe Celano

Hi  Christian,

Thank you for your help! To summarize (also for the benefit of other users), 
while it is true that in XQuery data($d/aspect-values/@sign) = "yes" and 
$d/aspect-values/@sign = "yes" are equivalent (because of atomization), the use 
of data() enables the user to
prevent the use of a certain index in BaseX (so this is a BaseX-specific 
feature). Paying attention to how BaseX uses indexes (which can be seen in the 
GUI Info panel) seems to be particularly important when join operations between 
documents are done: as far as I understand, which index and how these indexes 
are used automatically by BaseX cannot be predicted in advance, so what one can 
do is to actually try to use the data() function in order to test which index 
use turns out to be the best (especially when the query evaluates slowly).

Is this correct?

Thank you again!
Giuseppe 


Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On May 23, 2018, at 2:44 PM, Christian Grün <christian.gr...@gmail.com> wrote:
> 
> Hi Giuseppe,
> 
> I think your observation was related to another issue that has already
> been fixed recently. Did you try the latest snapshot [1]?
> 
> Btw, in your specific query I noticed that the data() may indeed be
> helpful to suppress the index rewriting for the last condition. As
> it’s the only one that has a static comparison string, it will be the
> one that will be chosen for index access, but for your data, it will
> actually be better if one of the other two conditions will be
> evaluated by the index.
> 
> Thanks for the sample documents,
> Christian
> 
> PS: 9.0.2 will be available until end of May.
> 
> [1] http://files.basex.org/releases/latest/
> 
> 
> 
> On Tue, May 22, 2018 at 5:22 PM, Giuseppe Celano
> <cel...@informatik.uni-leipzig.de> wrote:
>> I think I have identified a problem with atomization of attribute content
>> (no database involved). I have a simple query:
>> 
>> for $s in doc("doc1")//s//t
>> for $d in doc("doc2")//case
>> where  $d/verb_lemma = $s/@l and $d//verb_form/@value = $s/@f and
>> $d/aspect-values/@sign = "yes"
>> return
>> $s
>> 
>> In order to get a result, I (necessarily) need to use the data() function in
>> data($d/aspect-values/@sign) = "yes", otherwise the query never returns a
>> result. Is this a bug?
>> I would expect that the value of @sign is automatically atomized and
>> compared to "yes", but this does not seem the case.
>> Thanks.
>> 
>> Ciao,
>> Giuseppe
>> 
>> Universität Leipzig
>> Institute of Computer Science, NLP
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
>

[basex-talk] Atomization

2018-05-22 Thread Giuseppe Celano

I think I have identified a problem with atomization of attribute content (no 
database involved). I have a simple query:

for $s in doc("doc1")//s//t
for $d in doc("doc2")//case
where  $d/verb_lemma = $s/@l and $d//verb_form/@value = $s/@f and 
$d/aspect-values/@sign = "yes"
return
$s

In order to get a result, I (necessarily) need to use the data() function in 
data($d/aspect-values/@sign) = "yes", otherwise the query never returns a 
result. Is this a bug? 
I would expect that the value of @sign is automatically atomized and compared 
to "yes", but this does not seem the case.
Thanks.

Ciao,
Giuseppe

Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

Re: [basex-talk] How to use BaseX on MacBook? (Urgent!)

2018-05-16 Thread Giuseppe Celano

Hi Ben,

If you already use BaseX on a Linux machine, you already know how to use it on 
a Mac :) Simply download and unzip the file

http://files.basex.org/releases/9.0.1/BaseX901.zip

and then click on BaseX.jar if you want to access the GUI quickly, or type one 
of the commands in the bin folder for more access options. More information 
here:

http://docs.basex.org/wiki/Startup

Have fun.

Best,
Giuseppe 

PS: One of the advantages of using Basex is its amazing simplicity of use and 
installation across different operating systems!

On Wed, May 16, 2018 at 10:12 AM, Ben Engbers > wrote:
Hi,

If we manage to install BaseX on a MacBook, chances are great that we
will use BaseX dor our final project.

I know how to install BaseX on linux but I have no experience with
Apple. My fellow-students know how to use applications but don't know
how to deal with java-applications.

My question is if BaseX can be used on a MacBook. If so, where can I
find instructions?

Cheers,
Ben Engbers

-- 
Dr. Giuseppe G. A. Celano
Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig (DE)
E-mail: giuseppegacel...@gmail.com 
giuseppe.cel...@tufts.edu 
cel...@informatik.uni-leipzig.de 

Web site: https://sites.google.com/site/giuseppegacelano/ 

Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On May 16, 2018, at 10:12 AM, Ben Engbers  wrote:
> 
> Hi,
> 
> If we manage to install BaseX on a MacBook, chances are great that we
> will use BaseX dor our final project.
> 
> I know how to install BaseX on linux but I have no experience with
> Apple. My fellow-students know how to use applications but don't know
> how to deal with java-applications.
> 
> My question is if BaseX can be used on a MacBook. If so, where can I
> find instructions?
> 
> Cheers,
> Ben Engbers

Re: [basex-talk] Unexpected unary lookup result

2018-05-11 Thread Giuseppe Celano

The error is in Basex 9.0.1 but not in 8.6.7. It seems to be caused by the 
presence of the function within the lambda function.

Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On May 11, 2018, at 12:52 PM, Sebastian Zimmer 
> <sebastian.zim...@uni-koeln.de> wrote:
> 
> Sorry to bother you again, but I think there is still something wrong with my 
> code and I can't figure it out. This time I checked it consistently on BaseX 
> 9.0.1 (Windows and Linux, console and web server), results are always the 
> same:
> 
> xquery version "3.1";
> declare namespace array = "http://www.w3.org/2005/xpath-functions/array; 
> <http://www.w3.org/2005/xpath-functions/array>;
> 
> let $array1 := []
> let $array2 := array:for-each([], function($i) { lower-case($i) })
> let $array3 := array:for-each([], function($i) { $i + 1 })
> 
> return (
>   empty($array1!?*),
>   empty($array2!?*),
>   empty($array3!?*),
>   empty(
> for $i in 1 to array:size($array1)
> return $array1($i)
>   ),
>   empty(
> for $i in 1 to array:size($array2)
> return $array2($i)
>   ),
>   empty(
> for $i in 1 to array:size($array3)
> return $array3($i)
>   )
> )
> The results I get are:
> true
> false
> true
> true
> true
> true
> 
> Why isn't the second result "true" too?
> 
> Best regards,
> Sebastian
> 
> Am 11.05.2018 um 11:12 schrieb Sebastian Zimmer:
>> Hi again,
>> 
>> the problem is gone now after a reboot. It seems that the web server was 
>> running on another version while the console was running with 9.0.1
>> 
>> Sorry for the inconvenience.
>> 
>> Best,
>> Sebastian
>> 
>> Am 11.05.2018 um 11:04 schrieb Sebastian Zimmer:
>>> Hi Giuseppe,
>>> 
>>> thanks for checking. I double-checked again. The problem is even weirder 
>>> now:
>>> 
>>> When using the console, I too get 2x true:
>>> 
>>> $ ./bin/basex "./webapp/array_test.xql"
>>> true
>>> true
>>> 
>>> When using the web server, I still get this:
>>> 
>>> $ curl localhost:8994/rest?run=array_test.xql
>>> false
>>> true
>>> At first I thought there was some cache at work, preventing the update, but 
>>> it doesn't seem to be the case. I can edit the XQL und both outputs change 
>>> accordingly, but the first boolean is still different.
>>> 
>>> Best regards,
>>> Sebastian
>>> 
>>> Am 11.05.2018 um 08:28 schrieb Giuseppe Celano:
>>>> Hi Sebastian,
>>>> 
>>>> In my Basex 9.0.1 and 8.6.7 you get two "true". 
>>>> 
>>>> Best,
>>>> Giuseppe
>>>> 
>>>> Universität Leipzig
>>>> Institute of Computer Science, NLP
>>>> Augustusplatz 10
>>>> 04109 Leipzig
>>>> Deutschland
>>>> E-mail: cel...@informatik.uni-leipzig.de 
>>>> <mailto:cel...@informatik.uni-leipzig.de>
>>>> E-mail: giuseppegacel...@gmail.com <mailto:giuseppegacel...@gmail.com>
>>>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/ 
>>>> <http://www.dh.uni-leipzig.de/wo/team/>
>>>> Web site 2: https://sites.google.com/site/giuseppegacelano/ 
>>>> <https://sites.google.com/site/giuseppegacelano/>
>>>> 
>>>>> On May 11, 2018, at 1:50 AM, Sebastian Zimmer 
>>>>> <sebastian.zim...@uni-koeln.de <mailto:sebastian.zim...@uni-koeln.de>> 
>>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I have this script where I use the lookup operator to perform a unary 
>>>>> lookup:
>>>>> 
>>>>> xquery version "3.1";
>>>>> declare namespace array = "http://www.w3.org/2005/xpath-functions/array; 
>>>>> <http://www.w3.org/2005/xpath-functions/array>;
>>>>> 
>>>>> let $array := []
>>>>> 
>>>>> return (
>>>>>   empty($array!?*),   (: returns false :)
>>>>>   empty(
>>>>> for $i in 1 to array:size($array)
>>>>> return $array($i)
>>>>>   )  (: returns true  :)
>>>>> )
>>>>> I'm

Re: [basex-talk] Unexpected unary lookup result

2018-05-11 Thread Giuseppe Celano

Hi Sebastian,

In my Basex 9.0.1 and 8.6.7 you get two "true". 

Best,
Giuseppe

Universität Leipzig
Institute of Computer Science, NLP
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On May 11, 2018, at 1:50 AM, Sebastian Zimmer  
> wrote:
> 
> Hi,
> 
> I have this script where I use the lookup operator to perform a unary lookup:
> 
> xquery version "3.1";
> declare namespace array = "http://www.w3.org/2005/xpath-functions/array; 
> ;
> 
> let $array := []
> 
> return (
>   empty($array!?*),   (: returns false :)
>   empty(
> for $i in 1 to array:size($array)
> return $array($i)
>   )  (: returns true  :)
> )
> I'm curious that the first expression returns false even though it should be 
> equivalent to the second expression, if I read the XQuery spec [1] right:
> 
> If the context item is an array:
> If the KeySpecifier 
> 
>  is a wildcard ("*"), the UnaryLookup 
>  
> operator is equivalent to the following expression:
> for $k in 1 to array:size(.)
> return .($k)
> But maybe I'm missing something. I'd be glad if you could help.
> Best regards,
> Sebastian Zimmer
> [1] https://www.w3.org/TR/2017/REC-xquery-31-20170321/#id-unary-lookup 
> 
> -- 
> Sebastian Zimmer
> sebastian.zim...@uni-koeln.de 
>  
> Cologne Center for eHumanities 
> DH Center at the University of Cologne
>  @CCeHum 
>

Re: [basex-talk] database creation baseX.8.6.7

2018-04-23 Thread Giuseppe Celano

Yes, I confirm it is a bit longer than using the command via the GUI.


> On Apr 23, 2018, at 5:03 PM, Christian Grün <christian.gr...@gmail.com> wrote:
> 
> Hi Fabrice,
> 
>> I am a curious man,
> 
> …a good thing ;)
> 
>> Is the Pending Update List bypassed in some way when using addcache ?
> 
> If addcache is set, the nodes to be added will be written to a
> temporary database on disk. This way you can circumvent the main
> memory limitations, but it takes a bit longer (this is is why it’s an
> optional feature).
> 
> Cheers,
> Christian
> 
> 
> 
>> -Message d'origine-
>> De : basex-talk-boun...@mailman.uni-konstanz.de 
>> [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Giuseppe 
>> Celano
>> Envoyé : lundi 23 avril 2018 16:53
>> À : Christian Grün
>> Cc : BaseX
>> Objet : Re: [basex-talk] database creation baseX.8.6.7
>> 
>> Yes, with "addcache" it works! Thanks to both of you.
>> 
>> 
>>> On Apr 23, 2018, at 3:56 PM, Christian Grün <christian.gr...@gmail.com> 
>>> wrote:
>>> 
>>> Hi Giuseppe,
>>> 
>>> Apart rom Fabrice’s helpful hints, you could try to enable the
>>> ADDCACHE option [1].
>>> 
>>> Cheers,
>>> Christian
>>> 
>>> [1] http://docs.basex.org/wiki/Database_Module#db:create
>>> 
>>> On Mon, Apr 23, 2018 at 3:03 PM, Giuseppe Celano
>>> <cel...@informatik.uni-leipzig.de> wrote:
>>>> Hi All,
>>>> 
>>>> I can create a database via the GUI, but if I use db:create [1] I get the 
>>>> message "out of main memory": why? Thanks!
>>>> 
>>>> db:create("myDB",
>>>> "sourceDirectory",
>>>> "destinationDirectory",
>>>> map{"ftindex": true(), "language": false()}
>>>>)
>>>> 
>>>> Best,
>>>> Giuseppe
>>>> 
>>> 
>> 
>

Re: [basex-talk] database creation baseX.8.6.7

2018-04-23 Thread Giuseppe Celano

Yes, with "addcache" it works! Thanks to both of you.


> On Apr 23, 2018, at 3:56 PM, Christian Grün <christian.gr...@gmail.com> wrote:
> 
> Hi Giuseppe,
> 
> Apart rom Fabrice’s helpful hints, you could try to enable the
> ADDCACHE option [1].
> 
> Cheers,
> Christian
> 
> [1] http://docs.basex.org/wiki/Database_Module#db:create
> 
> On Mon, Apr 23, 2018 at 3:03 PM, Giuseppe Celano
> <cel...@informatik.uni-leipzig.de> wrote:
>> Hi All,
>> 
>> I can create a database via the GUI, but if I use db:create [1] I get the 
>> message "out of main memory": why? Thanks!
>> 
>> db:create("myDB",
>> "sourceDirectory",
>> "destinationDirectory",
>> map{"ftindex": true(), "language": false()}
>> )
>> 
>> Best,
>> Giuseppe
>> 
>

[basex-talk] database creation baseX.8.6.7

2018-04-23 Thread Giuseppe Celano

Hi All,

I can create a database via the GUI, but if I use db:create [1] I get the 
message "out of main memory": why? Thanks! 

db:create("myDB",
"sourceDirectory",
"destinationDirectory",
map{"ftindex": true(), "language": false()}
 )

Best,
Giuseppe

Re: [basex-talk] Unicode problem with database

2018-04-18 Thread Giuseppe Celano

Thanks for such a quick fix!

Best,
Giuseppe

Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Apr 18, 2018, at 7:04 PM, Christian Grün <christian.gr...@gmail.com> wrote:
> 
> Hi GIuseppe,
> 
> That’s a critical bug indeed, which we stumbled upon by ourselves last
> weekend: Short strings with non-ASCII Unicode characters were wrongly
> compressed, due to a buggy integer shift of a signed byte value [1].
> 
> A new snapshot is available [2]. Affected databases will need to be
> recreated (sorry). BaseX 9.0.1 will be released next week.
> 
> Thanks for letting us know!
> Christian
> 
> [1] 
> https://github.com/BaseXdb/basex/commit/9882669ad7b65bd51bc1d720c44d7c97df4685ff
> [2] http://files.basex.org/releases/8.6.7/
> 
> 
> 
> On Wed, Apr 18, 2018 at 3:54 PM, Giuseppe Celano
> <cel...@informatik.uni-leipzig.de> wrote:
>> Hi,
>> 
>> I tried to create a database containing the attached file and I get some 
>> Unicode character converted into "carriage return" characters. Any hint? 
>> Thanks!
>> 
>> Ciao,
>> Giuseppe
>> 
>> 
>> 
>> 
>> Universität Leipzig
>> Institute of Computer Science
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
>> 
>

[basex-talk] Collection function

2018-04-17 Thread Giuseppe Celano

Hi,

It seems there is an error with the collection function. Something like this:

collection("directory")[5]

does not return anything in 9.0 but it does in 8.6.7

Best,
Giuseppe


Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

Re: [basex-talk] GUI

2018-04-16 Thread Giuseppe Celano

Yes, I can now see it! Thanks!

Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Apr 16, 2018, at 11:05 AM, Andy Bunce <bunce.a...@gmail.com> wrote:
> 
> Hi Giuseppe,
> 
> It has been moved to be the 1st button on the toolbar "New"(or ctl-T)
> 
> /Andy
> 
> On 16 April 2018 at 09:55, Giuseppe Celano <cel...@informatik.uni-leipzig.de 
> <mailto:cel...@informatik.uni-leipzig.de>> wrote:
> I see that in the 9.0 version the "+ button" to add a new tab is missing. I 
> think it was very useful: can it be re-introduced in the following releases?
> 
> Best,
> Giuseppe
> 
> 
> Universität Leipzig
> Institute of Computer Science
> Augustusplatz 10
> 04109 Leipzig
> Deutschland
> E-mail: cel...@informatik.uni-leipzig.de 
> <mailto:cel...@informatik.uni-leipzig.de>
> E-mail: giuseppegacel...@gmail.com <mailto:giuseppegacel...@gmail.com>
> Web site 1: http://www.dh.uni-leipzig.de/wo/team/ 
> <http://www.dh.uni-leipzig.de/wo/team/>
> Web site 2: https://sites.google.com/site/giuseppegacelano/ 
> <https://sites.google.com/site/giuseppegacelano/>
> 
>

[basex-talk] GUI

2018-04-16 Thread Giuseppe Celano

I see that in the 9.0 version the "+ button" to add a new tab is missing. I 
think it was very useful: can it be re-introduced in the following releases?

Best,
Giuseppe


Universität Leipzig
Institute of Computer Science
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

Re: [basex-talk] child node problem

2018-04-04 Thread Giuseppe Celano

Thanks for this quick reply!

Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Apr 4, 2018, at 10:58 AM, Christian Grün <christian.gr...@gmail.com> wrote:
> 
> Hi Giuseppe,
> 
> A bug fix is available for the issue you’ve reported [1].
> 
> If you don’t want to switch over to the snapshot, you can rewrite your
> query as follows until 9.0.1 is released:
> 
>  for $ee in collection("my-path-to-files")
>  where $ee//case/aspect-values/@sign = "yes"
>  return $ee
> 
> Hope this helps, thanks for the kudos,
> Christian
> 
> [1] http://files.basex.org/releases/latest/
> 
> 
> 
> On Tue, Mar 27, 2018 at 6:49 PM, Giuseppe Celano
> <cel...@informatik.uni-leipzig.de> wrote:
>> Hi All,
>> 
>> Thanks for this new release, which looks great!
>> 
>> I have found a problem though (see error message below), when running a 
>> query like:
>> 
>> for $ee in collection("my-path-to-files")
>> where $ee//case[./aspect-values[@sign = "yes"]]
>> return
>> $ee
>> 
>> This works in version 8.6.7. The problem seems to be [./aspect-values[@sign 
>> = "yes"]]: if I change it in  [//aspect-values[@sign = "yes"]] it works fine 
>> also in 9.0. Any idea? Thanks.
>> 
>> Best,
>> Giuseppe
>> 
>> ERROR MESSAGE
>> 
>> Error:
>> Improper use? Potential bug? Your feedback is welcome:
>> Contact: basex-talk@mailman.uni-konstanz.de
>> Version: BaseX 9.0
>> Java: Oracle Corporation, 1.8.0_151
>> OS: Mac OS X, x86_64
>> Stack Trace:
>> java.lang.ClassCastException: org.basex.query.value.item.Dummy cannot be 
>> cast to org.basex.query.value.node.DBNode
>>at org.basex.query.expr.path.InvDocTest.get(InvDocTest.java:56)
>>at org.basex.query.expr.path.Path.index(Path.java:692)
>>at org.basex.query.expr.path.Path.optimize(Path.java:174)
>>at org.basex.query.expr.path.Path.optimize(Path.java:180)
>>at org.basex.query.expr.path.Path.compile(Path.java:157)
>>at org.basex.query.expr.gflwor.Where.compile(Where.java:53)
>>at org.basex.query.expr.gflwor.Where.compile(Where.java:1)
>>at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:101)
>>at org.basex.query.scope.MainModule.comp(MainModule.java:83)
>>at org.basex.query.QueryCompiler.compile(QueryCompiler.java:114)
>>at org.basex.query.QueryCompiler.compile(QueryCompiler.java:105)
>>at org.basex.query.QueryContext.compile(QueryContext.java:299)
>>at org.basex.query.QueryProcessor.compile(QueryProcessor.java:79)
>>at org.basex.core.cmd.AQuery.query(AQuery.java:78)
>>at org.basex.core.cmd.XQuery.run(XQuery.java:22)
>>at org.basex.core.Command.run(Command.java:257)
>>at org.basex.core.Command.execute(Command.java:93)
>>at org.basex.gui.GUI.exec(GUI.java:427)
>>at org.basex.gui.GUI.lambda$4(GUI.java:370)
>>at java.lang.Thread.run(Thread.java:748)
>> Compiling:
>> - pre-evaluate fn:collection([uri]) to document-node() sequence: 
>> collection("/Users/mycomputer/Documents/... -> (db:open-pre("documents", 0), 
>> ...)
>> - rewrite cached path to = operator: *:aspect-values -> 
>> (*:aspect-values/@*:sign = "yes")
>> - rewrite cached path to = operator: *:aspect-values -> 
>> (*:aspect-values/@*:sign = "yes")
>> - rewrite descendant-or-self step(s)
>> - apply attribute index for "yes"
>> Optimized Query:
>> for $ee_0 in (db:open-pre("documents", 0), ...) where 
>> $ee_0/descendant-or-self::node()/descendant::*:case[(*:aspect-values/@*:sign 
>> = "yes")] return $ee_0
>> Query:
>> for $ee in 
>> collection("/Users/mycomputer/Documents/act/03_p-valuesAfterReductionToPerfImp/documents")
>>  where $ee//case[./aspect-values[@sign = "yes"]] return $ee
>> 
>> 
>> Universität Leipzig
>> Institute of Computer Science, Digital Humanities
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
>

[basex-talk] child node problem

2018-03-27 Thread Giuseppe Celano

Hi All,

Thanks for this new release, which looks great!

I have found a problem though (see error message below), when running a query 
like:

for $ee in collection("my-path-to-files")
where $ee//case[./aspect-values[@sign = "yes"]]
return
$ee

This works in version 8.6.7. The problem seems to be [./aspect-values[@sign = 
"yes"]]: if I change it in  [//aspect-values[@sign = "yes"]] it works fine also 
in 9.0. Any idea? Thanks.

Best,
Giuseppe

ERROR MESSAGE

Error:
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 9.0
Java: Oracle Corporation, 1.8.0_151
OS: Mac OS X, x86_64
Stack Trace: 
java.lang.ClassCastException: org.basex.query.value.item.Dummy cannot be cast 
to org.basex.query.value.node.DBNode
at org.basex.query.expr.path.InvDocTest.get(InvDocTest.java:56)
at org.basex.query.expr.path.Path.index(Path.java:692)
at org.basex.query.expr.path.Path.optimize(Path.java:174)
at org.basex.query.expr.path.Path.optimize(Path.java:180)
at org.basex.query.expr.path.Path.compile(Path.java:157)
at org.basex.query.expr.gflwor.Where.compile(Where.java:53)
at org.basex.query.expr.gflwor.Where.compile(Where.java:1)
at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:101)
at org.basex.query.scope.MainModule.comp(MainModule.java:83)
at org.basex.query.QueryCompiler.compile(QueryCompiler.java:114)
at org.basex.query.QueryCompiler.compile(QueryCompiler.java:105)
at org.basex.query.QueryContext.compile(QueryContext.java:299)
at org.basex.query.QueryProcessor.compile(QueryProcessor.java:79)
at org.basex.core.cmd.AQuery.query(AQuery.java:78)
at org.basex.core.cmd.XQuery.run(XQuery.java:22)
at org.basex.core.Command.run(Command.java:257)
at org.basex.core.Command.execute(Command.java:93)
at org.basex.gui.GUI.exec(GUI.java:427)
at org.basex.gui.GUI.lambda$4(GUI.java:370)
at java.lang.Thread.run(Thread.java:748)
Compiling:
- pre-evaluate fn:collection([uri]) to document-node() sequence: 
collection("/Users/mycomputer/Documents/... -> (db:open-pre("documents", 0), 
...)
- rewrite cached path to = operator: *:aspect-values -> 
(*:aspect-values/@*:sign = "yes")
- rewrite cached path to = operator: *:aspect-values -> 
(*:aspect-values/@*:sign = "yes")
- rewrite descendant-or-self step(s)
- apply attribute index for "yes"
Optimized Query:
for $ee_0 in (db:open-pre("documents", 0), ...) where 
$ee_0/descendant-or-self::node()/descendant::*:case[(*:aspect-values/@*:sign = 
"yes")] return $ee_0
Query:
for $ee in 
collection("/Users/mycomputer/Documents/act/03_p-valuesAfterReductionToPerfImp/documents")
 where $ee//case[./aspect-values[@sign = "yes"]] return $ee


Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

[basex-talk] update Java

2018-01-15 Thread Giuseppe Celano

Hi,

I write to ask whether it is now advisable to update to Java 9 (while using the 
BaseX 8.6.x). Thanks.

Best,
Giuseppe

[basex-talk] XPath generator

2017-11-17 Thread Giuseppe Celano

Hi All,

I would like to ask what the best way is in BaseX to create XPath expressions 
once I identify a certain span in an XML file. More concretely, I usually 
tokenize a text contained in an XML document, and I would like to specify for 
each token its position in the original document. Thanks.

Best,
Giuseppe

Re: [basex-talk] db:text() vs XPath

2017-09-19 Thread Giuseppe Celano

Yes, this works!

Thanks,
Giuseppe

Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Sep 19, 2017, at 5:48 PM, Christian Grün  wrote:
> 
> for $u in db:open("db1")/text/s/t
>  group by $k := $u/f || "#" || $u/@o
>  where db:open("db2")/text/line[text() = string($k)]
>  let $n := count($u)
>  order by $n descending
>  return $k || " " || $n

Re: [basex-talk] db:text() vs XPath

2017-09-19 Thread Giuseppe Celano

Hi Christian,

It works only if I substitute your where clause with 

where db:text("db2", $k)

Ciao,
Giuseppe


> On Sep 19, 2017, at 4:15 PM, Christian Grün  wrote:
> 
> where db:open("db2")/text/line[text() = $k]

[basex-talk] db:text() vs XPath

2017-09-19 Thread Giuseppe Celano

I am using BaseX 8.6.4 and I am trying to do a group-by/order-by operation, and 
I see that two logically equivalent queries perform very differently: one 
cannot see the end, while the other can (and fast). I can provide further 
details if necessary, but these are the queries (look at the last line of both, 
where the difference is):

(This does not work:)

declare variable $p := db:open("db2")/text/line/text(); (: returns a list of 
values like ΠΕΡΙ#n-s---mv- :)

for $u in db:open("db1")/text/s/t
let $a := $u/f/text()   (: 
returns one value like ΠΕΡΙ :)
let $b := $u/@o/data(.)(: 
returns one value like n-s---mv- :)
group by $k := $a || "#" || $b(: builds 
a value like ΠΕΡΙ#n-s---mv- :)
let $n := count($u)
order by $n descending
return
if ($p = $k) then $k || " " || $n  else ()

(This works:)

for $u in db:open("db1")/text/s/t
let $a := $u/f/text()   (: 
returns one value like ΠΕΡΙ :)
let $b := $u/@o/data(.)(: 
returns one value like n-s---mv- :)
group by $k := $a || "#" || $b(: builds 
a value like ΠΕΡΙ#n-s---mv- :)
let $n := count($u)
order by $n descending
return
if (db:text("db2", $k)) then $k || " " || $n  else ()

The problem is with directly accessing the database with db:text() or 
indirectly using XPATH. I would tend to use XPATH and expect to get an 
underlying translation into db:text(), but this seems not to happen: why?

Best,
Giuseppe

Re: [basex-talk] Reading JSON

2017-08-15 Thread Giuseppe Celano

Hi Christian,

I confirm that the command you use always outputs very similar time results, 
even when you manually repeat it for 3 or 4 times, i.e., running it without 
using "-r". However, if you run (manually) for a few times each of the queries 
from the GUI, I get differences (total time from the Query Info Panel, for a 
333 KB file):

parse-json(file:read-text('text.txt') = around 40 ms
json-doc('text.txt') = around 80 ms
parse-json(unparsed-text('text.txt') = around 120 ms

Looking at the values reported in the Query Info Panel, the "compiling time" 
seems to be responsible for that.

Best,
Giuseppe

Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Aug 15, 2017, at 1:28 AM, Christian Grün <christian.gr...@gmail.com> wrote:
> 
> Hi Giuseppe,
> 
> I did some tests on command-line:
> 
>  basex -v -z -r100 "parse-json(unparsed-text('example.json'))"
>  basex -v -z -r100 "parse-json(file:read-text('example.json'))"
>  basex -v -z -r100 "json-doc('example.json')"
> 
> I tested the calls with a small and a large file (10 KB, 1.5 MB), and
> evaluation times were very similar, so I guess I need some more input
> to reproduce your results.
> 
> Best,
> Christian
> 
> 
> 
> On Mon, Aug 14, 2017 at 6:18 PM, Giuseppe Celano
> <cel...@informatik.uni-leipzig.de> wrote:
>> Hi Christian,
>> 
>> The latter option. I just opened a file and run the same query repeatedly. 
>> It is not an in-depth comparison at all, but the times shown in the Query 
>> Info were clearly different (even if just ms).
>> 
>> Best,
>> Giuseppe
>> 
>> 
>> 
>> Universität Leipzig
>> Institute of Computer Science, Digital Humanities
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
>>> On Aug 14, 2017, at 5:55 PM, Christian Grün <christian.gr...@gmail.com> 
>>> wrote:
>>> 
>>> Hi Giuseppe,
>>> 
>>> The semantics of the used functions are slightly different, but I
>>> guess the differences in terms of performance should be rather
>>> marginal. Just in case: Have you have done some in-depth comparisons
>>> that I could have a look at? Did you work with large input files, or
>>> did you run the queries repeatedly and look at the average times?
>>> 
>>> Cheers,
>>> Christian
>>> 
>>> 
>>> 
>>>> I have noticed different speeds when running the following functions (from 
>>>> slowest to fastest):
>>>> 
>>>> parse-json(unparsed-text('example.txt'))
>>>> json-doc("example.txt")
>>>> parse-json(file:read-text('example.txt'))
>>>> 
>>>> similarly for documents on the web:
>>>> 
>>>> parse-doc('http://example.com/text')
>>>> parse-json(fetch:text('http://example.com/text'))
>>>> 
>>>> Does this make sense to you? Is there any recommendation to follow? 
>>>> Despite the runtime speed difference, I admit I love writing one single 
>>>> function (json-doc) to get map conversion. Everything is so immediate :)
>>>> 
>>>> Best,
>>>> Giuseppe
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Universität Leipzig
>>>> Institute of Computer Science, Digital Humanities
>>>> Augustusplatz 10
>>>> 04109 Leipzig
>>>> Deutschland
>>>> E-mail: cel...@informatik.uni-leipzig.de
>>>> E-mail: giuseppegacel...@gmail.com
>>>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>>>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>>>> 
>>> 
>> 
>

Re: [basex-talk] HTTP module

2017-08-14 Thread Giuseppe Celano

Thanks, Andy. I have also tried to invoke curl via proc:execute():

proc:execute("curl",("-F", "data=@example.txt",  "-F", "tagger=", "-F", 
"parser=",  "http://lindat.mff.cuni.cz/services/udpipe/api/process; ))

The function works, but unfortunately the text inside the file is not 
recognized as UTF-8, and so I get al lot of gibberish in the result. At the 
beginning I though it was due to
my MacOS configuration, but I experimented a lot, and the problem seems to 
depend on BaseX. 

I run the basexgui (and basex) commands of the bin folder from my Terminal 
window and they should inherit the environment variables (and indeed 
proc:execute("locale") also shows the right UTF-8 values).

I will open a Github issue, unless I am missing something here.

Re: [basex-talk] HTTP module

2017-08-14 Thread Giuseppe Celano

Thanks, Kendall, I tried but it does not work :( 

Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Aug 14, 2017, at 6:53 PM, Kendall Shaw  wrote:
> 
> http:send-request( href='http://lindat.mff.cuni.cz/services/udpipe/api/process 
> '>
> 
> tokenizertaggerparserdata=Děti pojedou k babičce. Už se těší.
> 
> )

Re: [basex-talk] Reading JSON

2017-08-14 Thread Giuseppe Celano

Hi Christian,

The latter option. I just opened a file and run the same query repeatedly. It 
is not an in-depth comparison at all, but the times shown in the Query Info 
were clearly different (even if just ms).

Best,
Giuseppe



Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On Aug 14, 2017, at 5:55 PM, Christian Grün  wrote:
> 
> Hi Giuseppe,
> 
> The semantics of the used functions are slightly different, but I
> guess the differences in terms of performance should be rather
> marginal. Just in case: Have you have done some in-depth comparisons
> that I could have a look at? Did you work with large input files, or
> did you run the queries repeatedly and look at the average times?
> 
> Cheers,
> Christian
> 
> 
> 
>> I have noticed different speeds when running the following functions (from 
>> slowest to fastest):
>> 
>> parse-json(unparsed-text('example.txt'))
>> json-doc("example.txt")
>> parse-json(file:read-text('example.txt'))
>> 
>> similarly for documents on the web:
>> 
>> parse-doc('http://example.com/text')
>> parse-json(fetch:text('http://example.com/text'))
>> 
>> Does this make sense to you? Is there any recommendation to follow? Despite 
>> the runtime speed difference, I admit I love writing one single function 
>> (json-doc) to get map conversion. Everything is so immediate :)
>> 
>> Best,
>> Giuseppe
>> 
>> 
>> 
>> 
>> 
>> Universität Leipzig
>> Institute of Computer Science, Digital Humanities
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
>

Re: [basex-talk] HTTP module

2017-08-14 Thread Giuseppe Celano

Hi Andy,

Thanks for the help. If I pass an entire text via the url, though, I get the 
error message "Request-URI Too Large". 

Best,
Giuseppe

Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On 14 Aug 2017, at 14:11, Giuseppe Celano <cel...@informatik.uni-leipzig.de> 
> wrote:
> 
> Hi,
> 
> I am accessing a RESTful API via the following command:
> 
> curl -F data=@example.txt <mailto:data=@example.txt>  -F tokenizer= -F 
> tagger= -F parser= http://lindat.mff.cuni.cz/services/udpipe/api/process 
> <http://lindat.mff.cuni.cz/services/udpipe/api/process> > example2.txt
> 
> I am wondering what the best way is to do that in BaseX. The service also has 
> a URL syntax, as shown in the following example:
> 
> http://lindat.mff.cuni.cz/services/udpipe/api/process?tokenizer=D%C4%9Bti%20pojedou%20k%20babi%C4%8Dce.%20U%C5%BE%20se%20t%C4%9B%C5%A1%C3%AD.
>  
> <http://lindat.mff.cuni.cz/services/udpipe/api/process?tokenizer=D%C4%9Bti%20pojedou%20k%20babi%C4%8Dce.%20U%C5%BE%20se%20t%C4%9B%C5%A1%C3%AD.>
>   
> 
> I have tried:
> 
> http:send-request( href='http://lindat.mff.cuni.cz/services/udpipe/api/process?tokenizertaggerparserdata=D%C4%9Bti%20pojedou%20k%20babi%C4%8Dce.%20U%C5%BE%20se%20t%C4%9B%C5%A1%C3%AD'/
>  
> <http://lindat.mff.cuni.cz/services/udpipe/api/process?tokenizertaggerparserdata=D%C4%9Bti%20pojedou%20k%20babi%C4%8Dce.%20U%C5%BE%20se%20t%C4%9B%C5%A1%C3%AD'/>>)
> 
> and works perfectly;  but I was trying to put the body of the request in 
> http:body, but it does not work:
> 
> http:send-request( href='http://lindat.mff.cuni.cz/services/udpipe/api/process 
> <http://lindat.mff.cuni.cz/services/udpipe/api/process>'>
> 
> tokenizertaggerparserdata=Děti pojedou k babičce. Už se těší.
> 
> )
> 
> I could invoke the curl command in BaseX, but maybe there is a more elegant 
> way to send the content of the file (than adding it to the URL). Thanks.
> 
> Ciao,
> Giuseppe
> 
> 
> Universität Leipzig
> Institute of Computer Science, Digital Humanities
> Augustusplatz 10
> 04109 Leipzig
> Deutschland
> E-mail: cel...@informatik.uni-leipzig.de 
> <mailto:cel...@informatik.uni-leipzig.de>
> E-mail: giuseppegacel...@gmail.com <mailto:giuseppegacel...@gmail.com>
> Web site 1: http://www.dh.uni-leipzig.de/wo/team/ 
> <http://www.dh.uni-leipzig.de/wo/team/>
> Web site 2: https://sites.google.com/site/giuseppegacelano/ 
> <https://sites.google.com/site/giuseppegacelano/>
>

[basex-talk] Reading JSON

2017-08-14 Thread Giuseppe Celano

Hi,

I have noticed different speeds when running the following functions (from 
slowest to fastest):

parse-json(unparsed-text('example.txt'))
json-doc("example.txt")
parse-json(file:read-text('example.txt'))

similarly for documents on the web:

parse-doc('http://example.com/text')
parse-json(fetch:text('http://example.com/text'))

Does this make sense to you? Is there any recommendation to follow? Despite the 
runtime speed difference, I admit I love writing one single function (json-doc) 
to get map conversion. Everything is so immediate :)

Best,
Giuseppe





Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

[basex-talk] HTTP module

2017-08-14 Thread Giuseppe Celano

Hi,

I am accessing a RESTful API via the following command:

curl -F data=@example.txt  -F tokenizer= -F tagger= -F parser= 
http://lindat.mff.cuni.cz/services/udpipe/api/process > example2.txt

I am wondering what the best way is to do that in BaseX. The service also has a 
URL syntax, as shown in the following example:

http://lindat.mff.cuni.cz/services/udpipe/api/process?tokenizer=D%C4%9Bti%20pojedou%20k%20babi%C4%8Dce.%20U%C5%BE%20se%20t%C4%9B%C5%A1%C3%AD.
 

  

I have tried:

http:send-request()

and works perfectly;  but I was trying to put the body of the request in 
http:body, but it does not work:

http:send-request(

tokenizertaggerparserdata=Děti pojedou k babičce. Už se těší.

)

I could invoke the curl command in BaseX, but maybe there is a more elegant way 
to send the content of the file (than adding it to the URL). Thanks.

Ciao,
Giuseppe


Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

Re: [basex-talk] Differences in serialization of arrays with JSON vs. adaptive methods

2017-08-10 Thread Giuseppe Celano

Hi Joe,

I am happy to hear you are also spreading the word! XQuery has a most clean 
data model, and BaseX has implemented and extended the language so efficiently 
and elegantly.

Best,
Giuseppe 

Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On 10 Aug 2017, at 16:35, Joe Wicentowski  wrote:
> 
> Hi all,
> 
> First, I'm just back from DH2017, where Clifford Anderson and I taught two 
> workshops on XQuery using BaseX, along with eXist and Saxon.  BaseX performed 
> like a champ.  We were able to configure the GUI window to show just the 
> query and results windows—perfect when you're projecting the screen in a 
> large room and want everyone to see.  Many thanks for such a great teaching 
> tool!  (Our materials are at 
> https://github.com/CliffordAnderson/XQuery4Humanists 
> .)
> 
> Back to the topic of this post, though, I noticed a slight difference between 
> BaseX's serialization of arrays when using JSON vs. adaptive methods: with 
> JSON, the array's items are separated by newlines, whereas with adaptive, the 
> items are separated by spaces.  This is interesting since the serialization 
> spec notes that the adaptive method delegates the handling of the "indent" 
> parameter to JSON.  Some code to reproduce this is below.
> 
> I'm curious to know - is there a particular reason for this difference?
> 
> Thanks,
> Joe
> 
> 
> serialization-test.xq
> ```xquery
> xquery version "3.1";
> 
> declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization 
> ";
> let $array := ["Cheapside","London","Dean Prior","Devon"]
> for $method in ("json", "adaptive")
> let $serialization-parameters := 
>   
> {$method}
> yes
>   
> return
>   fn:serialize($array, $serialization-parameters)
> ```
> 
> serialization-test_results.txt
> ```txt
> [
>   "Cheapside",
>   "London",
>   "Dean Prior",
>   "Devon"
> ]
> ["Cheapside", "London", "Dean Prior", "Devon"]
> ```

Re: [basex-talk] Join operation and the database

2017-07-27 Thread Giuseppe Celano

Hi Christian,

Let's recapitulate:

If I compare values just using one indexed string (the one in @v), this is the 
fastest way (about one second on my machine).

If I compare against two distinct indexed values, their order matters, in that 
-if I understand correctly- the database uses the index only(?) for the first 
values.

I see that [p = $t/@o and f = $t] is much slower than [f = $t and p = $t/@o]. I 
calculated that on average f contains about  8 characters while p always 
contains 9. However, (Ancient Greek) characters  in f are heavier ( 2 or 3 
bytes each) than the (Latin) ones in p (1 byte each). Can this be the reason 
why [f = $t and p = $t/@o] is evaluated faster?

Best,
Giuseppe
 
Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On 27 Jul 2017, at 14:31, Christian Grün <christian.gr...@gmail.com> wrote:
> 
> Hi Giuseppe,
> 
> Thanks for the new query.
> 
> If you have a look at the query info, you will see that your query
> will in fact be rewritten to take advantage from the index structures:
> 
>  for $t_2 in document-node {"tlg0001.tlg001.perseus-grc2.xml"}/*:text/*:s/*:t
>  return db:text("splitted-db",
> $t_2/@*:o)/parent::*:p/parent::*:d[(*:f = $t_2/text())]
> 
> As your input document contains 45.667 texts, however, 45.667 index
> lookups will need to be performed, and this can take a while if the
> index results have a low selectivity.
> 
> However, there’s a chance to speed up your query. You have two
> competing index candidates:
> 
>  let $match := $lemm//d[./p = $t/@o and ./f = $t/text()]
> 
> As it is not possible to statically assess which one will be faster,
> the first candidate will be rewritten to an index request. In your
> specific case, you will get much better performing by moving the first
> comparison to the first place:
> 
>  let $match := $lemm//d[./f = $t/text() and ./p = $t/@o]
> 
> Here is a short version of your query that takes around 10 seconds on
> my machine (it doesn’t really matter if you move the tests in separate
> predicates):
> 
>  declare variable $txts := doc("tlg0001.tlg001.perseus-grc2.xml");
>  declare variable $lemm := db:open("splitted-db");
>  for $t in $txts//t
>  return $lemm//d[f = $t][p = $t/@o]
> 
> One obvious alternative (that we already discussed offline) is to
> store repeatedly accessed values in a map. This way, you can get
> evaluation times less than a second.
> 
> Hope this helps,
> Christian
> 
> 
> 
> On Thu, Jul 27, 2017 at 2:10 PM, Giuseppe Celano
> <cel...@informatik.uni-leipzig.de> wrote:
>> Hi Christian,
>> 
>> These are the queries:
>> 
>> (: This works :)
>> 
>> declare variable $txts := doc("tlg0001.tlg001.perseus-grc2.xml");
>> declare variable $lemm := db:open("splitted-db"); (: see link sent earlier
>> :)
>> for $t in $txts//t
>> let $match := $lemm//d[./@v = $t/@o || "#" || $t/text()]
>> return
>> $match
>> 
>> 
>> (: This does not work :)
>> 
>> declare variable $txts := doc("tlg0001.tlg001.perseus-grc2.xml");
>> declare variable $lemm := db:open("splitted-db");
>> for $t in $txts//t
>> let $match := $lemm//d[./p = $t/@o and ./f = $t/text()]
>> return
>> $match
>> 
>> 
>> 
>> 
>> Universität Leipzig
>> Institute of Computer Science, Digital Humanities
>> Augustusplatz 10
>> 04109 Leipzig
>> Deutschland
>> E-mail: cel...@informatik.uni-leipzig.de
>> E-mail: giuseppegacel...@gmail.com
>> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
>> Web site 2: https://sites.google.com/site/giuseppegacelano/
>> 
>> On 27 Jul 2017, at 13:48, Giuseppe Celano <cel...@informatik.uni-leipzig.de>
>> wrote:
>> 
>> Hi,
>> 
>> I performed join operations between many files and a dictionary. The files
>> contain tokenized texts, where one finds word forms + fine-grained POS tags.
>> Look at the following file:
>> 
>> https://raw.githubusercontent.com/gcelano/POStaggedAncientGreekXML/master/texts/tlg0001.tlg001.perseus-grc2.xml
>> 
>> The dictionary, which contains word forms + fine-grained POS tags + lemmas,
>> can be found here:
>> 
>> https://github.com/gcelano/LemmatizedAncientGreekXML/tree/master/uniqueTokens/values
>> 
>> I created a database for the dictionary and wrote a query (here simplified)
>> like

Re: [basex-talk] Join operation and the database

2017-07-27 Thread Giuseppe Celano

Hi Christian,

These are the queries:

(: This works :)

declare variable $txts := doc("tlg0001.tlg001.perseus-grc2.xml");
declare variable $lemm := db:open("splitted-db"); (: see link sent earlier :)
for $t in $txts//t
let $match := $lemm//d[./@v = $t/@o || "#" || $t/text()]
return
$match


(: This does not work :)

declare variable $txts := doc("tlg0001.tlg001.perseus-grc2.xml");
declare variable $lemm := db:open("splitted-db");
for $t in $txts//t
let $match := $lemm//d[./p = $t/@o and ./f = $t/text()]
return
$match
   
   


Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

> On 27 Jul 2017, at 13:48, Giuseppe Celano <cel...@informatik.uni-leipzig.de> 
> wrote:
> 
> Hi,
> 
> I performed join operations between many files and a dictionary. The files 
> contain tokenized texts, where one finds word forms + fine-grained POS tags. 
> Look at the following file:
> 
> https://raw.githubusercontent.com/gcelano/POStaggedAncientGreekXML/master/texts/tlg0001.tlg001.perseus-grc2.xml
>  
> <https://raw.githubusercontent.com/gcelano/POStaggedAncientGreekXML/master/texts/tlg0001.tlg001.perseus-grc2.xml>
> 
> The dictionary, which contains word forms + fine-grained POS tags + lemmas, 
> can be found here:
> 
> https://github.com/gcelano/LemmatizedAncientGreekXML/tree/master/uniqueTokens/values
>  
> <https://github.com/gcelano/LemmatizedAncientGreekXML/tree/master/uniqueTokens/values>
> 
> I created a database for the dictionary and wrote a query (here simplified) 
> like the following:
> 
> for $t in $s/t (: t are the tokens in the file containing the tokens :)
> let $match := $lemm//d[./p = $t/@o and ./f = $t/text()] (: $lemm//d are the 
> single entries in the dictionary :)
> return
> $match
> 
> I see that if I use this query, it is slow, as if the processor cannot use 
> the database indexes (./p and ./f). The situation does not seem to improve 
> with ./p/text() and ./f/text(), which I would assume to be equivalent to the 
> former because of atomization. On the contrary, if the same information 
> contained in ./p and ./f are merged together and put in an attribute (see @v 
> in the dictionary files) and this is compared against the values in the text 
> (after concatenating them properly), the join operation is super fast (i.e., 
> the index for the values in the attributes are used by BaseX).
> 
> Does anyone know why? I have been able to get my results via the above (slow) 
> comparison, but I would like to know what the cause of the problem was, if 
> possible. Thanks.
> 
> Best,
> Giuseppe
> 
> Universität Leipzig
> Institute of Computer Science, Digital Humanities
> Augustusplatz 10
> 04109 Leipzig
> Deutschland
> E-mail: cel...@informatik.uni-leipzig.de 
> <mailto:cel...@informatik.uni-leipzig.de>
> E-mail: giuseppegacel...@gmail.com <mailto:giuseppegacel...@gmail.com>
> Web site 1: http://www.dh.uni-leipzig.de/wo/team/ 
> <http://www.dh.uni-leipzig.de/wo/team/>
> Web site 2: https://sites.google.com/site/giuseppegacelano/ 
> <https://sites.google.com/site/giuseppegacelano/>

[basex-talk] Join operation and the database

2017-07-27 Thread Giuseppe Celano

Hi,

I performed join operations between many files and a dictionary. The files 
contain tokenized texts, where one finds word forms + fine-grained POS tags. 
Look at the following file:

https://raw.githubusercontent.com/gcelano/POStaggedAncientGreekXML/master/texts/tlg0001.tlg001.perseus-grc2.xml
 


The dictionary, which contains word forms + fine-grained POS tags + lemmas, can 
be found here:

https://github.com/gcelano/LemmatizedAncientGreekXML/tree/master/uniqueTokens/values
 


I created a database for the dictionary and wrote a query (here simplified) 
like the following:

for $t in $s/t (: t are the tokens in the file containing the tokens :)
let $match := $lemm//d[./p = $t/@o and ./f = $t/text()] (: $lemm//d are the 
single entries in the dictionary :)
return
$match

I see that if I use this query, it is slow, as if the processor cannot use the 
database indexes (./p and ./f). The situation does not seem to improve with 
./p/text() and ./f/text(), which I would assume to be equivalent to the former 
because of atomization. On the contrary, if the same information contained in 
./p and ./f are merged together and put in an attribute (see @v in the 
dictionary files) and this is compared against the values in the text (after 
concatenating them properly), the join operation is super fast (i.e., the index 
for the values in the attributes are used by BaseX).

Does anyone know why? I have been able to get my results via the above (slow) 
comparison, but I would like to know what the cause of the problem was, if 
possible. Thanks.

Best,
Giuseppe

Universität Leipzig
Institute of Computer Science, Digital Humanities
Augustusplatz 10
04109 Leipzig
Deutschland
E-mail: cel...@informatik.uni-leipzig.de
E-mail: giuseppegacel...@gmail.com
Web site 1: http://www.dh.uni-leipzig.de/wo/team/
Web site 2: https://sites.google.com/site/giuseppegacelano/

60 matches

Mail list logo