Re: [basex-talk] RESTXQ - java.lang.OutOfMemoryError: Java heap space

2020-05-29 Thread Sebastian Guerrero
Hi Christian,

Thank you for your reply, you always help me.

*- Can you share the query with us?*. Yes, of course.

The query is pretty much the same that I've sent to Fabrice:

*(# db:copynode false #) {*
*  for $case in
doc('file_A.xml')/trademark-applications-daily/application-information/file-segments/action-keys/case-file[case-file-header/status-code=(410,413,616,620,624,625,630,631,638,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,663,664,665,666,667,668,672,680,681,682,686,688,689,690,692,693,694,700,701,702,703,704,705,706,707,708,717,718,719,720,721,722,724,725,730,731,732,733,734,739,740,744,745,746,748,752,753,756,757,760,762,763,764,765,766,771,772,773,774,775,777,778,779,780,790,794,800,801,802,803,804,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,969,973)]
 *
*  return insert node $case as last into doc("US")*
*}*


And you're totally right about the security concerns: I don't offer to
anyone the access to the RESTXQ endpoint, it's only for me but from another
app ( on this case an Azure Web job instance ).

It's easy for me to do some maintenance tasks from a C# app, using the
RESTXQ, instead of directly on some BaseX GUI.

At the beginning I tried to do this:

*let $path:="A:\sources\xml\"*

*let $files:=*
*let $parts:=("US00","US01")*
*for $part in $parts*
*let $dir:= $path || $part*
*for $file in file:list($dir)*
*return $path || $part || "\" || $file*

*return*
*(# db:copynode false #) {*
*  for $file in $files  *
*  for $case in
doc($file)/trademark-applications-daily/application-information/file-segments/action-keys/case-file[case-file-header/status-code=(410,413,616,620,624,625,630,631,638,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,663,664,665,666,667,668,672,680,681,682,686,688,689,690,692,693,694,700,701,702,703,704,705,706,707,708,717,718,719,720,721,722,724,725,730,731,732,733,734,739,740,744,745,746,748,752,753,756,757,760,762,763,764,765,766,771,772,773,774,775,777,778,779,780,790,794,800,801,802,803,804,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,969,973)]
 *
*  return insert node $case as last into doc("US")*
*}*

And, of course, this ran out of memory in a matter of seconds. ( there are
more than 100 files )

So, I thought:

-* "Maybe, if I execute each of them using the RESTXQ the memory will be
released after each execution..." *

But seems I'm still missing something because the result is the same. But,
at least, I can finish the process restarting the HTTP server when it fails.

I've uploaded two of the XML files in case you want to test something with
them.[1]


Por supuesto que ayuda!
Regards,
Sebastian.

[1] https://easyupload.io/sd2ilu

On Fri, May 29, 2020 at 8:38 AM Christian Grün 
wrote:

> Hi Sebastian,
>
> In general, artifacts of updating queries should be cleaned up after
> the execution.
>
> The stack trace indicates that a very large main-memory database
> instance was created by one of your queries that exceeded the memory
> limits. Can you share the query with us?
>
> As XQuery is more powerful than pure query languages, it may be risky
> to allow the execution of arbitrary client code. First of all, you
> should ensure that the code is run with limited user permissions;
> otherwise, your system can be wiped out by a single file:delete('/',
> true()) call. Next, you could try to limit memory usage and execution
> time via the 'memory' and 'timeout' parameters [1]. However, as it’s
> close to impossible to reliably control the memory consumption of
> single threads in Java, I would rather suggest providing predefined
> user queries.
>
> Espero que esto ayude,
> Christian
>
> [1] https://docs.basex.org/wiki/XQuery_Module#xquery:eval
>
>
>
> On Thu, May 28, 2020 at 8:39 PM Sebastian Guerrero 
> wrote:
> >
> > Hi BaseX team!
> >
> > A quick question.
> >
> > Is there some known bug/common setting missing for RESTXQ and memory
> problems?
> >
> > I have this simple module ( into /webapp, a .xqm file ) :
> >
> >
> > module namespace exe = 'http://site.com/execute';
> > declare
> > %updating
> > %rest:path("update")
> > %rest:consumes("application/x-www-form-urlencoded")
> > %rest:POST function exe:update() {
> >
> > xquery:eval-update(request:parameter("query"))
> >
> > };
> >
> > I use it to execute some updates against some databases from different
> clients.
> >
> > Everything works fine by a while, but after some time I get this error
> [1]:
> >
> > java.lang.OutOfMemoryError: Java heap space
> >
> > I noticed that every call to update() the memory

[basex-talk] RESTXQ - java.lang.OutOfMemoryError: Java heap space

2020-05-28 Thread Sebastian Guerrero
Hi BaseX team!

A quick question.

Is there some known bug/common setting missing for RESTXQ and memory
problems?

I have this simple module ( into /webapp, a .xqm file ) :


*module namespace exe = 'http://site.com/execute
';*
*declare *
*%updating*
* %rest:path("update")*
* %rest:consumes("application/x-www-form-urlencoded")*
* %rest:POST function exe:update() {*

* xquery:eval-update(request:parameter("query"))*

*};*

I use it to execute some updates against some databases from
different clients.

Everything works fine by a while, but after some time I get this error [1]:

*java.lang.OutOfMemoryError: Java heap space*

I noticed that every call to *update()* the memory grows and grows until it
reaches the *OutOfMemoryError*. [2]

If I stop the HTTP server, the memory is released immediately. [3]

What I'm doing wrong?
Is there some command to execute the GC?
Is this a problem with "*xquery:eval-update()"*?
Am I using it in the wrong way?

Best regards,
Sebastian.
[1] https://imgur.com/DrcbwQg
[2] https://imgur.com/fonmrhm
[3] https://imgur.com/SYFBFK8


Re: [basex-talk] util:item - java.lang.NullPointerException

2020-05-21 Thread Sebastian Guerrero
Thank you very much, Christian.

I confirm that it's working now. [1]

Bests regards,
Sebastian

[1] https://imgur.com/NjUqqvF

On Thu, May 21, 2020 at 5:06 AM Christian Grün 
wrote:

> …fixed [1]. Thanks for the easily reproducible test case and the kudos.
>
> [1] http://files.basex.org/releases/latest/
>
>
>
> On Thu, May 21, 2020 at 1:04 AM Sebastian Guerrero 
> wrote:
> >
> > Hi BaseX team!
> >
> > I have a database with a text-index for a node named "serial-number"
> with 2,164,980 occurrences of text '000'. [1]
> >
> > If I execute :
> >
> > util:item(db:text("US02", "000"), 1)
> >
> > I get '000', as expected.
> >
> > But, If I execute:
> >
> >   util:item(db:text("US02", "000"), 2)
> >
> > I get the error message in [2]
> >
> > The same happens with
> >
> > db:text("US02", "000")[2]
> >
> > subsequence(db:text("US02", "000"),2,1)
> >
> >
> > I recorded everything on this video [3]
> >
> > Is this a bug?, I optimized the database before executing the queries.
> >
> > Best regards,
> > Sebastian.
> >
> >
> > [1] https://imgur.com/UDgXYPY
> >
> > [2] Stack Trace:
> > java.lang.NullPointerException
> > at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:46)
> > at org.basex.query.scope.MainModule.iter(MainModule.java:97)
> > at org.basex.query.QueryContext.iter(QueryContext.java:333)
> > at org.basex.query.QueryContext.cache(QueryContext.java:630)
> > at org.basex.query.QueryProcessor.cache(QueryProcessor.java:113)
> > at org.basex.core.cmd.AQuery.query(AQuery.java:101)
> > at org.basex.core.cmd.XQuery.run(XQuery.java:22)
> > at org.basex.core.Command.run(Command.java:257)
> > at org.basex.core.Command.execute(Command.java:93)
> > at org.basex.gui.GUI.exec(GUI.java:416)
> > at org.basex.gui.GUI.lambda$4(GUI.java:359)
> > at java.lang.Thread.run(Unknown Source)
> >
> > [3] https://youtu.be/gXYpBNd7iJc
>


[basex-talk] util:item - java.lang.NullPointerException

2020-05-20 Thread Sebastian Guerrero
Hi BaseX team!

I have a database with a text-index for a node named "*serial-number"*
with 2,164,980
occurrences of text '*000*'. *[1]*

If I execute :

*util:item(db:text("US02", "000"), 1)*

I get '000', as expected.

But, If I execute:

*  util:item(db:text("US02", "000"), 2)  *

I get the error message in *[2]*

The same happens with

*db:text("US02", "000")[2]*

*subsequence(db:text("US02", "000"),2,1)*


I recorded everything on this video *[3]*

Is this a bug?, I optimized the database before executing the queries.

Best regards,
Sebastian.


*[1] *https://imgur.com/UDgXYPY

*[2] *Stack Trace:
java.lang.NullPointerException
at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:46)
at org.basex.query.scope.MainModule.iter(MainModule.java:97)
at org.basex.query.QueryContext.iter(QueryContext.java:333)
at org.basex.query.QueryContext.cache(QueryContext.java:630)
at org.basex.query.QueryProcessor.cache(QueryProcessor.java:113)
at org.basex.core.cmd.AQuery.query(AQuery.java:101)
at org.basex.core.cmd.XQuery.run(XQuery.java:22)
at org.basex.core.Command.run(Command.java:257)
at org.basex.core.Command.execute(Command.java:93)
at org.basex.gui.GUI.exec(GUI.java:416)
at org.basex.gui.GUI.lambda$4(GUI.java:359)
at java.lang.Thread.run(Unknown Source)

*[3]* https://youtu.be/gXYpBNd7iJc


Re: [basex-talk] Full-text index: searches for common words in another node. Does it take a lot of time?

2020-05-18 Thread Sebastian Guerrero
Hi Fabrice!

Thanks a lot for your advice. Yes, it's a good idea. And yes, it works.

I created a separated index ( a new database ) for '*mark-identification*':

*for $db in ('US00','US01','US02')*
*let $index := {*
*  for $cases in
db:open($db)/trademark-applications-daily/application-information/file-segments/action-keys/case-file*
*  group by $text := $cases/case-file-header/mark-identification*
*  return*
*  *
*{$text}*
**
*{for $node in $cases return { db:node-id($node) }}*
**
*  *
*}*
*return db:create($db || '-mark-text', $index, $db || '-mark-text.xml')*


Of course with a full-text index for '*value*'.

So, to search I use this piece of code:

  let $text := 'corporation'
  for $db in ('US00','US01','US02')
  for $id in ft:search($db || '-mark-text', $text)/ancestor::text/nodes/id
  let $case-file := db:open-id($db, $id)
  return $case-file


And now it only takes 185ms in order to get the results and there is no
scan for the '*party-name*' values.

*- "**I really appreciated working with basex that time, because others
were in a kind of java/relational mapping hell... Me, I just had to add xml
documents, reindex, and sometimes purge deleted items."*: Oh dear, I can't
explain to you how much I'm in love with BaseX right now. Yes, trying to
manage this volume of data and translate to a SQL database is like a
Kafkaesque nightmare, not a healthy idea.

Thank you very much!
Cheers,
Sebastian.

On Mon, May 18, 2020 at 12:43 PM ETANCHAUD Fabrice <
fabrice.etanch...@maif.fr> wrote:

> Hi Sebastian,
> Yes I think your search on mark-identification suffers from the huge
> number of party-names.
> From what I remember, reverse index (from full text tokens to node ids) is
> shared across all element's names.
> so filtering on the element's name is done at last.
>
> When I was using basex to handle DOCDB patent db, I used to explode a
> document in sub-documents containing only keys and text to be indexed with
> respect to language and xml element, and then build seperate databases.
> That way I could create a dedicated full text index on a single (element
> names, language) combination.
>
> Did that help ?
>
> I really appreciated working with basex that time, because others were in
> a kind of java/relational mapping hell... Me, I just had to add xml
> documents, reindex, and sometimes purge deleted items.
>
> Best,
> Fabrice
>
> --
> *De :* BaseX-Talk  de la part
> de Sebastian Guerrero 
> *Envoyé :* lundi 18 mai 2020 17:23
> *À :* BaseX 
> *Objet :* [basex-talk] Full-text index: searches for common words in
> another node. Does it take a lot of time?
>
> Hi everybody.
>
> I'm here again with my doubts. Thank you for your patience. ^^
>
> I have a database of trademarks with a full-text index for two nodes:
> **:mark-identification,*:party-name*. [1]
>
> Where "*mark-identification*" contains the name of the trademark, and "
> *party-name*" contains the name of the owner of the trademark.
>
> I use the full-text index in order to search trademarks by its name,
> for example:
>
> *for $results in //case-file[case-file-header/mark-identification/text()
> contains text {'basex'}]*
> *return $results//mark-identification*
>
>
> returns all trademarks with "*basex*" on its name. It works like a
> thunderlight: 15ms to get 3 records among 2,134,434,598 nodes. Really a
> dream. [2]
>
> But, for example, if I change the searched text from "*basex*" by a
> common word in "*party-name*", for example, "*corporation*" ( has
> 1096187x occurrences on the full-text index as showed in [1], it's a very
> common word in owners of trademarks ):
>
> *for $results in //case-file[case-file-header/mark-identification/text()
> contains text {'corporation'}]*
> *return $results//mark-identification*
>
>
> It takes a long time to get 6,715 records: 62,000ms [3]
>
> If I search for "*live*" ( a common word for trademarks name, but not for
> owners names ) I get 5,875 records in 2,773 ms, which has not a
> relationship with the 62,000ms to get the 6k records for "*corporation*".
> [4]
>
> So...
>
>- Is this an expected behaviour?
>- Is there a way to specify which "section" of the full-text index
>should be used to perform the search? ( I don't know... maybe something
>similar to "*using stemming*" but "*using index 'mark-identification'*"
>)
>
> Please apologize me if I'm asking by something not-logical,
>
> Best regards,
> Sebastian
>
> [1] https://imgur.com/uLla1Xt
> [2] https://imgur.com/Fkcvv2O
> [3] https://imgur.com/Hk71CNe
> [4] https://imgur.com/P72k574
>
>


[basex-talk] Full-text index: searches for common words in another node. Does it take a lot of time?

2020-05-18 Thread Sebastian Guerrero
Hi everybody.

I'm here again with my doubts. Thank you for your patience. ^^

I have a database of trademarks with a full-text index for two nodes:
**:mark-identification,*:party-name*. [1]

Where "*mark-identification*" contains the name of the trademark, and "
*party-name*" contains the name of the owner of the trademark.

I use the full-text index in order to search trademarks by its name,
for example:

*for $results in //case-file[case-file-header/mark-identification/text()
contains text {'basex'}]*
*return $results//mark-identification*


returns all trademarks with "*basex*" on its name. It works like a
thunderlight: 15ms to get 3 records among 2,134,434,598 nodes. Really a
dream. [2]

But, for example, if I change the searched text from "*basex*" by a common
word in "*party-name*", for example, "*corporation*" ( has 1096187x
occurrences on the full-text index as showed in [1], it's a very common
word in owners of trademarks ):

*for $results in //case-file[case-file-header/mark-identification/text()
contains text {'corporation'}]*
*return $results//mark-identification*


It takes a long time to get 6,715 records: 62,000ms [3]

If I search for "*live*" ( a common word for trademarks name, but not for
owners names ) I get 5,875 records in 2,773 ms, which has not a
relationship with the 62,000ms to get the 6k records for "*corporation*".
[4]

So...

   - Is this an expected behaviour?
   - Is there a way to specify which "section" of the full-text index
   should be used to perform the search? ( I don't know... maybe something
   similar to "*using stemming*" but "*using index 'mark-identification'*" )

Please apologize me if I'm asking by something not-logical,

Best regards,
Sebastian

[1] https://imgur.com/uLla1Xt
[2] https://imgur.com/Fkcvv2O
[3] https://imgur.com/Hk71CNe
[4] https://imgur.com/P72k574


Re: [basex-talk] Same query returns different amount of records

2020-05-18 Thread Sebastian Guerrero
Hi Christian,

Thank you very much for your detailed answer, your comments are very useful
for me.

*- Could you check once again if this is fixed with the new snapshot?*: I
confirm it. With your new snapshot, the problem is fixed. [1].

Thank you very much for your comments about duplicate paths, you're right:
it's more performant if we write it in the other way. I've changed it. [2]

About "*ft:search*" ( and full index in general ) I've noticed a "strange"
behaviour when you perform a search using the full text.

But, I'll write about it in a separated thread to keep everything
consistent.

Best regards,
Sebastian

[1] https://imgur.com/XnRdxyD
[2] https://imgur.com/U1wo4y3





On Thu, May 14, 2020 at 11:25 AM Christian Grün 
wrote:

> Hi Sebastian,
>
> I couldn’t get this reproduced out of the box. A technical guess:
> Global full-text options may have been overwritten by
> database-specific properties in the second switch branch at compile,
> which yielded wrong/restricted results in the first branch at runtime.
>
> Could you check once again if this is fixed with the new snapshot [1]?
>
> Some more comments on your query: If you formulate duplicate paths
> only once, you might get even better performance:
>
> OLD:
>   for $a in A
>   where $a/B/C/D/E contains text { $text }
>   return $a/B/C/D/E
>
> NEW:
>   for $e in A/B/C/D/E
>   where $e contains text { $text }
>   return $e
>
> In a future version of BaseX, such patterns will automatically be
> rewritten. Currently, basic patterns are already simplified [2]:
>
>   for $e in A/B where $e/C/D return $e/C/D/E
>   → A/B[C/D]/C/D/E
>   → A/B/C/D/E
>
> The enforceindex is still in a somewhat experimental stage (hence,
> thanks for your feedback), and its behavior is sometimes surprising if
> there are various competing candidates for index rewrites in your
> expression. If you want to have more control on how your queries are
> executed, you can directly call ft:search:
>
> for $db in ('US00','US01','US02')
> return ft:search($db, $text)[parent::mark-identification]
>
> If all 'mark-identification' elements occur on the same level in your
> document, you can omit the remaining parent steps (this will further
> speed up query evaluation). A look at the optimized query in the
> InfoView panel will give you some more hints.
>
> Cheers,
> Christian
>
> [1] http://files.basex.org/releases/latest/
> [2] https://github.com/BaseXdb/basex/issues/1864
>
>
>
> On Wed, May 13, 2020 at 11:23 PM Sebastian Guerrero 
> wrote:
> >
> > Hi everyone! it's me again.
> >
> > Here is my doubt:
> >
> > If I execute this query:
> >
> >  (# db:enforceindex #) {
> >   for $db in ('US00','US01','US02')
> >   for $tmUS in
> db:open($db)/trademark-applications-daily/application-information/file-segments/action-keys/case-file
> >   where
> $tmUS/case-file-header/mark-identification/text() contains text { 'apple' }
> >   return
> $tmUS/case-file-header/mark-identification/text()
> > }
> >
> > I get 4k results in 139ms from three databases of 90GB and 13M of
> records. It works like a charm. [01]
> >
> > But, if I include that query into a for and then into a switch ( I tried
> with if-then-else too ), the same query returns only 11 results in 107ms
> [02]:
> >
> > declare namespace gb="http://www.ipo.gov.uk/schemas/tm;;
> > let $text := "apple"
> > let $registries := ('US')
> >
> > for $registry in $registries
> > return
> >   switch ($registry)
> >
> >case "US"
> >return
> >(# db:enforceindex #) {
> >   for $db in ('US00','US01','US02')
> >   for $tmUS in
> db:open($db)/trademark-applications-daily/application-information/file-segments/action-keys/case-file
> >   where
> $tmUS/case-file-header/mark-identification/text() contains text { $text }
> >   return
> $tmUS/case-file-header/mark-identification/text()
> > }
> >
> > case "GB"
> >return
> >(# db:enforceindex #) {
> >for $tmGB in
> db:open('GB')/gb:MarkLicenceeExportList/gb:TradeMark
> >  where
> $tmGB/gb:WordMarkSpecification/gb:MarkVerbalElementText/text() contains
> text { $text }
> >  return
> $tmGB/gb:WordMarkSpecification/gb:MarkVerbalElementText/text()
> > }
> >
> > 

[basex-talk] Same query returns different amount of records

2020-05-13 Thread Sebastian Guerrero
Hi everyone! it's me again.

Here is my doubt:

If I execute this query:






* (# db:enforceindex #) {
for $db in ('US00','US01','US02')  for $tmUS in
db:open($db)/trademark-applications-daily/application-information/file-segments/action-keys/case-file
where $tmUS/case-file-header/mark-identification/text()
contains text { 'apple' }  return
$tmUS/case-file-header/mark-identification/text()}*

I get 4k results in 139ms from three databases of 90GB and 13M of records.
It works like a charm. [01]

But, if I include that query into a *for* and then into a *switch* ( I
tried with *if-then-else* too ), the same query returns only 11 results in
107ms [02]:

*declare namespace gb="http://www.ipo.gov.uk/schemas/tm
";*
*let $text := "apple"*
*let $registries := ('US')*

*for $registry in $registries*
*return *
*   switch ($registry)*

*   case "US"*
*return *
*(# db:enforceindex #) {  *
*  for $db in ('US00','US01','US02')*
*  for $tmUS in
db:open($db)/trademark-applications-daily/application-information/file-segments/action-keys/case-file*
*  where $tmUS/case-file-header/mark-identification/text()
contains text { $text }*
*  return $tmUS/case-file-header/mark-identification/text()*
*}*

*case "GB"*
*return*
*   (# db:enforceindex #) {*
*   for $tmGB in
db:open('GB')/gb:MarkLicenceeExportList/gb:TradeMark*
* where
$tmGB/gb:WordMarkSpecification/gb:MarkVerbalElementText/text() contains
text { $text }*
* return
$tmGB/gb:WordMarkSpecification/gb:MarkVerbalElementText/text()*
*}*

*default return "Unknown registry code"*



I noticed that removing the case option "GB" ( even if it's not evaluated
), it works fine and returns the 4k records [03]:

*declare namespace gb="http://www.ipo.gov.uk/schemas/tm
";*
*let $text := "apple"*
*let $registries := ('US')*

*for $registry in $registries*
*return *
*   switch ($registry)*

*   case "US"*
*return *
*(# db:enforceindex #) {  *
*  for $db in ('US00','US01','US02')*
*  for $tmUS in
db:open($db)/trademark-applications-daily/application-information/file-segments/action-keys/case-file*
*  where $tmUS/case-file-header/mark-identification/text()
contains text { $text }*
*  return $tmUS/case-file-header/mark-identification/text()*
*}*

*default return "Unknown registry code"*



What I'm missing here? is this the right behaviour?

Best regards,
Sebastian

[01] https://imgur.com/o4RUUyO
[02] https://imgur.com/533c0rI
[03] https://imgur.com/mCb3qEe


Re: [basex-talk] Query optimizer bug with nested for? - java.lang.NullPointerException at Path.index(Path.java:673)

2020-05-11 Thread Sebastian Guerrero
Yes!, you were right!

I've bought a new 2TB M.2 disk to play with BaseX and I've got confused:
instead of install in *A:* I installed the new snapshot on the old *C:*

Now it works like a charm !! [1]

Thank you very very much, Christian, by your time.

Best regards!
Sebastian.

[1] https://imgur.com/DQi5aNP

On Mon, May 11, 2020 at 6:48 PM Christian Grün 
wrote:

> > I'm asking because I downloaded it and the error is still there, but now
> on line 671 instead of the original line 673.
>
> Hm, maybe it was the previous 9.3.3 snapshot you launched?
>
> Feel free to provide me with a minimized query that allows me to
> reproduce the bug if it persists. On GitHub, I have referenced a query
> that I used for testing [1].
>
> Best,
> Christian
>
> [1] https://github.com/BaseXdb/basex/issues/1860
>
>
>
>
> > Unexpected error: Improper use? Potential bug? Your feedback is welcome:
> > Contact: basex-talk@mailman.uni-konstanz.de
> > Version: BaseX 9.3.3 beta
> > Java: Oracle Corporation, 1.8.0_251
> > OS: Windows 10, amd64
> > Stack Trace:
> > java.lang.NullPointerException
> > at org.basex.query.expr.path.Path.index(Path.java:671)
> > at org.basex.query.expr.path.Path.optimize(Path.java:157)
> > at org.basex.query.expr.gflwor.For.addPredicate(For.java:190)
> > at org.basex.query.expr.gflwor.For.toPredicate(For.java:218)
> > at org.basex.query.expr.gflwor.GFLWOR.optimizeWhere(GFLWOR.java:532)
> > at org.basex.query.expr.gflwor.GFLWOR.optimize(GFLWOR.java:109)
> > at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:100)
> > at org.basex.query.expr.Extension.compile(Extension.java:45)
> >
> >
> > Cheers,
> > Sebastian.
> >
> >
> > On Mon, May 11, 2020 at 6:15 PM Christian Grün <
> christian.gr...@gmail.com> wrote:
> >>
> >> Buenas noticias: The bug is fixed. A new snapshot is waiting for you
> [1].
> >>
> >> Cheers,
> >> Christian
> >>
> >> [1] http://files.basex.org/releases/latest/
> >>
> >>
> >>
> >> On Mon, May 11, 2020 at 9:47 PM Sebastian Guerrero 
> wrote:
> >> >
> >> > Hi Christian!, thanks for your reply :-)
> >> >
> >> > I've just downloaded the latest snapshot [1] and executed the query
> [2], and yes: the problem is still there.
> >> >
> >> > Cheers,
> >> > Sebastian.
> >> >
> >> > [1] https://imgur.com/ahgMg7p
> >> > [2] https://imgur.com/tDCAtCu
> >> >
> >> > On Mon, May 11, 2020 at 3:58 PM Christian Grün <
> christian.gr...@gmail.com> wrote:
> >> >>>
> >> >>> Maybe it's a newbie issue, but I would like your comments.
> >> >>
> >> >>
> >> >> …definitely something you shouldn't encounter as a newbie either ;)
> Thanks for reporting it.
> >> >>
> >> >> Does the exception also occur with the latest snapshot [1]?
> >> >>
> >> >> Cheers
> >> >> Christian
> >> >>
> >> >> [1] http://files.basex.org/releases/latest/
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>>
> >> >>> I'm writing a RESTXQ method to search among a couple of databases.
> ( >100 databases )
> >> >>>
> >> >>> Some databases are split into a couple of parts due to the number
> of nodes. For example US: it's separated into US00, US01 and US02.
> >> >>>
> >> >>> So, my problem is:
> >> >>>
> >> >>> if I replace "for $usPart in ('US00')" by "for $usPart in
> ('US00','US01',''US02)" in QUERY [A], I get this error:
> >> >>>
> >> >>> Improper use? Potential bug? Your feedback is welcome:
> >> >>> Contact: basex-talk@mailman.uni-konstanz.de
> >> >>> Version: BaseX 9.3.2
> >> >>> Java: Oracle Corporation, 1.8.0_251
> >> >>> OS: Windows 10, amd64
> >> >>> Stack Trace:
> >> >>> java.lang.NullPointerException
> >> >>> at org.basex.query.expr.path.Path.index(Path.java:673)
> >> >>> at org.basex.query.expr.path.Path.optimize(Path.java:157)
> >> >>> at org.basex.query.expr.gflwor.For.toPredicate(For.java:220)
> >> >>> at org.basex.query.expr.gflwor.GFLWOR.optimizeWhere(GFLWOR.java:532)
> >> >>> at org.basex.query.expr.gflwor.GFLWOR.optimize(GFLWOR.java:109)

Re: [basex-talk] Query optimizer bug with nested for? - java.lang.NullPointerException at Path.index(Path.java:673)

2020-05-11 Thread Sebastian Guerrero
Hi Christian!
Wow, you are like a Terminator of coding! XD

Is *BaseX933-20200511.231351.exe *( snapshot of 23:14 ) the latest version?

I'm asking because I downloaded it and the error is still there, but now on
line 671 instead of the original line 673.

*Unexpected error: Improper use? Potential bug? Your feedback is welcome:*
*Contact: basex-talk@mailman.uni-konstanz.de
*
*Version: BaseX 9.3.3 beta*
*Java: Oracle Corporation, 1.8.0_251*
*OS: Windows 10, amd64*
*Stack Trace: *
*java.lang.NullPointerException*
* at org.basex.query.expr.path.Path.index(Path.java:671)*
* at org.basex.query.expr.path.Path.optimize(Path.java:157)*
* at org.basex.query.expr.gflwor.For.addPredicate(For.java:190)*
* at org.basex.query.expr.gflwor.For.toPredicate(For.java:218)*
* at org.basex.query.expr.gflwor.GFLWOR.optimizeWhere(GFLWOR.java:532)*
* at org.basex.query.expr.gflwor.GFLWOR.optimize(GFLWOR.java:109)*
* at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:100)*
* at org.basex.query.expr.Extension.compile(Extension.java:45)*


Cheers,
Sebastian.


On Mon, May 11, 2020 at 6:15 PM Christian Grün 
wrote:

> Buenas noticias: The bug is fixed. A new snapshot is waiting for you [1].
>
> Cheers,
> Christian
>
> [1] http://files.basex.org/releases/latest/
>
>
>
> On Mon, May 11, 2020 at 9:47 PM Sebastian Guerrero 
> wrote:
> >
> > Hi Christian!, thanks for your reply :-)
> >
> > I've just downloaded the latest snapshot [1] and executed the query [2],
> and yes: the problem is still there.
> >
> > Cheers,
> > Sebastian.
> >
> > [1] https://imgur.com/ahgMg7p
> > [2] https://imgur.com/tDCAtCu
> >
> > On Mon, May 11, 2020 at 3:58 PM Christian Grün <
> christian.gr...@gmail.com> wrote:
> >>>
> >>> Maybe it's a newbie issue, but I would like your comments.
> >>
> >>
> >> …definitely something you shouldn't encounter as a newbie either ;)
> Thanks for reporting it.
> >>
> >> Does the exception also occur with the latest snapshot [1]?
> >>
> >> Cheers
> >> Christian
> >>
> >> [1] http://files.basex.org/releases/latest/
> >>
> >>
> >>
> >>
> >>>
> >>> I'm writing a RESTXQ method to search among a couple of databases. (
> >100 databases )
> >>>
> >>> Some databases are split into a couple of parts due to the number of
> nodes. For example US: it's separated into US00, US01 and US02.
> >>>
> >>> So, my problem is:
> >>>
> >>> if I replace "for $usPart in ('US00')" by "for $usPart in
> ('US00','US01',''US02)" in QUERY [A], I get this error:
> >>>
> >>> Improper use? Potential bug? Your feedback is welcome:
> >>> Contact: basex-talk@mailman.uni-konstanz.de
> >>> Version: BaseX 9.3.2
> >>> Java: Oracle Corporation, 1.8.0_251
> >>> OS: Windows 10, amd64
> >>> Stack Trace:
> >>> java.lang.NullPointerException
> >>> at org.basex.query.expr.path.Path.index(Path.java:673)
> >>> at org.basex.query.expr.path.Path.optimize(Path.java:157)
> >>> at org.basex.query.expr.gflwor.For.toPredicate(For.java:220)
> >>> at org.basex.query.expr.gflwor.GFLWOR.optimizeWhere(GFLWOR.java:532)
> >>> at org.basex.query.expr.gflwor.GFLWOR.optimize(GFLWOR.java:109)
> >>> at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:100)
> >>> at org.basex.query.expr.Extension.compile(Extension.java:45)
> >>> at org.basex.query.expr.SwitchGroup.compile(SwitchGroup.java:40)
> >>> at org.basex.query.expr.Switch.compile(Switch.java:60)
> >>> at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:96)
> >>> at org.basex.query.expr.gflwor.ForLet.compile(ForLet.java:43)
> >>> at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:90)
> >>> at org.basex.query.scope.MainModule.comp(MainModule.java:81)
> >>> at org.basex.query.QueryCompiler.compile(QueryCompiler.java:114)
> >>> at org.basex.query.QueryCompiler.compile(QueryCompiler.java:105)
> >>> at org.basex.query.QueryContext.compile(QueryContext.java:312)
> >>> at org.basex.query.QueryProcessor.compile(QueryProcessor.java:79)
> >>>
> >>>
> >>> Using "for $usPart in ('US00')" it works without any problem. With one
> element there is no problem, with two or more it fails.
> >>>
> >>> Any ideas about what I'm doing wrong?
> >>>
> >>> Here is the
> >>> QUERY [A]
> >>>
> -

Re: [basex-talk] Query optimizer bug with nested for? - java.lang.NullPointerException at Path.index(Path.java:673)

2020-05-11 Thread Sebastian Guerrero
Hi Christian!, thanks for your reply :-)

I've just downloaded the latest snapshot [1] and executed the query [2],
and yes: the problem is still there.

Cheers,
Sebastian.

[1] https://imgur.com/ahgMg7p
[2] https://imgur.com/tDCAtCu

On Mon, May 11, 2020 at 3:58 PM Christian Grün 
wrote:

> Maybe it's a newbie issue, but I would like your comments.
>>
>
> …definitely something you shouldn't encounter as a newbie either ;) Thanks
> for reporting it.
>
> Does the exception also occur with the latest snapshot [1]?
>
> Cheers
> Christian
>
> [1] http://files.basex.org/releases/latest/
>
>
>
>
>
>> I'm writing a RESTXQ method to search among a couple of databases. ( >100
>> databases )
>>
>> Some databases are split into a couple of parts due to the number of
>> nodes. For example US: it's separated into US00, US01 and US02.
>>
>> So, my problem is:
>>
>> if I replace "*for $usPart in ('US00')*" by "*for $usPart in
>> ('US00','US01',''US02)*" in *QUERY [A]*, I get this error:
>>
>> *Improper use? Potential bug? Your feedback is welcome:*
>> *Contact: basex-talk@mailman.uni-konstanz.de
>> *
>> *Version: BaseX 9.3.2*
>> *Java: Oracle Corporation, 1.8.0_251*
>> *OS: Windows 10, amd64*
>> *Stack Trace: *
>> *java.lang.NullPointerException*
>> * at org.basex.query.expr.path.Path.index(Path.java:673)*
>> * at org.basex.query.expr.path.Path.optimize(Path.java:157)*
>> * at org.basex.query.expr.gflwor.For.toPredicate(For.java:220)*
>> * at org.basex.query.expr.gflwor.GFLWOR.optimizeWhere(GFLWOR.java:532)*
>> * at org.basex.query.expr.gflwor.GFLWOR.optimize(GFLWOR.java:109)*
>> * at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:100)*
>> * at org.basex.query.expr.Extension.compile(Extension.java:45)*
>> * at org.basex.query.expr.SwitchGroup.compile(SwitchGroup.java:40)*
>> * at org.basex.query.expr.Switch.compile(Switch.java:60)*
>> * at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:96)*
>> * at org.basex.query.expr.gflwor.ForLet.compile(ForLet.java:43)*
>> * at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:90)*
>> * at org.basex.query.scope.MainModule.comp(MainModule.java:81)*
>> * at org.basex.query.QueryCompiler.compile(QueryCompiler.java:114)*
>> * at org.basex.query.QueryCompiler.compile(QueryCompiler.java:105)*
>> * at org.basex.query.QueryContext.compile(QueryContext.java:312)*
>> * at org.basex.query.QueryProcessor.compile(QueryProcessor.java:79)*
>>
>>
>> Using "*for $usPart in ('US00')*" it works without any problem. With one
>> element there is no problem, with two or more it fails.
>>
>> Any ideas about what I'm doing wrong?
>>
>> Here is the
>> *QUERY [A]*
>>
>> -
>> declare namespace gb="http://www.ipo.gov.uk/schemas/tm;;
>>
>> let $text:="christian"
>> let $registries:=('GB','US')
>>
>> let $results :=
>>   for $registry in $registries
>>return
>>switch ($registry)
>>
>>case "US"
>>return
>>(# db:enforceindex #) {
>>   for $usPart in ('US00')
>>   for $tmUS in
>> db:open($usPart)/trademark-applications-daily/application-information/file-segments/action-keys/case-file
>>   where $tmUS/case-file-header/mark-identification/text()
>> contains text {$text} using stemming
>>   return
>> US{$tmUS/case-file-header/mark-identification/text()}
>> }
>>
>>case "GB"
>>return
>>(# db:enforceindex #) {
>>for $tmGB in
>> db:open('GB')/gb:MarkLicenceeExportList/gb:TradeMark
>>   where
>> $tmGB/gb:WordMarkSpecification/gb:MarkVerbalElementText/text() contains
>> text {$text}
>> return
>> GB{$tmGB/gb:WordMarkSpecification/gb:MarkVerbalElementText/text()}
>>
>> }
>>
>> default return ""
>>
>> return
>> 
>> {count($results)}
>> 
>> {
>> for $result in $results
>> return $result
>> }
>> 
>> 
>>
>> -
>>
>> Regards,
>> Sebastian
>>
>


[basex-talk] Query optimizer bug with nested for? - java.lang.NullPointerException at Path.index(Path.java:673)

2020-05-11 Thread Sebastian Guerrero
Hi everybody!

Maybe it's a newbie issue, but I would like your comments.

I'm writing a RESTXQ method to search among a couple of databases. ( >100
databases )

Some databases are split into a couple of parts due to the number of nodes.
For example US: it's separated into US00, US01 and US02.

So, my problem is:

if I replace "*for $usPart in ('US00')*" by "*for $usPart in
('US00','US01',''US02)*" in *QUERY [A]*, I get this error:

*Improper use? Potential bug? Your feedback is welcome:*
*Contact: basex-talk@mailman.uni-konstanz.de
*
*Version: BaseX 9.3.2*
*Java: Oracle Corporation, 1.8.0_251*
*OS: Windows 10, amd64*
*Stack Trace: *
*java.lang.NullPointerException*
* at org.basex.query.expr.path.Path.index(Path.java:673)*
* at org.basex.query.expr.path.Path.optimize(Path.java:157)*
* at org.basex.query.expr.gflwor.For.toPredicate(For.java:220)*
* at org.basex.query.expr.gflwor.GFLWOR.optimizeWhere(GFLWOR.java:532)*
* at org.basex.query.expr.gflwor.GFLWOR.optimize(GFLWOR.java:109)*
* at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:100)*
* at org.basex.query.expr.Extension.compile(Extension.java:45)*
* at org.basex.query.expr.SwitchGroup.compile(SwitchGroup.java:40)*
* at org.basex.query.expr.Switch.compile(Switch.java:60)*
* at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:96)*
* at org.basex.query.expr.gflwor.ForLet.compile(ForLet.java:43)*
* at org.basex.query.expr.gflwor.GFLWOR.compile(GFLWOR.java:90)*
* at org.basex.query.scope.MainModule.comp(MainModule.java:81)*
* at org.basex.query.QueryCompiler.compile(QueryCompiler.java:114)*
* at org.basex.query.QueryCompiler.compile(QueryCompiler.java:105)*
* at org.basex.query.QueryContext.compile(QueryContext.java:312)*
* at org.basex.query.QueryProcessor.compile(QueryProcessor.java:79)*


Using "*for $usPart in ('US00')*" it works without any problem. With one
element there is no problem, with two or more it fails.

Any ideas about what I'm doing wrong?

Here is the
*QUERY [A]*
-
declare namespace gb="http://www.ipo.gov.uk/schemas/tm;;

let $text:="christian"
let $registries:=('GB','US')

let $results :=
  for $registry in $registries
   return
   switch ($registry)

   case "US"
   return
   (# db:enforceindex #) {
  for $usPart in ('US00')
  for $tmUS in
db:open($usPart)/trademark-applications-daily/application-information/file-segments/action-keys/case-file
  where $tmUS/case-file-header/mark-identification/text()
contains text {$text} using stemming
  return
US{$tmUS/case-file-header/mark-identification/text()}
}

   case "GB"
   return
   (# db:enforceindex #) {
   for $tmGB in
db:open('GB')/gb:MarkLicenceeExportList/gb:TradeMark
  where
$tmGB/gb:WordMarkSpecification/gb:MarkVerbalElementText/text() contains
text {$text}
return
GB{$tmGB/gb:WordMarkSpecification/gb:MarkVerbalElementText/text()}

}

default return ""

return

{count($results)}

{
for $result in $results
return $result
}



-

Regards,
Sebastian


Re: [basex-talk] Bug creating full-text index

2020-05-06 Thread Sebastian Guerrero
Hi Christian,

Thank you very much for your answer.

You were right: I've added a field named "GoodsServicesDescription" into
full-index text and that was the cause of the error.
I removed it and the full-index now is created like a charm.
And yes: searching by that field is not a requirement for now.

BTW, we are evaluating BaseX to work with the UKIPO trademarks database (
several millions of records ), and so far it looks amazing.

I'll let you know how it goes.

Grüße aus dem Land der Gauchos nach Deutschland!
Sebastian.


On Wed, May 6, 2020 at 8:48 AM Christian Grün 
wrote:

> Hi Sebastian,
>
> I am afraid the input size seems to be too large for a full-text
> index. I will check if we can improve the error feedback (usually, you
> should get an info message that indicates that your data exceeds the
> maximum size for an index instance).
>
> If you don’t plan to query all texts of your document, you can reduce
> memory consumption by restricting your index to specific elements (see
> [1] for more information).
>
> Did you successfully manage to create your database without full-text
> index?
>
> Saludos (y buena salud) a Argentina,
> Christian
>
> [1] https://docs.basex.org/wiki/Indexes#Selective_Indexing
>
>
>
> On Wed, May 6, 2020 at 1:18 PM Sebastian Guerrero 
> wrote:
> >
> > Hello from Argentina!!!, thanks for the excellent software, I'm so happy
> learning a bit more about BaseX!
> >
> > Sadly I get an error trying to create a full-text index for a 51GB
> database:
> >
> > Improper use? Potential bug? Your feedback is welcome:
> > Contact: basex-talk@mailman.uni-konstanz.de
> > Version: BaseX 9.3.2
> > Java: Oracle Corporation, 1.8.0_231
> > OS: Windows 10, amd64
> > Stack Trace:
> > java.lang.NegativeArraySizeException
> > at org.basex.index.ft.FTList.next(FTList.java:93)
> > at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:239)
> > at org.basex.index.ft.FTBuilder.write(FTBuilder.java:147)
> > at org.basex.index.ft.FTBuilder.build(FTBuilder.java:86)
> > at org.basex.index.ft.FTBuilder.build(FTBuilder.java:23)
> > at org.basex.data.DiskData.createIndex(DiskData.java:198)
> > at org.basex.core.cmd.CreateIndex.create(CreateIndex.java:100)
> > at org.basex.core.cmd.Optimize.optimize(Optimize.java:181)
> > at org.basex.core.cmd.Optimize.optimize(Optimize.java:163)
> > at org.basex.core.cmd.Optimize.optimize(Optimize.java:92)
> > at org.basex.core.cmd.Optimize.finish(Optimize.java:82)
> > at org.basex.core.cmd.ACreate.update(ACreate.java:96)
> > at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:64)
> > at org.basex.core.Command.run(Command.java:257)
> > at org.basex.core.Command.execute(Command.java:93)
> > at org.basex.core.Command.execute(Command.java:116)
> > at
> org.basex.gui.dialog.DialogProgress.lambda$execute$0(DialogProgress.java:182)
> > at java.lang.Thread.run(Unknown Source)
> >
> > I'm using BaseX GUI.
> >
> > Is it a known problem? any advice?
> >
> > Regards,
> > Sebastian.
>


[basex-talk] Bug creating full-text index

2020-05-06 Thread Sebastian Guerrero
Hello from Argentina!!!, thanks for the excellent software, I'm so happy
learning a bit more about BaseX!

Sadly I get an error trying to create a full-text index for a 51GB database:

























*Improper use? Potential bug? Your feedback is welcome:Contact:
basex-talk@mailman.uni-konstanz.de
Version: BaseX 9.3.2Java: Oracle
Corporation, 1.8.0_231OS: Windows 10, amd64Stack Trace:
java.lang.NegativeArraySizeException at
org.basex.index.ft.FTList.next(FTList.java:93) at
org.basex.index.ft.FTBuilder.merge(FTBuilder.java:239) at
org.basex.index.ft.FTBuilder.write(FTBuilder.java:147) at
org.basex.index.ft.FTBuilder.build(FTBuilder.java:86) at
org.basex.index.ft.FTBuilder.build(FTBuilder.java:23) at
org.basex.data.DiskData.createIndex(DiskData.java:198) at
org.basex.core.cmd.CreateIndex.create(CreateIndex.java:100) at
org.basex.core.cmd.Optimize.optimize(Optimize.java:181) at
org.basex.core.cmd.Optimize.optimize(Optimize.java:163) at
org.basex.core.cmd.Optimize.optimize(Optimize.java:92) at
org.basex.core.cmd.Optimize.finish(Optimize.java:82) at
org.basex.core.cmd.ACreate.update(ACreate.java:96) at
org.basex.core.cmd.CreateIndex.run(CreateIndex.java:64) at
org.basex.core.Command.run(Command.java:257) at
org.basex.core.Command.execute(Command.java:93) at
org.basex.core.Command.execute(Command.java:116) at
org.basex.gui.dialog.DialogProgress.lambda$execute$0(DialogProgress.java:182)
at java.lang.Thread.run(Unknown Source)*

I'm using BaseX GUI.

Is it a known problem? any advice?

Regards,
Sebastian.