[basex-talk] Potential Bug - message from fulltext

2015-06-27 Thread Lars Johnsen
When trying to to a full text index on a collection of texts, the process
runs for a couple of hours with the exit message below - I think it is near
completed. From the GUI, I have at least seen the progress bar get to
around 80 %, so I think it is safe to assume that the error is connectedt
the final stages.

The texts are unstructured and represented as one line pr. book. Here is
the result from the index process. Parameters set in GUI are: Norwegian
Snowball, lemmatization, diacritics. There is set aside 30GB for the GUI.

Path summary:
doc(): 317259x, strings
  text: 317259x, leaf
text(): 317259x, strings, leaf

Here is the error message:

Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.2 beta 7d38949
Java: Oracle Corporation, 1.7.0_79
OS: Linux, amd64
Stack Trace:
java.lang.NegativeArraySizeException
at java.util.Arrays.copyOf(Arrays.java:2271)
at org.basex.util.TokenBuilder.add(TokenBuilder.java:303)
at org.basex.util.TokenBuilder.add(TokenBuilder.java:290)
at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:248)
at org.basex.index.ft.FTBuilder.write(FTBuilder.java:155)
at org.basex.index.ft.FTBuilder.index(FTBuilder.java:94)
at org.basex.index.ft.FTBuilder.build(FTBuilder.java:102)
at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1)
at org.basex.data.DiskData.createIndex(DiskData.java:195)
at org.basex.core.cmd.ACreate.create(ACreate.java:117)
at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:62)
at org.basex.core.Command.run(Command.java:398)
at org.basex.core.Command.execute(Command.java:100)
at org.basex.core.Command.execute(Command.java:123)
at org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:178)

Regards
Lars G Johnsen
National Library of Norway


Re: [basex-talk] Potential Bug - message from fulltext

2015-06-27 Thread Christian GrĂ¼n
Hi Lars,

It looks as if the input data is indeed too large to be indexed (the
internal id lists seem to exceed the maximum array size in main
memory). The usual alternative to make it work is to distribute your
document(s) into multiple databases.

If you want, you can also provide us with the input data, but I assume
it will take pretty much space?

Best,
Christian


 Sat, Jun 27, 2015 at 12:50 PM, Lars Johnsen yoon...@gmail.com wrote:
 When trying to to a full text index on a collection of texts, the process
 runs for a couple of hours with the exit message below - I think it is near
 completed. From the GUI, I have at least seen the progress bar get to around
 80 %, so I think it is safe to assume that the error is connectedt the final
 stages.

 The texts are unstructured and represented as one line pr. book. Here is the
 result from the index process. Parameters set in GUI are: Norwegian
 Snowball, lemmatization, diacritics. There is set aside 30GB for the GUI.

 Path summary:
 doc(): 317259x, strings
   text: 317259x, leaf
 text(): 317259x, strings, leaf

 Here is the error message:

 Improper use? Potential bug? Your feedback is welcome:
 Contact: basex-talk@mailman.uni-konstanz.de
 Version: BaseX 8.2 beta 7d38949
 Java: Oracle Corporation, 1.7.0_79
 OS: Linux, amd64
 Stack Trace:
 java.lang.NegativeArraySizeException
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at org.basex.util.TokenBuilder.add(TokenBuilder.java:303)
 at org.basex.util.TokenBuilder.add(TokenBuilder.java:290)
 at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:248)
 at org.basex.index.ft.FTBuilder.write(FTBuilder.java:155)
 at org.basex.index.ft.FTBuilder.index(FTBuilder.java:94)
 at org.basex.index.ft.FTBuilder.build(FTBuilder.java:102)
 at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1)
 at org.basex.data.DiskData.createIndex(DiskData.java:195)
 at org.basex.core.cmd.ACreate.create(ACreate.java:117)
 at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:62)
 at org.basex.core.Command.run(Command.java:398)
 at org.basex.core.Command.execute(Command.java:100)
 at org.basex.core.Command.execute(Command.java:123)
 at org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:178)

 Regards
 Lars G Johnsen
 National Library of Norway