Comments inline ...
On 2010-06-10 15:49, Natalia Shilenkova wrote:
Olle,
Thanks for including database.xml, I was about to ask for it to see
the configuration.
I don't think it is a problem that you are missing some of the system
collections for Xindice 1.1 that you have in 1.0, as far as I remember
they are not used anymore (sorry, cannot check right now), difference
in the collection size should not matter either - collections reserve
space based on configuration settings, but system collections normally
do not need so much space, so most of it will be just empty.
Even though there are several system/ .../*.tbl files created on disk,
only SysSymbols and SysConfig have modification dates that look more
recent. You are probably right ... the other system files are probably
leftovers from converting the database from Xindice 1.0 to 1.1..
What you listed under SysSymbols collections looks kind of normal,
except that it should have more documents. Based on your configuration
from database.xml, there should be at least two more documents in
/db/system/SysSymbols collection: w3c and w3c-local.
Yes. I expected to see
- w3c and the approximately 50+ collections underneath it
- w3c-local and similar collections here too
What is interesting is that *one* of the 50+ nested subcollections
appeared in the SysSymbol listing
- w3c-local/meta, appearing as "w3c-local_meta"
but its root collection "w3c-local" does not appear ;-(
Does /db/system/SysSymbols have any other documents in it?
Nope, the only ones (collections and documents) visible are the ones
listed earlier (as "see in the Ugly Browser" in earlier message below).
I also get the same result when using xindiceadmin tool to list
collections and documents under db/ (to eliminate the risk of there just
being a bug in the Ugly Browser).
As for your question - Xindice configuration location is hard-coded,
it is always has the same structure (system/SysConfig and
system/SysSymbols). To function properly, Xindice needs two pieces of
information for compressed collections - binary data stored in the
collection file itself (.tbl file) plus system data to know how to
interpret that binary data. You can see the listing of the documents
in collections, so I assume that collection file is OK.
As I can access the contents of three documents under system/, it seems
that they at least are in a healthy state.
So the observarions are:
- application documents exist, as Xindice says that they exist. The
listing of document names in collections look OK
- but accessing any application document fails
- fails even for the single document that is in the *only* application
collection presented in SysSymbols (w3c-local/meta)
It seems more likely that the SysSymbols data is broken, than that each
and every application document has been corrupted.
When doing a quick dump of SysSymbols.tbl, I can see that it is not
completely empty. It starts as
+---- note that the mailer will likely introduce some line breaks below ...
|
| ^...@^@^@
| ......... and a lot of nulls and some unprintable bytes, and then
.....
|
^...@2<FF><FF><FF><FF><FF><FF><FF><FF>^...@^b^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@^...@^@
|
^...@^@^...@^@^...@^psystem_sysconfig^@^nw3c-local_me...@^@^...@^@^...@^@^...@^a^@^...@^@^...@^@^...@^@^...@gps^@^qsystem_sysobjec...@^osystem_sysusers^@^Cw3c^@
w3c-loc...@^qw3c-local_content^@^Zw3c-local_content_articles^
| ... and more ...
+----
Interesting to see that the two collections that Xindice says exist
based on SysSymbols, their names appear as the first two pieces of
printable ascii sequences in the dump. Then further on in the dump comes
names of collections that in a sense *do* exist -- they do exist as
storage entities, and one can navigate to them but document contents is
inaccesisble.
Maybe there is still metadata information (about application documents)
in this file, but it has become inaccessible due to some storage index
getting weird.
Theoretically speaking, it might be possible to patch this storage file
back into working order, but I have no idea how to approach this
challenge. Only someone with inside knowledge about how storage is used
can do something sensible.
/olle
Regards,
Natalia
On Thu, Jun 10, 2010 at 8:27 AM, Olle Olsson<ol...@sics.se> wrote:
Nataila,
Good suggestion, thanks.It helped me to see that I may face a more difficult
problem. Some "symbol docs" cannot be found!
The following hierarchy is what I see in the Ugly Browser (for 1.1) ...
| db
| system
| SysConfig
| database.xml (document)
| SysSymbols
| system_SysConfig (document)
| w3c-local_meta (document)
Now, the part under SysSymbols does look strange!
I revived the old Xindice 1.0 version I have kept around, to have something
to compare to.
And there I see that much data seems to be missing in my 1.1 system.
Tragically, core data has gone away:
- system_SysAccess
- system_SysGroups
- system_SysObjects
- system_SysUsers
But also the 100+ application data collections (only one such collection is
seen)
For completeness, further down I reproduce the contents of the three
documents thar *can* be seen.
To get some idea about whether actual files have disappeared in the file
system, I looked at that,
but could not find anything unexpected (i.e., existence and size looked OK)
| ../../../opt/xindice-1.1/w3c/system/SysAccess:
| total 1
| -rwxr--r--+ 1 olleo sics 12288 2003-08-12 19:36 SysAccess.tbl
|
| ../../../opt/xindice-1.1/w3c/system/SysConfig:
| total 5
| -rw-rw-r--+ 1 olleo sics 45056 2010-06-09 10:56 SysConfig.tbl
|
| ../../../opt/xindice-1.1/w3c/system/SysGroups:
| total 1
| -rwxr--r--+ 1 olleo sics 12288 2003-08-12 19:36 SysGroups.tbl
|
| ../../../opt/xindice-1.1/w3c/system/SysObjects:
| total 1
| -rwxr--r--+ 1 olleo sics 12288 2003-08-12 19:36 SysObjects.tbl
|
| ../../../opt/xindice-1.1/w3c/system/SysSymbols:
| total 52
| -rw-rw-r--+ 1 olleo sics 4202496 2010-06-09 10:56 SysSymbols.tbl
|
| ../../../opt/xindice-1.1/w3c/system/SysUsers:
| total 1
| -rwxr--r--+ 1 olleo sics 12288 2003-08-12 19:36 SysUsers.tbl
The for the old Xindice 1.0, exactly the same was seen, except that
SysSymbols.tbl had a size of 659456. (1.0 is run on a Windows/NT, 1.1 on
Linux)
So, the 64,000 dollar question is: how is Xindice bootstrapped to find the
system data? Now it fails to find much data and does this depend on
information *in* the storage hierarch? Or does it depend on some other
configuration data that no longer is in a healtthy state?
Really, I have no good clue how to progress on this.
/olle
==================================================================
The accessible system files
______________________
system_SysConfig
<?xml version="1.0" encoding="UTF-8"?>
<?xindice-class org.apache.xindice.xml.SymbolTable?>
<symbols>
<symbol name="filer" id="6"/>
<symbol name="compressed" id="5"/>
<symbol name="xmlobjects" id="2"/>
<symbol name="pagecount" id="8"/>
<symbol name="name" id="1"/>
<symbol name="indexes" id="9"/>
<symbol name="class" id="7"/>
<symbol name="collection" id="4"/>
<symbol name="collections" id="3"/>
<symbol name="database" id="0"/>
</symbols>
______________________
w3c-local_meta
<?xml version="1.0" encoding="UTF-8"?>
<?xindice-class org.apache.xindice.xml.SymbolTable?>
<symbols>
<symbol name="content" id="2"/>
<symbol name="static" id="1"/>
<symbol name="meta" id="0"/>
</symbols>
______________________
database.xml
<?xml version="1.0" encoding="UTF-8"?>
<database name="db">
<xmlobjects/>
<collections>
<collection compressed="true" name="w3c">
<filer class="org.apache.xindice.core.filer.BTreeFiler" pagecount="1"/>
<indexes/>
<xmlobjects/>
... and so on ...
</collection>
<collection compressed="true" name="w3c-local">
<filer class="org.apache.xindice.core.filer.BTreeFiler" pagecount="1"/>
<indexes/>
<xmlobjects/>
.... and so on ....
</collection>
</collections>
</database>
On 2010-06-10 03:15, Natalia Shilenkova wrote:
Hi Olle,
Looking at the exception stack trace, I would say that you've got problem
with collection symbol table. That would explain your problems accessing all
the documents.
I was able to reproduce exactly the same stack trace by intentionally
modifying symbol table.
When Xindice saves a document to the collection, the document is
compressed, which means that its element and attribute names are replaced
with generated identifiers. The information about these identifiers saved in
the collection symbol table. When retrieving the document, Xindice looks up
the identifiers and replaces them with actual names. If it encounters
identifier that cannot be found in the table... well, that exception is what
happens.
I would suggest checking the symbol table first - it is just XML document
that is located in /db/system/SysSymbols collection and document name is the
same as collection path, but fragments of the path are separated with '_'
instead of '/'. So, for collection /db/a/b/c it will be a_b_c. Check if
anything looks out of place, maybe you can see right away that some element
or attribute names are missing...
This is an example of symbol table for SysConfig collection:
<?xml version="1.0" encoding="UTF-8"?>
<?xindice-class org.apache.xindice.xml.SymbolTable?>
<symbols>
<symbol name="filer" id="6"/>
<symbol name="inline-metadata" id="5"/>
<symbol name="compressed" id="4"/>
<symbol name="name" id="1"/>
<symbol name="indexes" id="8"/>
<symbol name="class" id="7"/>
<symbol name="collection" id="2"/>
<symbol name="collections" id="3"/>
<symbol name="database" id="0"/>
</symbols>
Regards,
Natalia
--
------------------------------------------------------------------
Olle Olsson ol...@sics.se Tel: +46 8 633 15 19 Fax: +46 8 751 72 30
[Svenska W3C-kontoret: ol...@w3.org]
SICS [Swedish Institute of Computer Science]
Box 1263
SE - 164 29 Kista
Sweden
------------------------------------------------------------------
--
------------------------------------------------------------------
Olle Olsson ol...@sics.se Tel: +46 8 633 15 19 Fax: +46 8 751 72 30
[Svenska W3C-kontoret: ol...@w3.org]
SICS [Swedish Institute of Computer Science]
Box 1263
SE - 164 29 Kista
Sweden
------------------------------------------------------------------