Thank you very much for your response...
I will answer all your questions... In order below.
First let me say that I just did an Import on a clean db, from an XML
document, which brought in 170 documents. No errors anywhere. I did a
count(*) from DATASTORE and I can see 170 docs there... Then I caused my
GC job to run and now there are only 5 records in DATASTORE ... So
evidently it is the GC job doing it.
It is worth saying that I have 2 nodes on each of the clusters I'm testing.
Yesterday I did an import to one of the clusters (150 docs). To the
other cluster I manually uploaded 3 documents to my app which were added
to the DATASTORE as well... I let both clusters run for the night and on
both cases, this morning, most of the records in DATASTORE are gone...
In both cases there are a few documents (5) that remain there. I don't
know why.
It would seem that all new documents are deleted by GC, but that's not
true... I just uploaded one of the ones i did yesterday, and after
running GC, It remained. However all the ones I imported are gone,
except for the misterios 5.
Now I will answer your questions below:
Thomas Müller wrote:
What version of Jackrabbit do you use?
jackrabbit-api.jar - 1.4.0
jackrabbit-core.jar - 1.4.1
jackrabbit-jcr-commons.jar - 1.4.0
jackrabbit-spi.jar - 1.4.0
jackrabbit-spi-commons.jar - 1.4.0
jackrabbit-text-extractors.jar - 1.4.0
It seems wrong that I have a core 1.4.1 and 1.4.0 for the rest of the
stuff... But that's how the framework i have came (liferay 5.1.2)
How did you find out
you are missing data (could you post the exception stack trace)?
No errors... Basically a user reported that some docs he added a day
before were not found (using our app's UI) and I went straight to the DB
and noticed that most of the docs were gone... I've researched database
backups and realize that this has been going on for a while.
What
does your repository.xml look like, and did you change it recently?
I will paste the repository.xml file now... I changed it on May 16th,
which is when I moved documents from FS to DB, and the problem has been
there, it looks like, since then... It is a shame that I did not realize
this before, but that's the way it is... I have a backup of may 17th and
the backup is missing most of the docs I imported on the 16th...
Evidently GC job ran before the backup.
<?xml version="1.0"?>
<Repository>
<DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
<param name="user" value="username" />
<param name="password" value="userPassword" />
<param name="schema" value="mysql"/>
<param name="databaseType" value="mysql"/>
<param name="minRecordLength" value="1024"/>
<param name="maxConnections" value="3"/>
<param name="copyWhenReading" value="true"/>
<!-- prefix can NOT be used other than to specify a schema when used
this seems to be due to some inconsistency in jackrabbit when
creating the table and when reading it. It uses the prefix for
creation
but not for using it. So, in MySql we use no prefix
the talble name is then DATASTORE, so it is reserved by jackrabbit
-->
<param name="tablePrefix" value=""/>
</DataStore>
<!-- FS Should not be shared accross nodes in the cluster,
so this should either be local, or prefixed for each node in the
db -->
<FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
<param name="user" value="username" />
<param name="password" value="userPassword" />
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="JCR_NODE1_FS_"/>
</FileSystem>
<Security appName="Jackrabbit">
<AccessManager
class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
<LoginModule
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
<param name="anonymousId" value="anonymous" />
</LoginModule>
</Security>
<Workspaces rootPath="${rep.home}/workspaces"
defaultWorkspace="liferay" />
<Workspace name="${wsp.name}">
<!-- FS Should not be shared accross nodes in the cluster,
so this should either be local, or prefixed for each node in
the db -->
<FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
<param name="user" value="username" />
<param name="password" value="userPassword" />
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="JCR_NODE1_${wsp.name}_FS_"/>
</FileSystem>
<!-- PM needs to be shared accross the cluster -->
<PersistenceManager
class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
<param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
<param name="user" value="username" />
<param name="password" value="userPassword" />
<param name="schemaObjectPrefix" value="JCR_${wsp.name}_PM_"/>
<param name="externalBLOBs" value="false"/>
</PersistenceManager>
</Workspace>
<Versioning rootPath="${rep.home}/version">
<!-- FS Should not be shared accross nodes in the cluster,
so this should either be local, or prefixed for each node in
the db -->
<FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
<param name="user" value="username" />
<param name="password" value="userPassword" />
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="JCR_NODE1_V_FS_"/>
</FileSystem>
<!-- PM needs to be shared accross the cluster -->
<PersistenceManager
class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
<param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
<param name="user" value="username" />
<param name="password" value="userPassword" />
<param name="schemaObjectPrefix" value="JCR_V_PM_"/>
<param name="externalBLOBs" value="false"/>
</PersistenceManager>
</Versioning>
<!-- Each cluster node needs to have a unique node id -->
<Cluster id="NODE1" syncDelay="5">
<!-- Journal needs to be shared accross the cluster -->
<Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
<param name="revision" value="${rep.home}/revision"/>
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
<param name="user" value="username" />
<param name="password" value="userPassword" />
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="JCR_JOURNAL_"/>
</Journal>
</Cluster>
</Repository>
Did you migrate data recently using XML import?
Not recently, it was just back then (May 16th).
How exactly do you run
the data store garbage collection?
I have a quartz job that runs every night... I have changed confign to
run it every 5 mintues for testing... Here is the code:
GarbageCollector gc;
SessionImpl si = (SessionImpl)
JCRFactoryUtil.createSession();
gc = si.createDataStoreGarbageCollector();
// optional (if you want to report progress
sometime):
//gc.setScanEventListener(this);
// scan must be called to find unused
elements
gc.scan();
gc.stopScan();
// delete old
data
gc.deleteUnused();
It seems that for now I could just disable GC... But i think it would be
better to fix it, so that I don't end up with a lot of space being taken
by unused documents...
Regards,
Thomas
Again, I really appreciate your response and hope to hear your input on
my answers soon.
Best Regards!
Alex.
On Tue, Jun 23, 2009 at 9:03 PM, Alexander Wallace<[email protected]> wrote:
Hi all.. I've no idea where the problem exists, and I am researching...
I am using the db for storage, and using DATASTORE as well...
The first time I roled this out I migrated all documents to the db and ended
up with 300+ rows in DATASTORE...
I'm going through db backups to find out when it first happened, but right
now, when i count (*) from DATASTORE, i see only 10 rows... If i go back a
few days I see a few more...
Any idea of what could be hapening?
I know i run GC every night...
Any clues?
Thanks!