Thank you very much for your response...

I will answer all your questions...  In order below.

First let me say that I just did an Import on a clean db, from an XML document, which brought in 170 documents. No errors anywhere. I did a count(*) from DATASTORE and I can see 170 docs there... Then I caused my GC job to run and now there are only 5 records in DATASTORE ... So evidently it is the GC job doing it.

It is worth saying that I have 2 nodes on each of the clusters I'm testing.

Yesterday I did an import to one of the clusters (150 docs). To the other cluster I manually uploaded 3 documents to my app which were added to the DATASTORE as well... I let both clusters run for the night and on both cases, this morning, most of the records in DATASTORE are gone...

In both cases there are a few documents (5) that remain there. I don't know why.

It would seem that all new documents are deleted by GC, but that's not true... I just uploaded one of the ones i did yesterday, and after running GC, It remained. However all the ones I imported are gone, except for the misterios 5.

Now I will answer your questions below:


Thomas Müller wrote:
What version of Jackrabbit do you use?
jackrabbit-api.jar - 1.4.0
jackrabbit-core.jar - 1.4.1
jackrabbit-jcr-commons.jar - 1.4.0
jackrabbit-spi.jar - 1.4.0
jackrabbit-spi-commons.jar - 1.4.0
jackrabbit-text-extractors.jar - 1.4.0

It seems wrong that I have a core 1.4.1 and 1.4.0 for the rest of the stuff... But that's how the framework i have came (liferay 5.1.2)

How did you find out
you are missing data (could you post the exception stack trace)?
No errors... Basically a user reported that some docs he added a day before were not found (using our app's UI) and I went straight to the DB and noticed that most of the docs were gone... I've researched database backups and realize that this has been going on for a while.

What
does your repository.xml look like, and did you change it recently?
I will paste the repository.xml file now... I changed it on May 16th, which is when I moved documents from FS to DB, and the problem has been there, it looks like, since then... It is a shame that I did not realize this before, but that's the way it is... I have a backup of may 17th and the backup is missing most of the docs I imported on the 16th... Evidently GC job ran before the backup.

<?xml version="1.0"?>
<Repository>

 <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
   <param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url" value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
   <param name="user" value="username" />
   <param name="password" value="userPassword" />
   <param name="schema" value="mysql"/>
   <param name="databaseType" value="mysql"/>
   <param name="minRecordLength" value="1024"/>
   <param name="maxConnections" value="3"/>
   <param name="copyWhenReading" value="true"/>
   <!-- prefix can NOT be used other than to specify a schema when used
        this seems to be due to some inconsistency in jackrabbit when
creating the table and when reading it. It uses the prefix for creation
        but not for using it. So, in MySql we use no prefix
        the talble name is then DATASTORE, so it is reserved by jackrabbit
     -->
   <param name="tablePrefix" value=""/>
 </DataStore>

 <!-- FS Should not be shared accross nodes in the cluster,
so this should either be local, or prefixed for each node in the db -->
 <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
   <param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url" value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
   <param name="user" value="username" />
   <param name="password" value="userPassword" />
   <param name="schema" value="mysql"/>
   <param name="schemaObjectPrefix" value="JCR_NODE1_FS_"/>
 </FileSystem>
 <Security appName="Jackrabbit">
<AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" /> <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
     <param name="anonymousId" value="anonymous" />
   </LoginModule>
 </Security>
<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="liferay" />
 <Workspace name="${wsp.name}">
   <!-- FS Should not be shared accross nodes in the cluster,
so this should either be local, or prefixed for each node in the db -->
   <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
     <param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url" value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
     <param name="user" value="username" />
     <param name="password" value="userPassword" />
     <param name="schema" value="mysql"/>
     <param name="schemaObjectPrefix" value="JCR_NODE1_${wsp.name}_FS_"/>
   </FileSystem>
   <!-- PM needs to be shared accross the cluster -->
<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager"> <param name="url" value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
     <param name="user" value="username" />
     <param name="password" value="userPassword" />
     <param name="schemaObjectPrefix" value="JCR_${wsp.name}_PM_"/>
     <param name="externalBLOBs" value="false"/>
   </PersistenceManager>
 </Workspace>
 <Versioning rootPath="${rep.home}/version">
   <!-- FS Should not be shared accross nodes in the cluster,
so this should either be local, or prefixed for each node in the db -->
   <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
     <param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url" value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
     <param name="user" value="username" />
     <param name="password" value="userPassword" />
     <param name="schema" value="mysql"/>
     <param name="schemaObjectPrefix" value="JCR_NODE1_V_FS_"/>
   </FileSystem>
   <!-- PM needs to be shared accross the cluster -->
<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager"> <param name="url" value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
     <param name="user" value="username" />
     <param name="password" value="userPassword" />
     <param name="schemaObjectPrefix" value="JCR_V_PM_"/>
     <param name="externalBLOBs" value="false"/>
   </PersistenceManager>
 </Versioning>

 <!-- Each cluster node needs to have a unique node id -->
 <Cluster id="NODE1" syncDelay="5">
   <!-- Journal needs to be shared accross the cluster -->
   <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
     <param name="revision" value="${rep.home}/revision"/>
     <param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url" value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
     <param name="user" value="username" />
     <param name="password" value="userPassword" />
     <param name="schema" value="mysql"/>
     <param name="schemaObjectPrefix" value="JCR_JOURNAL_"/>
   </Journal>
 </Cluster>
</Repository>


Did you migrate data recently using XML import?
Not recently, it was just back then (May 16th).
How exactly do you run
the data store garbage collection?

I have a quartz job that runs every night... I have changed confign to run it every 5 mintues for testing... Here is the code:

              GarbageCollector gc;
SessionImpl si = (SessionImpl) JCRFactoryUtil.createSession();
               gc = si.createDataStoreGarbageCollector();

// optional (if you want to report progress sometime): //gc.setScanEventListener(this);

// scan must be called to find unused elements
               gc.scan();
               gc.stopScan();

// delete old data
               gc.deleteUnused();

It seems that for now I could just disable GC... But i think it would be better to fix it, so that I don't end up with a lot of space being taken by unused documents...
Regards,
Thomas

Again, I really appreciate your response and hope to hear your input on my answers soon.

Best Regards!
Alex.

On Tue, Jun 23, 2009 at 9:03 PM, Alexander Wallace<[email protected]> wrote:
Hi all.. I've no idea where the problem exists, and I am researching...

I am using the db for storage, and using DATASTORE as well...

The first time I roled this out I migrated all documents to the db and ended
up with 300+ rows in DATASTORE...

I'm going through db backups to find out when it first happened, but right
now, when i count (*) from DATASTORE, i see only 10 rows... If i go back a
few days I see a few more...

Any idea of what could be hapening?

I know i run GC every night...

Any clues?

Thanks!



Reply via email to