Re: Loosing documents from DATASTORE.

Alexander Wallace Wed, 24 Jun 2009 10:23:46 -0700

Thank you very much for your response...

I will answer all your questions...  In order below.

First let me say that I just did an Import on a clean db, from an XMLdocument, which brought in 170 documents. No errors anywhere. I did acount(*) from DATASTORE and I can see 170 docs there... Then I caused myGC job to run and now there are only 5 records in DATASTORE ... Soevidently it is the GC job doing it.


It is worth saying that I have 2 nodes on each of the clusters I'm testing.

Yesterday I did an import to one of the clusters (150 docs). To theother cluster I manually uploaded 3 documents to my app which were addedto the DATASTORE as well... I let both clusters run for the night and onboth cases, this morning, most of the records in DATASTORE are gone...

In both cases there are a few documents (5) that remain there. I don'tknow why.

It would seem that all new documents are deleted by GC, but that's nottrue... I just uploaded one of the ones i did yesterday, and afterrunning GC, It remained. However all the ones I imported are gone,except for the misterios 5.


Now I will answer your questions below:


Thomas Müller wrote:

What version of Jackrabbit do you use?

jackrabbit-api.jar - 1.4.0
jackrabbit-core.jar - 1.4.1
jackrabbit-jcr-commons.jar - 1.4.0
jackrabbit-spi.jar - 1.4.0
jackrabbit-spi-commons.jar - 1.4.0
jackrabbit-text-extractors.jar - 1.4.0

It seems wrong that I have a core 1.4.1 and 1.4.0 for the rest of thestuff... But that's how the framework i have came (liferay 5.1.2)

How did you find out
you are missing data (could you post the exception stack trace)?

No errors... Basically a user reported that some docs he added a daybefore were not found (using our app's UI) and I went straight to the DBand noticed that most of the docs were gone... I've researched databasebackups and realize that this has been going on for a while.

What
does your repository.xml look like, and did you change it recently?

I will paste the repository.xml file now... I changed it on May 16th,which is when I moved documents from FS to DB, and the problem has beenthere, it looks like, since then... It is a shame that I did not realizethis before, but that's the way it is... I have a backup of may 17th andthe backup is missing most of the docs I imported on the 16th...Evidently GC job ran before the backup.


<?xml version="1.0"?>
<Repository>

 <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
   <param name="driver" value="com.mysql.jdbc.Driver"/>

<param name="url"value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />

   <param name="user" value="username" />
   <param name="password" value="userPassword" />
   <param name="schema" value="mysql"/>
   <param name="databaseType" value="mysql"/>
   <param name="minRecordLength" value="1024"/>
   <param name="maxConnections" value="3"/>
   <param name="copyWhenReading" value="true"/>
   <!-- prefix can NOT be used other than to specify a schema when used
        this seems to be due to some inconsistency in jackrabbit when

creating the table and when reading it. It uses the prefix forcreation

        but not for using it. So, in MySql we use no prefix
        the talble name is then DATASTORE, so it is reserved by jackrabbit
     -->
   <param name="tablePrefix" value=""/>
 </DataStore>

 <!-- FS Should not be shared accross nodes in the cluster,

so this should either be local, or prefixed for each node in thedb -->

 <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
   <param name="driver" value="com.mysql.jdbc.Driver"/>

<param name="url"value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />

   <param name="user" value="username" />
   <param name="password" value="userPassword" />
   <param name="schema" value="mysql"/>
   <param name="schemaObjectPrefix" value="JCR_NODE1_FS_"/>
 </FileSystem>
 <Security appName="Jackrabbit">

<AccessManagerclass="org.apache.jackrabbit.core.security.SimpleAccessManager" /><LoginModuleclass="org.apache.jackrabbit.core.security.SimpleLoginModule">

     <param name="anonymousId" value="anonymous" />
   </LoginModule>
 </Security>

<Workspaces rootPath="${rep.home}/workspaces"defaultWorkspace="liferay" />

 <Workspace name="${wsp.name}">
   <!-- FS Should not be shared accross nodes in the cluster,

so this should either be local, or prefixed for each node inthe db -->

   <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
     <param name="driver" value="com.mysql.jdbc.Driver"/>

<param name="url"value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />

     <param name="user" value="username" />
     <param name="password" value="userPassword" />
     <param name="schema" value="mysql"/>
     <param name="schemaObjectPrefix" value="JCR_NODE1_${wsp.name}_FS_"/>
   </FileSystem>
   <!-- PM needs to be shared accross the cluster -->

<PersistenceManagerclass="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager"><param name="url"value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />

     <param name="user" value="username" />
     <param name="password" value="userPassword" />
     <param name="schemaObjectPrefix" value="JCR_${wsp.name}_PM_"/>
     <param name="externalBLOBs" value="false"/>
   </PersistenceManager>
 </Workspace>
 <Versioning rootPath="${rep.home}/version">
   <!-- FS Should not be shared accross nodes in the cluster,

so this should either be local, or prefixed for each node inthe db -->

   <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
     <param name="driver" value="com.mysql.jdbc.Driver"/>

<param name="url"value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />

     <param name="user" value="username" />
     <param name="password" value="userPassword" />
     <param name="schema" value="mysql"/>
     <param name="schemaObjectPrefix" value="JCR_NODE1_V_FS_"/>
   </FileSystem>
   <!-- PM needs to be shared accross the cluster -->

<PersistenceManagerclass="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager"><param name="url"value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />

     <param name="user" value="username" />
     <param name="password" value="userPassword" />
     <param name="schemaObjectPrefix" value="JCR_V_PM_"/>
     <param name="externalBLOBs" value="false"/>
   </PersistenceManager>
 </Versioning>

 <!-- Each cluster node needs to have a unique node id -->
 <Cluster id="NODE1" syncDelay="5">
   <!-- Journal needs to be shared accross the cluster -->
   <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
     <param name="revision" value="${rep.home}/revision"/>
     <param name="driver" value="com.mysql.jdbc.Driver"/>

<param name="url"value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />

     <param name="user" value="username" />
     <param name="password" value="userPassword" />
     <param name="schema" value="mysql"/>
     <param name="schemaObjectPrefix" value="JCR_JOURNAL_"/>
   </Journal>
 </Cluster>
</Repository>

Did you migrate data recently using XML import?

Not recently, it was just back then (May 16th).

How exactly do you run
the data store garbage collection?

I have a quartz job that runs every night... I have changed confign torun it every 5 mintues for testing... Here is the code:


              GarbageCollector gc;

SessionImpl si = (SessionImpl)JCRFactoryUtil.createSession();

               gc = si.createDataStoreGarbageCollector();

// optional (if you want to report progresssometime)://gc.setScanEventListener(this);

// scan must be called to find unusedelements

               gc.scan();
               gc.stopScan();

// delete olddata

               gc.deleteUnused();

It seems that for now I could just disable GC... But i think it would bebetter to fix it, so that I don't end up with a lot of space being takenby unused documents...

Regards,
Thomas

Again, I really appreciate your response and hope to hear your input onmy answers soon.


Best Regards!
Alex.

On Tue, Jun 23, 2009 at 9:03 PM, Alexander Wallace<[email protected]> wrote:

Hi all.. I've no idea where the problem exists, and I am researching...

I am using the db for storage, and using DATASTORE as well...

The first time I roled this out I migrated all documents to the db and ended
up with 300+ rows in DATASTORE...

I'm going through db backups to find out when it first happened, but right
now, when i count (*) from DATASTORE, i see only 10 rows... If i go back a
few days I see a few more...

Any idea of what could be hapening?

I know i run GC every night...

Any clues?

Thanks!

Re: Loosing documents from DATASTORE.

Reply via email to