Re: flush to database?

Woonsan Ko Tue, 05 Jun 2018 15:48:02 -0700

On Tue, Jun 5, 2018 at 3:15 PM, Eric Berryman <eric.berry...@gmail.com> wrote:
> Again, thank you so much for walking me through this.
>
> Considering your comment that the Journal updates the cache too.
>
> Looking at the Journal table in more detail, I see something that at least
> makes sense.
> The last entry of meta data in the Journal table is the same as the last
> image on node2.
> So, it looks as if node2 is doing what it is suppose to do, but node1 has
> stopped writing to the Journal.
> (So the meta data must be in the cache of node1?)
>
> At this point, I'm a little worried that if I restart node1; I will lose
> data, or it will fix everything.


What's the version of Jackrabbit exactly?
Just as a wild guess (I'm not so sure), if there's a bug (such as dead
lock), you might lose some data when restarted.
If you're not with the latest of 2.6.x
(http://jackrabbit.apache.org/jcr/downloads.html#v2.6), there's a
chance of some known issues. But I'm not really sure whether or not
it's caused by a bug. You can check JIRA board. e.g,
https://issues.apache.org/jira/browse/JCR-3783

>
> Is this recoverable?  I'm guessing a restart of node1 will cause the
> Journal to start getting updated again, but how do I get the missing
> entries?

Hmm. There's a standalone tool for backup
(http://jackrabbit.apache.org/jcr/standalone-server.html#Backup_and_migration)
but it's not an option for you unfortunately.

Woonsan

> And, am I sure after a restart node1 will still have all the entries for
> the past two weeks (considering it wasn't making Journal entries).
>
> I don't see anything in the log that suggests that something died.  I'm
> also wondering how to monitor for this.
>
> Here is my repository.xml:
> <Repository>
>     <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>         <param name="path" value="/tmp/jackrabbit-olog/repository"/>
>     </FileSystem>
>     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>         <param name="driver" value="javax.naming.InitialContext"/>
>         <param name="url" value="jdbc/jcr"/>
>         <param name="databaseType" value="mysql"/>
>         <param name="schemaObjectPrefix" value="jcr_ds_" />
>     </DataStore>
>     <Security appName="Jackrabbit">
>         <SecurityManager
> class="org.apache.jackrabbit.core.security.simple.SimpleSecurityManager"
> workspaceName="security">
>         </SecurityManager>
>         <AccessManager class="org.apache.jackrabbit.core.security.simple.
> SimpleAccessManager">
>         </AccessManager>
>         <LoginModule class="org.apache.jackrabbit.core.security.simple.
> SimpleLoginModule">
>         </LoginModule>
>     </Security>
>     <Workspaces rootPath="/tmp/jackrabbit-olog/workspaces"
> defaultWorkspace="olog"/>
>     <Workspace name="${wsp.name}">
>         <FileSystem class="org.apache.jackrabbit.
> core.fs.local.LocalFileSystem">
>             <param name="path" value="${wsp.home}"/>
>         </FileSystem>
>         <PersistenceManager class="org.apache.jackrabbit.
> core.persistence.bundle.MySqlPersistenceManager">
>             <param name="driver" value="javax.naming.InitialContext"/>
>             <param name="url" value="jdbc/jcr"/>
>             <param name="schema" value="mysql"/>
>             <param name="schemaObjectPrefix" value="jcr_${wsp.name}_pm_"/>
>             <param name="externalBLOBs" value="false"/>
>         </PersistenceManager>
>         <SearchIndex class="org.apache.jackrabbit.
> core.query.lucene.SearchIndex">
>             <param name="path" value="${wsp.home}/index"/>
>             <param name="supportHighlighting" value="true"/>
>         </SearchIndex>
>     </Workspace>
>     <Versioning rootPath="/tmp/jackrabbit-olog/version">
>         <FileSystem class="org.apache.jackrabbit.
> core.fs.local.LocalFileSystem">
>             <param name="path" value="${rep.home}/version" />
>         </FileSystem>
>         <PersistenceManager class="org.apache.jackrabbit.core.state.db.
> SimpleDbPersistenceManager">
>             <param name="driver" value="javax.naming.InitialContext"/>
>             <param name="url" value="jdbc/jcr"/>
>             <param name="schema" value="mysql"/>
>             <param name="schemaObjectPrefix" value="jcr_pmver_"/>
>             <param name="externalBLOBs" value="false"/>
>         </PersistenceManager>
>     </Versioning>
>     <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"
>>
>         <param name="path" value="/tmp/jackrabbit-olog/repository/index"/>
>         <param name="supportHighlighting" value="true"/>
>     </SearchIndex>
>     <Cluster id="node3" syncDelay="2000">
>         <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
>             <param name="revision" value="${rep.home}/revision.log" />
>             <param name="driver" value="javax.naming.InitialContext"/>
>             <param name="url" value="jdbc/jcr"/>
>             <param name="databaseType" value="mysql"/>
>         </Journal>
>     </Cluster>
> </Repository>
>
> Thank you again!
> Eric
>
> On Mon, Jun 4, 2018 at 10:25 AM, Woonsan Ko <woon...@apache.org> wrote:
>
>> On Sat, Jun 2, 2018 at 8:39 AM, Eric Berryman <eric.berry...@gmail.com>
>> wrote:
>> > - Blob data in the table doesn't necessarily mean that the binary data
>> > is stored
>> > in a JCR node's binary property.
>> >
>> > - If you have a chance to use javax.jcr.Node#getNode(path) directly to
>> > retrieve the specific node containing binary property, then I don't
>> think that
>> > will hit the Lucene index.
>> >
>> > Thank you, these are good bits of information.  I just checked, and I do
>> > have API endpoints that only use getNode.  Node1 returns binary
>> properties,
>> > while node2 doesn't.  So, from your comment the index has nothing to do
>> > with my issue.  But, it looks like your first comment puts me in the
>> right
>> > path.  The table has the blob, but the binary property is probably
>> missing
>> > in the database.  Is it possible this isn't flushed to the database by
>>
>> I don't think so. Any changes must be persisted.
>>
>> > node1?  It seems to make sense that the large binary gets persisted,
>> while
>> > the small property might still be in memory?
>>
>> "the small property" can be persisted differently from "the larger
>> binary", depending on "minRecordLength" parameter:
>> - https://wiki.apache.org/jackrabbit/DataStore
>>
>> If the binary property data is not larger than minRecordLength, it's
>> persisted to the PersistenceManager from the memory, not to the
>> DataStore.
>> By the way, do you know that the single DataStore is global across
>> workspaces? If node1 uses a different workspace from what node1 uses
>> and if the binary data was smaller than minRecordLength, then the
>> binary from node1 is stored in its own database or table which might
>> not be seen by node2 (due to different workspace / DB configuration
>> possibly).
>> Perhaps you can check the repository.xml file and workspace.xml
>> file(s) on each node if there's anything different.
>>
>> >
>> > Another question, is the journal only used for updating the index, or
>> does
>> > it do more than that?
>>
>> I think it should care of the caching node state manager as well.
>>
>> Woonsan
>>
>> >
>> > Thank you again for your help!
>> > Eric
>> >
>> > On Sat, Jun 2, 2018, 00:18 Woonsan Ko <woon...@apache.org> wrote:
>> >
>> >> On Fri, Jun 1, 2018 at 9:57 PM, Eric Berryman <eric.berry...@gmail.com>
>> >> wrote:
>> >> > Node1 looks completely fine, and the application that uses it is in
>> >> > production.  It's a simple java ee application that uses the jcr to
>> >> upload
>> >> > and list past images.
>> >>
>> >> Does the application use javax.jcr.query.Query first to retrieve the
>> >> nodes containing binary properties? If so, it uses Lucene index for
>> >> the query.
>> >> If you have a chance to use javax.jcr.Node#getNode(path) directly to
>> >> retrieve the specific node containing binary property, then I don't
>> >> think that will hit the Lucene index. It just converts the path to
>> >> node ids to retrieve node states from database. So, it is worth
>> >> validating one of the recently added nodes by #getNode(path) on both
>> >> Node1 and Node2, IMO. If it returns a node but fails to return it by
>> >> Query, then it is a Lucene index issue. If it returns nothing in both
>> >> ways on Node2 while it works fineon Node1, then perhaps is Node2
>> >> looking at a different database or tables?
>> >>
>> >> Regards,
>> >>
>> >> Woonsan
>> >>
>> >> >
>> >> > I guess what I don't understand, is that they are looking at the exact
>> >> same
>> >> > database.  It seems I should be able to have node2 see it the same
>> way,
>> >> and
>> >> > the only difference would be the index, which is in a local file
>> >> directory.
>> >> >
>> >> > So strange.
>> >> >
>> >> > Thank you!
>> >> >
>> >> > On Jun 1, 2018 21:44, "Woonsan Ko" <woon...@apache.org> wrote:
>> >> >
>> >> > Hi Eric,
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 1:29 PM, Eric Berryman <
>> eric.berry...@gmail.com>
>> >> > wrote:
>> >> >> Hello!
>> >> >>
>> >> >> I have an application that uses jackrabbit to save images, using the
>> >> >> database filestore.
>> >> >> I have jackrabbit clustered (node1, node2).
>> >> >> This was working for me fine, but I started seeing an oddity.
>> >> >> Node1 inserts an image, but node2 doesn't seem to see it when queried
>> >> >> anymore.
>> >> >> So, node2 is now missing about the last 2 weeks of images.
>> >> >> I can see the correct image as a blob in the jcr_ds_DATASTORE table.
>> >> >
>> >> > Are you sure you are able to query or find the images in node1?
>> >> > Blob data in the table doesn't necessarily mean that the binary data
>> >> > is stored in a JCR node's binary property. The blob data could be
>> >> > referred by another node or versioned frozen node or non-existing node
>> >> > which can be caused by node deletion but the binary data wasn't
>> >> > garbage-collected.
>> >> > So, I'd traverse the nodes through simple JCR API and validate if the
>> >> > nodes really exists even in node1. You might need to ask around about
>> >> > the paths of the recently added nodes containing the binary data to do
>> >> > that.
>> >> >
>> >> >
>> >> >>
>> >> >> And, node2 logged that the journal has been applied.
>> >> >> The LOCAL_REVISIONS table shows both nodes have a revision id of 605
>> >> >> (although I do have 1364 images).
>> >> >>
>> >> >> I've tried adding enableConsistencyCheck=true and
>> >> >> forceConsistencyCheck=true to the index part of the repository.xml
>> file.
>> >> >> But, I don't see any errors.  Just, that the consistency check
>> happened.
>> >> >>
>> >> >> I've also tried clearing the index directory of node2.  Jackrabbit
>> >> >> recreates the index, applies the 605 journal entries, then ends up in
>> >> the
>> >> >> same state without the last two weeks of images.
>> >> >>
>> >> >> Are there any ideas to fix what seems to be an index issue.
>> >> >
>> >> > I'm kind of suspicious that some of the new nodes in last two weeks
>> >> > might have been removed for some reasons. You can perhaps rule out
>> >> > this possibility by inspecting JCR nodes on node1 first.
>> >> >
>> >> > Regards,
>> >> >
>> >> > Woonsan
>> >> >
>> >> >
>> >> >>
>> >> >> Any help or ideas to troubleshoot are greatly appreciated.
>> >> >> (jackrabbit 2.6)
>> >> >>
>> >> >> Thank you!
>> >> >> Eric
>> >>
>>

Re: flush to database?

Reply via email to