AW: Multiple instances of repository

Seidel. Robert Tue, 16 Nov 2010 02:34:39 -0800

Hi Nikhil,

you can also set up one server which accesses the jackrabbit repository and all 
instances use this server (for example by web service calls). You can have 
multiple sessions/connections in one jackrabbit instance (a session for each 
client). In this case you have only one jackrabbit instance, so no clustering 
is necessary.


Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] 
Gesendet: Dienstag, 16. November 2010 11:17
An: [email protected]
Betreff: RE: Multiple instances of repository

Thanks for your inputs, they are really helpful.

Well, so does my application is not a good candidate to use jackrabbit.

The other option, I had was to use jackrabbit in client-server mode. In this 
case I will be accessing the repository from RMI. But in the jackrabbit 
documents it has been mentioned that RMI is not optimized for performance and I 
should use embedded repository instance in my application code for better 
performance.

I can remove the search functionality from these clusters, because the life 
span of these will be very short. The application will take 2-4 minutes to do 
its job and I don't think we really need search for these clusters. 

But my question is, should I really use the clustering feature. I mean cluster 
nodes should normally have a longer life span. But here in this case the nodes 
will have very short life span 2-4 minutes.
I am kind of finding it hard to use these short span applications as cluster 
nodes.

Thanks,
Nikhil

-----Original Message-----
From: Seidel. Robert [mailto:[email protected]] 
Sent: Tuesday, November 16, 2010 3:33 PM
To: [email protected]
Subject: AW: Multiple instances of repository

Hi Nikhil,

I don't know if it will work (setProperty), but you have another problem. The 
Lucene search index is always saved in the file system. And afaik, each 
repository home needs its own index directories (so you have the index files 
for each cluster). If you make a new cluster, you have to wait for a long time 
till the index is built, depending on the data in your repository (if you have 
tons of data, you have to wait a week or longer).

The tables of the FS and PM will be shared between all cluster nodes - that 
works.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] 
Gesendet: Dienstag, 16. November 2010 10:54
An: [email protected]
Betreff: RE: Multiple instances of repository

Since there could be n number of instances. So I can't decide the cluster id 
beforehand.
Hence I have the following code that creates a cluster id at run time.

System.setProperty("org.apache.jackrabbit.core.cluster.node_id", 
"cluster_id"+System.nanoTime());

Similarly the repositoryHome path is generated at run time.

But do I also need separate tables for workspace file system? I have the 
following configuration for my workspace. Is it correct? The tables for the 
workspace FS and PersistenceManager will be shared between all the nodes or 
will these tables will be different?

<?xml version="1.0"?>
<!DOCTYPE Repository
          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd";>

<Repository>

     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
                <param name="driver" value="javax.naming.InitialContext"/>
                <param name="url" value="jdbc/amiDBDataSource"/>
                <param name="databaseType" value="oracle"/>             
        <param name="copyWhenReading" value="true"/>
        <param name="tablePrefix" value=""/>
        <param name="schemaObjectPrefix" value="J_R_DS_"/>
        <param name="schemaCheckEnabled" value="false"/> 
    </DataStore>

        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
                <param name="driver" value="javax.naming.InitialContext"/>
                <param name="url" value="jdbc/amiDBDataSource"/>
                <!-- The following value must oracle for oracle server this is 
not the same as the database schema -->
                <param name="schema" value="oracle"/>
                <param name="schemaObjectPrefix" value="J_R_FS_"/>
                <param name="schemaCheckEnabled" value="false"/> 
        </FileSystem>

        <Security appName="Jackrabbit">
                <SecurityManager 
class="repository.jcr.jackrabbit.EipSecurityManager" />
                <AccessManager 
class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
                <LoginModule 
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
                        <param name="principalProvider" 
value="repository.jcr.jackrabbit.EipPrincipalProvider" />
                </LoginModule>
        </Security>
        
        <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
        
        <Workspace name="${wsp.name}">
        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
                        <param name="driver" 
value="javax.naming.InitialContext"/>
                        <param name="url" value="jdbc/amiDBDataSource"/>
                        <!-- The following value must oracle for oracle server 
this is not the same as the database schema -->
                        <param name="schema" value="oracle"/>
                        <param name="schemaObjectPrefix" 
value="J_FS_${wsp.name}_"/>
                        <param name="schemaCheckEnabled" value="false"/> 
                </FileSystem>
                <PersistenceManager 
class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
                        <param name="driver" 
value="javax.naming.InitialContext"/>
                        <param name="url" value="jdbc/amiDBDataSource"/>
                        <param name="tableSpace" value="" />
                        <!-- The following value must oracle for oracle server 
this is not the same as the database schema -->
                        <param name="schema" value="oracle" />
                        <param name="schemaObjectPrefix" 
value="J_PM_${wsp.name}_" />
                        <param name="externalBLOBs" value="false" />
                        <param name="schemaCheckEnabled" value="false"/> 
                </PersistenceManager>
                <SearchIndex 
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="supportHighlighting" value="true"/>
        </SearchIndex>
        </Workspace>
        
        <Versioning rootPath="${rep.home}/version">
                
                <FileSystem 
class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
                        <param name="driver" 
value="javax.naming.InitialContext"/>
                        <param name="url" value="jdbc/amiDBDataSource"/>
                        <!-- The following value must oracle for oracle server 
this is not the same as the database schema -->
                        <param name="schema" value="oracle"/>
                        <param name="schemaObjectPrefix" value="J_V_FS_"/>
                        <param name="schemaCheckEnabled" value="false"/> 
                </FileSystem>
                <!-- Change to Oracle Class <PersistenceManager 
class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
                <PersistenceManager 
class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
                        <param name="driver" 
value="javax.naming.InitialContext"/>
                        <param name="url" value="jdbc/amiDBDataSource"/>
                        <param name="tableSpace" value="" />
                        <!-- The following value must oracle for oracle server 
this is not the same as the database schema -->
                        <param name="schema" value="oracle" />
                        <param name="schemaObjectPrefix" value="J_V_PM_" />
                        <param name="externalBLOBs" value="false" />
                        <param name="schemaCheckEnabled" value="false"/> 
                </PersistenceManager>

        </Versioning>
        
    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
        <param name="path" value="${rep.home}/search/index"/>
        <param name="supportHighlighting" value="true"/>
    </SearchIndex>
    
        <Cluster syncDelay="2000">
                <Journal 
class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
                <param name="revision" value="${rep.home}/revision.log" />
                        <param name="driver" 
value="javax.naming.InitialContext"/>
                        <param name="url" value="jdbc/amiDBDataSource"/>
                        <param name="schemaObjectPrefix" value="J_R_" />
                        <param name="databaseType" value="oracle"/>
                </Journal>
        </Cluster> 
          
</Repository>

Thanks,
Nikhil
-----Original Message-----
From: Seidel. Robert [mailto:[email protected]] 
Sent: Tuesday, November 16, 2010 2:42 PM
To: [email protected]
Subject: AW: Multiple instances of repository

Hi Nikhil,

you need clustering, because all of your instances should access the same 
repository.

What you need is separate repository homes for each instance. In my use case I 
have an installation directory for each instance, so the repository home is 
located below this directory.

You have to make sure, that each instance has also its own repository.xml 
because you need to define different clusterIDs.

And you have to define a cluster section in the repository.xml where the 
journal is located, which is necessary for synchronization:

    <Cluster id="node1" syncDelay="5000">
      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
        <param name="driver" value="javax.naming.InitialContext"/>
        <param name="url" value="jdbc/amiDBDataSource"/>
          ...  
      </Journal>
    </Cluster>

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] 
Gesendet: Dienstag, 16. November 2010 09:37
An: [email protected]
Betreff: RE: Multiple instances of repository

Thanks for replying back. I will need little more help to understand the things 
completely.
I will just elaborate a bit more on my usage scenario. I am also attaching my 
repository.xml file with this mail. Please let me know if you want to know more 
about my environment.

In my case, I want to keep all the data in one database and I want to use 
jackrabbit as JCR over this database.
I have the jackrabbit embedded in my application so the repository gets-up as 
part of the application.
Now this application reads some files from repository and also inserts some 
data in repository.
There could be two instances of the application app1 running on machine1 and 
app2 running on machine2.
So my application instances are different and I can create multiple repository 
homes to avoid the locking problem but I still wants to insert the data from 
these applications in same database tables.
So if all the application instances use the same repository configuration file 
and specify their own repository home.
Will that work in my case? Will there be any consistency issues?

When you say separate data store and separate persistence managers, you mean 
separate repository configuration file or separate database tables for data 
stores and persistence managers.

My instances and the repositories operate separately from each other but they 
still want to share the data. The data inserted by one application instance 
should be visible to other instance. So they all should be inserting the data 
in same tables, that's what my understanding is.

Thanks,
Nikhil
 
-----Original Message-----
From: Seidel. Robert [mailto:[email protected]] 
Sent: Tuesday, November 16, 2010 1:22 PM
To: [email protected]
Subject: AW: Multiple instances of repository

Hi Nikhil,

if you want to use clustering, you have to define a repository home for each 
cluster. 

Clustering is necessary, if you want to have the same data/indexes at all 
cluster nodes - the key word is synchronization.

If your instances and the repositories operate separately from each other, you 
don't need clustering. Separate repository homes, data stores and persistence 
managers will do the job.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] 
Gesendet: Dienstag, 16. November 2010 08:33
An: [email protected]
Betreff: Multiple instances of repository

Hi,

I am using jackrabbit as JCR implementation in my project. I am running 
jackrabbit with in my application in the same jvm.
The application read the content from repository and also writes some content 
in repository.
There could be multiple concurrent instances of my application running on the 
same or different machines.
I have a configuration file for jackrabbit and I have a single repository home 
for jackrabbit.
Now as soon as one instance of the application is up and running, I can't run 
the other instance as the first instance creates a lock file in repository home.
After doing some search I came to know about running the jackrabbit in 
clustered mode.
Now my question is even in this case I will have to specify a different 
repository home for every run, right?
That means I should form the repository home path at the run time because at 
compile time I am not sure how many instance will be run.
This is a standalone java application and theoretically n number of instance 
can be run.
My question is when I have to specify a different repository path for every 
run, then the jackrabbit will work even with out clustering?
Because .lock file will be different for different runs as the repository home 
is different.
I know I am missing something here, please help me.
I am attaching my conf file with this mail.

Thanks,
Nikhil

AW: Multiple instances of repository

Reply via email to