Re: Multiple instances of repository

Justin Edelson Tue, 16 Nov 2010 10:42:41 -0800

Nikhil-
I think you should rethink you're architecture. It really doesn't make
sense to be bringing repository instances up only for a 2-4 minute
job. Instead, you should think about using the Command pattern and
package your "applications" as executable jobs which can be run inside
a long-running VM against a local repository instance (i.e. making
in-process calls instead of RMI or DavEx).


This is where something like OSGi and Apache Sling can be *very*
helpful, but there are obviously other ways to add/remove jobs at
runtime. See, for example, Sling's Scheduler support:
http://sling.apache.org/site/scheduler-service-commons-scheduler.html

Justin

On Tue, Nov 16, 2010 at 5:16 AM,  <[email protected]> wrote:
> Thanks for your inputs, they are really helpful.
>
> Well, so does my application is not a good candidate to use jackrabbit.
>
> The other option, I had was to use jackrabbit in client-server mode. In this 
> case I will be accessing the repository from RMI. But in the jackrabbit 
> documents it has been mentioned that RMI is not optimized for performance and 
> I should use embedded repository instance in my application code for better 
> performance.
>
> I can remove the search functionality from these clusters, because the life 
> span of these will be very short. The application will take 2-4 minutes to do 
> its job and I don't think we really need search for these clusters.
>
> But my question is, should I really use the clustering feature. I mean 
> cluster nodes should normally have a longer life span. But here in this case 
> the nodes will have very short life span 2-4 minutes.
> I am kind of finding it hard to use these short span applications as cluster 
> nodes.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:[email protected]]
> Sent: Tuesday, November 16, 2010 3:33 PM
> To: [email protected]
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> I don't know if it will work (setProperty), but you have another problem. The 
> Lucene search index is always saved in the file system. And afaik, each 
> repository home needs its own index directories (so you have the index files 
> for each cluster). If you make a new cluster, you have to wait for a long 
> time till the index is built, depending on the data in your repository (if 
> you have tons of data, you have to wait a week or longer).
>
> The tables of the FS and PM will be shared between all cluster nodes - that 
> works.
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected] [mailto:[email protected]]
> Gesendet: Dienstag, 16. November 2010 10:54
> An: [email protected]
> Betreff: RE: Multiple instances of repository
>
> Since there could be n number of instances. So I can't decide the cluster id 
> beforehand.
> Hence I have the following code that creates a cluster id at run time.
>
> System.setProperty("org.apache.jackrabbit.core.cluster.node_id", 
> "cluster_id"+System.nanoTime());
>
> Similarly the repositoryHome path is generated at run time.
>
> But do I also need separate tables for workspace file system? I have the 
> following configuration for my workspace. Is it correct? The tables for the 
> workspace FS and PersistenceManager will be shared between all the nodes or 
> will these tables will be different?
>
> <?xml version="1.0"?>
> <!DOCTYPE Repository
>          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
>          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd";>
>
> <Repository>
>
>     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>                <param name="driver" value="javax.naming.InitialContext"/>
>                <param name="url" value="jdbc/amiDBDataSource"/>
>                <param name="databaseType" value="oracle"/>
>        <param name="copyWhenReading" value="true"/>
>        <param name="tablePrefix" value=""/>
>        <param name="schemaObjectPrefix" value="J_R_DS_"/>
>        <param name="schemaCheckEnabled" value="false"/>
>    </DataStore>
>
>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                <param name="driver" value="javax.naming.InitialContext"/>
>                <param name="url" value="jdbc/amiDBDataSource"/>
>                <!-- The following value must oracle for oracle server this is 
> not the same as the database schema -->
>                <param name="schema" value="oracle"/>
>                <param name="schemaObjectPrefix" value="J_R_FS_"/>
>                <param name="schemaCheckEnabled" value="false"/>
>        </FileSystem>
>
>        <Security appName="Jackrabbit">
>                <SecurityManager 
> class="repository.jcr.jackrabbit.EipSecurityManager" />
>                <AccessManager 
> class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
>                <LoginModule 
> class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>                        <param name="principalProvider" 
> value="repository.jcr.jackrabbit.EipPrincipalProvider" />
>                </LoginModule>
>        </Security>
>
>        <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
>
>        <Workspace name="${wsp.name}">
>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                        <param name="driver" 
> value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <!-- The following value must oracle for oracle server 
> this is not the same as the database schema -->
>                        <param name="schema" value="oracle"/>
>                        <param name="schemaObjectPrefix" 
> value="J_FS_${wsp.name}_"/>
>                        <param name="schemaCheckEnabled" value="false"/>
>                </FileSystem>
>                <PersistenceManager 
> class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>                        <param name="driver" 
> value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="tableSpace" value="" />
>                        <!-- The following value must oracle for oracle server 
> this is not the same as the database schema -->
>                        <param name="schema" value="oracle" />
>                        <param name="schemaObjectPrefix" 
> value="J_PM_${wsp.name}_" />
>                        <param name="externalBLOBs" value="false" />
>                        <param name="schemaCheckEnabled" value="false"/>
>                </PersistenceManager>
>                <SearchIndex 
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>            <param name="path" value="${wsp.home}/index"/>
>            <param name="supportHighlighting" value="true"/>
>        </SearchIndex>
>        </Workspace>
>
>        <Versioning rootPath="${rep.home}/version">
>
>                <FileSystem 
> class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                        <param name="driver" 
> value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <!-- The following value must oracle for oracle server 
> this is not the same as the database schema -->
>                        <param name="schema" value="oracle"/>
>                        <param name="schemaObjectPrefix" value="J_V_FS_"/>
>                        <param name="schemaCheckEnabled" value="false"/>
>                </FileSystem>
>                <!-- Change to Oracle Class <PersistenceManager 
> class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
>                <PersistenceManager 
> class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>                        <param name="driver" 
> value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="tableSpace" value="" />
>                        <!-- The following value must oracle for oracle server 
> this is not the same as the database schema -->
>                        <param name="schema" value="oracle" />
>                        <param name="schemaObjectPrefix" value="J_V_PM_" />
>                        <param name="externalBLOBs" value="false" />
>                        <param name="schemaCheckEnabled" value="false"/>
>                </PersistenceManager>
>
>        </Versioning>
>
>    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>        <param name="path" value="${rep.home}/search/index"/>
>        <param name="supportHighlighting" value="true"/>
>    </SearchIndex>
>
>        <Cluster syncDelay="2000">
>                <Journal 
> class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>                <param name="revision" value="${rep.home}/revision.log" />
>                        <param name="driver" 
> value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="schemaObjectPrefix" value="J_R_" />
>                        <param name="databaseType" value="oracle"/>
>                </Journal>
>        </Cluster>
>
> </Repository>
>
> Thanks,
> Nikhil
> -----Original Message-----
> From: Seidel. Robert [mailto:[email protected]]
> Sent: Tuesday, November 16, 2010 2:42 PM
> To: [email protected]
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> you need clustering, because all of your instances should access the same 
> repository.
>
> What you need is separate repository homes for each instance. In my use case 
> I have an installation directory for each instance, so the repository home is 
> located below this directory.
>
> You have to make sure, that each instance has also its own repository.xml 
> because you need to define different clusterIDs.
>
> And you have to define a cluster section in the repository.xml where the 
> journal is located, which is necessary for synchronization:
>
>    <Cluster id="node1" syncDelay="5000">
>      <Journal 
> class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>        <param name="driver" value="javax.naming.InitialContext"/>
>        <param name="url" value="jdbc/amiDBDataSource"/>
>          ...
>      </Journal>
>    </Cluster>
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected] [mailto:[email protected]]
> Gesendet: Dienstag, 16. November 2010 09:37
> An: [email protected]
> Betreff: RE: Multiple instances of repository
>
> Thanks for replying back. I will need little more help to understand the 
> things completely.
> I will just elaborate a bit more on my usage scenario. I am also attaching my 
> repository.xml file with this mail. Please let me know if you want to know 
> more about my environment.
>
> In my case, I want to keep all the data in one database and I want to use 
> jackrabbit as JCR over this database.
> I have the jackrabbit embedded in my application so the repository gets-up as 
> part of the application.
> Now this application reads some files from repository and also inserts some 
> data in repository.
> There could be two instances of the application app1 running on machine1 and 
> app2 running on machine2.
> So my application instances are different and I can create multiple 
> repository homes to avoid the locking problem but I still wants to insert the 
> data from these applications in same database tables.
> So if all the application instances use the same repository configuration 
> file and specify their own repository home.
> Will that work in my case? Will there be any consistency issues?
>
> When you say separate data store and separate persistence managers, you mean 
> separate repository configuration file or separate database tables for data 
> stores and persistence managers.
>
> My instances and the repositories operate separately from each other but they 
> still want to share the data. The data inserted by one application instance 
> should be visible to other instance. So they all should be inserting the data 
> in same tables, that's what my understanding is.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:[email protected]]
> Sent: Tuesday, November 16, 2010 1:22 PM
> To: [email protected]
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> if you want to use clustering, you have to define a repository home for each 
> cluster.
>
> Clustering is necessary, if you want to have the same data/indexes at all 
> cluster nodes - the key word is synchronization.
>
> If your instances and the repositories operate separately from each other, 
> you don't need clustering. Separate repository homes, data stores and 
> persistence managers will do the job.
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected] [mailto:[email protected]]
> Gesendet: Dienstag, 16. November 2010 08:33
> An: [email protected]
> Betreff: Multiple instances of repository
>
> Hi,
>
> I am using jackrabbit as JCR implementation in my project. I am running 
> jackrabbit with in my application in the same jvm.
> The application read the content from repository and also writes some content 
> in repository.
> There could be multiple concurrent instances of my application running on the 
> same or different machines.
> I have a configuration file for jackrabbit and I have a single repository 
> home for jackrabbit.
> Now as soon as one instance of the application is up and running, I can't run 
> the other instance as the first instance creates a lock file in repository 
> home.
> After doing some search I came to know about running the jackrabbit in 
> clustered mode.
> Now my question is even in this case I will have to specify a different 
> repository home for every run, right?
> That means I should form the repository home path at the run time because at 
> compile time I am not sure how many instance will be run.
> This is a standalone java application and theoretically n number of instance 
> can be run.
> My question is when I have to specify a different repository path for every 
> run, then the jackrabbit will work even with out clustering?
> Because .lock file will be different for different runs as the repository 
> home is different.
> I know I am missing something here, please help me.
> I am attaching my conf file with this mail.
>
> Thanks,
> Nikhil
>
>

Re: Multiple instances of repository

Reply via email to