Re: Multiple instances of repository

Justin Edelson Wed, 17 Nov 2010 12:51:01 -0800

On Wed, Nov 17, 2010 at 12:05 PM,  <[email protected]> wrote:
> So I will have to run a cluster configuration on this machine1, because I 
> will have two independent JVMs hitting on
> the same repository?
Yes.


> I really don't want to run  cluster nodes on a single machine, just so that 
> different JVMs can access the repository.
> That doesn't look correct. I am sure that will be better ways to solve this 
> issue as well.
Although I suspect this isn't typical, there's nothing wrong with
this. Multiple JVMs = cluster nodes; doesn't really matter if they're
on the same physical machine or multiple physical machines.

Justin

>
> Any ideas will be of great help.
>
> -Nikhil
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of 
> Justin Edelson
> Sent: Wednesday, November 17, 2010 12:12 AM
> To: [email protected]
> Subject: Re: Multiple instances of repository
>
> Nikhil-
> I think you should rethink you're architecture. It really doesn't make
> sense to be bringing repository instances up only for a 2-4 minute
> job. Instead, you should think about using the Command pattern and
> package your "applications" as executable jobs which can be run inside
> a long-running VM against a local repository instance (i.e. making
> in-process calls instead of RMI or DavEx).
>
> This is where something like OSGi and Apache Sling can be *very*
> helpful, but there are obviously other ways to add/remove jobs at
> runtime. See, for example, Sling's Scheduler support:
> http://sling.apache.org/site/scheduler-service-commons-scheduler.html
>
> Justin
>
> On Tue, Nov 16, 2010 at 5:16 AM,  <[email protected]> wrote:
>> Thanks for your inputs, they are really helpful.
>>
>> Well, so does my application is not a good candidate to use jackrabbit.
>>
>> The other option, I had was to use jackrabbit in client-server mode. In this 
>> case I will be accessing the repository from RMI. But in the jackrabbit 
>> documents it has been mentioned that RMI is not optimized for performance 
>> and I should use embedded repository instance in my application code for 
>> better performance.
>>
>> I can remove the search functionality from these clusters, because the life 
>> span of these will be very short. The application will take 2-4 minutes to 
>> do its job and I don't think we really need search for these clusters.
>>
>> But my question is, should I really use the clustering feature. I mean 
>> cluster nodes should normally have a longer life span. But here in this case 
>> the nodes will have very short life span 2-4 minutes.
>> I am kind of finding it hard to use these short span applications as cluster 
>> nodes.
>>
>> Thanks,
>> Nikhil
>>
>> -----Original Message-----
>> From: Seidel. Robert [mailto:[email protected]]
>> Sent: Tuesday, November 16, 2010 3:33 PM
>> To: [email protected]
>> Subject: AW: Multiple instances of repository
>>
>> Hi Nikhil,
>>
>> I don't know if it will work (setProperty), but you have another problem. 
>> The Lucene search index is always saved in the file system. And afaik, each 
>> repository home needs its own index directories (so you have the index files 
>> for each cluster). If you make a new cluster, you have to wait for a long 
>> time till the index is built, depending on the data in your repository (if 
>> you have tons of data, you have to wait a week or longer).
>>
>> The tables of the FS and PM will be shared between all cluster nodes - that 
>> works.
>>
>> Kindly regards, Robert
>>
>> -----Ursprüngliche Nachricht-----
>> Von: [email protected] [mailto:[email protected]]
>> Gesendet: Dienstag, 16. November 2010 10:54
>> An: [email protected]
>> Betreff: RE: Multiple instances of repository
>>
>> Since there could be n number of instances. So I can't decide the cluster id 
>> beforehand.
>> Hence I have the following code that creates a cluster id at run time.
>>
>> System.setProperty("org.apache.jackrabbit.core.cluster.node_id", 
>> "cluster_id"+System.nanoTime());
>>
>> Similarly the repositoryHome path is generated at run time.
>>
>> But do I also need separate tables for workspace file system? I have the 
>> following configuration for my workspace. Is it correct? The tables for the 
>> workspace FS and PersistenceManager will be shared between all the nodes or 
>> will these tables will be different?
>>
>> <?xml version="1.0"?>
>> <!DOCTYPE Repository
>>          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
>>          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd";>
>>
>> <Repository>
>>
>>     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>>                <param name="driver" value="javax.naming.InitialContext"/>
>>                <param name="url" value="jdbc/amiDBDataSource"/>
>>                <param name="databaseType" value="oracle"/>
>>        <param name="copyWhenReading" value="true"/>
>>        <param name="tablePrefix" value=""/>
>>        <param name="schemaObjectPrefix" value="J_R_DS_"/>
>>        <param name="schemaCheckEnabled" value="false"/>
>>    </DataStore>
>>
>>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>>                <param name="driver" value="javax.naming.InitialContext"/>
>>                <param name="url" value="jdbc/amiDBDataSource"/>
>>                <!-- The following value must oracle for oracle server this 
>> is not the same as the database schema -->
>>                <param name="schema" value="oracle"/>
>>                <param name="schemaObjectPrefix" value="J_R_FS_"/>
>>                <param name="schemaCheckEnabled" value="false"/>
>>        </FileSystem>
>>
>>        <Security appName="Jackrabbit">
>>                <SecurityManager 
>> class="repository.jcr.jackrabbit.EipSecurityManager" />
>>                <AccessManager 
>> class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
>>                <LoginModule 
>> class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>>                        <param name="principalProvider" 
>> value="repository.jcr.jackrabbit.EipPrincipalProvider" />
>>                </LoginModule>
>>        </Security>
>>
>>        <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" 
>> />
>>
>>        <Workspace name="${wsp.name}">
>>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>>                        <param name="driver" 
>> value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <!-- The following value must oracle for oracle 
>> server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle"/>
>>                        <param name="schemaObjectPrefix" 
>> value="J_FS_${wsp.name}_"/>
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </FileSystem>
>>                <PersistenceManager 
>> class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>>                        <param name="driver" 
>> value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <param name="tableSpace" value="" />
>>                        <!-- The following value must oracle for oracle 
>> server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle" />
>>                        <param name="schemaObjectPrefix" 
>> value="J_PM_${wsp.name}_" />
>>                        <param name="externalBLOBs" value="false" />
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </PersistenceManager>
>>                <SearchIndex 
>> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>            <param name="path" value="${wsp.home}/index"/>
>>            <param name="supportHighlighting" value="true"/>
>>        </SearchIndex>
>>        </Workspace>
>>
>>        <Versioning rootPath="${rep.home}/version">
>>
>>                <FileSystem 
>> class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>>                        <param name="driver" 
>> value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <!-- The following value must oracle for oracle 
>> server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle"/>
>>                        <param name="schemaObjectPrefix" value="J_V_FS_"/>
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </FileSystem>
>>                <!-- Change to Oracle Class <PersistenceManager 
>> class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
>>                <PersistenceManager 
>> class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>>                        <param name="driver" 
>> value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <param name="tableSpace" value="" />
>>                        <!-- The following value must oracle for oracle 
>> server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle" />
>>                        <param name="schemaObjectPrefix" value="J_V_PM_" />
>>                        <param name="externalBLOBs" value="false" />
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </PersistenceManager>
>>
>>        </Versioning>
>>
>>    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>        <param name="path" value="${rep.home}/search/index"/>
>>        <param name="supportHighlighting" value="true"/>
>>    </SearchIndex>
>>
>>        <Cluster syncDelay="2000">
>>                <Journal 
>> class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>>                <param name="revision" value="${rep.home}/revision.log" />
>>                        <param name="driver" 
>> value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <param name="schemaObjectPrefix" value="J_R_" />
>>                        <param name="databaseType" value="oracle"/>
>>                </Journal>
>>        </Cluster>
>>
>> </Repository>
>>
>> Thanks,
>> Nikhil
>> -----Original Message-----
>> From: Seidel. Robert [mailto:[email protected]]
>> Sent: Tuesday, November 16, 2010 2:42 PM
>> To: [email protected]
>> Subject: AW: Multiple instances of repository
>>
>> Hi Nikhil,
>>
>> you need clustering, because all of your instances should access the same 
>> repository.
>>
>> What you need is separate repository homes for each instance. In my use case 
>> I have an installation directory for each instance, so the repository home 
>> is located below this directory.
>>
>> You have to make sure, that each instance has also its own repository.xml 
>> because you need to define different clusterIDs.
>>
>> And you have to define a cluster section in the repository.xml where the 
>> journal is located, which is necessary for synchronization:
>>
>>    <Cluster id="node1" syncDelay="5000">
>>      <Journal 
>> class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>>        <param name="driver" value="javax.naming.InitialContext"/>
>>        <param name="url" value="jdbc/amiDBDataSource"/>
>>          ...
>>      </Journal>
>>    </Cluster>
>>
>> Kindly regards, Robert
>>
>> -----Ursprüngliche Nachricht-----
>> Von: [email protected] [mailto:[email protected]]
>> Gesendet: Dienstag, 16. November 2010 09:37
>> An: [email protected]
>> Betreff: RE: Multiple instances of repository
>>
>> Thanks for replying back. I will need little more help to understand the 
>> things completely.
>> I will just elaborate a bit more on my usage scenario. I am also attaching 
>> my repository.xml file with this mail. Please let me know if you want to 
>> know more about my environment.
>>
>> In my case, I want to keep all the data in one database and I want to use 
>> jackrabbit as JCR over this database.
>> I have the jackrabbit embedded in my application so the repository gets-up 
>> as part of the application.
>> Now this application reads some files from repository and also inserts some 
>> data in repository.
>> There could be two instances of the application app1 running on machine1 and 
>> app2 running on machine2.
>> So my application instances are different and I can create multiple 
>> repository homes to avoid the locking problem but I still wants to insert 
>> the data from these applications in same database tables.
>> So if all the application instances use the same repository configuration 
>> file and specify their own repository home.
>> Will that work in my case? Will there be any consistency issues?
>>
>> When you say separate data store and separate persistence managers, you mean 
>> separate repository configuration file or separate database tables for data 
>> stores and persistence managers.
>>
>> My instances and the repositories operate separately from each other but 
>> they still want to share the data. The data inserted by one application 
>> instance should be visible to other instance. So they all should be 
>> inserting the data in same tables, that's what my understanding is.
>>
>> Thanks,
>> Nikhil
>>
>> -----Original Message-----
>> From: Seidel. Robert [mailto:[email protected]]
>> Sent: Tuesday, November 16, 2010 1:22 PM
>> To: [email protected]
>> Subject: AW: Multiple instances of repository
>>
>> Hi Nikhil,
>>
>> if you want to use clustering, you have to define a repository home for each 
>> cluster.
>>
>> Clustering is necessary, if you want to have the same data/indexes at all 
>> cluster nodes - the key word is synchronization.
>>
>> If your instances and the repositories operate separately from each other, 
>> you don't need clustering. Separate repository homes, data stores and 
>> persistence managers will do the job.
>>
>> Kindly regards, Robert
>>
>> -----Ursprüngliche Nachricht-----
>> Von: [email protected] [mailto:[email protected]]
>> Gesendet: Dienstag, 16. November 2010 08:33
>> An: [email protected]
>> Betreff: Multiple instances of repository
>>
>> Hi,
>>
>> I am using jackrabbit as JCR implementation in my project. I am running 
>> jackrabbit with in my application in the same jvm.
>> The application read the content from repository and also writes some 
>> content in repository.
>> There could be multiple concurrent instances of my application running on 
>> the same or different machines.
>> I have a configuration file for jackrabbit and I have a single repository 
>> home for jackrabbit.
>> Now as soon as one instance of the application is up and running, I can't 
>> run the other instance as the first instance creates a lock file in 
>> repository home.
>> After doing some search I came to know about running the jackrabbit in 
>> clustered mode.
>> Now my question is even in this case I will have to specify a different 
>> repository home for every run, right?
>> That means I should form the repository home path at the run time because at 
>> compile time I am not sure how many instance will be run.
>> This is a standalone java application and theoretically n number of instance 
>> can be run.
>> My question is when I have to specify a different repository path for every 
>> run, then the jackrabbit will work even with out clustering?
>> Because .lock file will be different for different runs as the repository 
>> home is different.
>> I know I am missing something here, please help me.
>> I am attaching my conf file with this mail.
>>
>> Thanks,
>> Nikhil
>>
>>
>

Re: Multiple instances of repository

Reply via email to