Thanks Clay. I am not trying to load that many records at once. The application is crawling a directory. It places the files from that directory into JackRabbit one at a time, and puts a content id onto a queue which is picked up by consumers on different servers. Those consumers then use the content id to retrieve the file from JackRabbit. Each piece of content is saved in a node under the root node. The performance slowdown is coming from calling session.getRootNode(), from what I can gather from the docs I need the root node in order to add a child node. Note the slowdown is pretty significant and I don't need to have close to 50k to start seeing it (I start seeing it within a few minutes of running my app). I don't need orderable nodes, how do I disable that?

On 11/13/2015 03:10 PM, Clay Ferguson wrote:
​Please let us know more about your use case. Why are you even "trying" to
load that many records all at once. Or at least scan them one by one, I
mean. In most use cases you wouldn't need to do this kind of thing, unless
it's some kind of backup or replication. I say "most" cases... I'm not
  saying you don't need to just asking for a bit more background. BTW: If
you don't need 'orderable' nodes try to avoid them. That type of node does
not work at 'scale'... and 50K is propably pushing it.​

Best regards,
Clay Ferguson
[email protected]


On Fri, Nov 13, 2015 at 3:33 PM, <[email protected]> wrote:

Hi,
I am new to JackRabbit and using version 2.11.2.  I am using JackRabbit to
store documents in a multi-threaded environment.  I noticed that the time
it takes to retrieve the root node is inconsistent and slow (several
seconds +) and degrades over time (after 50K plus child nodes retrieval is
taking ~15 seconds).

Originally, I was using code as follows to obtain a repository:

  public Repository getRepository() throws ClassNotFoundException,
RepositoryException {

  
ServiceLoader.load(Class.forName("org.apache.jackrabbit.jcr2dav.Jcr2davRepositoryFactory"));
      return JcrUtils.getRepository(jackabbitServerUrl);
  }

Then I came across the following thread:

http://jackrabbit.510166.n4.nabble.com/getRootNode-takes-27-seconds-td1571027.html#a1571302

This thread had some useful information (BatchReadConfig), but I am not
certain how to use the API to take advantage of it.  I have changed my code
to the following but it doesn't appear that node retrieval performance has
improved, is there something I am missing/doing wrong?

1) Repository Factory
public Repository getRepository(@SuppressWarnings("rawtypes") Map
parameters) throws RepositoryException {
         String repositoryFactoryName = parameters != null && (
                 parameters.containsKey(PARAM_REPOSITORY_SERVICE_FACTORY) ||
                         parameters.containsKey(PARAM_REPOSITORY_CONFIG))
                 ? "org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory"
                 : "org.apache.jackrabbit.core.RepositoryFactoryImpl";

         Object repositoryFactory;
         try {
             Class<?> repositoryFactoryClass =
Class.forName(repositoryFactoryName, true,
                     Thread.currentThread().getContextClassLoader());

             repositoryFactory = repositoryFactoryClass.newInstance();
         }
         catch (Exception e) {
             throw new RepositoryException(e);
         }

         if (repositoryFactory instanceof RepositoryFactory) {
             return ((RepositoryFactory)
repositoryFactory).getRepository(parameters);
         }
         else {
             throw new RepositoryException(repositoryFactory + " is not a
RepositoryFactory");
         }
     }

2) Use the factory to get a repo:
  public Repository getRepository() throws ClassNotFoundException,
RepositoryException {
         Map<String, RepositoryConfig> parameters =
Collections.singletonMap(
                 "org.apache.jackrabbit.jcr2spi.RepositoryConfig",
                 (RepositoryConfig) new
RepositoryConfigImpl(jackabbitServerUrl));

         return getRepository(parameters);
     }

3) Repository Config:
private static final class RepositoryConfigImpl implements
RepositoryConfig {

         private String jackabbitServerUrl;

         private RepositoryConfigImpl(String jackabbitServerUrl) {
             super();
             this.jackabbitServerUrl = jackabbitServerUrl;
         }

         public CacheBehaviour getCacheBehaviour() {
             return CacheBehaviour.INVALIDATE;
         }

         public int getItemCacheSize() {
             return 100;
         }

         public int getPollTimeout() {
             return 5000;
         }

         public RepositoryService getRepositoryService() throws
RepositoryException {
             BatchReadConfig brc = new BatchReadConfig() {
                 public int getDepth(Path path, PathResolver resolver)
throws NamespaceException {
                     return 1;
                 }
             };
             return new RepositoryServiceImpl(jackabbitServerUrl, brc);
         }

     }

Thanks for your time.

David





Reply via email to