Re: How are developers using jackrabbit

Vikas Bhatia Wed, 01 Aug 2007 08:23:07 -0700

Hello All,

I read somewhere that most of the dev folk are enjoying their summer
vacations :)


Thanks for your detailed replies so far.My content model primarily
deals with binary data with a lot of supporting nodes.

On 7/31/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote:
> Hello Bertrand,
> >
> > On 7/31/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote:
> > > Regarding your usecase, having around 36.000.000 documents
> > after one year .in one
> > > single ws with terabytes of data...so 100.000.000 docs
> > within three years...Well, I
> > > think you at least have to tune some settings :-)...
> >
> > Just to make sure there's no misunderstanding, the original post says
> > "nodes", not "documents".
>
> Yes you are right! I must have misunderstood since he is talking about 
> "pushing 300-500 nodes a minute" so I understood he meant pushing docs in JR 
> :-)
>

The reason I said nodes, is because we have different kinds of nodes
in the system, while most of them are documents, there are nodes that
are supporting, such as permissions or auditing etc. So for one
document added to JR, 3-4 nodes might be modified, and each document
has about 7-8 properties which could increase as we see usage
statistics.

We have tried to stay away from references, since we saw that this
could slow the system down tremendously and clog up the DB.

> >
> > So that's 36 million nodes a year, or 100 million after three years.
> > If it was documents, it might be many more nodes than that.
> >
> > Although I haven't run those tests myself, I've talked with people
> > doing tests with, IIRC, 150 million nodes, and such quantities are
> > also regularly mentioned in Lucene tests
>
> Yes I agree, but in these cases you really have to understand how to tune and 
> configure each seperate component, because for example, if you have a just 
> invalidated indexReader, and you are doing a search on a common word with a 
> sort on title, or some rangequery, you might run into problems with 150 
> million nodes.
>
> >, so I don't think this is
> > necessarily a problem. But of course, it depends on how nodes are
> > structured and on what's indexed.
>
> Indexing seems to me pretty important when having 150 million nodes. Actually 
> ATM I am sorting out the JackRabbit 1.4 release planned 
> IndexingConfigurationImpl possibilities, which look very promising to me 
> (though OTOH, people must know how to configure the indexing properly, and 
> this might be a bit harsh in the beginning because you really have to know 
> the content modelling structure AFAICS).
>
> But as I misunderstood the requirements regarding nodes, and you know people 
> who have succesful tests with 150 million nodes...well, then I will stick to 
> my remark that you need to know to tune some configuration parameters :-)
>
> Regards Ard
>
> >
> > -Bertrand
> >
>

Again, mine might be a common case or a unique case depending on how
you choose to look at it.

We have been using JR for a while and wondering whether there is a
secret sauce somewhere and hence this email, trying to gauge from the
community on their experiences.

Thanks.

V.

Re: How are developers using jackrabbit

Reply via email to