hi marcel,

On Sun, Sep 25, 2011 at 3:40 PM, Marcel Bruch <[email protected]> wrote:
> Hi,
>
> I'm looking for some advice whether Jackrabbit might be a good choice for my 
> problem. Any comments on this are greatly appreciated.
>
>
> = Short description of the challenge =
>
> We've built a Eclipse based tool that analyzes java source files and stores 
> its analysis results in additional files. The workspace  potentially has 
> hundreds of projects and each project may have up to a few thousands of 
> files. Say, there will be 200 projects and 1000 java source files per project 
> in a single workspace. Then, there will be 200*1000 = 200.000 files.
>
> On a full workspace build, all these 200k files have to be compiled (by the 
> IDE) and analyzed (by our tool) at once and the analysis results have to be 
> dumped to disk rather fast.
> But the most common use case is that a single file is changed several times 
> per minute and thus gets frequently analyzed.
>
> At the moment, the analysis results are dumped on disk as plain json files; 
> one json file for each java class. Each json file is around 5 to 100kb in 
> size; some files grow up to several megabytes (<10mb), these files have a few 
> hundred JSON complex nodes (which might perfectly map to nodes in JCR).
>
> = Question =
>
> We would like to change the simple file system approach by a more 
> sophisticated approach and I wonder whether Jackrabbit may be a suitable 
> backend for this use case. Since we map all our data to JSON already, it 
> looks like Jackrabbit/JCR is a perfect fit for this but I can't say for sure.
>
> What's your suggestion? Is Jackrabbit capable to quickly load and store 
> json-like data - even if 200k files (nodes + their sub-nodes) have to be 
> updated very in very short time?

absolutely. if the data is reasonably structured/organized jackrabbit
should be a perfect fit.
i suggest to leverage the java package space hierarchy for organizing the data
(i.e. org.apache.jackrabbit.core.TransientRepository ->
/org/apache/jackrabbit/core/TransientRepository).
for further data modeling recommondations see [0].

cheers
stefan

[0] http://wiki.apache.org/jackrabbit/DavidsModel

>
>
> Thanks for your suggestions. I've you need more details on what operations 
> are performed or how data looks like, I would be glad to take your questions.
>
> Marcel
>
> --
> Eclipse Code Recommenders:
>  w www.eclipse.org/recommenders
>  tw www.twitter.com/marcelbruch
>  g+ www.gplus.to/marcelbruch
>
>

Reply via email to