Sebastien,

Just as a reminder, we use jackrabbit 1.4.  I'm not explicitly using the 
text-extractors.  Our repository.xml looks like this:

<Repository>
        <FileSystem 
class="org.apache.jackrabbit.core.fs.db.JNDIDatabaseFileSystem">
                <param name="dataSourceLocation" value="kycAppDataSource" />
                <param name="schema" value="mssql" />
                <param name="schemaObjectPrefix" value="J_R_FS_" />
                <param name="bundleCacheSize" value="8" />
                <param name="consistencyCheck" value="false" />
                <param name="minBlobSize" value="16384" />
        </FileSystem>

        <Security appName="Jackrabbit">
                <AccessManager
                        
class="org.apache.jackrabbit.core.security.SimpleAccessManager">
                </AccessManager>

                <LoginModule 
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
                        <param name="anonymousId" value="anonymous" />
                </LoginModule>
        </Security>

        <Workspaces rootPath="${rep.home}/workspaces"
                defaultWorkspace="default" />

        <Workspace name="${wsp.name}">
                <FileSystem 
class="org.apache.jackrabbit.core.fs.db.JNDIDatabaseFileSystem">
                        <param name="dataSourceLocation" 
value="kycAppDataSource" />
                        <param name="schema" value="mssql" />
                        <param name="schemaObjectPrefix" 
value="J_FS_${wsp.name}_" />
                        <param name="bundleCacheSize" value="8" />
                        <param name="consistencyCheck" value="false" />
                        <param name="minBlobSize" value="16384" />
                </FileSystem>
                <PersistenceManager
                        
class="org.apache.jackrabbit.core.persistence.db.JNDIDatabasePersistenceManager">
                        <param name="dataSourceLocation" 
value="kycAppDataSource" />
                        <param name="schema" value="mssql" />
                        <param name="schemaObjectPrefix" 
value="J_PM_${wsp.name}_" />
                        <param name="bundleCacheSize" value="8" />
                        <param name="consistencyCheck" value="false" />
                        <param name="minBlobSize" value="16384" />
                </PersistenceManager>
                <SearchIndex 
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
                        <param name="path" value="${wsp.home}/index" />
                </SearchIndex>
        </Workspace>

        <Versioning rootPath="${rep.home}/version">
                <FileSystem 
class="org.apache.jackrabbit.core.fs.db.JNDIDatabaseFileSystem">
                        <param name="dataSourceLocation" 
value="kycAppDataSource" />
                        <param name="schema" value="mssql" />
                        <param name="schemaObjectPrefix" value="J_V_FS_" />
                        <param name="bundleCacheSize" value="8" />
                        <param name="consistencyCheck" value="false" />
                        <param name="minBlobSize" value="16384" />
                </FileSystem>
                <PersistenceManager
                        
class="org.apache.jackrabbit.core.persistence.db.JNDIDatabasePersistenceManager">
                        <param name="dataSourceLocation" 
value="kycAppDataSource" />
                        <param name="schema" value="mssql" />
                        <param name="schemaObjectPrefix" value="J_V_PM_" />
                        <param name="bundleCacheSize" value="8" />
                        <param name="consistencyCheck" value="false" />
                        <param name="minBlobSize" value="16384" />
                </PersistenceManager>
        </Versioning>

        <SearchIndex 
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
                <param name="path" value="${rep.home}/repository/index" />
        </SearchIndex>
</Repository>

Our customers are alleviating the memory problem by restarting the servers 
daily.

The documents we store are numerous (thousands daily) and vary in size.  They 
are news articles (xml/html) and reports (rtf) and are all stored as binary 
content (base64 encoded).  We also store some attributes about these articles 
that are in string format.  We delete thousands of news articles per day when 
reports are finalized.  We do not need to be able to search the content of 
these articles - but I assume they are being indexed because we have specified 
SearchIndex elements in our repository xml.


Am I correct here?
Muguet


-----Original Message-----
From: Sébastien Launay [mailto:[email protected]] 
Sent: Tuesday, September 29, 2009 8:20 AM
To: [email protected]
Subject: Re: Memory issues with jackrabbit/lucene

Le 29/09/2009 13:51, Muguet Bradbury a écrit :
> Sebastien,
>
> Thanks for the reply.  Yes, we do store large documents (rtf and large xml 
> documents).  When we store each document, we create a session, add the 
> document, save the session, and close the session.  The LuceneTermBuffers 
> remain.  However, if the indexing occurs asynchronously, this may be what's 
> filling up the memory.  Eventually, the application gets an out of memory 
> exception.
This is clearly caused by the asynchronous indexing of binary properties.
You can also deactivate index of this kind of documents [1].

Can you provide more informations on these documents (size, number, ...) ?

> I will look into removing the SearchIndex elements from the repository.xml 
> and workspace.xml.  Do we also need to remove the index directories from the 
> wsp.home path?  Will removing the SearchIndex elements make retrieval of the 
> documents (with the node keys) slower?
>   
Removing the index directory is not mandatory as it will not be used
anymore. But, this consumes disk space so you can remove them.

Lucene indexes are only used for search features (XPath, SQL, AQM).
Node#getNodes(),  Node#getProperties(), Session#getNodeByUUID(),
... uses an asbtraction called PersistenceManager [2].
Default implementations of PersistenceManager do not use an index.

[1] http://jackrabbit.apache.org/jackrabbit-text-extractors.html
[2] http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ

--
Sébastien Launay


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
______________________________________________________________________

Reply via email to