[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-11-03 Thread JIRA
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

[ http://issues.apache.org/jira/browse/JCR-550?page=3Dcomments#action_1=
2446852 ]=20
   =20
Claus K=C3=B6ll commented on JCR-550:


Hi Jukka ..

The issue JCR-574 is very different to this issue. The Problem was that the=
 LazyReader  has only catched Exceptions not Runtime Exceptions.
The Problem here is that i get a OutOfMemoryException while re-indexing a h=
uge Repository.
This is for me a very big problem because i can not work in a Production en=
vironment with Jackrabbit because we=20
have about 4-5 million documents (doc,xls,pdf). If i have to re-index the r=
epsoitory i can not to this.
I will try the vm-argument what marcel wrote.

claus

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 -=
-

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram]=20
 jackrabbit 1.0.1=20
 jdk 1.4.2_12=20
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more t=
hen 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack trac=
es.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump:=20
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=3Dorg.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=3Dpath value=3D${wsp.home}/index/
 param name=3DtextFilterClasses
 value=3Dorg.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter,=20
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 org.apache.jackrabbit.core.query.OpenOfficeTextFi=
lter/
 param name=3DuseCompoundFile value=3Dtrue/
 param name=3DminMergeDocs value=3D100/
 param name=3DvolatileIdleTime value=3D3/
 param name=3DmaxMergeDocs value=3D10/
 param name=3DmergeFactor value=3D10/
 param name=3DbufferSize value=3D10/
 param name=3DcacheSize value=3D1000/
 param name=3DforceConsistencyCheck value=3Dfalse/
 param name=3DautoRepair value=3Dtrue/
 param name=3DrespectDocumentOrder value=3Dfalse/
 param name=3Danalyzer
 value=3Dorg.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

--=20
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: htt=
p://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

   


[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-11-02 Thread Jukka Zitting (JIRA)
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

[ 
http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12446831 ] 

Jukka Zitting commented on JCR-550:
---

Does this issue still occur now that RuntimeExceptions are being catched per 
the JCR-574 fix?

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 --

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram] 
 jackrabbit 1.0.1 
 jdk 1.4.2_12 
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more then 
 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack traces.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump: 
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=org.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=path value=${wsp.home}/index/
 param name=textFilterClasses
 value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 
 org.apache.jackrabbit.core.query.OpenOfficeTextFilter/
 param name=useCompoundFile value=true/
 param name=minMergeDocs value=100/
 param name=volatileIdleTime value=3/
 param name=maxMergeDocs value=10/
 param name=mergeFactor value=10/
 param name=bufferSize value=10/
 param name=cacheSize value=1000/
 param name=forceConsistencyCheck value=false/
 param name=autoRepair value=true/
 param name=respectDocumentOrder value=false/
 param name=analyzer
 value=org.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-11-02 Thread Marcel Reutegger (JIRA)
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

[ 
http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12446843 ] 

Marcel Reutegger commented on JCR-550:
--

Claus wrote:
 is there another way to get a dump file ?

Acutally there is. jdk 1.4.2-12 supports the option 
-XX:+HeapDumpOnOutOfMemoryError

With this option the JVM will create a dump it goes out of memory.

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 --

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram] 
 jackrabbit 1.0.1 
 jdk 1.4.2_12 
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more then 
 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack traces.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump: 
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=org.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=path value=${wsp.home}/index/
 param name=textFilterClasses
 value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 
 org.apache.jackrabbit.core.query.OpenOfficeTextFilter/
 param name=useCompoundFile value=true/
 param name=minMergeDocs value=100/
 param name=volatileIdleTime value=3/
 param name=maxMergeDocs value=10/
 param name=mergeFactor value=10/
 param name=bufferSize value=10/
 param name=cacheSize value=1000/
 param name=forceConsistencyCheck value=false/
 param name=autoRepair value=true/
 param name=respectDocumentOrder value=false/
 param name=analyzer
 value=org.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-09-07 Thread JIRA
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

[ http://issues.apache.org/jira/browse/JCR-550?page=3Dcomments#action_1=
2433037 ]=20
   =20
Claus K=C3=B6ll commented on JCR-550:


hi @ all=20
in my case the most file types in my repository are word documents.
if i remove the org.apache.jackrabbit.core.query.MsWordTextFilter  class th=
e re-index process works fine.
but if i enable the filter the process ends with a outofmemory error.
i think we must look for a memory leak ...
claus

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 -=
-

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram]=20
 jackrabbit 1.0.1=20
 jdk 1.4.2_12=20
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more t=
hen 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack trac=
es.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump:=20
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=3Dorg.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=3Dpath value=3D${wsp.home}/index/
 param name=3DtextFilterClasses
 value=3Dorg.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter,=20
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 org.apache.jackrabbit.core.query.OpenOfficeTextFi=
lter/
 param name=3DuseCompoundFile value=3Dtrue/
 param name=3DminMergeDocs value=3D100/
 param name=3DvolatileIdleTime value=3D3/
 param name=3DmaxMergeDocs value=3D10/
 param name=3DmergeFactor value=3D10/
 param name=3DbufferSize value=3D10/
 param name=3DcacheSize value=3D1000/
 param name=3DforceConsistencyCheck value=3Dfalse/
 param name=3DautoRepair value=3Dtrue/
 param name=3DrespectDocumentOrder value=3Dfalse/
 param name=3Danalyzer
 value=3Dorg.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

--=20
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: htt=
p://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

   


[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-09-06 Thread JIRA
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

[ http://issues.apache.org/jira/browse/JCR-550?page=3Dcomments#action_1=
2432776 ]=20
   =20
Claus K=C3=B6ll commented on JCR-550:


hi marcel
the vm argument  -Xrunhprof:heap=3Dsites,doe=3Dn=20
does not work in my case. the re-index process stops after about 1-2 minute=
s with a outofmemory-error
is there another way to get a dump file ?
claus

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 -=
-

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram]=20
 jackrabbit 1.0.1=20
 jdk 1.4.2_12=20
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more t=
hen 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack trac=
es.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump:=20
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=3Dorg.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=3Dpath value=3D${wsp.home}/index/
 param name=3DtextFilterClasses
 value=3Dorg.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter,=20
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 org.apache.jackrabbit.core.query.OpenOfficeTextFi=
lter/
 param name=3DuseCompoundFile value=3Dtrue/
 param name=3DminMergeDocs value=3D100/
 param name=3DvolatileIdleTime value=3D3/
 param name=3DmaxMergeDocs value=3D10/
 param name=3DmergeFactor value=3D10/
 param name=3DbufferSize value=3D10/
 param name=3DcacheSize value=3D1000/
 param name=3DforceConsistencyCheck value=3Dfalse/
 param name=3DautoRepair value=3Dtrue/
 param name=3DrespectDocumentOrder value=3Dfalse/
 param name=3Danalyzer
 value=3Dorg.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

--=20
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: htt=
p://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

   


[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-09-04 Thread JIRA
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

[ http://issues.apache.org/jira/browse/JCR-550?page=3Dcomments#action_1=
2432434 ]=20
   =20
Claus K=C3=B6ll commented on JCR-550:


I tried to re-index my repsoitory without the text filters and it works fin=
e.
So the bug is in one of the text filters ...
These text filters i used before=20
org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter
org.apache.jackrabbit.core.query.MsExcelTextFilter
org.apache.jackrabbit.core.query.MsPowerPointTextFilter
org.apache.jackrabbit.core.query.MsWordTextFilter
org.apache.jackrabbit.core.query.PdfTextFilter
org.apache.jackrabbit.core.query.HTMLTextFilter
org.apache.jackrabbit.core.query.XMLTextFilter
org.apache.jackrabbit.core.query.RTFTextFilter
org.apache.jackrabbit.core.query.OpenOfficeTextFilter

So i can test to re-index the repository without some filters  ... Please g=
ive me a hint wich one i should use ???

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 -=
-

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram]=20
 jackrabbit 1.0.1=20
 jdk 1.4.2_12=20
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more t=
hen 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack trac=
es.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump:=20
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=3Dorg.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=3Dpath value=3D${wsp.home}/index/
 param name=3DtextFilterClasses
 value=3Dorg.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter,=20
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 org.apache.jackrabbit.core.query.OpenOfficeTextFi=
lter/
 param name=3DuseCompoundFile value=3Dtrue/
 param name=3DminMergeDocs value=3D100/
 param name=3DvolatileIdleTime value=3D3/
 param name=3DmaxMergeDocs value=3D10/
 param name=3DmergeFactor value=3D10/
 param name=3DbufferSize value=3D10/
 param name=3DcacheSize value=3D1000/
 param name=3DforceConsistencyCheck value=3Dfalse/
 param name=3DautoRepair value=3Dtrue/
 param name=3DrespectDocumentOrder value=3Dfalse/
 param name=3Danalyzer
 value=3Dorg.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

--=20
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: htt=
p://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

   


[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-09-01 Thread Christian Zanata (JIRA)
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

[ 
http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12432089 ] 

Christian Zanata commented on JCR-550:
--

Hi Marcel,
we think the problem is in PdfTextFilter or in PdfBox libraries. We are not 
sure about that and we still investigate in that direction. It seems that after 
an exception something don't free the resources.

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 --

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram] 
 jackrabbit 1.0.1 
 jdk 1.4.2_12 
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more then 
 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack traces.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump: 
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=org.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=path value=${wsp.home}/index/
 param name=textFilterClasses
 value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 
 org.apache.jackrabbit.core.query.OpenOfficeTextFilter/
 param name=useCompoundFile value=true/
 param name=minMergeDocs value=100/
 param name=volatileIdleTime value=3/
 param name=maxMergeDocs value=10/
 param name=mergeFactor value=10/
 param name=bufferSize value=10/
 param name=cacheSize value=1000/
 param name=forceConsistencyCheck value=false/
 param name=autoRepair value=true/
 param name=respectDocumentOrder value=false/
 param name=analyzer
 value=org.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-09-01 Thread Ian Boston (JIRA)
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

[ 
http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12432119 ] 

Ian Boston commented on JCR-550:


2 things with pdfbox

If you dont religiously close the streams it causes oom problems and a GC wont 
get to the finalize fast enough to avoid OOM

2. PDFBox has to build the entire document including all the graphics images 
before it can render the text. If you have a refactored PDF you can get 1000's 
of graphics line segments, this causes PDBBox to use lots of CPU and Heap 
converting to a text stream.

I am using PDFBox in a different search engine in the same way and it randomly 
causes lots of problems with refactored PDF files.

HTH

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 --

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram] 
 jackrabbit 1.0.1 
 jdk 1.4.2_12 
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more then 
 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack traces.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump: 
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=org.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=path value=${wsp.home}/index/
 param name=textFilterClasses
 value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 
 org.apache.jackrabbit.core.query.OpenOfficeTextFilter/
 param name=useCompoundFile value=true/
 param name=minMergeDocs value=100/
 param name=volatileIdleTime value=3/
 param name=maxMergeDocs value=10/
 param name=mergeFactor value=10/
 param name=bufferSize value=10/
 param name=cacheSize value=1000/
 param name=forceConsistencyCheck value=false/
 param name=autoRepair value=true/
 param name=respectDocumentOrder value=false/
 param name=analyzer
 value=org.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-08-31 Thread Marcel Reutegger (JIRA)
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

[ 
http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12431955 ] 

Marcel Reutegger commented on JCR-550:
--

To reproduce this issue I tried to re-index a repository with 100'000 nodes. I 
was able to re-index the repository with as little as 32 mb heap size. My 
profiler did not show any exceptional memory usage in the search index. The 
memory usage was actually quite low.

Can you please try to re-index your repository without the text filters? Maybe 
there is a memory leak in one of the filters when an exception is thrown on an 
invalid or corrupt document.

Having a heap dump for analysis would also be helpful. Can you please run the 
re-indexing process with the following JVM option: -Xrunhprof:heap=sites,doe=n
This will allow you to create a heap dump on a Ctrl-Break (on Windows) or kill 
-QUIT (on Unix) on the JVM process.

Thanks a lot.

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 --

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram] 
 jackrabbit 1.0.1 
 jdk 1.4.2_12 
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more then 
 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack traces.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump: 
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=org.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=path value=${wsp.home}/index/
 param name=textFilterClasses
 value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 
 org.apache.jackrabbit.core.query.OpenOfficeTextFilter/
 param name=useCompoundFile value=true/
 param name=minMergeDocs value=100/
 param name=volatileIdleTime value=3/
 param name=maxMergeDocs value=10/
 param name=mergeFactor value=10/
 param name=bufferSize value=10/
 param name=cacheSize value=1000/
 param name=forceConsistencyCheck value=false/
 param name=autoRepair value=true/
 param name=respectDocumentOrder value=false/
 param name=analyzer
 value=org.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-08-29 Thread Marcel Reutegger (JIRA)
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

[ 
http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12431236 ] 

Marcel Reutegger commented on JCR-550:
--

Your log files seem to indicate that some of your content is corrupt:

Caused by: java.lang.IllegalArgumentException: invalid QName literal
at org.apache.jackrabbit.name.QName.valueOf(QName.java:618)
at 
org.apache.jackrabbit.core.state.util.Serializer.deserialize(Serializer.java:124)
at 
org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager.load(ObjectPersistenceManager.java:206)
... 61 more


Please note that using the ObjectPersistenceManager on a production system is 
not recommended because it is not transactional. You should consider using 
DerbyPersistenceManager as your version storage.

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 --

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram] 
 jackrabbit 1.0.1 
 jdk 1.4.2_12 
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more then 
 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack traces.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump: 
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=org.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=path value=${wsp.home}/index/
 param name=textFilterClasses
 value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 
 org.apache.jackrabbit.core.query.OpenOfficeTextFilter/
 param name=useCompoundFile value=true/
 param name=minMergeDocs value=100/
 param name=volatileIdleTime value=3/
 param name=maxMergeDocs value=10/
 param name=mergeFactor value=10/
 param name=bufferSize value=10/
 param name=cacheSize value=1000/
 param name=forceConsistencyCheck value=false/
 param name=autoRepair value=true/
 param name=respectDocumentOrder value=false/
 param name=analyzer
 value=org.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira