[jira] Created: (JCR-550) ObservationManagerFactory) -

2006-08-29 Thread Christian Zanata (JIRA)
OutOfMemoryError when re-indexing the repository
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
--

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram] 
jackrabbit 1.0.1 
jdk 1.4.2_12 
Intel Xeon 3.2GHz with 2Gb of memory

poi-3.0-alpha2-20060616.jar
poi-contrib-3.0-alpha2-20060616.jar
poi-scratchpad-3.0-alpha2-20060616.jar
jackrabbit-core-1.0.1.jar
jackrabbit-index-filters-1.0.1.jar
jackrabbit-jcr-commons-1.0.1.jar
jcr-1.0.jar
tm-extractors-0.4.jar
lucene-1.4.3.jar

Reporter: Christian Zanata
 Attachments: log_files.zip

[ERROR] 20060825 17:06:40
(org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError

when we try to re-index a repository, the repository is quite big (more then 4 
Gb of disk usage) and sometimes it stores 40Mb size documents.

As attach I put all the last logs we registered, with the full stack traces.

Related to this whe have also errors with Lucene:

[DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
- Dump: 
java.io.IOException: Invalid header signature; read 8656037701166316554,
expected -2226271756974174256
at org.apache.jackrabbit.core.query.MsWordTextFilter

and then this ones:

[DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
[ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
not shut down properly.
[ERROR] 20060803 09:33:14
(org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
Synchronous EventConsumer threw exception.
java.lang.NullPointerException: null values not allowed

this is our repository.xml configuration for indexing

SearchIndex
class=org.apache.jackrabbit.core.query.lucene.SearchIndex
param name=path value=${wsp.home}/index/
param name=textFilterClasses
value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
org.apache.jackrabbit.core.query.MsExcelTextFilter,
org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
org.apache.jackrabbit.core.query.MsWordTextFilter,
org.apache.jackrabbit.core.query.PdfTextFilter,
org.apache.jackrabbit.core.query.HTMLTextFilter,
org.apache.jackrabbit.core.query.XMLTextFilter,
org.apache.jackrabbit.core.query.RTFTextFilter,
org.apache.jackrabbit.core.query.OpenOfficeTextFilter/
param name=useCompoundFile value=true/
param name=minMergeDocs value=100/
param name=volatileIdleTime value=3/
param name=maxMergeDocs value=10/
param name=mergeFactor value=10/
param name=bufferSize value=10/
param name=cacheSize value=1000/
param name=forceConsistencyCheck value=false/
param name=autoRepair value=true/
param name=respectDocumentOrder value=false/
param name=analyzer
value=org.apache.lucene.analysis.standard.StandardAnalyzer/
/SearchIndex

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: problem retrieving nodes from different workspaces

2006-08-29 Thread J Kuijpers

Were you able to reproduce our problem?


Jukka Zitting-3 wrote:
 
 Hi,
 
 On 8/28/06, J Kuijpers [EMAIL PROTECTED] wrote:
 Supplied repository.xml and runnable MultipleWorkspaceTest.java
 http://www.nabble.com/user-files/235783/repository.xml repository.xml
 http://www.nabble.com/user-files/235784/MultipleWorkspaceTest.java
 MultipleWorkspaceTest.java
 
 The MultipleWorkspaceTest.java file appears to be empty. Could you
 resend it, inline if necessary?
 
 BR,
 
 Jukka Zitting
 
 -- 
 Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
 Software craftsmanship, JCR consulting, and Java development
 
 

-- 
View this message in context: 
http://www.nabble.com/problem-retrieving-nodes-from-different-workspaces-tf2177041.html#a6037018
Sent from the Jackrabbit - Dev forum at Nabble.com.



Re: problem retrieving nodes from different workspaces

2006-08-29 Thread Marcel Reutegger

Your repository.xml file is broken.

You have:

PersistenceManager
  class=org.apache.jackrabbit.core.state.db.DerbyPersistenceManager
param name=url 
value=jdbc:derby:${rep.home}/version/db;create=true/

param name=schemaObjectPrefix value=version_/
/PersistenceManager


A fixed value for the parameter 'schemaObjectPrefix' will cause 
Jackrabbit to write content of multiple workspaces into the same 
table, thus possibly overwriting content.


You must use a value that includes the workspace name as a variable.

E.g. the sample configuration uses this:

param name=schemaObjectPrefix value=${wsp.name}_/

See also:
https://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit/src/main/config/repository.xml

Using the sample repository.xml the test works fine even with a 
shutdown in between.


regards
 marcel



J Kuijpers wrote:

Were you able to reproduce our problem?


Jukka Zitting-3 wrote:

Hi,

On 8/28/06, J Kuijpers [EMAIL PROTECTED] wrote:

Supplied repository.xml and runnable MultipleWorkspaceTest.java
http://www.nabble.com/user-files/235783/repository.xml repository.xml
http://www.nabble.com/user-files/235784/MultipleWorkspaceTest.java
MultipleWorkspaceTest.java

The MultipleWorkspaceTest.java file appears to be empty. Could you
resend it, inline if necessary?

BR,

Jukka Zitting

--
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR consulting, and Java development







--
Marcel Reutegger
Day Management AG
Barfuesserplatz 6, 4001 Basel Switzerland

[EMAIL PROTECTED]
www.day.com

T 41 61 226 98 98
F 41 61 226 98 97

This message is a private communication. If you are
not the intended recipient, please do not read, copy,
or use it, and do not disclose it to others. Please
notify the sender of the delivery error by replying to
this message, and then delete it from your system.
Thank you. The sender does not assume any liability
for timely, trouble-free, complete, virus free, secure,
error free or uninterrupted arrival of this e-mail. For
verification please request a hard copy version.


Re: ItemNotFoundException while switching between workspaces

2006-08-29 Thread Marcel Reutegger

quipere wrote:

See
http://www.nabble.com/problem-retrieving-nodes-from-different-workspaces-tf2177041.html
Is about the same problem, doesn't throw ItemNotFounException but returns
unexpected nodes. I asume this because example code lacks an ordeable
noodtypedefinition.


can you please check your repository.xml and see if there is the same 
configuration issue as with the other 'workspace test'.


regards
 marcel


[jira] Assigned: (JCR-550) ObservationManagerFactory) -

2006-08-29 Thread Marcel Reutegger (JIRA)
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

 [ http://issues.apache.org/jira/browse/JCR-550?page=all ]

Marcel Reutegger reassigned JCR-550:


Assignee: Marcel Reutegger

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 --

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram] 
 jackrabbit 1.0.1 
 jdk 1.4.2_12 
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more then 
 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack traces.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump: 
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=org.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=path value=${wsp.home}/index/
 param name=textFilterClasses
 value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 
 org.apache.jackrabbit.core.query.OpenOfficeTextFilter/
 param name=useCompoundFile value=true/
 param name=minMergeDocs value=100/
 param name=volatileIdleTime value=3/
 param name=maxMergeDocs value=10/
 param name=mergeFactor value=10/
 param name=bufferSize value=10/
 param name=cacheSize value=1000/
 param name=forceConsistencyCheck value=false/
 param name=autoRepair value=true/
 param name=respectDocumentOrder value=false/
 param name=analyzer
 value=org.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-550) ObservationManagerFactory) -

2006-08-29 Thread Marcel Reutegger (JIRA)
OutOfMemoryError when re-indexing the repository
In-Reply-To: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

[ 
http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12431236 ] 

Marcel Reutegger commented on JCR-550:
--

Your log files seem to indicate that some of your content is corrupt:

Caused by: java.lang.IllegalArgumentException: invalid QName literal
at org.apache.jackrabbit.name.QName.valueOf(QName.java:618)
at 
org.apache.jackrabbit.core.state.util.Serializer.deserialize(Serializer.java:124)
at 
org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager.load(ObjectPersistenceManager.java:206)
... 61 more


Please note that using the ObjectPersistenceManager on a production system is 
not recommended because it is not transactional. You should consider using 
DerbyPersistenceManager as your version storage.

 ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
 --

 Key: JCR-550
 URL: http://issues.apache.org/jira/browse/JCR-550
 Project: Jackrabbit
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.0.1
 Environment: tomcat 5.0 [256 up to 512 mb of ram] 
 jackrabbit 1.0.1 
 jdk 1.4.2_12 
 Intel Xeon 3.2GHz with 2Gb of memory
 
 poi-3.0-alpha2-20060616.jar
 poi-contrib-3.0-alpha2-20060616.jar
 poi-scratchpad-3.0-alpha2-20060616.jar
 jackrabbit-core-1.0.1.jar
 jackrabbit-index-filters-1.0.1.jar
 jackrabbit-jcr-commons-1.0.1.jar
 jcr-1.0.jar
 tm-extractors-0.4.jar
 lucene-1.4.3.jar
Reporter: Christian Zanata
 Assigned To: Marcel Reutegger
 Attachments: log_files.zip


 [ERROR] 20060825 17:06:40
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
 when we try to re-index a repository, the repository is quite big (more then 
 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
 As attach I put all the last logs we registered, with the full stack traces.
 Related to this whe have also errors with Lucene:
 [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
 - Dump: 
 java.io.IOException: Invalid header signature; read 8656037701166316554,
 expected -2226271756974174256
 at org.apache.jackrabbit.core.query.MsWordTextFilter
 and then this ones:
 [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
 removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
 [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
 Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
 not shut down properly.
 [ERROR] 20060803 09:33:14
 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
 Synchronous EventConsumer threw exception.
 java.lang.NullPointerException: null values not allowed
 this is our repository.xml configuration for indexing
 SearchIndex
 class=org.apache.jackrabbit.core.query.lucene.SearchIndex
 param name=path value=${wsp.home}/index/
 param name=textFilterClasses
 value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
 org.apache.jackrabbit.core.query.MsExcelTextFilter,
 org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
 org.apache.jackrabbit.core.query.MsWordTextFilter,
 org.apache.jackrabbit.core.query.PdfTextFilter,
 org.apache.jackrabbit.core.query.HTMLTextFilter,
 org.apache.jackrabbit.core.query.XMLTextFilter,
 org.apache.jackrabbit.core.query.RTFTextFilter,
 
 org.apache.jackrabbit.core.query.OpenOfficeTextFilter/
 param name=useCompoundFile value=true/
 param name=minMergeDocs value=100/
 param name=volatileIdleTime value=3/
 param name=maxMergeDocs value=10/
 param name=mergeFactor value=10/
 param name=bufferSize value=10/
 param name=cacheSize value=1000/
 param name=forceConsistencyCheck value=false/
 param name=autoRepair value=true/
 param name=respectDocumentOrder value=false/
 param name=analyzer
 value=org.apache.lucene.analysis.standard.StandardAnalyzer/
 /SearchIndex

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-482) DocViewSaxEventGenerator may generate non-NS-wellformed XML

2006-08-29 Thread Julian Reschke (JIRA)
[ 
http://issues.apache.org/jira/browse/JCR-482?page=comments#action_12431254 ] 

Julian Reschke commented on JCR-482:


Related to this, ExportDocViewTest.compareNamespaces() makes the assumption 
that *all* registered namespaces need to be serialized in the root element (and 
refers to 6.4.2.1 as justification). However, 6.4.2.1 only talks about the 
relevant declarations.

In any case, both the requirement in the spec and the test case should be 
relaxed to permit any serialization that produces a valid XML document: it 
should be left to the implementation when and where to include namespace 
declarations, as long as they the result document is namespace-wellformed.


 DocViewSaxEventGenerator may generate non-NS-wellformed XML
 ---

 Key: JCR-482
 URL: http://issues.apache.org/jira/browse/JCR-482
 Project: Jackrabbit
  Issue Type: Bug
  Components: xml
Affects Versions: 0.9, 1.0, 1.0.1
 Environment: n/a
Reporter: Julian Reschke
 Assigned To: Jukka Zitting
Priority: Minor
 Fix For: 1.1

 Attachments: JIRA-482.diff.txt


 The XML serialization code relies on the fact that all required prefix-to-uri 
 mappings are known beforehand (actually, when serializing the root node). So 
 there's an assumption that the permanent namespace registry will never change 
 during serialization, which may be incorrect when another client adds 
 namespace registrations while the XML export is in progress.
 To fix this, addNamespacePrefixes should ensure that namespace declarations 
 have been written for all prefixes used on the current node (node name + 
 properties), potentially going back to the namespace resolver when needed.
 (Should there be consensus for that change I'm happy to give it a try)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Backup Tool Refactored

2006-08-29 Thread Nicolas

Hi,

The GSOC is over and the backup tool need a little bit of refactoring before
being committed (see past threads). Here are the changes I plan to
implement.

- Add a method to import/export the node version histories in VersionManager
and implement them in its classes.
- Subclass PropInfo to avoid writing a custom method in the original class.
- Trailing spaces, comment and checkStyle on all new classes
- Check everything and send the patches.

The changes in the core would be limited to a new class
NodeVersionHistoriesUpdatableStateManager in
org.apache.jackrabbit.core.state and in the VersionManager and its various
implemented classes.

Those changes would be filed in a new JIRA issue as discussed previously
(for administrative reasons with Google).

BR,
Nico
my blog! http://www.deviant-abstraction.net !!