Re: [VOTE] Release Apache Jackrabbit 2.0.3
+1
Re: [VOTE] Release Apache Jackrabbit 2.1.2
+1 Regards, Thomas
Re: [jr3] Clustering: Scalable Writes / Asynchronous Change Merging
Hi, Network delay .. is faster than the delay of a disk I wrote the network is the new disk (in terms of bottleneck, in terms of performance problem). Network delay may be a bit faster now than disk access. But it's *still* a huge bottleneck (compared to in-memory operations), specially if cluster nodes are far apart. If you have the complete repository in-memory, but for each operation you need network access, then the network is the bottleneck. And I don't want that to be the bottleneck. I know I repeat myself, but it looks like this was not clear. importance of leveraging in-memory storage In-memory storage *alone* is fast. But if used in combination with the current clustering architecture, then writes will not scale. They will be just be a bit faster (until you reach the network delay wall). What is then the reason for asynchronous change merging, if not for performance? Where did I say it's not about performance? As I already wrote: it's about how to manage cluster nodes that are relatively far apart. observation listeners would not always get notified in the same order Regular observation listeners are not necessarily the problem, we could just delay them until the pre-defined sync delay (until things are in sync). The problem are *synchronous* event listeners (as I already wrote). The JCR API doesn't actually define them as far as I know. Regards, Thomas
Re: [jr3] Clustering: Scalable Writes / Asynchronous Change Merging
Hi, See section 7 Vector Time. Also see [1] from slide 14 onwards for a more approachable reference. [1] http://www.cambridge.org/resources/0521876346/6334_Chapter3.pdf Thanks! From what I read so far it sounds like my idea is called Time Warp / Virtual Time. On page 10 and 11 there is the notion of Total Ordering: The main problem in totally ordering events is that two or more events at different processes may have identical timestamp. - A tie-breaking mechanism is needed to order such events. - Process identifiers are linearly ordered and tie among events with identical scalar timestamp is broken on the basis of their process identifiers. This is what I meant with + clusterNodeId Vector time and Matrix time: I think it would need too much memory - if the dimension of vector clocks is the number of cluster nodes, then the number of dimensions would change whenever you add a cluster node. Time Warp matches my suggestion: Page 33 says Time Warp relies on the general lookahead-rollback mechanism where each process executes without regard to other processes having synchronization conflicts. - it sounds like my proposal (I specially like the term Time Warp). If a conflict is discovered, the offending processes are rolled back to the time just before the conflict and executed forward along the revised path. Virtual time is implemented a collection of several loosely synchronized local virtual clocks. Regards, Thomas
Re: [jr3] Clustering: Scalable Writes / Asynchronous Change Merging
Hi, Let's discuss partitioning / sharding in another thread. Asynchronous change merging is not about how to manage huge repositories (for that you need partitioning / sharding), it's about how to manage cluster nodes that are relatively far apart. I'm not sure if this is the default use case for Jackrabbit. Traditionally, asynchronous change merging (synchronizing) is only used if the subsystems are offline for some time, or if there is a noticeable networking delay between them, for example if cluster nodes are in different countries. But I don't want that the network is the new disk (in terms of bottleneck, in terms of performance problem). Networking delay may be the bottleneck even if cluster nodes are in the same room, specially when you keep the whole repository in memory, or use SSDs. Also, computers get more and more cores, and at some point message passing is more efficient than locking. Asynchronous operation is bad for reservation systems, banking applications, or if you can't guarantee sticky sessions. Here you need synchronous operations or at least locking. If you want to support both cases in the same repository, you could use virtual repositories (which are also good for partitioning / sharding). My proposal is for Jackrabbit 3 only. In the extreme case, the asynchronous change merger might very well be a separate thread and use little more than the JCR API. Therefore asynchronous change merging should have very little or no effect on performance if it is not used. On the other hand, replication should likely be in the persistence layer. I think the persistence API should be synchronous as it is now. We could also use normal UUIDs or SHA1 hashes of the serialized change sets That's an option, but lookup by node id and time must be efficient. UUIDs / secure hashes are not that space efficient (that might not be the problem). We see from Jackrabbit that indexing random data (UUIDs) is extremely bad for cache locality and index efficiency, but if indexing is done by time then that's also not a problem. The algorithm I propose is sensitive to configuration changes, but you only need to change the formula when going from max 256 cluster nodes to more than 256 cluster nodes (for example). And you need a unique cluster id. But I don't think that's the problem. we could leverage a virtual time algorithm I read the paper, but I don't actually understand how to implement it. We'll probably need some mechanism for making the content of conflicting changes available for clients to review event if the merge algorithm chooses to discard them. If we leave it up to the client to decide what to do, then things might more easily run out of sync. But in any case there might be problems, for example synchronous event listeners might get a different order of events in different cluster nodes (possibly even different events). Probably it would make sense to add some kind of offline comparison / sync feature, similar to rsync. Actually that could be useful even for Jackrabbit 2. Regards, Thomas
[jr3] Clustering: Scalable Writes / Asynchronous Change Merging
The current Jackrabbit clustering doesn't scale well for writes because all cluster nodes use the same persistent storage. Even if persistence storage is clustered, the cluster journal relies on changes being immediately visible in all nodes. That means Jackrabbit clustering can scale well for reads, however it can't scale well for writes. This is a property Jackrabbit clustering shares with most clustering solutions for relational databases. Still, it would make sense to solve this problem for Jackrabbit 3. == Current Jackrabbit Clustering == [Cluster Node 1] -- | Shared [Cluster Node 2] -- | Storage I propose a different architecture in Jackrabbit 3: == Jackrabbit 3 Clustering == [Cluster Node 1] -- [ Local Storage ] [Cluster Node 2] -- [ Local Storage ] Please note that shared node storage is still supported for things like the data store, but no longer required or supported for the persistent storage (currently called PersistenceManager). Instead, the cluster nodes should merge each others changes asynchronously (except operations like JCR locking, plus potentially other operations that are not that common; maybe even node move). With asynchronously I mean usually within a second or so, but potentially minutes later depending on configuration, latency between cluster nodes, and possibly load. Similar to NoSQL systems. == Unique Change Set Ids == For my idea to work, we need globally unique change set ids. Each change set is stored in the event journal, and can be retrieved later and sent to other cluster nodes. I suggest that events are grouped into change sets so that all events within the same session.save() operation have the same change set id. We could also call it transaction id (I don't mind). Change set ids need to be unique across all cluster nodes. That means, the change set id could be: changeSetId = nanosecondsSince1970 * totalClusterNodes + clusterNodeId Let's say if you have 2 cluster nodes currently and expect to add a few more later (up to 10), you could use the formula: changeSetId = nanosecondsSince1970 * 10 + clusterNodeId To support more than 10 cluster nodes the formula would need to be changed (that could be done at runtime). It doesn't necessarily need to be this formula, but the change set id should represent the time when the change occurred, and it should be unique. == How to Merge Changes == Changes need to be merged so that all cluster nodes end up with the same data (you could call this eventually consistent). New changes are not problematic can be applied directly. This includes local changes of course, because the change set id of local changes is always newer than the last change. Changes with change set ids in the future are delayed. Cluster nodes should have reasonably synchronized clocks (it doesn't need to be completely exact, but it should be reasonably accurate, so that such delayed events are not that common). So the only tricky thing are changes that happened in the past, in another cluster node, if the same data was changed in this cluster node (or another cluster node) afterwards (afterwards mean with a higher change set id). To find out that a change happened in the past, each node needs to at least know the change set id of the last change. There are multiple solutions: == Solution A: Node Granularity, Ignore Old Changes == Here, each node only need to know when it was changed the last time. If the change set id is older than that, changes to its properties and child node list are ignored. That means, if two cluster nodes concurrently change data in a node, the newer change wins, and the older change is lost. This is a bit problematic for example when concurrently adding child nodes: Only the added child node of the newer change survives, which is probably unexpected. == Solution B: Merge Old Changes == Here, we need an efficient way to load the list of changes (events) to a node since a certain time. Now, when merging a change, the old versions of the node need to be loaded or re-constructed, and then the old change needs to be applied as if it already happened before the newer change. Let's say we know about the two versions: v1: node a; child nodes b, c, d; properties x=1, y=2 event t9: add child node e, set property x=2, remove property y v9: node a; child nodes b, c, d, e; properties x=2 The change to merge happend in the past: event t3: add child node f, remove child node b, set property y=3, remove property x, set property z=1 Now the result would be: v9(new): node a, child nodes c, d, e, f; properties x=2, z=1 There are other ways to merge the changes of course (for example, only merge added child / removed child nodes). I think there are some tricky problems, however I think it's relatively easy to ensure the algorithm is correct using a few randomized test cases. No matter what the merge rules are, they would need to be constructed so that at the end of the day, each cluster node would end up with the exact same
Re: Concurrent Write issues with Jackrabbit
Hi, Are you sure the problem is concurrency and not performance? Are you sure that the persistence manager you use does support higher write throughput? What persistence manager do you use, and what is the write throughput you see, and what do you need? Regards, Thomas
Re: Concurrent Write issues with Jackrabbit
Hi, Do you use Day CRX / CQ? If yes, I suggest to use the Day support. Regards, Thomas
Re: Concurrent Write issues with Jackrabbit
Hi, What I am getting here is that writes will be serialized due to a Single Write lock For scalability, you also need scalable hardware. Just using multiple threads will not improve performance if all the data is then stored on the same disk. Regards, Thomas
Re: Diagram of Jackrabbit remoting options
Hi, It looks good to me too (but I'm not an expert in this area). i don't fully understand the distinction between 'Component' and 'Shared Code', but apart from that, looks good to me. I also don't understand the difference. Regards, Thomas
Re: bit by JCR-1558 - what are valid workarounds?
Hi, I am not completely sure if that solves the problem, but could you try sharing the repository-level FileSystem between cluster nodes? That means, configure the first FileSystem entry in the repository.xml (the one directly within the Repository element; not the one within the Workspace element) in each cluster node to that it points to the same file system? I think clustering documentation at http://wiki.apache.org/jackrabbit/Clustering is incorrect here. But I didn't try it myself, so I'm not completely sure it will work. If it doesn't work, please post the Jackrabbit version you use, the configuration, and the file listing (of both the shared and the local files). If it does work, please tell us as well (I will then update the documentation). Regards, Thomas
Re: [jr3] Security through obscurity
well, i don't ;) i don't think that a proper oo design will necessarily be overly complex. Having everything convoluted just for the sake of avoiding public implementation methods is completely unrelated to proper OO design. It may be your understanding of proper OO design, but it's definitely not mine. Anyway, let's see how it goes. Could those who suggest to get rid of the public implementation methods please submit a patch for the Jackrabbit 3 prototype? We can discuss from there. Regards, Thomas
Re: [jr3] Security through obscurity
Hi, I'm sorry about the tone of my mails. I just want to avoid that we run into the trap of making Jackrabbit 3 much too complicated and complex for the sake of being modular. I agree there shouldn't be many public implementation methods, but what I don't want to do is add additional glue classes to avoid them. Adding complexity to conceal bad design, and then call that good design. I would rather have some public methods, if the overall design is simpler, than that added indirection. This is not just about public methods. It's also about splitting Jackrabbit 3 into multiple projects. In my view, we should keep it one project, and one jar file, at least for now. I believe there are far too many projects and jar files in Jackrabbit 2. Regards, Thomas
[jr3] Security through obscurity
Do we want have public methods in the Jackrabbit 3 implementation that can possibly be misused (if somebody casts to an implementation class)? See the discussion at https://issues.apache.org/jira/browse/JCR-2640 The advantage of not having public classes: people can't cast implementation classes and call such methods. Is this really a problem? People should use the JCR API - they are not supposed to cast to implementation classes. The disadvantages are: it massively complicates developing Jackrabbit 3. It complicates understanding the source code. It potentially slows down performance. It needs more memory (potentially a lot, for example for cached objects such a NodeImpl). It's probably not always possible to follow this rule. It doesn't solve the problem (people can still modify the source code of Jackrabbit, or they can call setAccessible(true)). Wikipedia currently defines security through obscurity as follows: a principle in security engineering, which attempts to use secrecy (of design, implementation, etc.) to provide security. In my view this is such a case. Examples of embedded repositories or databases a) that need more than one package and use this no public methods approach: - I don't know any. Are there any such projects? b) that do have public methods: - All open source Java databases I know (Apache Derby, HSQLDB, H2) - Hibernate (well probably most projects) - I'm sure there are many cases in Sun JDK and JRE, for example xml packages, javac, javadoc, almost everywhere where interfaces and implementation are distinct and multiple packages are used.
Re: [jr3] Security through obscurity
Not exposing implementation details through public API is a basic OO design principle. i think with a proper design and packaging, this will not be a problem. I don't think you talk about the same thing here. Proper OO is using interfaces, and not casting to implementation classes. For example, constructors. Those need to be public if you want to construct a new object in a different package. How do you create a org.apache.jackrabbit.j3.nodetype.NodeTypeManagerImpl from a different package, say, org.apache.jackrabbit.j3.SessionImpl, without public constructor or public method? Maybe there is a way to do that. For you it may even be a proper design, modular or whatever. Like what Jukka just made (adding an indirection class). For me, that's plain confusing, overly complex, and bad (it's security through obscurity). The direct way (having a public constructor) is the best solution. That was just an example. There are many other cases, for example org.apache.jackrabbit.j3.NodeImpl.doAddLock(..). Regards, Thomas
Re: [jr3] Security through obscurity
Hi, If you think it's proper OO and such, could you please provide *one* example of a larger project that does *not* have public implementation methods? Regards, Thomas
Re: [jr3] Security through obscurity
Hi, I completely agree with Justin. Package-protected I think it does have it's use, but for more complex products it's just not enough. Somewhat related: in Java 1.0.2 you could use private protected: http://www.jguru.com/faq/view.jsp?EID=15576 Security For real security, you either need remoting, or a SecurityManager. Regards, Thomas
Re: [jr3] Jackrabbit 3 in trunk
Hi, objectives of the jr3 project is to deliver better performance than jr2 on scalability, concurrency, latency, etc., it would be helpful to have an automated stress test framework That's true. There are already a few such test cases, but more are required. Patches are welcome of course :-) However I fear most people will ignore this prototype unless it is actually usable. That's why I think adding features is important as well. This doesn't mean the prototype needs to pass the TCK, but at least the basic operations should work as expected. It's easier to fix deep architectural issues before a bunch of code has been written around the architecture, so the priority should be to have code that breaks the architecture (highlighting the weak points) before having code that uses the architecture (highlighting the strong points). In other words, the architecture needs to be correct before adding features. That's true. Probably clustering should be added before versioning, because clustering has a higher impact on the architecture. Regards, Thomas
Re: [jr3] Jackrabbit 3 in trunk
Hi, My suggestion (admittedly as a bystander) would be that the sooner people can start breaking it, the sooner it can get fixed, so prioritize activities based on first getting it to the point of breakability (rather than usability), and then merge. Sorry, I don't understand what you mean exactly... Could you give an example? Regards, Thomas
Re: FYI: Moving session-related classes to o.a.j.core.session
Hi, I'm not sure if this will help more than it will complicate things. Disadvantages: - Isn't almost every class at in o.a.j.core at least somewhat session related? - If you move classes to other packages, you will have to make many method public. Instead of moving session related classes to a separate package, what about moving unrelated classes to different packages? For example TestContentLoader (test), RepositoryCopier (utilities), SearchManager (search), NodeTypeInstanceHandler (nodetype), RepositoryChecker (persistence), UserPerWorkspaceSecurityManager (security), DefaultSecurityManager (security), ItemValidator (nodetype). Regards, Thomas On Mon, May 17, 2010 at 10:43 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, As a part of my work on JCR-890, I'm planning to move most of the session-related classes from o.a.j.core to a new o.a.j.core.session package. This will make it easier to review and control related dependencies and code paths, and to ultimately guard them against access from concurrent threads. As the first step I'm simply moving the relevant classes and making the minor dependency changes where needed, so the functional risk should be low. However, the moves will likely invalidate many other pending jackrabbit-core changes, so please let me know if you have pending changes that I should wait for before I move these classes. Unless there's a need to wait, I'm planning to commit the changes in the afternoon today. BR, Jukka Zitting
Re: FYI: Moving session-related classes to o.a.j.core.session
Hi, These unrelated classes are mostly things like RepositoryImpl, TransientRepository, RepositoryCopier, etc. to which many external codebases are linking, so we can't move them. SessionImpl is used in my applications as well. RepositoryImpl, TransientRepository I don't think those should be or need to be moved. Regards, Thomas
Re: FYI: Moving session-related classes to o.a.j.core.session
Hi, As far as I understand, you want to move the classes so we can add checkstyle / PMD constraints, and more easily ensure every method call from an external class is synchronized. I think that's fine. Having the 'proxy' classes sounds like a solution for the backward compatibility concerns (not the perfect solution, but a good solution for Jackrabbit 2). For Jackrabbit 3 I hope people will not directly cast to implementation classes any longer. Regards, Thomas
Re: [jr3] Jackrabbit 3 in trunk
Hi, So far the prototype is not yet usable, meaning too many features are missing, tools are missing, documentation is missing. I guess this needs to be fixed first, so that it becomes somewhat usable (even with limited functionality). We also need to find out how / where exactly we want to add it in the trunk. Regards, Thomas
Re: Moving backwards compatibility tests to trunk
Hi An alternative is: download the old Jackrabbit jar files when running the tests (download the jar files dynamically when required, for example to the target directory), and then load them using a custom class loader, or create the old repository in a separate process. While this is currently not required, it would be more flexible (can support very large repositories, and comparing against many versions of Jackrabbit). The same approach can be used by migration tools (migrate a repository from any old version of Jackrabbit to a new version). It's just an idea (I don't have plans to implement this myself currently). But I do have some source code: http://code.google.com/p/h2database/source/browse/trunk/h2/src/tools/org/h2/dev/util/Migrate.java - this standalone class migrates from an old database version to a new version. Regards, Thomas
[jr3] Additional Jackrabbit interfaces in org.apache.jackrabbit.api
There are a few interfaces that might be interesting for all users of Jackrabbit. Those should be in the api package (not only for OSGi). The most important is probably: org.apache.jackrabbit.core.observation.SynchronousEventListener What about 'officially' supporting it, and moving it to org.apache.jackrabbit.api? For example to org.apache.jackrabbit.api.observation.SynchronousEventListener Another related interface that currently doesn't exist is: org.apache.jackrabbit.api.observation.ClusterEvent with a method isExternal() so that you can find out if an event originated from this or another cluster node (because in some cases you only want to handle an event in one cluster node, not in all of them). Maybe we would additionally need ClusterAwareEventListener and ClusterAwareEventJournal (to avoid having to cast). What do you think? Other interfaces? This is mainly for Jackrabbit 3, but we might start supporting it within Jackrabbit 2.x as well if needed. Regards, Thomas
Re: [jr3] Additional Jackrabbit interfaces in org.apache.jackrabbit.api
Hi, org.apache.jackrabbit.api.observation.JackrabbitEvent ? You are right, I didn't see that... sorry... JackrabbitEvent already has isExternal, so forget about ClusterEvent. org.apache.jackrabbit.api.observation.ExtendedEvent I can't find this one. Regards, Thomas
Re: [jr3] Node Identifiers / Corresponding Nodes
Hi, I'm wondering if the Jackrabbit 3 should support storage backends that use the path as the identifier. It's probably possible (with some limitations), but I'm not sure if it's necessary. I'm sure it's inefficient, but sometimes that's not a problem. What do others think? If we want to support it, we should decide that early on. Regards, Thomas
Re: [jr3] Node Identifiers / Corresponding Nodes
Hi, I agree, we should concentrate on few backends. I think there are at least two: - database (what we have now, default) - in-memory (for testing only) Still I will check what it takes to support path based node ids. Currently I think it will only take one additional parameter in one method (StorageSession.newNodeId(..., Val relPath), but not sure. Let's see. Regards, Thomas
[jr3] Node Identifiers / Corresponding Nodes
== Node Identifier Format == Jackrabbit node ids are currently UUIDs. For Jackrabbit 3, I think that embedded storage mechanisms should use a long sequence instead. Advantages of sequences: faster to generate (nodeId = nextId++); faster index lookup (nodes generated at around the same time have similar ids, which improves index efficiency); needs less space (specially when using a variable size format; see [1]). Advantages of UUIDs: allows distributed creating of nodes. That's why the Jackrabbit 3 data format should support UUIDs as node ids: for cloud storage mechanisms. == JCR Node Identifier versus Internal Unique Node ID == The JCR API requires that corresponding nodes of different workspaces have the same JCR identifier. The current Jackrabbit stores each workspace separately, so that's not a problem. With Jackrabbit 3, I would like to combine the storage of all workspaces. The problem is that JCR node identifiers can no longer be equal the internal unique node id. For efficient storage, the internal node id should be the combination of the workspace id and the JCR node identifier. One solution is: long internalUniqueNodeId = (workspaceId 40) + jcrNodeIdentifier. The problem is: node ids in workspaces other than workspace #0 need quite a lot of space when using a variable size format. My proposal is: store the workspace id at the end of the JCR node identifier, using a variable size format (see [1]). I think in most cases there is only 1 workspace (workspace #0). The second important case is fewer than 16 workspaces. I suggest to support the following 4 cases: * workspace #0: the node ids end with bit 0: internalUniqueNodeId = jcrNodeIdentifier 1 * workspaces #1-#15: node ids end with the bits 01: (jcrNodeIdentifier 6) + (workspaceId 2) + 1 * workspaces #16-#2047: node ids end with 011: (jcrNodeIdentifier 14) + (workspaceId 3) + 3 * workspace #2048-#268'435'455: ids end with 0111: (jcrNodeIdentifier 32) + (workspaceId 4) + 7 * workspace #268'435'455 and larger are not supported. What do you think, do those constants make sense? [1] The variable size int / long formats are used in various open source projects such as Apache Lucene, SQLite, H2 Database Engine, Google Protocol Buffers. It is somewhat similar to UTF-8 encoding. See also: http://code.google.com/p/h2database/source/browse/trunk/h2/src/main/org/h2/store/Data.java#989 http://en.wikipedia.org/wiki/Golomb_coding == Node Without ID == The Jackrabbit 3 data format should support storing nodes embedded within the parent node. The advantage is: such embedded nodes would be stored next to each other, possibly improving read performance, and maybe reducing storage space (both needs to be tested). The identifier of such embedded nodes would be unique, but not stable. Regards, Thomas
Re: Jackrabbit 1.6.0 Write Performance
Hi, - The jackrabbit repository is accessed from our app using RMI. Can you use the repository in embedded mode? That would help a lot. embedded Derby database We've tested using postgres I would test the H2 database if you have time. Regards, Thomas
Re: Jackrabbit 1.6.0 Write Performance
Hi, With regard to concurrency, are there any plans for jackrabbit to support concurrency out of the box? If you use one session for each thread then it should already work. It's a bug if it doesn't. In any case I would use one session per thread, no matter if a future version of Jackrabbit supports it or not. Regards, Thomas
Re: clustered environment, 2 different jvms, TransientFileFactory, storing file blobs in db
Hi, Stefan is right, File.createTempFile() doesn't generate colliding files. However, there is a potential problem with the TransientFileFactory. Consider the following case: - The file bin-1.tmp is created (BLOBInTempFile line 51). - The TransientFileFactory adds a PhantomReference A in its queue. - BLOBInTempFile.delete() or dispose() is called, the file bin-1.tmp is deleted. - A new file is created, and also called bin-1.tmp is created (BLOBInTempFile line 51) (that's possible because File.createTempFile can re-use file names). - The TransientFileFactory adds a second PhantomReference B in its queue, pointing to a different file with the same name. - The first (only the first) BLOBInTempFile is no longer referenced. - The TransientFileFactory.ReaperThread gets PhantomReference A and deletes this file. But the file is still used and referenced (B). I'm not sure if this is what is happening in your case, but it is a potential problem. Could you log a bug? There are multiple ways to solve the problem. I think the best solution is to not use File.createTempFile() and instead use our own file name factory (with a random part, and an counter part). Regards, Thomas
Re: [jr3] MicroKernel prototype
Hi, it's too early IMO to judge whether a caching hierarchy manager is needed or not... IMO the only statement that can be made based on your comparison is that if the prototype with very limited functionality were slower than jackrabbit with a fully implemented feature set, the protoype's architecture would probably need to be reconsidered ;) I agree. - security - locking - scalability (number of concurrent sessions and repository size) - transactions OK, I will then try to implement (prototype) those features now. very flat hierarchies Yes. We do want to solve that, it will affect the architecture, and we don't have much experience yet how to best solve it. So I guess it's also one of the features that should be implemented early. Regards, Thomas
Re: [jr3] MicroKernel prototype
Hi, i doubt that the results of this comparison is any way significant. It was not supposed to be a fair comparison :-) Of course the prototype doesn't implement all features. For example path are parsed in a very simplistic way. I don't think the end result will be as fast as the prototype. Still, I do hope that the missing features will not slow down the code significantly if they are not used. And if they are used, the penalty shouldn't be too high. What is significant is: the prototype is not slower than the full Jackrabbit, even without the CachingHierarchyManager. For me that's relatively important because it would simplify the architecture. More tests are required to check if the current architecture works well even if there are millions of nodes and many concurrent sessions. And it's important to add more features of course. I'm wondering what is the *most* problematic features to verify the architecture: - security - orderable child nodes - same name siblings - locking - transactions - clustering - observation - workspaces - node types - large number of child nodes - search - correct path parsing and lookup - multiple sessions cut some features to gian performance improvement. I'm not sure. What features could be cut? Regards, Thomas
Re: [jr3] MicroKernel prototype
Hi, I have some early performance test results: There is a test with 3 levels of child nodes (each node 20 children) (TestSimple.createReadNodes). With the JDBC storage and the H2 database, this is about 14 times faster than the Jackrabbit 2.0 trunk (0.2 seconds versus 2.9 seconds for Jackrabbit 2.0). This is after 3 test runs. The storage space usage is about 1/3 (2.8 MB for the prototype versus 9.5 MB for Jackrabbit 2.0). Regards, Thomas
[jr3] Store journal as nodes
Currently the journal (cluster journal and event journal) is stored using a separate storage mechanism. I think it should be stored using the 'normal' storage mechanism. Advantages: - Simplifies the architecture (specially for clustering) - Events and node data are in the same transaction, which improves reliability and performance Regards, Thomas
Re: [jr3] Store journal as nodes
Hi, (except logging Yes, I think SLF4J is fine and configuration, probably Some information need to be available when the repository is constructed, or at the latest when logging in: What storage backend to use, and how to connect to the storage backend. The rest of the configuration (fulltext index configuration for example, workspace names, security, data store configuration, cluster configuration, node type registry) should be in the repository (as system nodes) in the normal case. This is to simplify the system and to make configuration changes transactional. There may be a ways to override that (for example when constructing the repository object), but that should be the exception. I think it doesn't make sense to keep the xml configuration files. What do you mean by 'normal' storage mechanism ? I mean the data should be stored in the same place as the node data. Unless we find it is a performance problem, I would try to store the events as node bundles of some kind (possibly multiple events plus regular nodes in the same bundle). For the micro kernel it could look exactly like a normal node. Is it nodes and properties, in which case I fear further performance issues in this area. If it does turn out to be a performance problem, we will change it of course. Regards, Thomas
Re: [jr3] Store journal as nodes
Hi, In case of cluster db journal, the hostname of db connection. The hostname of the database (if a database is used) and the database name needs to be known when creating the repository object. Storing it in a 'repository.xml' file is possible, but it's just an unnecessary indirection. If you keep this information in the repository.xml file, where do you store the path of the repository.xml file? If the user name and password need to be protected (not stored as plain text) how do you do that? Using yet another indirection (JNDI)? I suggest to pass the database URL (or whatever storage you use) when creating the repository object. Example (using a helper method; just an example): RepositoryFactoryImpl.openRepository(jdbc:postgresql:repo, user, password); If you want to use a repository.xml file (that only contains the database connection information) you can of course. But do you really need an XML file for the database URL, the user name, and the password? Specially if the user name and password are things that normally should not be stored in a file? Speaking about databases: do you know of a database where you need to store the location of the database files in an XML file? I guess there are some databases where you *can* do that, but I don't know any where you *have to*. Configuration should be editable without boot the repository. Why? Again, for db store, if db host changes after repository shutdown, we should be able to config the repository to use a different db host. Like we can change in current repository.xml. The current repository.xml file contains much more than just the database connection settings. It contains the search index configuration (or at least part of it), file system configuration, cluster configuration, data store configuration, security configuration, workspace configuration (for some the version store), etc. All that, except for the database connection settings, can be stored in the repository itself. Because it simplifies things. It's a feature of some application server to manage cluster configurations. I don't see a problem here. They can. I would prefer leave the complicity out of default standalone deployment. I like to keep things as simple as possible. The repository.xml and workspace.xml files are not required; they actually make things more complicated than necessary (specially, but not only, when clustering is used). Regards, Thomas
Re: [jr3] Synchronized sessions
Hi, consistency. I don't know of a relational database that allows you to violate referential integrity, unique key constraints, or check constraints - simply by using the same connection in multiple threads. jcr repository should have some point to do the constraints check as well. Should fail the operation if conflict found. The easiest way to achieve internal integrity is to synchronize on an object. Synchronizing on the session object is much easier, and costs much less, than 1) allowing to corrupt internal states, 2) but then somehow detect the corruption 3) and then trying to fix such problems later on. Performance is not major concern, it's the design. For me, performance _is_ a major concern. But reliability is more important. Synchronisation should be limited and should be applied to low level where necessary There is an overhead for each synchronize. If you synchronize on a very low level, the cost potentially higher than if you synchronize on a higher level. Because you have to synchronize more. Please note scalability doesn't apply in this context: if you want to do stuff concurrently, then use multiple sessions. instead blindly on session for everything. I don't suggest to synchronize blindly. I suggest to synchronize with open eyes :-) Sync on session level could increase the deadlock as well. No, the opposite: if every method is synchronized on the same object, it will decrease the probability of deadlocks. Regards, Thomas
Re: [jr3] support transaction log
Hi, It may slow down writes around 50%. I think it should be an optional feature (some storage backends may not support it at all, and there should be a way to disable / enable it for those backends that can support it). I think we should support writing our own transaction log even when using relational databases, but I guess it should be possible to switch that off. Regards, Thomas
Re: [jr3] Micro-kernel vs. new persistence
Hi, I think the persistence / storage API should be generic enough to support at least 3 different implementations efficiently: - an implementation based on a relational database - a file based implementation - in-memory I think the storage API should support some kind of storage session (normally one storage session for each JCR session). For a relational database, such a session could map to a database connection. In my view the persistence should be based on bundles (node with all property values and with the list of child nodes) as it is now; maybe there should be a way to combine multiple bundles into one. Probably there should be a way to persist the transient space (only when it doesn't fit in memory). Otherwise we would need to implement a separate mechanism for that (using temporary files). I think the data store API can be used as it is (maybe we can simplify it a bit). Regards, Thomas
Re: [jr3] Delayed Repository Initialization
Hi, I am not clear what credentials you are refering to I refer to the database user name and password that are currently stored in the repository.xml (except when using JNDI): http://jackrabbit.apache.org/api/1.5/org/apache/jackrabbit/core/persistence/bundle/BundleDbPersistenceManager.html # param name=user value=/ # param name=password value=/ and how current jackrabbit works with backend login Currently, Jackrabbit requires to be able to create a database connection when initializing. If it's related to storage backend, it need always store on repository level. It depends on what you mean with repository level. It doesn't make sense to store the user name and password of the database inside the database (I hope you agree :-) I would like to make repository.xml optional. To do that, the user name and password for the database need to be stored somewhere else. One solution is to provide them when creating the repository object. Example: String factoryClass = ...; String url = ...?user=sapassword=xyz; RepositoryFactory factory = (RepositoryFactory) Class.forName(factoryClass).newInstance(); MapString, String parameters = new HashMapString, String(); parameters.put(url, url); Repository rep = factory.getRepository(parameters); In this case the user name and password are included in the repository URL. This solution is almost what we have how (except there is no repository.xml). What I propose is: Jackrabbit should support the following use case as well: String factoryClass = ...; String url = ...; RepositoryFactory factory = (RepositoryFactory) Class.forName(factoryClass).newInstance(); MapString, String parameters = new HashMapString, String(); parameters.put(url, url); Repository rep = factory.getRepository(parameters); Session session = rep.login(new SimpleCredentials(sa, xyz.toCharArray())); Here, the user name and password of the storage backend (for example a relational database) are not included in the repository URL. Instead, they are supplied in the first session that logs into the repository. Currently this use case is not supported. I suggest that Jackrabbit 3 support this as a possible use case (not necessarily as the default use case). Unless we designed to map jcr session user to jdbc user. Not necessarily. The Delayed Repository Initialization is not related to how Jackrabbit works internally. Jackrabbit might still use only one JDBC connection for the whole repository. Or it might use a JDBC connection pool. Or it might use one JDBC connection per session. Regards, Thomas
Re: [jr3] Delayed Repository Initialization
Hi, Currently Jackrabbit doesn't support relayed initialization. Unless I misunderstood Felix, he would also like to get rid of this restriction. Just to clarify: my suggestion is *not* about requiring the repository is initialized when the first session is opened. It's also *not* about requiring that the JCR credentials are used to login to the backend storage (in most cases that's not a good idea). This idea is about *allowing* delayed repository initialization. The examples I gave are just for illustration and show *one* possible use case. Couldn't this be done by a special wrapping Repository implementation? That's problematic. Such a wrapper would have quite some overhead. The JCR API is not easily wrapable if you want to do it correctly: you would have to wrap almost every JCR interface and method, including Node and Property. That would be a relatively large memory overhead. You could use the Java proxy mechanism, but that is relatively slow (uses reflection). Regards, Thomas
[jr3] Delayed Repository Initialization
Currently Jackrabbit initializes the repository storage (persistence manager) when creating the repository object. If the repository data is stored in relational database, then the database connection is opened at that time. I suggest to allow delayed initialization (allow, not require). For some storage backends, the repository could initialize when opening the first session. Example: 1) String url = jdbc:...; RepositoryFactory factory = (RepositoryFactory) Class.forName(factoryClass).newInstance(); MapString, String parameters = new HashMapString, String(); parameters.put(url, url); Repository rep = factory.getRepository(parameters); 2) String user = ..., password = ...; Session session = rep.login(new SimpleCredentials(user, password.toCharArray())); This example uses a relational database as the storage. When creating the repository object, user name and password are unknown, so the repository could not initialize at that time. Only when the first user logs in, the user name and password are known. In this case, the user name and password of the session would match the user name and password of the storage backend, but that's actually not a requirement (it's just an example). The current Jackrabbit architecture doesn't support this 'delayed initialization' use case yet. I suggest that Jackrabbit 3 should support such delayed initialization. Whether or not we will implement storage backends that actually do use this mechanism is another question. Regards, Thomas
[jr3] Exception Handling
For Jackrabbit 3, I would like to improve exception handling. Some ideas: == Use Error Codes == Currently exception message are hardcoded (in English). When using error codes, exception messages could be translated. I'm not saying we should translate them ourselves, but if somebody wants to, he could. Disadvantage: it's more work to maintain, specially if Jackrabbit is split into multiple projects. Every project could have it's own message list, or the list could be centralized. I'm not sure if it's worth it. What do you think? == Include the Jackrabbit Version in Exceptions == This is mainly to simplify support: it's very easy to say what version was used when somebody posts an exception message. Example: Repository is closed [1000-3.0.1003] - this would mean error code 1000, Jackrabbit version 3.0, build 1003. The build number alone would be enough, but for the user it may be better to also include the version. Also, it will allow looking at the source code without having to download the source code of the correct version, even without having to install an IDE. I wrote a simple JavaScript application: http://www.h2database.com/html/sourceError.html - if you paste an exception in the 'Input' text area, it will link to the source code and display additional information. The source code is in a IFrame that links to the right tag in the source repository. For example, if you paste the following exception: Syntax error in SQL statement SELECT * FORM[*] TEST [42000-130] at org.h2.message.DbException.getJdbcSQLException(DbException.java:317) at org.h2.message.DbException.get(DbException.java:168) at org.h2.message.DbException.get(DbException.java:145) at org.h2.message.DbException.getSyntaxError(DbException.java:180) at org.h2.command.Parser.getSyntaxError(Parser.java:475) You will be able to browse the source code in the Source Code frame. If Jackrabbit is split into multiple projects, there would be multiple versions. There are solutions for this, but as a start, it's easier to just use this mechanism in one project only (Jackrabbit Core). Regards, Thomas
Re: [jr3] Delayed Repository Initialization
Hi, I would prefer to initialise the repository at first place and make sure everything is correctly for repository I wrote: *allow* delayed initialization (allow, not require). If user want delay the initialisation, may create the repository reference only when first accessed. If the credentials are included in the repository configuration (currently they always are; they have to be) then it's of course possible to initialize when the repository object is created. The question is: should Jackrabbit 3 *require* (like now) that the credentials for the storage are included in the repository configuration? I think for some storage backends it should not require that. Instead (only in those cases), it should initialize the repository when the first session is opened. Regards, Thomas
Re: [jr3] Synchronized sessions
Hi, jdbc connection is not thread safe. jcr session works similar way and I prefer follow the same pattern. Me too. But there is a difference between thread safety and consistency. I don't know of a relational database that allows you to violate referential integrity, unique key constraints, or check constraints - simply by using the same connection in multiple threads. See also http://en.wikipedia.org/wiki/ACID Jackrabbit did and does have such problems (nodes that point to non-existing parent nodes; nodes that point to non-existing child nodes). *Those* are the problems I want to solve. Jackrabbit shouldn't try to protect an application from storing the wrong data. It can't. Application developers are responsible for ensuring application level consistency (this sentence stolen from Wikipedia). To what avail? It should never be necessary to run a consistency check or consistency fix. It should never be necessary to delete nodes because they are corrupt. Nodes should never get corrupt. programmers ... If they do not, it is their fault and they have to live with the consequences of their doing the wrong thing. Unfortunately, it's not that simple to find out whose program caused the problem. Usually other people have to fix the problem than those who caused them. But not with synchronizing all methods. As I already wrote, if this does turn out to be a performance problem, we can remove synchronization where required. Regards, Thomas
Re: [jr3] EventJournal / who merges changes
Hi, Multiple threads adding child nodes to the same parent node Yes, that's an important use case, and should not be a problem problem for my proposed solution. For instance, more than 1 thread calling UserManager.createUser(userId,shardPath(useId)) where shardPath(userId) results in a subtree generated from the userId to reduce contention If we support large child node lists (automatically split using hidden inner nodes) then your application would get simpler. child nodes are essentially multivalue properties You are right, internally child nodes are stored in a similar way currently. Regards, Thomas
Re: [jr3] EventJournal / who merges changes
Hi Ian, Could you describe your use case? probability of conflict when updating a multivalued property is reduced What methods do you call, and how should the conflict be resolved? Example: if you currently use the following code: 1) session1.getNode(test).setProperty(multi, new String[]{a, b},..); 2) session2.getNode(test).setProperty(multi, new String[]{d, e},..); 3) session1.save(); 4) session2.save(); Then that would be a conflict. How would you resolve it? One option is to silently overwrite in line 4. Regards, Thomas
Re: [jr3] EventJournal / who merges changes
There are low level merge and high level merge. A low level merge is problematic it can result in unexpected behavior. I would even say the way Jackrabbit merges changes currently (by looking at the data itself, not at the operations) is problematic. Example: Currently, orderBefore can not be done at the same time as addNode or another orderBefore. I'm not saying this is important, but it's one case that is complex. Another example: Let's say the low level representation would split nodes if here are more than 1000 child nodes (add one layer of hidden internal nodes). That means adding a node to a list of 1000 nodes could cause a (b-tree-) split. If two sessions do that concurrently it will get messy. Session 1 will create new internal nodes, session 2 will create new internal nodes as well (but different ones), and merging the result will (probably) duplicate all 1000 nodes. Or worse. The idea is to _not_ try to merge by looking at the data, but merge by re-applying the operation. If saving the new data fails (by looking at the timestamp/version numbers), then refresh the data, and re-apply the operation (orderBefore, addNode,...). This is relatively easy to implement, and works in more cases than what Jackrabbit can do now. Jackrabbit anyway needs to keep the EventJournal, so this is will not use more memory. This is not a new idea, it is how MVCC works (at least how I understand it). From http://en.wikipedia.org/wiki/Multiversion_concurrency_control - if a transaction [fails], the transaction ... is aborted and restarted. Regards, Thomas
Re: [jr3] Synchronized sessions
Hi, this creates a big potential for deadlocks Could you provide an example on how such a deadlock could look like? just synchronizing all methods So you also synchronize all Node/Item/Property methods Some methods don't need to be synchronized, for example some getter methods such as Session.getRepository(), RangeIterator.getPosition() and getSize(). I'm not sure if Node.getProperty needs to be synchronized. The Value class is (almost) immutable so synchronization is not required here. But very likely Session.getNode(..) and Node.getNode() need to be synchronized because those potentially modify the cache. ensure that for a given Item x, the same Item instance is always returned from all getXXX methods I'm not sure what you are referring to. Jackrabbit already does ensure the same node object is returned as far as I know, but for other reasons than synchronization. if people do the wrong things, well, fine, let them do ... It's usually not those people that have to fix broken repositories. my veto Let's see. Most jcr apps I've seen often use a single session from several threads to read from this session. (I think I also read it somewhere that this is safe with jackrabbit, but I might be mistaken). I'm not sure if this is really safe. Maybe it is problematic if one thread uses the same session for updates. Simply syncing everything on the session would decrease performance in these cases dramatically. Actually, I don't think that's the case. Regards, Thomas
Re: [jr3] Synchronized sessions
Hi, Consider two or more threads reading different items at the same time: they all are chained one after the other. Only if those threads use the same session. this is unsupported, yet you want to add synchronization to secure this unsupported case ... When we are done it becomes a supported use case :-) I don't have an example off hand Please let us know if you have one. Regards, Thomas
Re: [jr3] Synchronized sessions
Hi http://issues.apache.org/jira/browse/JCR-2443. Unfortunately this bug doesn't have a test case. Also I didn't find a thread dump that shows what the problem was exactly. I can't say what was the problem there. Observation is definitely an area where synchronization can potentially lead to deadlocks. Maybe observation needs to use its own session(s) so that it can't block. This is not a new issue however: most writes are already synchronized (not all writes however). I'm hesitant to change synchronization with the current implementation: doing that would very likely lead to Java level deadlocks. We need to make sure synchronization is always done on the same level, and in the same order. With the current implementation, that's challenging. Of course performance and concurrency is very important. But the current approach (mutable data structures, some writes are synchronized) is quite dangerous. Instead, immutable data structures should be used, at least for values and objects in the shared cache. Everything else should be properly synchronized if mutable, or - if that's too slow - the proper data structures should be used, for example ConcurrentHashMap, CopyOnWriteArrayList, CopyOnWriteArraySet. Regards, Thomas
[jr3] EventJournal / who merges changes
== Current Behavior == Currently Jackrabbit tries to merge changes when two sessions add/change/remove different properties concurrently on the same node. As far as I understand, Jackrabbit merges changes by looking at the data (baseline, currently stored, and new). The same for child nodes: when two sessions add different child nodes concurrently, both child nodes are added. There are some problems, for example (when using b-tree mechanisms for child nodes) when a session added child nodes that caused the child node list to split, and a second session adds a different child node (possibly causing a different split). For the second session it looks like some child nodes have been removed, and it would add the child node on the wrong (b-tree) level (in the inner node instead in the leave node). I think merging changes is problematic. Trying to derive the logical operation from diffing the old and new versions is sometimes very hard. I suggest to merge changes in a different way. == Proposed Solution == When adding/changing/removing a property or node, the logical operation should be recorded on a high level (this node was added, this node was moved from here to there, this property was added), first in memory, but when there are changes, it needs to be persisted (possibly only temporarily). When committing a transaction (usually Session.save()), the micro-kernel tries to apply the changes. If there was a conflict, the micro-kernel rejects the changes (it doesn't try to merge). The higher level then has to deal with that. One way to deal with conflict resolution is: 1) Reload the current persistent state (undo all changes, load the new data). 2) Replay the logical operations from the (in-memory or persisted) journal. 3) If that fails again, depending on a timeout, go to 1) or fail. What I describe here is how I understand MVCC http://en.wikipedia.org/wiki/Multiversion_concurrency_control - every object would also have a read timestamp, and if a transaction Ti wanted to write to object P, and the timestamp of that transaction is earlier than the object's read timestamp (TS(Ti) RTS(P)), the transaction Ti is aborted and restarted. So Jackrabbit would record the 'transaction Ti' on a higher level. If applying the changes fails (in the micro-kernel), Jackrabbit would automatically restart this transaction (up to a timeout). This should also work well in a distributed environment. This case is similar synchronizing databases. == API == Instead of the current API that requires the change log to be in memory, I suggest to use iterators: void store(IteratorBundle newBundles, IteratorEvent events) throws ConcurrentUpdateException The ChangeLog consists of the new node bundles (plus, for each node bundle, the read timestamp). The event list consists of the EventJournal entries. For smaller operations, a session can keep the event journal in memory. For larger operations, the session can use a temporary file, or possibly store the data in a temporary area within the persistence layer (maybe using a different API). If the operation fails, the session would reload all bundles, and re-apply the events stored in his own local event log. Regards, Thomas
[jr3] Synchronized sessions
Currently, Jackrabbit sessions are somewhat synchronized, but not completely (for example it's still possible to concurrently read and update data). There were some problems because of that, and probably there still are. I believe it's better to synchronize all methods in the session (on the session object). This includes methods on nodes and properties and so on. If this does turn out to be a performance problem, we can remove synchronization where required (and where it can safely be removed) or change the implementation (use immutable objects or safe data structures). This is more conservative, but I think the impact on performance will be minimal. Of course performance is important, however I think data consistency is more important than the possible gain of a few percents of (read-) performance. Regards, Thomas
Re: [jr3] Synchronized sessions
Hi, deadlocks I think it's relatively simple to synchronize all methods on the session. If we want to make sessions thread-safe, we should use proper implementations. Yes, that's what I want to write: a proper implementation. any concurrent use of the same session is unsupported. The disadvantage of this is that there is no way to enforce correct usage. In some cases, incorrect usage leads to data corruption. I believe data corrupt is not acceptable, even if the user made a mistake. Regards, Thomas
[jr3] Bundle format
I would like to define a new storage format for nodes and properties. A few ideas: == Name and Namespace Index == Currently each new property and node name is stored in the name index. Each namespace is stored in the namespace index. Those indexes are used to compress the data. There are several (smaller) problems with this: - The indexes are stored in properties file (non-transactional). - In the past, there were a few problems when migrating data (copying workspaces). - Jackrabbit indexes *each* name and namespace. This can run out of memory if there are many names (dynamically created names). - This is a problem for clustering (specially when using the eventually consistent model). I would like to keep a name index mechanism for commonly used names and namespaces, but would also support a non-indexed names / namespace format. I think we should start with a fixed list. We could add a mechanism to create new index entries later on. == Node Id == Currently Jackrabbit uses UUIDs to identify nodes. Even nodes that are not referenceable have UUIDs. This allows to create nodes concurrently, which is good. It is not optimal for storage however (index cache efficiency is very bad because the numbers are random; size overhead). Also, it's quite in-flexible (its hard to refer to external nodes). For node id storage, I suggest to support multiple data types: UUID (which is basically a fixed length or a string), long, and string. The Jackrabbit implementation may not need to support all formats (at least first), but the (bundle) storage format should. == List of Parent Node Ids == I would store that as a (hidden) multi-value property. == Commonly Used String == If we want to store node types as regular properties, we should avoid storing the node type strings. Instead, we should store the node type index only. This is similar to the name index and namespace index. I suggest the storage format supports a set of indexed values (initially a fixed list). Regards, Thomas
Re: [jr3] Repository microkernel
Hi, A agree with Marcel, the current Jackrabbit SPI it too high level. it must be impossible to create inconsistent data using the micro-kernel API - tree and referential integrity awareness +1. No more consistency check / fix. No more inconsistent index if the property-value-index is also part of the micro-kernel. - long running transaction support I think 'large' transaction support is important, but it doesn't need to be very fast. I would avoid creating temporary files (persisting the cache). Instead, changes could be written to storage, but not committed. A few ideas (but I'm not convinced all of them are good ideas): - In memory, each node points to its (main) parent. If a node is in memory, then its parent is also. - Nodes are immutable (in memory, and on disk). Each change (or at least commit) will replace all parent nodes including the root node. - No change merging. Instead, use MVCC (when reading) and 'node level write locking'. As far as I know this is like most MVCC databases work. Actually we could use 'property level write locking'. - Support multiple persistence backends (database, file system,...). - Support two phase commit and distributed transaction at a very low level, so that it's very easy to distribute data to many storage backends. - Move operations are copy+delete internally (maybe reordering a node in the list of child nodes is also a move). - Subtrees that didn't change for a longer time are eventually persisted as one blob in the data store in a form that is compact and fast to read Regards, Thomas
Re: [jr3] Node types
Hi, Which makes observation listeners an integral part of the microkernel, btw. The microkernel would only need to support one callback object (listener is probably the wrong word, because it is also called for read operations). This one would then call (and allow to register) regular JCR observation listeners. It would also deal with / delegate security, constraint checking like node type, and so on. I'm not sure who should be able to write to the node type system. It would be great if any session (with sufficient access rights) can, because that would simplify clustering. The 'node type system' would then just listen for changes on those nodes (and possibly revert those changes if they don't make sense - rolling back that transaction). Regards, Thomas
Re: [jr3] Flat hierarchy
About micro-kernel: I think the micro-kernel shouldn't have to support splitting large child lists into smaller lists. That could be done on a higher level. What it should support is all the features required by this mechanism: - Support 'hidden' nodes (those are marked using a hidden property). That means the path doesn't always map directly to the stored structure. Therefore the micro-kernel should not be directly responsible to build or interpret JCR paths (micro-kernel path are similar but they don't always match the JCR path). - The entry in the child node list may contain multiple properties (in most cases the name, and the child node id; but sometimes also the reference of the next child node). The number of properties for each entry is the same however. For sorting, only the first element is relevant. - The child node list can always be stored in sorted order. But this sorting doesn't always map to the JCR child node list. Regards, Thomas
Re: [jr3] Use JCache JSR-107 for (all) caches
Hi, About clustering: there are two main use cases: A) to improve read throughput and to achieve high availability. In this case writes can be serialized. B) to improve write throughput. In this case writes should not be serialized, instead writes should be merged later on (eventually consistent). I guess sometime we need to support both, but personally I think A is important as well (if not more important than B). Regards, Thomas
Re: [jr3] Node types
Hi, This would be after the fact and wouldn't work to validate that changes are correct (to verify added / changed nodes don't violate node type constraints). Also it wouldn't work for security. Regards, Thomas
Re: [jr3] Node types
Hi, I don't see the point of doing such steps after the transaction has already been committed. Well, because you don't have a callback mechanism that gets called _before_ committing (or reading, in the case of security). I'd make node type constraints and security checks the responsibility of the client who commits the transaction. That's a solution :-) I'm not sure it's the _right_ solution, but we can start like that. Regards, Thomas
Re: [jr3] Plugin architecture
Hi, The configuration should be persisted in the repository itself. Not in external configuration files. * dynamic configuration First of all, I would define an API for configuration changes. This API could be the regular JCR API, and the configuration could be stored in special system nodes. On top of that API, those who want to use OSGi can do that. Observation listeners (called triggers in relational databases) are currently are not part of the configuration, you always have to add them after starting the repository. I think there should be a way to add a persistent observation listener that is automatically started whenever the repository is started. Repository and Session lifecycle listeners or transaction boundary checkers Same as for trigger. Regards, Thomas
Re: [jr3] Use JCache JSR-107 for (all) caches
Hi, Is Jackrabbit too slow for you? Or do you have out of memory problems? Or why do you want to use your own cache? features like overflow to disk I would try to avoid that. It's not really a 'cache' if it has to be stored to disk, if the original data is also on disk. I would try to solve the root cause of the problem (problems supporting large transactions, improving performance) instead of trying to work around the issues on some higher level. Regards, Thomas
[jr3] Node types
Currently node types are integral part of the repository. There is a special storage mechanism (the file custom_nodetypes.xml), which is non-transactional and problematic for clustering. To simplify the architecture ('microkernel'), could the node type functionality be implemented outside the kernel, as some kind of observation listeners? The node type configuration could be stored as regular nodes in s special tree. When registering or modifying a node type, existing nodes may have to be updated of course. The node type information itself could be stored in the nodes itself as a hidden property.
Re: [jr3] Restructure Lucene indexing make use of Lucene 2.9 features
Hi, Thanks for the explanation! index every unique jcr fieldname in a unique lucene field, and do not prefix values as currently is being done. This sounds very reasonable. Regards, Thomas
Re: [jr3] Search index in content
I'd use Lucene to manage it. There are several problems. One is transactions, another is updating the index synchronously. Another problem is dependence on Lucene which is a problem for persistence and clustering. I would very much like to avoid inventing our own search index. I would definitely not use a completely new mechanism. I would re-use the repository to store the index data. Regards, Thomas
Re: [jr3] Flat hierarchy
Hi, I would also use a b-tree structure. If a node has too many child nodes, two new invisible internal nodes are created, and the list of child nodes is split up. Those internal nodes wouldn't have any properties. For efficient path lookup, the child node list should be sorted by name. This is a bit tricky. Currently, when adding a node, it is added as the last child. I suggest to change that behavior, and add it at the right place by default (so that the sort order is preserved). Like this, a path lookup is very fast even if there are many child nodes (binary search / b-tree). Is that an acceptable change (usability and spec conform)? If the user changes the child node order (manually re-ordered the nodes), then the sort order is broken. Then the path lookup has to scan through all nodes. While that's much slower, I think it's acceptable. One alternative is to use a linked list (each child node points to the next child node), which is very problematic for sharable nodes. So there would be a hidden flag 'child nodes are sorted by name'. Regards, Thomas
Re: [jr3] Flat hierarchy
Hi, I would also use a b-tree structure. If a node has too many child nodes, two new invisible internal nodes are created, and the list of child nodes is split up. Those internal nodes wouldn't have any properties. You mean a b-tree for each node? I think this could be a separate index, but one for the whole tree. The repository is one large b-tree, and each JCR node is a b-tree node (except for the JCR nodes that don't have any child nodes: those are b-tree leaves). If a JCR node has many child nodes, then there is at least one more level of b-tree nodes between the node and the child nodes. I think supporting fast path lookups for orderable child nodes is a bit more important than flat hierarchies Path lookups would still be fast (the same speed as now), except for large child node lists that were re-ordered. The difference is only for large child node lists. There is a difference between 'orderable' nodes (have the ability to reorder the child node list) and actually 're-ordered' child node lists. Is it acceptable if new nodes appear in lexicographic order in the child node list? Regards, Thomas
Re: [jr3] Search index in content
Hi, Property/value indexes: We anyway will have to implement some kind of database persistence. Databases support transactional indexes. We could use those instead of using Lucene. Or we could store the index in JCR nodes (which is part of the large repository b-tree). Indexes in databases are stored in exactly the same way. In any case, keeping the index and the persistence in the same storage simplifies transactional persistence a lot. A microkernel that relies on Apache Lucene even for simple property/value indexes is not an option in my view. Regards, Thomas
Re: [jr3] Flat hierarchy
Hi, A Jackrabbit repository is some kind of b-tree - just that the pages never split and balanced automatically. Maybe using b-tree is confusing? Let's call it manual b-tree then. i agree that flat hierarchies is an important feature, however we shouldn't compromise the performance for the 'standard' case with less than 1k child nodes I agree. Using the b-tree style wouldn't slow down the standard case. In the standard case things would stay exactly like they are now. add a next pointer This makes the data structure more complex but allows us to maintain support for orderable nodes. That's definitely an option. I just wonder if it's really required. I guess we will find out. Regards, Thomas
Re: [jr3] Plugin architecture
not sure that the JCR EventListener interface could be used for persistent observation listeners You are right. It would probably be a different API (to be defined). This mechanism could be used for (just an idea): - JCR observation - security (filtering nodes and properties; allowing / disallowing certain operations) - indexing - (maybe) register a remote repository Regards, Thomas
Re: [jr3] Flat hierarchy
Hi, I think Jukka is correct that the correct use of B-trees is to use one for each list of child nodes, not as a way to model the entire hierarchy. If you are more comfortable with this view, that's OK. I probably should have said: the whole repository is a tree data structure. And there are modifications that can easily be applied to B-trees that deal with arbitrary (not based on a key) ordering of the nodes Sure. Jackrabbit needs a way to quickly navigate to a node by path. For that, you have to traverse the nodes and for each node you have to find the correct child. To do that, it's better if the child node list is ordered by name. Otherwise you have to iterate over all child nodes until you find the right one. Or you need a secondary index (Lucene?). And that's no matter if it's using a b-tree internally or not. The part that's not clear to me is how this can be efficiently combined with an append-only storage format that's being discussed on the [jr3] Unified persistence thread. It wouldn't be good if every time a list of children is modified the persistence layer has to make a complete copy of the modified B-tree You only have to update the b-tree node that is modified. That may be a hidden node (internal, hidden b-tree node) or a real node. Regards, Thomas
Re: [jr3] Flat hierarchy
Hi, JCR requires lookup of children by name and/or position (for orderable children), so the implementation needs to support all these cases efficiently. The trickiest one to handle is probably Node.getNodes(String namePattern) because it requires using both name and position together. While it's true that all that needs to be supported, I doubt that we should try to optimize for all cases. Otherwise the normal case will be slower. Usually, there are not that many child nodes. In that case lookup is not a problem: both a array and a hash map can be used (in memory). If there are many child nodes, then we should try to optimize for the most important case. I think it doesn't make sense to optimize for the case that the long list (many thousand) children are manually re-ordered (using orderBefore). Regards, Thomas
Re: [jr3] Flat hierarchy
Hi, Even without using orderBefore, the specification still requires a stable ordering of children for nodes that support orderable child nodes (see 23.1 of JCR 2.0 spec). Thanks for the link! I see now my idea violates the specification which says (23.3) When a child node is added to a node that has orderable child nodes it is added to the end of the list. My idea was to add the child node according to its name (until the order is changed using orderBefore). One possibility is to limit the use of the B-tree to nodes that do not support orderable children You are right. Unfortunately orderable child nodes is the default. Another solution is to keep a linked list from child node entry to child node entry (only in this case). Let's see how complicated that is. By the way same name siblings would work fine (no linked list required). Regards, Thomas
Re: [jr3] Search index in content
+1 For simple search a built-in index would help a lot, for example node names and (some) property values. Each property name could have its own index. Advantages: - transactional index updates - reduced complexity - reduced number of open files - allows to implement Jackrabbit in C I would not try to use that to index binaries (fulltext index), or re-implement advanced features (ranking, phrase queries, stemming,...). Regards, Thomas
Re: [jr3] Restructure Lucene indexing make use of Lucene 2.9 features
Hi each property indexed in its own Lucene field Could you explain in more details? What is a 1:1 mapping? Do you mean each property type should have it's own index, or each property name should have its own index? Would this not increase the number of Lucene index files a lot? Regards, Thomas
Re: [jr3] Unified persistence
Hi, I would implement the storage layer ourselves. It could look like: - FileDataStore: keep as is (maybe reduce the directory level by one). - Each node has a number (I would use a long). Used for indexing. - MainStorage: the node data is kept in an append-only main persistence storage. When switching to a new file, the node lookup table (node index) is appended. An optimization step would separate less updated nodes (old generation) and frequently updated nodes. Nodes and its child nodes are grouped together. - Namespace index, name index, nodetype registry: start with a fixed (hardcoded) list, and store additional entries as system nodes. Regards, Thomas
Re: [jr3] One workspace to rule them all
Hi, The most obvious trouble with this approach is that the node UUIDs would no longer be unique within such a super-workspace. I'm not sure how to best solve that problem, apart from switching to some alternative internal node identifiers. Any ideas? Use a number (variable size when stored to disk; long in memory) as the unique node identifier. Anyway we need a way to identify all nodes for indexing (property/value index). This number would not necessarily be accessible from the public API however. I would still keep the UUID for backward compatibility, but only for referenceable nodes. It would be stored as a (hidden) property. Regards, Thomas
Re: [jr3] Search index in content
Hi, For me, there are two kinds of indexes: the property/value indexes, and the fulltext index. The property/value indexes are for property values, node names, paths, node references, and so on. Such indexes (or indices) are relatively small and fast. In relational databases, those are the secondary indexes (non-primary-key indexes). Those index updates should be done synchronously as part of the transaction (maybe even in the transient space). Currently, we use Apache Lucene for this, but I wouldn't. I would keep those indexes within the repository. The fulltext index is (potentially) slow, specially fulltext extraction. Therefore, fulltext index should be done asynchronously if it takes too long. Also, in a clustered environment, at least text extraction should only be done in one cluster node. I would still use Apache Tika and Apache Lucene for this. Regards, Thomas
Re: [jr3] MVCC
Hi, I would do MVCC in a similar way it is done in relational databases such as PostgreSQL. See also www.postgresql.org/files/developer/transactions.pdf Concurrent writes and MVCC: usually MVCC means readers are never blocked by other readers or writers, and writers are not blocked by readers. However, writers can block other writers when trying to update the same node (row in databases). Concurrent writes to disk: I think this only makes sense if the hardware supports it. With a single disk it doesn't make sense: concurrent writes to two different positions or files is actually slower than serialize writing, and only writing to one position in one file. Relational databases don't usually persist all (intermediate) versions, just the committed version. I don't think that copy-on-read is a good idea. If we use append-only storage, in theory all old versions are available, but indexing those is problematic. Regards, Thomas
Re: [jr3] Unified persistence
Hi, About 'append only' and 'immutable' storage. Here is an interesting link: http://eclipsesource.com/blogs/2009/12/13/persistent-trees-in-git-clojure-and-couchdb-data-structure-convergence/ Regards, Thomas
Re: Jackrabbit 3: extracting same name sibling support from the core
Hi, A very simple implementation of my idea: http://h2database.com/p.html#e5e5d0fa3aabc42932e6065a37b1f6a8 The method hasSameNameSibling() that is called for each remove(). If it turns out to be a performance problem we could add a hidden property in the first SNS node itself (only required there). Does anybody see any other obvious problems? Regards, Thomas
Re: Upgrade from 1.5.5 to 2.0.0 was unsuccessful in clustered environment: Cause: java.sql.SQLException: Lock wait timeout exceeded; Any ideas?
Hi, Could you point me in the right direction for a production-ready model 3 deployment model (where we can access the repository remotely)? There is some documentation available here: http://wiki.apache.org/jackrabbit/RemoteAccess Regards, Thomas
Jackrabbit 3: extracting same name sibling support from the core
Hi, About SNS (same name siblings): what about moving that part away from the core? Currently, the Jackrabbit architecture is (simplified): 1) API layer (JCR API, SPI API) 2) Jackrabbit core, which knows about SNS After moving the SNS support, it would be something like this: 1) API layer (JCR API, SPI API) 2) SNS support code, which knows about SNS, and maps SNS node names to/from internal node names 3) Jackrabbit core, doesn't know anything about SNS (node names must be unique, [ and ] are supported within node names) My hope is that this would simplify the core, because it doesn't have to deal with SNS at all. Disadvantage: there is a risk that certain things would get a bit slower if SNS are actually used (specially if there are lots of SNS). Regards, Thomas
Re: Jackrabbit 3: extracting same name sibling support from the core
Hi, Is this could be an optional feature in 3.X? As JCR 2.X is out and it could raise comparability problem, right? This change wouldn't affect the public API. SNS would still be supported as they are done now. Maybe with a few changes, but all within the JCR 2.0 specification. About compatibility: existing repositories would need to be converted in some way, that's true. One way to convert is the repository copier tool: http://wiki.apache.org/jackrabbit/BackupAndMigration Regards, Thomas
Re: Upgrade from 1.5.5 to 2.0.0 was unsuccessful in clustered environment: Cause: java.sql.SQLException: Lock wait timeout exceeded; Any ideas?
Hi, Please use the 'user' list for questions. the lock timeouts are occurring only with non-jcr tables during routine actions in other areas of our site, even though they have nothing to do with Jackrabbit. It sounds like the problem is not related to Jackrabbit then. disabling jackrabbit solved the problem It could be due to lower database activity. What happens if you use two independent databases? Regards, Thomas
Re: change proposal DataStore
Hi, Currently there is only data store per repository. If you need a data store per workspace, then you need one repository per workspace. - Assign a datastore per workspace (customer) so it's possible to measure (and limit) storage usage for a given customer This more sounds like an accounting problem than a technical problem. Could you add some accounting code to the application? For example, use an ObservationListener to calculate the disk space used by a user (workspace). Or, use a wrapper around the input stream and measure / limit storage like this. - Dynamic allocation, so newer or more accessed nodes will be stored on faster disk and old nodes will be mover to sata slower disk Such a 'caching data store' would be nice. It's a bit tricky to implement I think. Currently, there is no such implementation, however patches are always welcome. Regards, Thomas
Re: change proposal DataStore
Hi, extend the datastore interface workspace name, node name, property name ... I'm not sure if the workspace / node name / node identifier / property name is always available. One advantage of this addition would be: it could speed up garbage collection. If a binary object knows the node identifier(s), garbage collection could check the large objects first (because you could keep links from the binary object to the place where it is / was used). I'm not completely against such a change, however for the given problem (accounting) it does sound like the wrong solution. Regards, Thomas
Re: [VOTE] Release Apache Jackrabbit 2.0.0
+1 Release this package as Apache Jackrabbit 2.0.0 - checksums OK - licences OK - notice.txt, readme.txt and release-notes.txt files OK - mvn clean install OK with Sun Java 1.5.0_22 / Mac OS X Regards, Thomas
Re: javax.naming.NamingException: The repository home D:\repository appears to be in use since the file named .lock is locked by another process.
Hi, About the repository lock see http://wiki.apache.org/jackrabbit/RepositoryLock P.S. Please use the user list for usage questions Regards, Thomas On Thu, Jan 21, 2010 at 2:04 PM, abhishek reddy abhishek.c1...@gmail.com wrote: hi, For the first time, i can able to access repository successfully. second time onwards it is giving the following exception. javax.naming.NamingException: The repository home D:\repository appears to be in use since the file named .lock is locked by another process. I have created the Repository and kept it in the application scope. and everytime iam accessing the repository in the following manner Repository repository = (Repository) sc.getAttribute(repository); session=repository.login(); //code session.close(); how to overcome this problem ? Everytime do i need to remove this lock file manually? need help regarding -- Abhishek
Re: Hudson build is still unstable: Jackrabbit-trunk » Jackrabbit Core #961
Hi, now org.apache.jackrabbit.core.util.CooperativeFileLockTest.testFileLock failed thomas, I think it was you who added this test recently, right? Yes... does Hudson run on Windows? It looks like a timing problem (the thread doesn't stop quickly enough). Regards, Thomas
Re: Sling's use of Jackrabbit
Hi, We can't change that API part in 2.x. I understand we should not _change_ (or remove) a public API within 2.x. That's actually the main reason why I wouldn't export the PersistenceManager API now, because it would force us to keep it like it is for the whole 2.x. But we can still export _additional_ packages within 2.x (for example, export the PersistenceManager API in 2.1 or 2.2 if really needed). Regards, Thomas
Re: Sling's use of Jackrabbit
Hi, The problem of Jackrabbit Core is, that apart from implementing the Jackrabbit API (which is imported in the bundle), it has its internal API (for example the PersistenceManager interface or others). This internal API is not properly separated (in terms of Java Packages) from implementation classes, which should not leak into the exported package space. If this really is an issue, we should try to solve it. Is is really an issue? A solution might be to move the PersistenceManager and other interfaces to jackrabbit-api (would we need to change the package name?). Regards, Thomas
Re: Sling's use of Jackrabbit
Hi, It's worth to move some of the internal API to jackrabbit-api for other bundle to provide different implementation. Tt could well documented and better for third party to extend the jackrabbit. I would do that only if there is an actual need for it. Do you have another implementation? Persistence manager, data store, or journal? If yes, would it be enough to just move the persistence manager interface, or do we need to do something else (for example does your implementation extend the abstract bundle persistence manager)? Regards, Thomas
Re: Sling's use of Jackrabbit
Hi, I don't have another implementation at the moment for any of them. OK, good to know. I can think of it's might be possible to add key/value store as bundle persistence store in future. I would wait until it's a real problem. Trying to solve _potential_ problems in advance is usually the wrong path. Regards, Thomas
Re: Sling's use of Jackrabbit
Hi, I would not move the API to the Jackrabbit API. Just moving the interfaces into sepatate packages, eg. below o.a.j.core.api would suffice it to export this space and leave the implementation private. +1 Moving the persistence manager interface and co (basically everything that can be swapped by custom implementations in the repository.xml) into a separate API package is a good idea. But it makes sense to separate this from the client-side API in jackabbit-api and only have this as an exported package in jackrabbit-core. -1 As I already wrote, it doesn't make sense to do that now. We can still do that later on, when there is actually somebody that needs it. Regards, Thomas
Re: [VOTE] Release Apache Jackrabbit 2.0 beta3
+1 Release this package as Apache Jackrabbit 2.0-beta3 - checksums OK - licences OK - notice.txt, readme.txt and release-notes.txt files OK - mvn clean install OK with Sun Java 1.6.0_15 / Mac OS X Regards, Thomas On Mon, Nov 23, 2009 at 10:21 AM, Sébastien Launay sebastienlau...@gmail.com wrote: Hi, [X] +1 Release this package as Apache Jackrabbit 2.0-beta3 - checksums [OK] - signature [OK] - license, notice, header and readme files [OK] - maven build [OK with one failed test] with Ubuntu Jaunty / Sun Java 1.6.0_14-b08 A test case in jackrabbit-jcr-client prevent the build from being successful but i think this test case is not critical for releasing a beta version: --- Test set: org.apache.jackrabbit.client.RepositoryFactoryImplTest --- Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.253 sec FAILURE! testGetSpi2davexRepository(org.apache.jackrabbit.client.RepositoryFactoryImplTest) Time elapsed: 0.078 sec ERROR! java.lang.UnsupportedOperationException: Missing implementation at org.apache.jackrabbit.spi2dav.ExceptionConverter.generate(ExceptionConverter.java:109) at org.apache.jackrabbit.spi2dav.ExceptionConverter.generate(ExceptionConverter.java:49) at org.apache.jackrabbit.spi2dav.RepositoryServiceImpl.getRepositoryDescriptors(RepositoryServiceImpl.java:537) at org.apache.jackrabbit.jcr2spi.RepositoryImpl.init(RepositoryImpl.java:82) at org.apache.jackrabbit.jcr2spi.RepositoryImpl.create(RepositoryImpl.java:95) at org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory.getRepository(Jcr2spiRepositoryFactory.java:166) at org.apache.jackrabbit.client.RepositoryFactoryImpl.getRepository(RepositoryFactoryImpl.java:75) at org.apache.jackrabbit.client.RepositoryFactoryImplTest.testGetSpi2davexRepository(RepositoryFactoryImplTest.java:169) Caused by: org.apache.jackrabbit.webdav.DavException: Method REPORT is not defined in RFC 2068 and is not supported by the Servlet API at org.apache.jackrabbit.webdav.client.methods.DavMethodBase.getResponseException(DavMethodBase.java:172) at org.apache.jackrabbit.webdav.client.methods.DavMethodBase.checkSuccess(DavMethodBase.java:181) at org.apache.jackrabbit.spi2dav.RepositoryServiceImpl.getRepositoryDescriptors(RepositoryServiceImpl.java:507) ... 31 more I do not test the maven artefacts just the source package and the war. -- Sébastien Launay
Re: How to reclaim disk space?
Hi, How big is this directory? By default, Jackrabbit uses Apache Derby to persist data. This directory belongs to the embedded Apache Derby databases. There is a way to compact Derby databases, however you would need implement this yourself. I found the link to the Apache Derby documentation: http://db.apache.org/derby/docs/10.5/ref/ref-single.html#rrefaltertablecompress Regards, Thomas On Wed, Nov 11, 2009 at 4:21 PM, Xudaquan xudaquan2...@yahoo.cn wrote: Hello, I meet a problem describe below when I use jackrabbit: when I add nodes to the repository,this size of some files in de diretory 'jackrabbit\workspaces\default\db\seg0' increase incessantly, but after stop adding nodes to the repository, I remove all nodes I added, all the sizes of files in diretory 'jackrabbit\workspaces\default\db\seg0' remain the same. so the disk space jacerabbit use cann't release. how to solve this problem? thanks! -- 好玩贺卡等你发,邮箱贺卡全新上线!http://cn.rd.yahoo.com/mail_cn/tagline/card/*http://card.mail.cn.yahoo.com/