[VOTE] Release Apache Jackrabbit Oak 1.0.5

2014-08-25 Thread Thomas Mueller
A candidate for the Jackrabbit Oak 1.0.5 release is available at:

https://dist.apache.org/repos/dist/dev/jackrabbit/oak/1.0.5/

The release candidate is a zip archive of the sources in:


https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.0.5/

The SHA1 checksum of the archive is
2cd71913fe66ba9491ee7edb4e82469e228412c9.

A staged Maven repository is available for review at:

https://repository.apache.org/

The command for running automated checks against this release candidate is:

$ sh check-release.sh oak 1.0.5
2cd71913fe66ba9491ee7edb4e82469e228412c9

Please vote on releasing this package as Apache Jackrabbit Oak 1.0.5.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Jackrabbit PMC votes are cast.

[ ] +1 Release this package as Apache Jackrabbit Oak 1.0.5
[ ] -1 Do not release this package because...

My vote is +1

Regards
Thomas




Re: MissingLastRevSeeker

2014-08-25 Thread Julian Reschke

On 2014-08-26 08:03, Amit Jain wrote:

Hi Julian,

The LastRevRecoveryAgent is executed at 2 places
1. On DocumentNodeStore startup where the MissingLastRevSeeker is used to
get potential candidates for recovery.
  2. At regular intervals defined by the property
'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService (default
60 seconds). Short description is that MissingLastRevSeeker will be called
rarely in this case.
Long description - In this case a less expensive query is executed to find
out all the stale clusterNodes for which recovery is to be performed. If
there are clusterNodes that have unexpectedly shutdown and their
'leaseEndTime' has not expired then MissingLastRevSeeker will check all
potential candidates.


Proposal: if this code *is* used regularly, we'll need an API so that

DocumentStore implementations other than Mongo can optimize the query.
+1. Since, It will be executed on every startup. RDBDocumentStore already
maintains the index on _modified property so, optimized querying is
possible.

Thanks
Amit


OK, so can we put what's needed into the DocumentStore API, or 
alternatively have an extension interface, that both MongoDocumentStore 
and RDBDocumentStore could implement?


Best regards, Julian


Re: MissingLastRevSeeker

2014-08-25 Thread Amit Jain
Hi Julian,

The LastRevRecoveryAgent is executed at 2 places
1. On DocumentNodeStore startup where the MissingLastRevSeeker is used to
get potential candidates for recovery.
 2. At regular intervals defined by the property
'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService (default
60 seconds). Short description is that MissingLastRevSeeker will be called
rarely in this case.
Long description - In this case a less expensive query is executed to find
out all the stale clusterNodes for which recovery is to be performed. If
there are clusterNodes that have unexpectedly shutdown and their
'leaseEndTime' has not expired then MissingLastRevSeeker will check all
potential candidates.

>> Proposal: if this code *is* used regularly, we'll need an API so that
DocumentStore implementations other than Mongo can optimize the query.
+1. Since, It will be executed on every startup. RDBDocumentStore already
maintains the index on _modified property so, optimized querying is
possible.

Thanks
Amit


On Mon, Aug 25, 2014 at 7:36 PM, Julian Reschke 
wrote:

> Hi there,
>
> it appears that the MissingLastRevSeeker (oak-core), when run, will be
> very slow on large repos, unless they use a MongoDocumentStore (which has a
> special-cased query).
>
> Question: when will this code execute? I've seen it occasionally during
> benchmarking, but it doesn't seem to happen always.
>
> Proposal: if this code *is* used regularly, we'll need an API so that
> DocumentStore implementations other than Mongo can optimize the query.
>
> Best regards, Julian
>


Re: JCR API implementation transparency

2014-08-25 Thread Tobias Bocanegra
fyi, I created https://issues.apache.org/jira/browse/OAK-2052

On Mon, Aug 25, 2014 at 10:32 PM, Chetan Mehrotra
 wrote:
> On Tue, Aug 26, 2014 at 10:44 AM, Tobias Bocanegra  wrote:
>> IMO, this should work, even if the value is not a ValueImpl. In this
>> case, it should fall back to the API methods to read the binary.
>
> +1
>
> Chetan Mehrotra


Re: JCR API implementation transparency

2014-08-25 Thread Chetan Mehrotra
On Tue, Aug 26, 2014 at 10:44 AM, Tobias Bocanegra  wrote:
> IMO, this should work, even if the value is not a ValueImpl. In this
> case, it should fall back to the API methods to read the binary.

+1

Chetan Mehrotra


JCR API implementation transparency

2014-08-25 Thread Tobias Bocanegra
Hi,

I'm looking at an issue [0] where "copying" of a JCR value fails,
because the source and destination repository implementation are
different.

so basically:

s1 = repository1.login(); // remote repository via davex
s2 = repository2.login(); // local oak repository

p1 = s1.getProperty();
n2 = s2.getNode();

n2.setProperty(p1.getName(), p1.getValue());

AFAICT, this usually works but not for binary values. it eventually fails in:

org.apache.jackrabbit.oak.plugins.value.ValueImpl#getBlob(javax.jcr.Value)

public static Blob getBlob(Value value) {
checkState(value instanceof ValueImpl);
return ((ValueImpl) value).getBlob();
}

...because the value is not a ValueImpl but a QValue.

IMO, this should work, even if the value is not a ValueImpl. In this
case, it should fall back to the API methods to read the binary.
WDYT?

Regards, Toby


[0] https://issues.apache.org/jira/browse/JCRVLT-58


Re: [DISCUSS] supporting faceting in Oak query engine

2014-08-25 Thread Lukas Smith
Aloha,

you should definitely talk to the HippoCMS developers. They forked Jackrabbit 
2.x to add facetting as virtual nodes. They ran into some performance issues 
but I am sure they still have value-able feedback on this.

regards,
Lukas Kahwe Smith

> On 25 Aug 2014, at 18:43, Laurie Byrum  wrote:
> 
> Hi Tommaso,
> I am happy to see this thread!
> 
> Questions: 
> Do you expect to want to support hierarchical or pivoted facets soonish?
> If so, does that influence this decision?
> Do you know how ACLs will come into play with your facet implementation?
> If so, does that influence this decision? :-)
> 
> Thanks!
> Laurie
> 
> 
> 
>> On 8/25/14 7:08 AM, "Tommaso Teofili"  wrote:
>> 
>> Hi all,
>> 
>> since this has been asked every now and then [1] and since I think it's a
>> pretty useful and common feature for search engine nowadays I'd like to
>> discuss introduction of facets [2] for the Oak query engine.
>> 
>> Pros: having facets in search results usually helps filtering (drill down)
>> the results before browsing all of them, so the main usage would be for
>> client code.
>> 
>> Impact: probably change / addition in both the JCR and Oak APIs to support
>> returning other than "just nodes" (a NodeIterator and a Cursor
>> respectively).
>> 
>> Right now a couple of ideas on how we could do that come to my mind, both
>> based on the approach of having an Oak index for them:
>> 1. a (multivalued) property index for facets, meaning we would store the
>> facets in the repository, so that we would run a query against it to have
>> the facets of an originating query.
>> 2. a dedicated QueryIndex implementation, eventually leveraging Lucene
>> faceting capabilities, which could "use" the Lucene index we already have,
>> together with a "sidecar" index [3].
>> 
>> What do you think?
>> Regards,
>> Tommaso
>> 
>> [1] :
>> http://markmail.org/search/?q=oak%20faceting#query:oak%20faceting%20list%3
>> Aorg.apache.jackrabbit.oak-dev+page:1+state:facets
>> [2] : http://en.wikipedia.org/wiki/Faceted_search
>> [3] :
>> http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-file
>> s/userguide.html
> 


Re: [DISCUSS] supporting faceting in Oak query engine

2014-08-25 Thread Laurie Byrum
Hi Tommaso,
I am happy to see this thread!

Questions: 
Do you expect to want to support hierarchical or pivoted facets soonish?
If so, does that influence this decision?
Do you know how ACLs will come into play with your facet implementation?
If so, does that influence this decision? :-)

Thanks!
Laurie



On 8/25/14 7:08 AM, "Tommaso Teofili"  wrote:

>Hi all,
>
>since this has been asked every now and then [1] and since I think it's a
>pretty useful and common feature for search engine nowadays I'd like to
>discuss introduction of facets [2] for the Oak query engine.
>
>Pros: having facets in search results usually helps filtering (drill down)
>the results before browsing all of them, so the main usage would be for
>client code.
>
>Impact: probably change / addition in both the JCR and Oak APIs to support
>returning other than "just nodes" (a NodeIterator and a Cursor
>respectively).
>
>Right now a couple of ideas on how we could do that come to my mind, both
>based on the approach of having an Oak index for them:
>1. a (multivalued) property index for facets, meaning we would store the
>facets in the repository, so that we would run a query against it to have
>the facets of an originating query.
>2. a dedicated QueryIndex implementation, eventually leveraging Lucene
>faceting capabilities, which could "use" the Lucene index we already have,
>together with a "sidecar" index [3].
>
>What do you think?
>Regards,
>Tommaso
>
>[1] :
>http://markmail.org/search/?q=oak%20faceting#query:oak%20faceting%20list%3
>Aorg.apache.jackrabbit.oak-dev+page:1+state:facets
>[2] : http://en.wikipedia.org/wiki/Faceted_search
>[3] :
>http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-file
>s/userguide.html



MissingLastRevSeeker

2014-08-25 Thread Julian Reschke

Hi there,

it appears that the MissingLastRevSeeker (oak-core), when run, will be 
very slow on large repos, unless they use a MongoDocumentStore (which 
has a special-cased query).


Question: when will this code execute? I've seen it occasionally during 
benchmarking, but it doesn't seem to happen always.


Proposal: if this code *is* used regularly, we'll need an API so that 
DocumentStore implementations other than Mongo can optimize the query.


Best regards, Julian


[DISCUSS] supporting faceting in Oak query engine

2014-08-25 Thread Tommaso Teofili
Hi all,

since this has been asked every now and then [1] and since I think it's a
pretty useful and common feature for search engine nowadays I'd like to
discuss introduction of facets [2] for the Oak query engine.

Pros: having facets in search results usually helps filtering (drill down)
the results before browsing all of them, so the main usage would be for
client code.

Impact: probably change / addition in both the JCR and Oak APIs to support
returning other than "just nodes" (a NodeIterator and a Cursor
respectively).

Right now a couple of ideas on how we could do that come to my mind, both
based on the approach of having an Oak index for them:
1. a (multivalued) property index for facets, meaning we would store the
facets in the repository, so that we would run a query against it to have
the facets of an originating query.
2. a dedicated QueryIndex implementation, eventually leveraging Lucene
faceting capabilities, which could "use" the Lucene index we already have,
together with a "sidecar" index [3].

What do you think?
Regards,
Tommaso

[1] :
http://markmail.org/search/?q=oak%20faceting#query:oak%20faceting%20list%3Aorg.apache.jackrabbit.oak-dev+page:1+state:facets
[2] : http://en.wikipedia.org/wiki/Faceted_search
[3] :
http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-files/userguide.html


oak-run benchmarks

2014-08-25 Thread Julian Reschke

Hi there,

I'm currently looking at the benchmark behavior for the RDB persistence, 
and I believe I'm seeing degrading performance with each additional run 
of the benchmark.


To make cases like these easier to find, would it make sense to also 
report on whether there's a trend in benchmark times? (such as by 
reporting the average ratio of runtime between subsequent runs?)


Best regards, Julian


buildbot failure in ASF Buildbot on oak-trunk-win7

2014-08-25 Thread buildbot
The Buildbot has detected a new failure on builder oak-trunk-win7 while 
building ASF Buildbot.
Full details are available at:
 http://ci.apache.org/builders/oak-trunk-win7/builds/497

Buildbot URL: http://ci.apache.org/

Buildslave for this Build: bb-win7

Build Reason: scheduler
Build Source Stamp: [branch jackrabbit/oak/trunk] 1620305
Blamelist: mduerig

BUILD FAILED: failed compile

sincerely,
 -The Buildbot





Oak 1.0.5 release plan

2014-08-25 Thread Thomas Mueller
Sorry, wrong subject. It's the Oak 1.0.5 release of course.

On 25/08/14 14:06, "Thomas Mueller"  wrote:

>Hi,
>
>Now that 1.0.4 is out, it's time to plan the next minor release.
>
>I'm planning to cut the 1.0.5 release today in about 30 minutes.
>
>Regards
>Thomas
>



Re: Oak 1.0.4 release plan

2014-08-25 Thread Thomas Mueller
Hi,

Now that 1.0.4 is out, it's time to plan the next minor release.

I'm planning to cut the 1.0.5 release today in about 30 minutes.

Regards
Thomas



buildbot success in ASF Buildbot on oak-trunk-win7

2014-08-25 Thread buildbot
The Buildbot has detected a restored build on builder oak-trunk-win7 while 
building ASF Buildbot.
Full details are available at:
 http://ci.apache.org/builders/oak-trunk-win7/builds/496

Buildbot URL: http://ci.apache.org/

Buildslave for this Build: bb-win7

Build Reason: scheduler
Build Source Stamp: [branch jackrabbit/oak/trunk] 1620287
Blamelist: alexparvulescu

Build succeeded!

sincerely,
 -The Buildbot





Re: NodeStore#checkpoint api reevaluation

2014-08-25 Thread Marcel Reutegger
Hi,

On 22/08/14 16:31, "Alex Parvulescu"  wrote:
>Following OAK-2039 there was a discussion around the current design of the
>#checkpoint apis. [0]
>
>It looks a bit confusing that you can call the apis to create a checkpoint
>and get back a reference but when retrieving it, it might not exist, even
>if the calls are back to back.
>With OAK-2039 I've added some warning logs when a checkpoint cannot be
>created but a ref is still returned, to understand if this is a system
>load
>problem, or something more profound.

what is the reason the SegmentNodeStore does a commitSemaphore.tryAcquire()
instead of a commitSemaphore.acquire() like in SegmentNodeStore.merge()?

>I believe that nobody has any issues with the #retrieve method, all the
>confusion is really about the #checkpoint parts, currently marked as
>'@Nonnull'.
>
>Alternatives mentioned are
> - return null if the checkpoint was not created
> - throw en exception
>
>I vote -0 for the change, I believe that making this more complicated than
>it needs to be (more null checks, or a try/catch) doesn't really benefit
>anybody.

I think we should improve it somehow because I find the current behaviour
quite confusing. The current implementation of
SegmentNodeStore.checkpoint()
IMO violates the contract. It may return a string reference to a checkpoint
which was never created and obviously won't be valid for the requested
lifetime.

In my view, a client should be able to detect this in a simple way. Right
now you would have to call retrieve() to find out if checkpoint() actually
worked.

Returning a null value works better if we specify under what conditions
no checkpoint can be created. After all a client would have to implement
some code in response to a null value. E.g. should it retry later, because
the checkpoint cannot be created when the system is under load? This would
be a good fit if we keep the current implementation in SegmentNodeStore.

An exception works better if we say an implementation should always be
able to create a checkpoint and only fail if it cannot perform the
operation because of e.g. an underlying IOException.

Regards
 Marcel



Re: NodeStore#checkpoint api reevaluation

2014-08-25 Thread Michael Dürig



On 22.8.14 4:31 , Alex Parvulescu wrote:

Hi,

Following OAK-2039 there was a discussion around the current design of the
#checkpoint apis. [0]

It looks a bit confusing that you can call the apis to create a checkpoint
and get back a reference but when retrieving it, it might not exist, even
if the calls are back to back.


Reading the Javadoc carefully this is to be expected. However I think 
this could be improved. Either by making the Javadoc for #checkpoint 
more explicit about it or by reflecting it in the return value.


Instead of returning null for the later option we could also return a 
constant value representing a not available checkpoint. With that client 
code wouldn't need to change but could check the returned value if desired.


Michael


With OAK-2039 I've added some warning logs when a checkpoint cannot be
created but a ref is still returned, to understand if this is a system load
problem, or something more profound.

I believe that nobody has any issues with the #retrieve method, all the
confusion is really about the #checkpoint parts, currently marked as
'@Nonnull'.

Alternatives mentioned are
  - return null if the checkpoint was not created
  - throw en exception

I vote -0 for the change, I believe that making this more complicated than
it needs to be (more null checks, or a try/catch) doesn't really benefit
anybody.

If there are thoughts around how this should change, please feel free to
join in.

best,
alex


[0]
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/state/NodeStore.java#L124