[jira] [Commented] (OAK-7245) Build Jackrabbit Oak #1228 failed

2018-02-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354626#comment-16354626
 ] 

Hudson commented on OAK-7245:
-

Previously failing build now is OK.
 Passed run: [Jackrabbit Oak 
#1231|https://builds.apache.org/job/Jackrabbit%20Oak/1231/] [console 
log|https://builds.apache.org/job/Jackrabbit%20Oak/1231/console]

> Build Jackrabbit Oak #1228 failed
> -
>
> Key: OAK-7245
> URL: https://issues.apache.org/jira/browse/OAK-7245
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: continuous integration
>Reporter: Hudson
>Priority: Major
>
> No description is provided
> The build Jackrabbit Oak #1228 has failed.
> First failed run: [Jackrabbit Oak 
> #1228|https://builds.apache.org/job/Jackrabbit%20Oak/1228/] [console 
> log|https://builds.apache.org/job/Jackrabbit%20Oak/1228/console]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7058) oak-run compact reports success even when it was cancelled

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7058:
---
Flagged:   (was: Impediment)

> oak-run compact reports success even when it was cancelled
> --
>
> Key: OAK-7058
> URL: https://issues.apache.org/jira/browse/OAK-7058
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: run, segment-tar
>Reporter: Michael Dürig
>Priority: Major
>  Labels: production, tooling
> Fix For: 1.10
>
>
> When {{oak-run compact}} gets cancelled because running out of disk space it 
> will send a corresponding warning to the logs and bail out. However on the 
> console it will still report success. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7058) oak-run compact reports success even when it was cancelled

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7058:
---
Flagged: Impediment

> oak-run compact reports success even when it was cancelled
> --
>
> Key: OAK-7058
> URL: https://issues.apache.org/jira/browse/OAK-7058
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: run, segment-tar
>Reporter: Michael Dürig
>Priority: Major
>  Labels: production, tooling
> Fix For: 1.10
>
>
> When {{oak-run compact}} gets cancelled because running out of disk space it 
> will send a corresponding warning to the logs and bail out. However on the 
> console it will still report success. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-5469) TarMK: scaling the content

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5469:
---
Fix Version/s: (was: 1.10)
   (was: 1.9.0)

> TarMK: scaling the content
> --
>
> Key: OAK-5469
> URL: https://issues.apache.org/jira/browse/OAK-5469
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>Priority: Major
>  Labels: scalability
> Attachments: segment-per-path.png
>
>
> Production experience has shown that big repositories are prone to thrashing:
> {quote}
> Monitoring showed as massive level of major page faults, load averages 
> several times the number of cores, system cpu levels well above 50% and 
> extreme levels of IO. As more IOPS was provisioned the instance consumed all 
> available IOPS. The TechOps team reported many TB of read IO per hour and 
> hardly any write IO.
> Investigation revealed that the repository size was just larger than the 
> available RAM on the machine. The instance was running in MMAPED mode and the 
> IO was due to major page faults mapping in and out pages of memory. This was 
> made worse by transparent huge page settings causing huge pages to be mapped 
> proactively on major page faults. Compaction reduced the repository size to 
> less than RAM. The TechOps team now monitor the total tar file size and dont 
> let it exceed the RAM on the machine, scheduling compactions to keep within 
> limits. Since the default to TarMK was to run memory mapped rather than on 
> heap, the JVM had no visibility of the mayhem being caused at OS level.
> {quote}
> This epic is all about improving scalability of the TarMK wrt. the content. 
> Below are some initial points to consider. Let's create issues and link them 
> to this epic as we go.
> * What kind of internal / external monitoring do we need to understand and 
> optimally predict thrashing? Can we monitor the working set (active pages)? 
> The number of segments in the segment cache might be a good starting point.
> * (How) can we reproduce the thrashing (easily enough)? Can we scale it down 
> (i.e. to an instance with littler RAM)?
> * What is the impact of transparent huge pages (and switching it off)? How 
> much do we suffer from read amplification? What would be the impact of not 
> memory mapping but instead increasing the size of the segment buffer 
> accordingly? Both approaches aim at having finer grained control over the 
> data actually being loaded into RAM.
> * What other OS level tweaks should / can we look at? 
> * Can we reduce the working set by keeping it more compact? E.g. running 
> GC/compaction, reducing read amplification (see above), improving 
> de-duplication of values, storing values more efficiently (e.g. dates, and 
> boolean), can we on the fly compress buffers (e.g. segments)?
> * How do we testing with big repositories?
>   * What is a big repository? (Potential target: 100 GB segment store - 500M 
> nodes, TBC)
>   * What to measure (indicators of size): size on disk (after compaction), 
> number of JCR nodes, number of node records (reachable vs. waste)
>   * How to measure?
> * {{oak-run debug}} (needs improvements for better scalability)
> * one-line tool to provide all the info?
>   * How to obtain big repositories (generate or re-use existing)?
>   * What to analyze / monitor / debug?
> * Possible limits: number of nodes (relative to RAM) for which trashing 
> starts to occur, max. number of direct children, max. concurrent requests 
> during online garbage collection.
> * Platform monitoring: 
>   * basic: disc size, IO, CPU, memory
>   * Asses impact of hardware upgrades on performance. E.g. what impact 
> does doubling RAM/IO/CPU have on our test scenarios.
>   * in depth: page faults, writes / reads per process, working set of 
> nodes, commit statistics, incoming requests vs Oak operations, other hiccups
>   * Tools: [Ganglia|http://ganglia.info/], 
> [jHiccups|https://github.com/giltene/jHiccup], 
> [AppDynamics|https://www.appdynamics.com/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-5468) Ease TarMK Operations

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5468:
---
Fix Version/s: (was: 1.10)
   (was: 1.9.0)

> Ease TarMK Operations
> -
>
> Key: OAK-5468
> URL: https://issues.apache.org/jira/browse/OAK-5468
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>Priority: Major
>  Labels: management, monitoring, operations, tooling
>
> h2. Ease of TarMK Operations
> This epic is all about simplifying the operational aspects of the TarMK. 
> Broadly this can be broken down into the following three topics.
> h3. Monitoring
> * We need to improve monitoring for system load and health. It should be easy 
> for operators to figure out which parts of the TarMK are within safe bounds 
> and and which are not.
> * Failures should be easy to diagnose and pinpoint the root cause. It should 
> be evident if and how a failures can be fixed by the operator. 
> h3. Management
> * Management tasks should be easy to use, clear and safe. It should be 
> evident how to achieve a certain task, what it means to execute it and what 
> its parameters mean (discoverability). Executing a task should no cause harm 
> to the system because the system is not in the right state (e.g. running 
> restore concurrently to backup should be safe). 
> h3. Tooling
> * We need better tooling for diagnosing systems. E.g. Analysis of file stores 
> (what content, how much content, distribution over space and time, 
> reachability, retention time, garbage, etc.) Both, online and offline (i.e. 
> post mortem).
> h2. Individual improvements
> Below is a list of items to address in no specific order. Let's start 
> extracting them into individual issues linked to this epic as we start 
> tackling this. 
> h3. Monitoring
> * Throughput (e.g. time to commit, time to save, etc.)
> * Thrashing (setting on thereof)
> * SNFE (transient vs. catastrophic)
> * DSGC
> * FileStore (e.g. size on disk, #tar files, #segments, #nodes, #properties, 
> etc.)
> * Cold standby (progress, liveliness, latency, etc.)
> * ...
> h3. Management
> * Revisit backup/restore (OAK-5103, OAK-4866)
> * Coordination of management operations (ability to run conditionally, 
> prevent them from running concurrently, etc.)
> h3. Tooling
> * Progress monitor {{oak-run compact}}
> * Crash recovery for {{oak-run compact}} (e.g. run cleanup only to remove 
> garbage left by prior crash)
> * Bring {{oak-run check}} up to date. Address scalability and performance 
> issues. Include more useful statistics (e.g. node types, child node lists, 
> content distribution, etc.)
> * Changes over time
> * Consolidation of various (unversioned) scripts into oak-run like 'node 
> count script', 'node remove script'.
> * Allow connecting tools to a running instance.
> * Snapshotting support: restartable stats collection (snapshot at certain 
> revision, diff to collect extras)
> * "Friendly" output formats that can be easily used by other tools (e.g. Unix 
> tools, Kibana, etc.)
> * Proper usage of stdin and stdout
> * Proper exit codes
> * Current gap in tooling is around the idea of healing a repository plagued 
> with SNFEs, bridge the gap between {{oak-run check}} and 'oak console node 
> count script', provide options to plug in the holes to restore the repository 
> to a consistent state. One idea would be to complement rolling back the 
> segment store to the last good revision with rolling it forward to a new and 
> fixed good revisions. The simplest way of fixing is to just replace 
> unreadable items with empty ones (i.e. "plugging the holes"). From there one 
> could diff this new fixed revision against the last good revision to asses 
> the damage and see what else needs fixing (e.g. to regain consistency wrt. to 
> JCR). 
> * Classification of tools between development / research/ experimental and 
> production (customer facing). The latter need a different level of support, 
> maintenance, QE, documentation etc. Possibly mark via documentation which is 
> which. 
> * Group commands from oak-run in namespaces. Assign a different namespace to 
> each persistence implementation in Oak. Let every implementation parse its 
> own commands. Move commands closer to their implementation and relieve 
> oak-run from code bloat. See OAK-5437 for further details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7207) Define porcelain and plumbing tools for the Segment Store

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7207:
---
Labels: production tooling  (was: tooling)

> Define porcelain and plumbing tools for the Segment Store
> -
>
> Key: OAK-7207
> URL: https://issues.apache.org/jira/browse/OAK-7207
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: segment-tar
>Reporter: Francesco Mari
>Priority: Major
>  Labels: production, tooling
> Fix For: 1.10
>
>
> In a spirit similar to 
> [Git|https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain]'s, 
> it would be beneficial to create porcelain and plumbing tooling for the 
> Segment Store.
> Plumbing tools expose lower level operations on the Segment Store. Knowledge 
> about the internals of the Segment Store is necessary to understand how 
> plumbing tools work. Plumbing tools communicate via a command line interface. 
> It must be easy to invoke plumbing tools from other tools (possibly by 
> shelling out). The output of plumbing tools must be easy to consume 
> programmatically.
> Porcelain tools are written for human consumption. Their interface must be 
> user-friendly and should be as much as possible backwards compatible. 
> Porcelain tools use plumbing ones to implement their features. It should be 
> possible to use the same porcelain tools with different versions of the 
> plumbing tools, as long as the plumbing tools "speak" through an interface 
> that remain sufficiently compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7112) Update documentation for cold standby

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7112:
---
Labels: documentation  (was: )

> Update documentation for cold standby
> -
>
> Key: OAK-7112
> URL: https://issues.apache.org/jira/browse/OAK-7112
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: doc, segment-tar, tarmk-standby
>Reporter: Andrei Dulceanu
>Assignee: Andrei Dulceanu
>Priority: Major
>  Labels: documentation
> Fix For: 1.9.0, 1.10, 1.8.3
>
>
> Improve monitoring section of cold standby in {{oak-doc}} to include missing 
> MBean screenshots.
> [~mduerig], [~frm]: How about adding a *Benchmarking* section to the cold 
> standby page covering a bit ways to use the new {{Oak-Segment-Tar-Cold}} 
> fixture and also running {{ScalabilityStandbySuite}} on top of it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7207) Define porcelain and plumbing tools for the Segment Store

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7207:
---
Labels: tooling  (was: )

> Define porcelain and plumbing tools for the Segment Store
> -
>
> Key: OAK-7207
> URL: https://issues.apache.org/jira/browse/OAK-7207
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: segment-tar
>Reporter: Francesco Mari
>Priority: Major
>  Labels: tooling
> Fix For: 1.10
>
>
> In a spirit similar to 
> [Git|https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain]'s, 
> it would be beneficial to create porcelain and plumbing tooling for the 
> Segment Store.
> Plumbing tools expose lower level operations on the Segment Store. Knowledge 
> about the internals of the Segment Store is necessary to understand how 
> plumbing tools work. Plumbing tools communicate via a command line interface. 
> It must be easy to invoke plumbing tools from other tools (possibly by 
> shelling out). The output of plumbing tools must be easy to consume 
> programmatically.
> Porcelain tools are written for human consumption. Their interface must be 
> user-friendly and should be as much as possible backwards compatible. 
> Porcelain tools use plumbing ones to implement their features. It should be 
> possible to use the same porcelain tools with different versions of the 
> plumbing tools, as long as the plumbing tools "speak" through an interface 
> that remain sufficiently compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6312) Unify NodeStore/DataStore configurations

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-6312:
---
Labels: configuration production  (was: )

> Unify NodeStore/DataStore configurations
> 
>
> Key: OAK-6312
> URL: https://issues.apache.org/jira/browse/OAK-6312
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob, blob-plugins, composite, documentmk, rdbmk, 
> segment-tar
>Reporter: Arek Kita
>Priority: Major
>  Labels: configuration, production
>
> I've noticed recently that with many different NodeStore
> implementation (Segment, Document, Composite) but also DataStore
> implementation (File, S3, Azure) and some composite ones like
> (Hierarchical, Federated) it
> becomes more and more difficult to set up everything correctly and be
> able to know the current persistence state of repository (especially
> with pretty aged repos). The factory code/required options are more complex 
> not only from user perspective but also from maintenance point.
> We should have the same means of *describing* layouts of Oak repository no 
> matter if it is simple or more layered/composite instance.
> Some work has already been done in scope of OAK-6210 so I guess we have good 
> foundations to continue working in that direction.
> /cc [~mattvryan], [~chetanm]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6707) TarWriter.close() must not throw an exception on subsequent invocations

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-6707:
---
Labels: technical_debt  (was: )

> TarWriter.close() must not throw an exception on subsequent invocations
> ---
>
> Key: OAK-6707
> URL: https://issues.apache.org/jira/browse/OAK-6707
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Michael Dürig
>Priority: Minor
>  Labels: technical_debt
> Fix For: 1.10
>
>
> Invoking TarWriter.close() on an already closed writer throws an {{ISE}}. 
> According to the general contract this is not allowed:
> {code}
> * Closes this stream and releases any system resources associated
> * with it. If the stream is already closed then invoking this
> * method has no effect.
> {code}
> We should adjust the behvaviour of that method accordingly. 
> Failing to comply with that general contract causes {{TarWriter}} instances 
> to fail in try-resource statements when multiple wrapped streams are involved.
> Consider 
> {code}
> try (
> StringWriter string = new StringWriter();
> PrintWriter writer = new PrintWriter(string);
> WriterOutputStream out = new WriterOutputStream(writer, Charsets.UTF_8))
> {
> dumpHeader(out);
> writer.println("");
> dumpHex(out);
> writer.println("");
> return string.toString();
> }
> {code}
> This code would cause exceptions to be thrown if e.g. the 
> {{PrintWriter.close}} method would not be idempotent. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7043) Collect SegmentStore stats as part of status zip

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7043:
---
Labels: monitoring production  (was: )

> Collect SegmentStore stats as part of status zip
> 
>
> Key: OAK-7043
> URL: https://issues.apache.org/jira/browse/OAK-7043
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segment-tar
>Reporter: Chetan Mehrotra
>Priority: Major
>  Labels: monitoring, production
>
> Many times while investigating issue we request customer to provide to size 
> of segmentstore and at times list of segmentstore directory. It would be 
> useful if there is an InventoryPrinter for SegmentStore which can include
> * Size of segment store 
> * Listing of segment store directory
> * Possibly tail of journal.log
> * Possibly some stats/info from index files stored in tar files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7188) guava: ListenableFuture.transform() changes to transformAsync in version 20

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7188:
---
Labels: technical_debt  (was: )

> guava: ListenableFuture.transform() changes to transformAsync in version 20
> ---
>
> Key: OAK-7188
> URL: https://issues.apache.org/jira/browse/OAK-7188
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: segment-tar
>Reporter: Julian Reschke
>Priority: Major
>  Labels: technical_debt
> Attachments: OAK-7188.diff, OAK-7188.diff
>
>
> See 
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/util/concurrent/Futures.html#transform(com.google.common.util.concurrent.ListenableFuture,%20com.google.common.util.concurrent.AsyncFunction)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-1905) SegmentMK: Arch segment(s)

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-1905:
---
Fix Version/s: 1.10

> SegmentMK: Arch segment(s)
> --
>
> Key: OAK-1905
> URL: https://issues.apache.org/jira/browse/OAK-1905
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Jukka Zitting
>Priority: Minor
>  Labels: perfomance, scalability
> Fix For: 1.10
>
>
> There are a lot of constants and other commonly occurring name, values and 
> other data in a typical repository. To optimize storage space and access 
> speed, it would be useful to place such data in one or more constant "arch 
> segments" that are always cached in memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-4103) Replace journal.log with an in place journal

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-4103:
---
Fix Version/s: 1.10

> Replace journal.log with an in place journal
> 
>
> Key: OAK-4103
> URL: https://issues.apache.org/jira/browse/OAK-4103
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Andrei Dulceanu
>Priority: Minor
>  Labels: resilience
> Fix For: 1.10
>
>
> Instead of writing the current head revision to the {{journal.log}} file we 
> could make it an integral part of the node states: as OAK-3804 demonstrates 
> we already have very good heuristics to reconstruct a lost journal. If we add 
> the right annotations to the root node states this could replace the current 
> approach. The latter is problematic as it relies on the flush thread properly 
> and timely updating {{journal.log}}. See e.g. OAK-3303. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6836) OnRC report

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-6836:
---
Fix Version/s: 1.10

> OnRC report
> ---
>
> Key: OAK-6836
> URL: https://issues.apache.org/jira/browse/OAK-6836
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Valentin Olteanu
>Priority: Minor
>  Labels: production
> Fix For: 1.10
>
> Attachments: gcreport.png
>
>
> Currently, the information regarding an online revision cleanup execution is 
> scattered across multiple log lines and partially available in the attributes 
> of {{SegmentRevisionGarbageCollection}} MBean. 
> While useful for debugging, this is hard to grasp for users that need to 
> understand the full process to be able to read it.
> The idea would be to create a "report" with all the details of an execution 
> and output it at the end - write to log, but also store it in the MBean, from 
> where it can be consumed by monitoring and health checks. 
> In the MBean, this would replace the _Last*_ attributes.
> In the logs, this could replace all the intermediary logs (switch them to 
> DEBUG).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7234) Check for outdated journal at startup

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7234:
---
Fix Version/s: 1.10

> Check for outdated journal at startup
> -
>
> Key: OAK-7234
> URL: https://issues.apache.org/jira/browse/OAK-7234
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Michael Dürig
>Priority: Minor
>  Labels: resilience, tooling
> Fix For: 1.10
>
>
> To prevent accidentally branching the repository when the {{journal.log}} 
> became outdated (e.g. OAK-3702) we could add an additional safety feature 
> which would prevent the repository from starting in such cases. There's a 
> couple of concerns to address:
>  * What kind of tooling / guidance do we need to provide to recover should 
> such a situation be detected?
>  * How do we detect the {{journal.log}} being outdated?
>  * How do we prevent false positives?
>  * How do we deal with situation where the {{journal.log}} modifications are 
> intended (e.g. by tools, of manual interventions)?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6707) TarWriter.close() must not throw an exception on subsequent invocations

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-6707:
---
Fix Version/s: 1.10

> TarWriter.close() must not throw an exception on subsequent invocations
> ---
>
> Key: OAK-6707
> URL: https://issues.apache.org/jira/browse/OAK-6707
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Michael Dürig
>Priority: Minor
> Fix For: 1.10
>
>
> Invoking TarWriter.close() on an already closed writer throws an {{ISE}}. 
> According to the general contract this is not allowed:
> {code}
> * Closes this stream and releases any system resources associated
> * with it. If the stream is already closed then invoking this
> * method has no effect.
> {code}
> We should adjust the behvaviour of that method accordingly. 
> Failing to comply with that general contract causes {{TarWriter}} instances 
> to fail in try-resource statements when multiple wrapped streams are involved.
> Consider 
> {code}
> try (
> StringWriter string = new StringWriter();
> PrintWriter writer = new PrintWriter(string);
> WriterOutputStream out = new WriterOutputStream(writer, Charsets.UTF_8))
> {
> dumpHeader(out);
> writer.println("");
> dumpHex(out);
> writer.println("");
> return string.toString();
> }
> {code}
> This code would cause exceptions to be thrown if e.g. the 
> {{PrintWriter.close}} method would not be idempotent. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7057) Segment.toString: Record table should include an index into the hexdump

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7057:
---
Fix Version/s: 1.10

> Segment.toString: Record table should include an index into the hexdump
> ---
>
> Key: OAK-7057
> URL: https://issues.apache.org/jira/browse/OAK-7057
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Michael Dürig
>Priority: Minor
>  Labels: tooling
> Fix For: 1.10
>
>
> Currently the Segment dump created in {{Segment.toString}} includes a list of 
> records with their offsets. However these offsets do no match the ones in the 
> subsequent raw byte dump of the segment. We should add a raw offsets to the 
> list of records so finding the actual data that belongs to a record doesn't 
> involve manually fiddling with logical / physical offset translation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-5655) TarMK: Analyse locality of reference

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5655:
---
Fix Version/s: 1.10

> TarMK: Analyse locality of reference 
> -
>
> Key: OAK-5655
> URL: https://issues.apache.org/jira/browse/OAK-5655
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: segment-tar
>Reporter: Michael Dürig
>Priority: Major
>  Labels: scalability
> Fix For: 1.10
>
> Attachments: compaction-time-vs-reposize.m, 
> compaction-time-vs.reposize.png, data00053a.tar-reads.png, offrc.jfr, 
> segment-per-path-compacted-nocache.png, 
> segment-per-path-compacted-nostringcache.png, segment-per-path-compacted.png, 
> segment-per-path.png
>
>
> We need to better understand the locality aspects of content stored in TarMK: 
> * How is related content spread over segments?
> * What content do we consider related? 
> * How does locality of related content develop over time when changes are 
> applied?
> * What changes do we consider typical?
> * What is the impact of compaction on locality? 
> * What is the impact of the deduplication caches on locality (during normal 
> operation and during compaction)?
> * How good are checkpoints deduplicated? Can we monitor this online?
> * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6891) Executions of background threads might pile up

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-6891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-6891:
---
Fix Version/s: 1.10

> Executions of background threads might pile up
> --
>
> Key: OAK-6891
> URL: https://issues.apache.org/jira/browse/OAK-6891
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Francesco Mari
>Priority: Major
>  Labels: production
> Fix For: 1.10
>
> Attachments: example.txt
>
>
> The background threads used in {{FileStore}} are implemented by wrapping 
> {{Runnable}} instances in {{SafeRunnable}}, and by handing the 
> {{SafeRunnable}} instances over to a {{ScheduledExecutorService}}. 
> The documentation of {{ScheduledExecutorService#scheduleAtFixedRate}} states 
> that "if any execution of a task takes longer than its period, then 
> subsequent executions may start late, but will not concurrently execute". 
> This means that if an execution is delayed, the piled up executions might 
> fire in rapid succession.
> This way of running the periodic background threads might not be ideal. For 
> example, it doesn't make much sense to flush the File Store five times in a 
> row. On the other hand, if the background tasks are coded with this caveat in 
> mind, this issue might not be a problem at all. For example, flushing the 
> File Store five times in a row might not be a problem if many of those 
> executions don't do much and return quickly.
> Tasks piling up might be a problem when it comes to release the resource 
> associated with the {{FileStore}} in a responsive way. Since the 
> {{ScheduledExecutorService}} is gracefully shut down, it might take some time 
> before all the scheduled background tasks are processed and the 
> {{ScheduledExecutorService}} is ready to be terminated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7058) oak-run compact reports success even when it was cancelled

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7058:
---
Fix Version/s: 1.10

> oak-run compact reports success even when it was cancelled
> --
>
> Key: OAK-7058
> URL: https://issues.apache.org/jira/browse/OAK-7058
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: run, segment-tar
>Reporter: Michael Dürig
>Priority: Major
>  Labels: production, tooling
> Fix For: 1.10
>
>
> When {{oak-run compact}} gets cancelled because running out of disk space it 
> will send a corresponding warning to the logs and bail out. However on the 
> console it will still report success. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7207) Define porcelain and plumbing tools for the Segment Store

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7207:
---
Fix Version/s: 1.10

> Define porcelain and plumbing tools for the Segment Store
> -
>
> Key: OAK-7207
> URL: https://issues.apache.org/jira/browse/OAK-7207
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: segment-tar
>Reporter: Francesco Mari
>Priority: Major
> Fix For: 1.10
>
>
> In a spirit similar to 
> [Git|https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain]'s, 
> it would be beneficial to create porcelain and plumbing tooling for the 
> Segment Store.
> Plumbing tools expose lower level operations on the Segment Store. Knowledge 
> about the internals of the Segment Store is necessary to understand how 
> plumbing tools work. Plumbing tools communicate via a command line interface. 
> It must be easy to invoke plumbing tools from other tools (possibly by 
> shelling out). The output of plumbing tools must be easy to consume 
> programmatically.
> Porcelain tools are written for human consumption. Their interface must be 
> user-friendly and should be as much as possible backwards compatible. 
> Porcelain tools use plumbing ones to implement their features. It should be 
> possible to use the same porcelain tools with different versions of the 
> plumbing tools, as long as the plumbing tools "speak" through an interface 
> that remain sufficiently compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6919) SegmentCache might introduce unwanted memory references to SegmentId instances

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-6919:
---
Fix Version/s: 1.10

> SegmentCache might introduce unwanted memory references to SegmentId instances
> --
>
> Key: OAK-6919
> URL: https://issues.apache.org/jira/browse/OAK-6919
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
>Priority: Major
> Fix For: 1.10
>
>
> {{SegmentCache}} contains, through the underlying Guava cache, hard 
> references to both {{SegmentId}} and {{Segment}} instances. Thus, 
> {{SegmentCache}} contributes to the computation of in-memory references that, 
> in turn, constitute the root references of the garbage collection algorithm.
> Further investigations are needed to assess this statement but, if 
> {{SegmentCache}} is proved to be problematic, there are some possible 
> solutions.
> For example, {{SegmentCache}} might be reworked to store references to 
> MSB/LSB pairs as keys, instead of to {{SegmentId}} instances. Moreover, 
> instead of referencing {{Segment}} instances as values, {{SegmentCache}} 
> might hold references to their underlying {{ByteBuffer}}. With these changes 
> in place, {{SegmentCache}} would not interfere with {{SegmentTracker}} and 
> the garbage collection algorithm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-5360) Cancellation of gc should be reflected by RevisionGC.getRevisionGCStatus()

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5360:
---
Fix Version/s: 1.10

> Cancellation of gc should be reflected by RevisionGC.getRevisionGCStatus()
> --
>
> Key: OAK-5360
> URL: https://issues.apache.org/jira/browse/OAK-5360
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>Priority: Major
>  Labels: gc, management, monitoring, production
> Fix For: 1.10
>
>
> Currently when a garbage collection cycle is cancelled from "within" (i.e. 
> through {{CancelCompactionSupplier}} then this is not reflected through 
> {{RevisionGC.getRevisionGCStatus()}} but rather reported as successful run. 
> We should change this and return a failure result indication the cancellation 
> so downstream consumers get an proper indication whether and which gc runs 
> actually succeeded. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6584) Add tooling API

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-6584:
---
Fix Version/s: 1.10

> Add tooling API
> ---
>
> Key: OAK-6584
> URL: https://issues.apache.org/jira/browse/OAK-6584
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>Priority: Major
>  Labels: tooling
> Fix For: 1.10
>
>
> h3. Current situation
> Current segment store related tools are implemented ad-hoc by potentially 
> relying on internal implementation details of Oak Segment Tar. This makes 
> those tools less useful, portable, stable and potentially applicable than 
> they should be.
> h3. Goal
> Provide a common and sufficiently stable Oak Tooling API for implementing 
> segment store related tools. The API should be independent of Oak and not 
> available for normal production use of Oak. Specifically it should not be 
> possible to it to implement production features and production features must 
> not rely on it. It must be possible to implement the Oak Tooling API in Oak 
> 1.8 and it should be possible for Oak 1.6.
> h3. Typical use cases
> * Query the number of nodes / properties / values in a given path satisfying 
> some criteria
> * Aggregate a certain value on queries like the above
> * Calculate size of the content / size on disk
> * Analyse changes. E.g. how many binaries bigger than a certain threshold 
> were added / removed between two given revisions. What is the sum of their 
> sizes?
> * Analyse locality: measure of locality of node states. Incident plots (See 
> https://issues.apache.org/jira/browse/OAK-5655?focusedCommentId=15865973=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15865973).
> * Analyse level of deduplication (e.g. of checkpoint) 
> h3. Validation
> Reimplement [Script Oak|https://github.com/mduerig/script-oak] on top of the 
> tooling API. 
> h3. API draft
> * Whiteboard shot of the [API 
> entities|https://wiki.apache.org/jackrabbit/Oakathon%20August%202017?action=AttachFile=view=IMG_20170822_163256.jpg]
>  identified initially.
> * Further [drafting of the API|https://github.com/mduerig/oak-tooling-api] 
> takes place on Github for now. We'll move to the Apache SVN as soon as 
> considered mature enough and have a consensus of where to best move it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-4994) Implement additional record types

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-4994:
---
Fix Version/s: 1.10

> Implement additional record types
> -
>
> Key: OAK-4994
> URL: https://issues.apache.org/jira/browse/OAK-4994
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Priority: Major
>  Labels: tooling
> Fix For: 1.10
>
>
> The records written in the segment store should be augmented with additional 
> types. In OAK-2498 the following additional types were identified:
> - List of property names. A list of strings, where every string is a property 
> name, is referenced by the template record.
> - List of list of values. This list is pointed to by the node record and 
> contains the values for single\- and multi\- value properties of that node. 
> The double indirection is needed to support multi-value properties.
> - Map from string to node. This map is referenced by the template and 
> represents the child relationship between nodes.
> - Super root. This is a marker type identifying top-level records for the 
> repository super-roots.
> Just adding these types doesn't improve the situation for the segment store, 
> though. Bucket and block records are not easily parseable because they have a 
> variable length and their size is not specified in the record value itself. 
> For record types to be used effectively, the way we serialize certain kind of 
> data has to be reviewed for further improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-5635) Revisit FileStoreStats mbean stats format

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5635:
---
Fix Version/s: 1.10

> Revisit FileStoreStats mbean stats format
> -
>
> Key: OAK-5635
> URL: https://issues.apache.org/jira/browse/OAK-5635
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Alex Deparvu
>Assignee: Michael Dürig
>Priority: Major
>  Labels: monitoring
> Fix For: 1.10
>
>
> This is a bigger refactoring item to revisit the format of the exposed data, 
> moving towards having it in a more machine consumable friendly format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6397) Move record implementations to their own package

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-6397:
---
Labels: technical_debt  (was: )

> Move record implementations to their own package
> 
>
> Key: OAK-6397
> URL: https://issues.apache.org/jira/browse/OAK-6397
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
>Priority: Major
>  Labels: technical_debt
>
> Given the work done for OAK-6378, it is now possible to move the record 
> implementation to their own package. These implementations can be implemented 
> on top of the {{SegmentReader}}, {{SegmentWriter}} and {{BlobStore}} 
> interfaces, and detached from other implementation classes from the 
> {{o.a.j.o.segment}} package. I have already started working in [this 
> branch|https://github.com/francescomari/jackrabbit-oak/tree/record-package] 
> in GitHub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-5860) Compressed segments

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5860:
---
Fix Version/s: 1.10

> Compressed segments
> ---
>
> Key: OAK-5860
> URL: https://issues.apache.org/jira/browse/OAK-5860
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Andrei Dulceanu
>Priority: Major
>  Labels: scalability
> Fix For: 1.10
>
>
> It would be interesting to see the effect of compressing the segments within 
> the tar files with a sufficiently effective and performant compression 
> algorithm:
> * Can we increase overall throughput by trading CPU for IO?
> * Can we scale to bigger repositories (in number of nodes) by squeezing in 
> more segments per MB and thus pushing out onset of thrashing?
> * What would be a good compression algorithm/library?
> * Can/should we make this optional? 
> * Migration and compatibility issues?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6405) Cleanup the o.a.j.o.segment.file.tar package

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-6405:
---
Labels: technical_debt  (was: )

> Cleanup the o.a.j.o.segment.file.tar package
> 
>
> Key: OAK-6405
> URL: https://issues.apache.org/jira/browse/OAK-6405
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
>Priority: Major
>  Labels: technical_debt
>
> This issue tracks the cleanup and rearrangement of the internals of the 
> {{o.a.j.o.segment.file.tar}} package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (OAK-2833) Refactor TarMK

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved OAK-2833.

Resolution: Done

Tracking further refactorings as top level tasks

> Refactor TarMK
> --
>
> Key: OAK-2833
> URL: https://issues.apache.org/jira/browse/OAK-2833
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>Priority: Major
>  Labels: technical_debt
>
> Container issue for refactoring the TarMK to make it more testable, 
> maintainable, extensible and less entangled. 
> For example the segment format should be readable, writeable through 
> standalone means so tests, tools and production code can share this code. 
> Currently there is a lot of code duplication involved here. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-4582) Split Segment in a read-only and a read-write implementations

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-4582:
---
Issue Type: Task  (was: Technical task)
Parent: (was: OAK-2833)

> Split Segment in a read-only and a read-write implementations
> -
>
> Key: OAK-4582
> URL: https://issues.apache.org/jira/browse/OAK-4582
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
>Priority: Major
>  Labels: technical_debt
> Attachments: benchmark-01.png, benchmark-01.txt
>
>
> {{Segment}} is central to the working of the Segment Store, but it currently 
> serves two purposes:
> # It is a temporary storage location for the currently written segment, 
> waiting to be full and flushed to disk.
> # It is a way to parse serialzed segments read from disk.
> To distinguish these two use cases, I suggest to promote {{Segment}} to the 
> status of interface, and to create two different implementations for a 
> read-only and a read-write segments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-4104) Refactor reading records from segments

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-4104:
---
Issue Type: Task  (was: Technical task)
Parent: (was: OAK-2833)

> Refactor reading records from segments
> --
>
> Key: OAK-4104
> URL: https://issues.apache.org/jira/browse/OAK-4104
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: segment-tar
>Reporter: Michael Dürig
>Priority: Major
>  Labels: technical_debt
>
> We should refactor how records (e.g. node states) are read from segments. 
> Currently this is scattered and replicated across various places. All of 
> which hard coding certain indexes into a byte buffer (see calls to 
> {{Record.getOffset}} for how bad this is). 
> The current implementation makes it very hard to maintain the code and evolve 
> the segment format. We should optimally have one place per segment version 
> defining the format as a single source of truth which is then reused by the 
> various parts in of the SegmentMK, tooling and tests. 
> We should also evaluate 3rd party data serialisation libraries, which could 
> make our lives easier. Focus should be on ease of use, separation of concerns 
> (schema vs. implementation), compactness of format, efficient en/decoding, 
> support for schema evolution. Possible candidates include [protocol 
> buffers|https://developers.google.com/protocol-buffers/] and [Apache 
> Avro|http://avro.apache.org/]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (OAK-5859) Analyse and reduce IO amplification by OS

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved OAK-5859.

Resolution: Won't Fix

> Analyse and reduce IO amplification by OS
> -
>
> Key: OAK-5859
> URL: https://issues.apache.org/jira/browse/OAK-5859
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: segment-tar
>Reporter: Michael Dürig
>Priority: Major
>  Labels: performance, scalability
>
> Certain operation system settings might result in too much data actually 
> being read from disk causing early setting on of thrashing. E.g. transparent 
> huge pages or too big read aheads might be contra productive in combination 
> with the TarMKs memory mapping model. 
> * Determine the ratio of data being read by the TarMK and actual data being 
> read from disk. Determine the impact of relevant OS parameters (e.g. 
> transparent huge pages) on this ratio.
> * Compare memory mapped mode with file IO mode with an accordingly increased 
> segment cache. This would move prediction of what is likely to be read next 
> from the OS layer into our segment cache eviction strategy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (OAK-4146) Improve tarmkrecovery docs

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved OAK-4146.

Resolution: Won't Fix

The tarmkrecovery was removed along with  OAK-5834. Will address the 
documentation in the scope of OAK-5885.   

> Improve tarmkrecovery docs
> --
>
> Key: OAK-4146
> URL: https://issues.apache.org/jira/browse/OAK-4146
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: run, segment-tar, segmentmk
>Reporter: Alex Deparvu
>Assignee: Alex Deparvu
>Priority: Minor
>  Labels: documentation
>
> Add some helper steps on output and what you can actually do with it:
> {quote}
> 1. Run tarmkrecovery command
> {code:none}
> nohup java -Xmx2048m -jar oak-run-*.jar tarmkrecovery repository/segmentstore 
> &> tarmkrecovery.log &
> {code}
> 2. Take the output of the tarmkrecovery, take the top 10 items output 
> (excluding "Current head revision line") then reverse the order of those and 
> format them to journal.log file format (revision:offset root) and put those 
> values in a fresh journal.log in that format
> For example:
> {code:none}
> 6ee64a26-491e-4630-ac2e-bdad1f27e73a:257016 root
> 5ee64a26-491e-4630-ac2e-bdad1f27e73b:257111 root
> {code}
> 3. After setting up the new journal.log then run this command on the 
> segmentstore
> {code:none}
> nohup java -Xmx2048m -jar oak-run-*.jar check -p repository/segmentstore -d 
> &> check.log &
> {code}
> 4. That command will give you output of which of those 10 items in the 
> journal.log are good. Now remove all lines from the journal that come after 
> the last known good revision.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (OAK-5464) Improve the transaction rate of the TarMK

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved OAK-5464.

   Resolution: Fixed
Fix Version/s: (was: 1.10)
   (was: 1.9.0)
   1.8.0

> Improve the transaction rate of the TarMK
> -
>
> Key: OAK-5464
> URL: https://issues.apache.org/jira/browse/OAK-5464
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>Priority: Major
>  Labels: scalability
> Fix For: 1.8.0
>
>
> The TarMK's write throughput is limited by the way concurrent commits are 
> processed: rebasing and running the commit hooks happen within a lock without 
> any explicit scheduling. This epic covers improving the overall transaction 
> rate. The proposed approach would roughly be to first make scheduling of 
> transactions explicit, then add monitoring on transaction to gather a better 
> understanding and then experiment and implement explicit scheduling 
> strategies to optimise particular aspects. 
> h2. Summary of ideas mentioned in an offline sessions
> h3. Advantages of explicit scheduling:
> * Control over (order) of commits
> * Sophisticated monitoring (commit statistics, e.g. commit rate, time in 
> queue, etc.) 
> * Favour certain commits (e.g. checkpoints)
> * Reorder commits to simplify rebasing
> * Suspend the compactor on concurrent commits and have it resume where it 
> left off afterwards
> * Parallelise certain commits (e.g. by piggy backing)
> * Implement a concurrent commit editor. we'd need to take care of proper 
> access to the shared state; [~frm] maybe introduce the idea of a common 
> context to enforce concurrent access semantics.
> h3. Scheduler Implementation
> * Expedite
> * Prioritise
> * Defer
> * Collapse
> * Coalesce
> * Parallelise
> * Piggy back: can we piggy back commits on top of each other? The idea would 
> be while processing the changes of one commit to also check them for 
> conflicts with the changes of other commits waiting to commit. If a conflict 
> is detected there, that other commit can immediately be failed (given the 
> current commit doesn't fail).
> * Merging non conflicting commits. Given multiple transactions ready to 
> commit at the same time. Can we process them as one (given they don't 
> conflict) instead of one after each other, which requires rebasing the later 
> transaction to be rebase on the former.
> * Shield the file store from {{InterruptedException}} because of thread 
> boundaries introduced
> * Implement tests, benchmarks and fixtures for verification



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-5464) Improve the transaction rate of the TarMK

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5464:
---
Epic Status: Done  (was: To Do)

> Improve the transaction rate of the TarMK
> -
>
> Key: OAK-5464
> URL: https://issues.apache.org/jira/browse/OAK-5464
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>Priority: Major
>  Labels: scalability
> Fix For: 1.8.0
>
>
> The TarMK's write throughput is limited by the way concurrent commits are 
> processed: rebasing and running the commit hooks happen within a lock without 
> any explicit scheduling. This epic covers improving the overall transaction 
> rate. The proposed approach would roughly be to first make scheduling of 
> transactions explicit, then add monitoring on transaction to gather a better 
> understanding and then experiment and implement explicit scheduling 
> strategies to optimise particular aspects. 
> h2. Summary of ideas mentioned in an offline sessions
> h3. Advantages of explicit scheduling:
> * Control over (order) of commits
> * Sophisticated monitoring (commit statistics, e.g. commit rate, time in 
> queue, etc.) 
> * Favour certain commits (e.g. checkpoints)
> * Reorder commits to simplify rebasing
> * Suspend the compactor on concurrent commits and have it resume where it 
> left off afterwards
> * Parallelise certain commits (e.g. by piggy backing)
> * Implement a concurrent commit editor. we'd need to take care of proper 
> access to the shared state; [~frm] maybe introduce the idea of a common 
> context to enforce concurrent access semantics.
> h3. Scheduler Implementation
> * Expedite
> * Prioritise
> * Defer
> * Collapse
> * Coalesce
> * Parallelise
> * Piggy back: can we piggy back commits on top of each other? The idea would 
> be while processing the changes of one commit to also check them for 
> conflicts with the changes of other commits waiting to commit. If a conflict 
> is detected there, that other commit can immediately be failed (given the 
> current commit doesn't fail).
> * Merging non conflicting commits. Given multiple transactions ready to 
> commit at the same time. Can we process them as one (given they don't 
> conflict) instead of one after each other, which requires rebasing the later 
> transaction to be rebase on the former.
> * Shield the file store from {{InterruptedException}} because of thread 
> boundaries introduced
> * Implement tests, benchmarks and fixtures for verification



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-5056) Improve GC scalability on TarMK

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5056:
---
Epic Status: Done  (was: To Do)

> Improve GC scalability on TarMK
> ---
>
> Key: OAK-5056
> URL: https://issues.apache.org/jira/browse/OAK-5056
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>Priority: Major
>  Labels: gc, scalability
> Fix For: 1.8.0
>
>
> This issue is about making TarMK gc more scalable: 
> * how to deal with huge repositories.
> * how to deal with massive concurrent writes.
> * how can we improve monitoring to determine gc health. 
> ** Monitor deduplication caches (e.g. deduplication of checkpoints)
> Possible avenues to explore:
> * Can we partition gc? (e.g. along sub-trees, along volatile vs. static 
> content)
> * Can we pause and resume gc? (e.g. to give precedence to concurrent writes) 
> * Can we make gc a real background process not contending with foreground 
> operations? 
> This issue is a follow up to OAK-2849, which was about efficacy of gc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (OAK-5056) Improve GC scalability on TarMK

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved OAK-5056.

   Resolution: Fixed
Fix Version/s: (was: 1.10)
   (was: 1.9.0)
   1.8.0

> Improve GC scalability on TarMK
> ---
>
> Key: OAK-5056
> URL: https://issues.apache.org/jira/browse/OAK-5056
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>Priority: Major
>  Labels: gc, scalability
> Fix For: 1.8.0
>
>
> This issue is about making TarMK gc more scalable: 
> * how to deal with huge repositories.
> * how to deal with massive concurrent writes.
> * how can we improve monitoring to determine gc health. 
> ** Monitor deduplication caches (e.g. deduplication of checkpoints)
> Possible avenues to explore:
> * Can we partition gc? (e.g. along sub-trees, along volatile vs. static 
> content)
> * Can we pause and resume gc? (e.g. to give precedence to concurrent writes) 
> * Can we make gc a real background process not contending with foreground 
> operations? 
> This issue is a follow up to OAK-2849, which was about efficacy of gc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (OAK-3350) Incremental compaction

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved OAK-3350.

Resolution: Later

> Incremental compaction
> --
>
> Key: OAK-3350
> URL: https://issues.apache.org/jira/browse/OAK-3350
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segment-tar
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>Priority: Minor
>  Labels: compaction, gc, scalability
>
> This is OAK-3349 taken to the extreme: given a segment that is almost not 
> referenced any more we could just rewrite the still referenced content. That 
> is, say a segment contains two properties reachable from the current root 
> node state and all its remaining content is not reachable from the root node 
> state. In that case we could rewrite these two properties and create a new 
> root node state referencing the rewritten properties. This would effectively 
> make the segment eligible for being gc-ed. 
> Such an approach would start from segments that are sparse and compact these 
> instead of compacting everything as we currently do, which might cause a lot 
> of copying around stuff that already is compact. The challenging part here is 
> probably finding the segments that are sparse as this involves inverting the 
> reference graph. 
> Todo: Asses feasibility and impact, implement prototype.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (OAK-1264) Avoid duplicating shared content in backup

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved OAK-1264.

Resolution: Later

> Avoid duplicating shared content in backup
> --
>
> Key: OAK-1264
> URL: https://issues.apache.org/jira/browse/OAK-1264
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Jukka Zitting
>Priority: Major
>  Labels: backup, operations
>
> The backup feature from OAK-1159 is currently unable to detect cases where 
> the same binary or some other piece of shared content is referenced from two 
> or more places in the content tree, and ends up duplicating such content in 
> the backup.
> It would be nice if the backup could automatically detect such cases and 
> avoid the extra duplicates. See the {{testSharedContent}} test case in 
> {{FileStoreBackupTest}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7249) Create charset encoding utility that detects malformed input

2018-02-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-7249:
---
Fix Version/s: 1.10

> Create charset encoding utility that detects malformed input
> 
>
> Key: OAK-7249
> URL: https://issues.apache.org/jira/browse/OAK-7249
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: segment-tar
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.10
>
>
> For now in segment-tar; might be moved later on. 
> Include test for:
> - wellformed input
> - malformed input
> - multi-threaded encoding



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7182) Make it possible to update Guava

2018-02-06 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354150#comment-16354150
 ] 

Shawn Heisey commented on OAK-7182:
---

Solr (the server) is designed to run isolated from all other software, 
listening for HTTP requests.  Solr is a webapp, but other applications should 
not be running in the servlet container with Solr.  SolrJ (the java client) 
does not depend on Guava.

I have also updated SOLR-10308.


> Make it possible to update Guava
> 
>
> Key: OAK-7182
> URL: https://issues.apache.org/jira/browse/OAK-7182
> Project: Jackrabbit Oak
>  Issue Type: Wish
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Attachments: OAK-7182-guava-21.diff, guava.diff
>
>
> We currently rely on Guava 15, and this affects all users of Oak because they 
> essentially need to use the same version.
> This is an overall issue to investigate what would need to be done in Oak in 
> order to make updates possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-5506) reject item names with unpaired surrogates early

2018-02-06 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354109#comment-16354109
 ] 

Julian Reschke commented on OAK-5506:
-

Created https://issues.apache.org/jira/browse/OAK-7249 to track addition of 
utility class and related tests.

> reject item names with unpaired surrogates early
> 
>
> Key: OAK-5506
> URL: https://issues.apache.org/jira/browse/OAK-5506
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: core, jcr, segment-tar
>Affects Versions: 1.5.18
>Reporter: Julian Reschke
>Priority: Minor
> Fix For: 1.10
>
> Attachments: OAK-5506-01.patch, OAK-5506-02.patch, OAK-5506-4.diff, 
> OAK-5506-bench.diff, OAK-5506-name-conversion.diff, OAK-5506-segment.diff, 
> OAK-5506-segment2.diff, OAK-5506-segment3.diff, OAK-5506.diff, 
> ValidNamesTest.java
>
>
> Apparently, the following node name is accepted:
>{{"foo\ud800"}}
> but a subsequent {{getPath()}} call fails:
> {noformat}
> javax.jcr.InvalidItemStateException: This item [/test_node/foo?] does not 
> exist anymore
> at 
> org.apache.jackrabbit.oak.jcr.delegate.ItemDelegate.checkAlive(ItemDelegate.java:86)
> at 
> org.apache.jackrabbit.oak.jcr.session.operation.ItemOperation.checkPreconditions(ItemOperation.java:34)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.prePerform(SessionDelegate.java:615)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:205)
> at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:112)
> at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.getPath(ItemImpl.java:140)
> at 
> org.apache.jackrabbit.oak.jcr.session.NodeImpl.getPath(NodeImpl.java:106)
> at 
> org.apache.jackrabbit.oak.jcr.ValidNamesTest.nameTest(ValidNamesTest.java:271)
> at 
> org.apache.jackrabbit.oak.jcr.ValidNamesTest.testUnpairedSurrogate(ValidNamesTest.java:259)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source){noformat}
> (test case follows)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (OAK-7249) Create charset encoding utility that detects malformed input

2018-02-06 Thread Julian Reschke (JIRA)
Julian Reschke created OAK-7249:
---

 Summary: Create charset encoding utility that detects malformed 
input
 Key: OAK-7249
 URL: https://issues.apache.org/jira/browse/OAK-7249
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: segment-tar
Reporter: Julian Reschke
Assignee: Julian Reschke


For now in segment-tar; might be moved later on. 

Include test for:

- wellformed input
- malformed input
- multi-threaded encoding



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7182) Make it possible to update Guava

2018-02-06 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354061#comment-16354061
 ] 

Julian Reschke commented on OAK-7182:
-

https://issues.apache.org/jira/secure/attachment/12909461/OAK-7182-guava-21.diff
 contains changes that would be needed to run with Guava 21 (but doesn't have 
compat with older versions yet). Trouble is that solr doesn't work with > 20, 
see https://issues.apache.org/jira/browse/SOLR-10308

> Make it possible to update Guava
> 
>
> Key: OAK-7182
> URL: https://issues.apache.org/jira/browse/OAK-7182
> Project: Jackrabbit Oak
>  Issue Type: Wish
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Attachments: OAK-7182-guava-21.diff, guava.diff
>
>
> We currently rely on Guava 15, and this affects all users of Oak because they 
> essentially need to use the same version.
> This is an overall issue to investigate what would need to be done in Oak in 
> order to make updates possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7182) Make it possible to update Guava

2018-02-06 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-7182:

Attachment: OAK-7182-guava-21.diff

> Make it possible to update Guava
> 
>
> Key: OAK-7182
> URL: https://issues.apache.org/jira/browse/OAK-7182
> Project: Jackrabbit Oak
>  Issue Type: Wish
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Attachments: OAK-7182-guava-21.diff, guava.diff
>
>
> We currently rely on Guava 15, and this affects all users of Oak because they 
> essentially need to use the same version.
> This is an overall issue to investigate what would need to be done in Oak in 
> order to make updates possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7173) Update documentation for oak-run check

2018-02-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354033#comment-16354033
 ] 

Michael Dürig commented on OAK-7173:


+1

> Update documentation for oak-run check
> --
>
> Key: OAK-7173
> URL: https://issues.apache.org/jira/browse/OAK-7173
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: doc, oak-run, segment-tar
>Reporter: Michael Dürig
>Assignee: Andrei Dulceanu
>Priority: Major
>  Labels: documentation
> Fix For: 1.9.0, 1.10, 1.8.3
>
> Attachments: OAK-7173.patch
>
>
> We should review and update the documentation of [{{oak-run 
> check}}|http://jackrabbit.apache.org/oak/docs/nodestore/segment/overview.html#check].
>  E.g. to include the new options from OAK-6373.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7173) Update documentation for oak-run check

2018-02-06 Thread Andrei Dulceanu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354021#comment-16354021
 ] 

Andrei Dulceanu commented on OAK-7173:
--

[~mduerig], [~frm], what do you think about the explanations added to the 
documentation? Do they make sense?

> Update documentation for oak-run check
> --
>
> Key: OAK-7173
> URL: https://issues.apache.org/jira/browse/OAK-7173
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: doc, oak-run, segment-tar
>Reporter: Michael Dürig
>Assignee: Andrei Dulceanu
>Priority: Major
>  Labels: documentation
> Fix For: 1.9.0, 1.10, 1.8.3
>
> Attachments: OAK-7173.patch
>
>
> We should review and update the documentation of [{{oak-run 
> check}}|http://jackrabbit.apache.org/oak/docs/nodestore/segment/overview.html#check].
>  E.g. to include the new options from OAK-6373.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7173) Update documentation for oak-run check

2018-02-06 Thread Andrei Dulceanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Dulceanu updated OAK-7173:
-
Attachment: OAK-7173.patch

> Update documentation for oak-run check
> --
>
> Key: OAK-7173
> URL: https://issues.apache.org/jira/browse/OAK-7173
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: doc, oak-run, segment-tar
>Reporter: Michael Dürig
>Assignee: Andrei Dulceanu
>Priority: Major
>  Labels: documentation
> Fix For: 1.9.0, 1.10, 1.8.3
>
> Attachments: OAK-7173.patch
>
>
> We should review and update the documentation of [{{oak-run 
> check}}|http://jackrabbit.apache.org/oak/docs/nodestore/segment/overview.html#check].
>  E.g. to include the new options from OAK-6373.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (OAK-7248) Remove deprecated deep option from check command

2018-02-06 Thread Andrei Dulceanu (JIRA)
Andrei Dulceanu created OAK-7248:


 Summary: Remove deprecated deep option from check command
 Key: OAK-7248
 URL: https://issues.apache.org/jira/browse/OAK-7248
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: run, segment-tar
Reporter: Andrei Dulceanu
Assignee: Andrei Dulceanu
 Fix For: 1.9.0, 1.10


With OAK-5595 we have enabled deep traversals by default when using the check 
command. At the same time we have deprecated the --{{deep}} option.

Since all these happened for {{1.8}}, the next logical step to do for {{1.10}} 
is to remove this option altogether.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-5506) reject item names with unpaired surrogates early

2018-02-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353806#comment-16353806
 ] 

Michael Dürig commented on OAK-5506:


re. b), let's leave it like it is for now and figure it out while writing the 
migration related test.

> reject item names with unpaired surrogates early
> 
>
> Key: OAK-5506
> URL: https://issues.apache.org/jira/browse/OAK-5506
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: core, jcr, segment-tar
>Affects Versions: 1.5.18
>Reporter: Julian Reschke
>Priority: Minor
> Fix For: 1.10
>
> Attachments: OAK-5506-01.patch, OAK-5506-02.patch, OAK-5506-4.diff, 
> OAK-5506-bench.diff, OAK-5506-name-conversion.diff, OAK-5506-segment.diff, 
> OAK-5506-segment2.diff, OAK-5506-segment3.diff, OAK-5506.diff, 
> ValidNamesTest.java
>
>
> Apparently, the following node name is accepted:
>{{"foo\ud800"}}
> but a subsequent {{getPath()}} call fails:
> {noformat}
> javax.jcr.InvalidItemStateException: This item [/test_node/foo?] does not 
> exist anymore
> at 
> org.apache.jackrabbit.oak.jcr.delegate.ItemDelegate.checkAlive(ItemDelegate.java:86)
> at 
> org.apache.jackrabbit.oak.jcr.session.operation.ItemOperation.checkPreconditions(ItemOperation.java:34)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.prePerform(SessionDelegate.java:615)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:205)
> at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:112)
> at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.getPath(ItemImpl.java:140)
> at 
> org.apache.jackrabbit.oak.jcr.session.NodeImpl.getPath(NodeImpl.java:106)
> at 
> org.apache.jackrabbit.oak.jcr.ValidNamesTest.nameTest(ValidNamesTest.java:271)
> at 
> org.apache.jackrabbit.oak.jcr.ValidNamesTest.testUnpairedSurrogate(ValidNamesTest.java:259)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source){noformat}
> (test case follows)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7242) OAK API overview documentation link for NodeState doesn't exist

2018-02-06 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353777#comment-16353777
 ] 

Vikas Saurabh commented on OAK-7242:


Thanks [~anchela], [~mduerig] for the clarification. I guess then {{NodeState}} 
shouldn't be part of "key API entry points" in the doc [~indra2gurjar]'s patch 
does.

> OAK API overview documentation link for NodeState doesn't exist
> ---
>
> Key: OAK-7242
> URL: https://issues.apache.org/jira/browse/OAK-7242
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: api
>Affects Versions: 1.8.0
>Reporter: indra kumar gurjar
>Assignee: angela
>Priority: Minor
>
> At OAK API overview documentation page [0] when user clicks on NodeState 
> documentation , it shows 404.
> NodeState API is not listed under oak-api , so either it should be removed 
> from overview documentation or new link should be updated.
> New link for API - 
> [https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeState.html]
> [0] - [https://jackrabbit.apache.org/oak/docs/oak_api/overview.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-5506) reject item names with unpaired surrogates early

2018-02-06 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353763#comment-16353763
 ] 

Julian Reschke commented on OAK-5506:
-

a I can take care of the unit test.

b) Not sure whether the utils should be configurable - caller will need to be 
aware of it anyway, right? Feedback appreciated.

> reject item names with unpaired surrogates early
> 
>
> Key: OAK-5506
> URL: https://issues.apache.org/jira/browse/OAK-5506
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: core, jcr, segment-tar
>Affects Versions: 1.5.18
>Reporter: Julian Reschke
>Priority: Minor
> Fix For: 1.10
>
> Attachments: OAK-5506-01.patch, OAK-5506-02.patch, OAK-5506-4.diff, 
> OAK-5506-bench.diff, OAK-5506-name-conversion.diff, OAK-5506-segment.diff, 
> OAK-5506-segment2.diff, OAK-5506-segment3.diff, OAK-5506.diff, 
> ValidNamesTest.java
>
>
> Apparently, the following node name is accepted:
>{{"foo\ud800"}}
> but a subsequent {{getPath()}} call fails:
> {noformat}
> javax.jcr.InvalidItemStateException: This item [/test_node/foo?] does not 
> exist anymore
> at 
> org.apache.jackrabbit.oak.jcr.delegate.ItemDelegate.checkAlive(ItemDelegate.java:86)
> at 
> org.apache.jackrabbit.oak.jcr.session.operation.ItemOperation.checkPreconditions(ItemOperation.java:34)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.prePerform(SessionDelegate.java:615)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:205)
> at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:112)
> at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.getPath(ItemImpl.java:140)
> at 
> org.apache.jackrabbit.oak.jcr.session.NodeImpl.getPath(NodeImpl.java:106)
> at 
> org.apache.jackrabbit.oak.jcr.ValidNamesTest.nameTest(ValidNamesTest.java:271)
> at 
> org.apache.jackrabbit.oak.jcr.ValidNamesTest.testUnpairedSurrogate(ValidNamesTest.java:259)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source){noformat}
> (test case follows)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-5506) reject item names with unpaired surrogates early

2018-02-06 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353759#comment-16353759
 ] 

Francesco Mari commented on OAK-5506:
-

[~mduerig], it sounds reasonable to me.

> reject item names with unpaired surrogates early
> 
>
> Key: OAK-5506
> URL: https://issues.apache.org/jira/browse/OAK-5506
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: core, jcr, segment-tar
>Affects Versions: 1.5.18
>Reporter: Julian Reschke
>Priority: Minor
> Fix For: 1.10
>
> Attachments: OAK-5506-01.patch, OAK-5506-02.patch, OAK-5506-4.diff, 
> OAK-5506-bench.diff, OAK-5506-name-conversion.diff, OAK-5506-segment.diff, 
> OAK-5506-segment2.diff, OAK-5506-segment3.diff, OAK-5506.diff, 
> ValidNamesTest.java
>
>
> Apparently, the following node name is accepted:
>{{"foo\ud800"}}
> but a subsequent {{getPath()}} call fails:
> {noformat}
> javax.jcr.InvalidItemStateException: This item [/test_node/foo?] does not 
> exist anymore
> at 
> org.apache.jackrabbit.oak.jcr.delegate.ItemDelegate.checkAlive(ItemDelegate.java:86)
> at 
> org.apache.jackrabbit.oak.jcr.session.operation.ItemOperation.checkPreconditions(ItemOperation.java:34)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.prePerform(SessionDelegate.java:615)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:205)
> at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:112)
> at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.getPath(ItemImpl.java:140)
> at 
> org.apache.jackrabbit.oak.jcr.session.NodeImpl.getPath(NodeImpl.java:106)
> at 
> org.apache.jackrabbit.oak.jcr.ValidNamesTest.nameTest(ValidNamesTest.java:271)
> at 
> org.apache.jackrabbit.oak.jcr.ValidNamesTest.testUnpairedSurrogate(ValidNamesTest.java:259)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source){noformat}
> (test case follows)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-5506) reject item names with unpaired surrogates early

2018-02-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353754#comment-16353754
 ] 

Michael Dürig commented on OAK-5506:


+1 for the patch from my side provided we follow up with
 * A unit test for {{CharsetEncodingUtils}} ([~reschke], can you take care of 
this?)
 * A way to fall back to the previous behaviour for tests and via feature flag 
([~reschke], can {{CharsetEncodingUtils}} be parametrized to that respect or 
would it be better to leave this to the caller side?)
 * A migration test ensuring repositories written before these change can still 
be safely read with this patch (I can take this up).

[~frm] WDYT?

> reject item names with unpaired surrogates early
> 
>
> Key: OAK-5506
> URL: https://issues.apache.org/jira/browse/OAK-5506
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: core, jcr, segment-tar
>Affects Versions: 1.5.18
>Reporter: Julian Reschke
>Priority: Minor
> Fix For: 1.10
>
> Attachments: OAK-5506-01.patch, OAK-5506-02.patch, OAK-5506-4.diff, 
> OAK-5506-bench.diff, OAK-5506-name-conversion.diff, OAK-5506-segment.diff, 
> OAK-5506-segment2.diff, OAK-5506-segment3.diff, OAK-5506.diff, 
> ValidNamesTest.java
>
>
> Apparently, the following node name is accepted:
>{{"foo\ud800"}}
> but a subsequent {{getPath()}} call fails:
> {noformat}
> javax.jcr.InvalidItemStateException: This item [/test_node/foo?] does not 
> exist anymore
> at 
> org.apache.jackrabbit.oak.jcr.delegate.ItemDelegate.checkAlive(ItemDelegate.java:86)
> at 
> org.apache.jackrabbit.oak.jcr.session.operation.ItemOperation.checkPreconditions(ItemOperation.java:34)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.prePerform(SessionDelegate.java:615)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:205)
> at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:112)
> at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.getPath(ItemImpl.java:140)
> at 
> org.apache.jackrabbit.oak.jcr.session.NodeImpl.getPath(NodeImpl.java:106)
> at 
> org.apache.jackrabbit.oak.jcr.ValidNamesTest.nameTest(ValidNamesTest.java:271)
> at 
> org.apache.jackrabbit.oak.jcr.ValidNamesTest.testUnpairedSurrogate(ValidNamesTest.java:259)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source){noformat}
> (test case follows)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7242) OAK API overview documentation link for NodeState doesn't exist

2018-02-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353734#comment-16353734
 ] 

Michael Dürig commented on OAK-7242:


{{PropertyState}} and {{NodeState}} instances are immutable value objects (or 
ADTs really), so like e.g. {{String}} it makes sense to share them where 
necessary. The {{Tree}} class in the Oak API introduces higher level 
functionality based on these primitives (e.g. mutability, access control, 
transactions, etc.). As this additional functionality is mainly focused on the 
hierarchy (i.e. the nodes), it turned out that there is no need in wrapping 
properties again as this would not provide any additional value by increasing 
the complexity at the same time.

> OAK API overview documentation link for NodeState doesn't exist
> ---
>
> Key: OAK-7242
> URL: https://issues.apache.org/jira/browse/OAK-7242
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: api
>Affects Versions: 1.8.0
>Reporter: indra kumar gurjar
>Assignee: angela
>Priority: Minor
>
> At OAK API overview documentation page [0] when user clicks on NodeState 
> documentation , it shows 404.
> NodeState API is not listed under oak-api , so either it should be removed 
> from overview documentation or new link should be updated.
> New link for API - 
> [https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeState.html]
> [0] - [https://jackrabbit.apache.org/oak/docs/oak_api/overview.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7223) Files could be kept partially in case of disconnection from backends

2018-02-06 Thread Amit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353726#comment-16353726
 ] 

Amit Jain commented on OAK-7223:


Backported to 1.6 branch with http://svn.apache.org/viewvc?rev=1823298=rev

> Files could be kept partially in case of disconnection from backends
> 
>
> Key: OAK-7223
> URL: https://issues.apache.org/jira/browse/OAK-7223
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: blob-plugins
>Affects Versions: 1.6.8, 1.8.1
>Reporter: Amit Jain
>Assignee: Amit Jain
>Priority: Blocker
> Fix For: 1.9.0, 1.10, 1.6.9, 1.8.2
>
> Attachments: OAK-7223-2.patch, OAK-7223-3.patch, OAK-7223.patch
>
>
> The FileCache#load method needs to clean the files which may have been 
> downloaded partially in case of errors from backend. This partially 
> downloaded file should be removed before returning from the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (OAK-7242) OAK API overview documentation link for NodeState doesn't exist

2018-02-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353715#comment-16353715
 ] 

Michael Dürig edited comment on OAK-7242 at 2/6/18 11:04 AM:
-

[~catholicon], it's correct :) the oak-api contains {{ContentRepository}}, 
{{Root}}, {{Tree}} and {{PropertyState}}, while the {{NodeStore}} is on a 
different layer in the architecture and operates on {{NodeStates}}. I guess 
it's just for convenience that both layers use {{PropertyState}} but you would 
have to ask [~jukkaz] or [~mduerig] for the exact reasoning for sharing 
{{PropertyState}} in both layers.


was (Author: anchela):
[~catholicon], it's correct :-) the oak-api contains {{ContentRepository}}, 
{{Root}}, {{Tree}} and {{PropertyState}}, while the {{NodeStore}} is on a 
different layer in the architecture and operates on {{NodeStates}}. I guess 
it's just for convenience that both layers use {{PropertyState}} but you would 
have to ask Jukka or [~mduerig] for the exact reasoning for sharing 
{{PropertyState}} in both layers.

> OAK API overview documentation link for NodeState doesn't exist
> ---
>
> Key: OAK-7242
> URL: https://issues.apache.org/jira/browse/OAK-7242
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: api
>Affects Versions: 1.8.0
>Reporter: indra kumar gurjar
>Assignee: angela
>Priority: Minor
>
> At OAK API overview documentation page [0] when user clicks on NodeState 
> documentation , it shows 404.
> NodeState API is not listed under oak-api , so either it should be removed 
> from overview documentation or new link should be updated.
> New link for API - 
> [https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeState.html]
> [0] - [https://jackrabbit.apache.org/oak/docs/oak_api/overview.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (OAK-7242) OAK API overview documentation link for NodeState doesn't exist

2018-02-06 Thread angela (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angela reassigned OAK-7242:
---

Assignee: angela

> OAK API overview documentation link for NodeState doesn't exist
> ---
>
> Key: OAK-7242
> URL: https://issues.apache.org/jira/browse/OAK-7242
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: api
>Affects Versions: 1.8.0
>Reporter: indra kumar gurjar
>Assignee: angela
>Priority: Minor
>
> At OAK API overview documentation page [0] when user clicks on NodeState 
> documentation , it shows 404.
> NodeState API is not listed under oak-api , so either it should be removed 
> from overview documentation or new link should be updated.
> New link for API - 
> [https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeState.html]
> [0] - [https://jackrabbit.apache.org/oak/docs/oak_api/overview.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7242) OAK API overview documentation link for NodeState doesn't exist

2018-02-06 Thread angela (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353715#comment-16353715
 ] 

angela commented on OAK-7242:
-

[~catholicon], it's correct :-) the oak-api contains {{ContentRepository}}, 
{{Root}}, {{Tree}} and {{PropertyState}}, while the {{NodeStore}} is on a 
different layer in the architecture and operates on {{NodeStates}}. I guess 
it's just for convenience that both layers use {{PropertyState}} but you would 
have to ask Jukka or [~mduerig] for the exact reasoning for sharing 
{{PropertyState}} in both layers.

> OAK API overview documentation link for NodeState doesn't exist
> ---
>
> Key: OAK-7242
> URL: https://issues.apache.org/jira/browse/OAK-7242
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: api
>Affects Versions: 1.8.0
>Reporter: indra kumar gurjar
>Priority: Critical
>
> At OAK API overview documentation page [0] when user clicks on NodeState 
> documentation , it shows 404.
> NodeState API is not listed under oak-api , so either it should be removed 
> from overview documentation or new link should be updated.
> New link for API - 
> [https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeState.html]
> [0] - [https://jackrabbit.apache.org/oak/docs/oak_api/overview.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7242) OAK API overview documentation link for NodeState doesn't exist

2018-02-06 Thread angela (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angela updated OAK-7242:

Priority: Minor  (was: Critical)

> OAK API overview documentation link for NodeState doesn't exist
> ---
>
> Key: OAK-7242
> URL: https://issues.apache.org/jira/browse/OAK-7242
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: api
>Affects Versions: 1.8.0
>Reporter: indra kumar gurjar
>Priority: Minor
>
> At OAK API overview documentation page [0] when user clicks on NodeState 
> documentation , it shows 404.
> NodeState API is not listed under oak-api , so either it should be removed 
> from overview documentation or new link should be updated.
> New link for API - 
> [https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeState.html]
> [0] - [https://jackrabbit.apache.org/oak/docs/oak_api/overview.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (OAK-7247) Update Oak 1.8 to Jackrabbit 2.16.1

2018-02-06 Thread Julian Reschke (JIRA)
Julian Reschke created OAK-7247:
---

 Summary: Update Oak 1.8 to Jackrabbit 2.16.1
 Key: OAK-7247
 URL: https://issues.apache.org/jira/browse/OAK-7247
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: parent
Reporter: Julian Reschke
Assignee: Julian Reschke






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7246) Improve cleanup of locally copied index files

2018-02-06 Thread Amit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Jain updated OAK-7246:
---
Summary: Improve cleanup of locally copied index files  (was: Improve 
ceanup of locally copied index files)

> Improve cleanup of locally copied index files
> -
>
> Key: OAK-7246
> URL: https://issues.apache.org/jira/browse/OAK-7246
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Major
>
> This task is to re-think how should we do clean up of locally copied index 
> files which are no longer in use.
> Current approach:
> # index writers, while creating index files, keep list of 
> currently-being-written files
> ## this list is cleared when a new index writer comes into play
> # index tracker opens new index (at new revision) via observation
> ## while being opened, we also track current dir listing of the local index 
> files
> # during opening new index, the tracker closes the old revision of index 
> reader
> ## during this close, local files noted above during open are purged if ( 
> they don't show up in remote view of the index && they aren't part of 
> currently being written list by index writer)
> This approach, at least in following timeline, would incur extra copying (and 
> as a side-effect also open some index files directly off of remote input 
> stream during CoWs):
> # CoW1 creates [a, b]
> # CoW2 starts and creates [c, d], removes [a, b] from remote
> # CoR1 opens an index due to CoW1
> ## local-list-CoR1 = [a, b, c, d], remote-index-list=[a, b]
> # CoW2 finishes
> # CoW3 creates [e, f], removes [a,b] from remote
> ## CoW-currently-being-written-list=[e,f]
> # CoR2 opens due to CoW2
> ## local-list-CoR2=[a,b,c,d,e,f], remote-index-list=[c,d]
> # CoR1 closes
> ## deletes [c,d] as they aren't in its list of index files ([a,b]) AND aren't 
> part of shared list ([e,f])
> Disclaimer: the timeline might be off a bit (haven't written a test yet... 
> but the basic point is that CoR could be working with a index file set and 
> the new files might have come in twice after CoR - thus shared list doesn't 
> have complete information of new files written in.
> [~chetanm], can you please check the timeline above - I'd try to work on a 
> test case in the mean time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (OAK-7246) Improve ceanup of locally copied index files

2018-02-06 Thread Vikas Saurabh (JIRA)
Vikas Saurabh created OAK-7246:
--

 Summary: Improve ceanup of locally copied index files
 Key: OAK-7246
 URL: https://issues.apache.org/jira/browse/OAK-7246
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: lucene
Reporter: Vikas Saurabh
Assignee: Vikas Saurabh


This task is to re-think how should we do clean up of locally copied index 
files which are no longer in use.

Current approach:
# index writers, while creating index files, keep list of 
currently-being-written files
## this list is cleared when a new index writer comes into play
# index tracker opens new index (at new revision) via observation
## while being opened, we also track current dir listing of the local index 
files
# during opening new index, the tracker closes the old revision of index reader
## during this close, local files noted above during open are purged if ( they 
don't show up in remote view of the index && they aren't part of currently 
being written list by index writer)

This approach, at least in following timeline, would incur extra copying (and 
as a side-effect also open some index files directly off of remote input stream 
during CoWs):
# CoW1 creates [a, b]
# CoW2 starts and creates [c, d], removes [a, b] from remote
# CoR1 opens an index due to CoW1
## local-list-CoR1 = [a, b, c, d], remote-index-list=[a, b]
# CoW2 finishes
# CoW3 creates [e, f], removes [a,b] from remote
## CoW-currently-being-written-list=[e,f]
# CoR2 opens due to CoW2
## local-list-CoR2=[a,b,c,d,e,f], remote-index-list=[c,d]
# CoR1 closes
## deletes [c,d] as they aren't in its list of index files ([a,b]) AND aren't 
part of shared list ([e,f])

Disclaimer: the timeline might be off a bit (haven't written a test yet... but 
the basic point is that CoR could be working with a index file set and the new 
files might have come in twice after CoR - thus shared list doesn't have 
complete information of new files written in.

[~chetanm], can you please check the timeline above - I'd try to work on a test 
case in the mean time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6941) Compatibility matrix for oak-run compact

2018-02-06 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-6941:
--
Fix Version/s: (was: 1.8.2)

> Compatibility matrix for oak-run compact
> 
>
> Key: OAK-6941
> URL: https://issues.apache.org/jira/browse/OAK-6941
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: doc, run, segment-tar
>Reporter: Valentin Olteanu
>Priority: Major
>  Labels: documentation, tooling
> Fix For: 1.9.0, 1.10, 1.8.3
>
>
> h4. Problem statement
> For compacting the segmentstore using {{oak-run}}, the safest option is to 
> use the same version of {{oak-run}} as the Oak version used to generate the 
> repository. Yet, sometimes, a newer {{oak-run}} version is recommended to 
> benefit of bug fixes and improvements, but not every combination of source 
> repo and oak-run is safe to use and the user needs a way to check the 
> compatibility. Thus, the users need a tool that guides the decision of which 
> version to use.
> h4. Requirements
> * Easy to decide what {{oak-run}} version should be used for a certain Oak 
> version
> * Up to date with the latest releases
> * Machine readable for scripting
> * Include details on the benefits of using a certain version (release notes)
> * Blacklist of versions that should not be used (with alternatives)
> h4. Solution
> TBD



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7173) Update documentation for oak-run check

2018-02-06 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-7173:
--
Fix Version/s: (was: 1.8.2)

> Update documentation for oak-run check
> --
>
> Key: OAK-7173
> URL: https://issues.apache.org/jira/browse/OAK-7173
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: doc, oak-run, segment-tar
>Reporter: Michael Dürig
>Assignee: Andrei Dulceanu
>Priority: Major
>  Labels: documentation
> Fix For: 1.9.0, 1.10, 1.8.3
>
>
> We should review and update the documentation of [{{oak-run 
> check}}|http://jackrabbit.apache.org/oak/docs/nodestore/segment/overview.html#check].
>  E.g. to include the new options from OAK-6373.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7112) Update documentation for cold standby

2018-02-06 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-7112:
--
Fix Version/s: (was: 1.8.2)

> Update documentation for cold standby
> -
>
> Key: OAK-7112
> URL: https://issues.apache.org/jira/browse/OAK-7112
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: doc, segment-tar, tarmk-standby
>Reporter: Andrei Dulceanu
>Assignee: Andrei Dulceanu
>Priority: Major
> Fix For: 1.9.0, 1.10, 1.8.3
>
>
> Improve monitoring section of cold standby in {{oak-doc}} to include missing 
> MBean screenshots.
> [~mduerig], [~frm]: How about adding a *Benchmarking* section to the cold 
> standby page covering a bit ways to use the new {{Oak-Segment-Tar-Cold}} 
> fixture and also running {{ScalabilityStandbySuite}} on top of it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6031) Add TarFiles to the architecture diagram

2018-02-06 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-6031:
--
Fix Version/s: (was: 1.8.2)

> Add TarFiles to the architecture diagram
> 
>
> Key: OAK-6031
> URL: https://issues.apache.org/jira/browse/OAK-6031
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: doc, segment-tar
>Affects Versions: 1.8.0
>Reporter: Francesco Mari
>Assignee: Francesco Mari
>Priority: Major
>  Labels: documentation
> Fix For: 1.9.0, 1.10, 1.8.3
>
>
> The newly created {{TarFiles}} should be added to the architecture diagram 
> for oak-segment-tar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7173) Update documentation for oak-run check

2018-02-06 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-7173:
--
Fix Version/s: 1.8.3

> Update documentation for oak-run check
> --
>
> Key: OAK-7173
> URL: https://issues.apache.org/jira/browse/OAK-7173
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: doc, oak-run, segment-tar
>Reporter: Michael Dürig
>Assignee: Andrei Dulceanu
>Priority: Major
>  Labels: documentation
> Fix For: 1.9.0, 1.10, 1.8.2, 1.8.3
>
>
> We should review and update the documentation of [{{oak-run 
> check}}|http://jackrabbit.apache.org/oak/docs/nodestore/segment/overview.html#check].
>  E.g. to include the new options from OAK-6373.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7112) Update documentation for cold standby

2018-02-06 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-7112:
--
Fix Version/s: 1.8.3

> Update documentation for cold standby
> -
>
> Key: OAK-7112
> URL: https://issues.apache.org/jira/browse/OAK-7112
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: doc, segment-tar, tarmk-standby
>Reporter: Andrei Dulceanu
>Assignee: Andrei Dulceanu
>Priority: Major
> Fix For: 1.9.0, 1.10, 1.8.2, 1.8.3
>
>
> Improve monitoring section of cold standby in {{oak-doc}} to include missing 
> MBean screenshots.
> [~mduerig], [~frm]: How about adding a *Benchmarking* section to the cold 
> standby page covering a bit ways to use the new {{Oak-Segment-Tar-Cold}} 
> fixture and also running {{ScalabilityStandbySuite}} on top of it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6031) Add TarFiles to the architecture diagram

2018-02-06 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-6031:
--
Fix Version/s: 1.8.3

> Add TarFiles to the architecture diagram
> 
>
> Key: OAK-6031
> URL: https://issues.apache.org/jira/browse/OAK-6031
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: doc, segment-tar
>Affects Versions: 1.8.0
>Reporter: Francesco Mari
>Assignee: Francesco Mari
>Priority: Major
>  Labels: documentation
> Fix For: 1.9.0, 1.10, 1.8.2, 1.8.3
>
>
> The newly created {{TarFiles}} should be added to the architecture diagram 
> for oak-segment-tar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-6941) Compatibility matrix for oak-run compact

2018-02-06 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-6941:
--
Fix Version/s: 1.8.3

> Compatibility matrix for oak-run compact
> 
>
> Key: OAK-6941
> URL: https://issues.apache.org/jira/browse/OAK-6941
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: doc, run, segment-tar
>Reporter: Valentin Olteanu
>Priority: Major
>  Labels: documentation, tooling
> Fix For: 1.9.0, 1.10, 1.8.2, 1.8.3
>
>
> h4. Problem statement
> For compacting the segmentstore using {{oak-run}}, the safest option is to 
> use the same version of {{oak-run}} as the Oak version used to generate the 
> repository. Yet, sometimes, a newer {{oak-run}} version is recommended to 
> benefit of bug fixes and improvements, but not every combination of source 
> repo and oak-run is safe to use and the user needs a way to check the 
> compatibility. Thus, the users need a tool that guides the decision of which 
> version to use.
> h4. Requirements
> * Easy to decide what {{oak-run}} version should be used for a certain Oak 
> version
> * Up to date with the latest releases
> * Machine readable for scripting
> * Include details on the benefits of using a certain version (release notes)
> * Blacklist of versions that should not be used (with alternatives)
> h4. Solution
> TBD



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OAK-7212) Document the document order traversal option

2018-02-06 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-7212:
--
Fix Version/s: (was: 1.8.2)
   1.8.3

> Document the document order traversal option
> 
>
> Key: OAK-7212
> URL: https://issues.apache.org/jira/browse/OAK-7212
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: doc, run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Major
> Fix For: 1.8.3
>
>
> Document the doc-order-traversal option introduced with OAK-6353



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (OAK-7223) Files could be kept partially in case of disconnection from backends

2018-02-06 Thread Amit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Jain resolved OAK-7223.

Resolution: Fixed

Backported to 1.8 branch - http://svn.apache.org/viewvc?rev=1823284=rev

> Files could be kept partially in case of disconnection from backends
> 
>
> Key: OAK-7223
> URL: https://issues.apache.org/jira/browse/OAK-7223
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: blob-plugins
>Affects Versions: 1.6.8, 1.8.1
>Reporter: Amit Jain
>Assignee: Amit Jain
>Priority: Blocker
> Fix For: 1.9.0, 1.10, 1.6.9, 1.8.2
>
> Attachments: OAK-7223-2.patch, OAK-7223-3.patch, OAK-7223.patch
>
>
> The FileCache#load method needs to clean the files which may have been 
> downloaded partially in case of errors from backend. This partially 
> downloaded file should be removed before returning from the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)