[jira] [Updated] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-9052: Attachment: fileSizeOverTime.png > Reindexing using --doc-traversal-mode may need a lot of memory > -- > > Key: OAK-9052 > URL: https://issues.apache.org/jira/browse/OAK-9052 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, mongomk >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: fileSizeOverTime.png > > > Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For > aggregation, there is a limit on memory usage, by default around 100 MB. > Depending on the content structure, this limit can be exceeded. > It would be good to find a way to avoid a memory limit, for example using a > temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100829#comment-17100829 ] Thomas Mueller commented on OAK-9052: - https://github.com/oak-indexing/jackrabbit-oak/pull/154 With the memory setting "0" (the default value), a temporary file is created for the linked list, so that heap memory usage is constant (around 30 MB I guess). Internally, a persistent key-value store, the H2 MVStore, is used (the same one as used by the MongoMK for the persistent cache). Every minute, the file is compacted (configurable using the "oak.indexer.linkedList.compactMillis" system property) It's possible to use the old behavior by setting the system property "oak.indexer.memLimitInMB" to 100. > Reindexing using --doc-traversal-mode may need a lot of memory > -- > > Key: OAK-9052 > URL: https://issues.apache.org/jira/browse/OAK-9052 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, mongomk >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For > aggregation, there is a limit on memory usage, by default around 100 MB. > Depending on the content structure, this limit can be exceeded. > It would be good to find a way to avoid a memory limit, for example using a > temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller reassigned OAK-9052: --- Assignee: Thomas Mueller > Reindexing using --doc-traversal-mode may need a lot of memory > -- > > Key: OAK-9052 > URL: https://issues.apache.org/jira/browse/OAK-9052 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, mongomk >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For > aggregation, there is a limit on memory usage, by default around 100 MB. > Depending on the content structure, this limit can be exceeded. > It would be good to find a way to avoid a memory limit, for example using a > temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099871#comment-17099871 ] Thomas Mueller commented on OAK-9052: - Data structure: * FlatFileBufferLinkedList is used in the second phase and contains a list of NodeStateEntry objects. * NodeStateEntry.nodeState is a LazyChildrenNodeState for entries in memory, but can be a DocumentNodeState when reading from MongoDB (in the first phase). * NodeStateEntry objects can be (de-)serialized using the NodeStateEntryWriter / NodeStateEntryReader. That is usually only used in the first phase. * The temp file is stored in temp/flat-file-store/sort-work-dir/sortInBatch...flatfile (by default using compression). > Reindexing using --doc-traversal-mode may need a lot of memory > -- > > Key: OAK-9052 > URL: https://issues.apache.org/jira/browse/OAK-9052 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, mongomk >Reporter: Thomas Mueller >Priority: Major > > Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For > aggregation, there is a limit on memory usage, by default around 100 MB. > Depending on the content structure, this limit can be exceeded. > It would be good to find a way to avoid a memory limit, for example using a > temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
Thomas Mueller created OAK-9052: --- Summary: Reindexing using --doc-traversal-mode may need a lot of memory Key: OAK-9052 URL: https://issues.apache.org/jira/browse/OAK-9052 Project: Jackrabbit Oak Issue Type: Improvement Components: indexing, mongomk Reporter: Thomas Mueller Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For aggregation, there is a limit on memory usage, by default around 100 MB. Depending on the content structure, this limit can be exceeded. It would be good to find a way to avoid a memory limit, for example using a temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OAK-9046) Index function string-length should index size for binary properties
[ https://issues.apache.org/jira/browse/OAK-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-9046. - Resolution: Fixed > Index function string-length should index size for binary properties > > > Key: OAK-9046 > URL: https://issues.apache.org/jira/browse/OAK-9046 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > > For index definition > {noformat} > jcr:primaryType="nt:unstructured" > ordered="{Boolean}true" > propertyIndex="{Boolean}true" > type="Long" > function="fn:string-length(jcr:content/@jcr:data)"/> > {noformat} > Expected result: Index the size of @jcr:data > Current result: > {noformat} > ERROR o.a.j.o.p.i.l.LuceneDocumentMaker - Failed to calculate function value > for [function, length, @jcr:content/jcr:data] ... > java.lang.IllegalStateException: String is too long: 2325601444581057974 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9046) Index function string-length should index size for binary properties
[ https://issues.apache.org/jira/browse/OAK-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099667#comment-17099667 ] Thomas Mueller commented on OAK-9046: - http://svn.apache.org/r1877376 > Index function string-length should index size for binary properties > > > Key: OAK-9046 > URL: https://issues.apache.org/jira/browse/OAK-9046 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > > For index definition > {noformat} > jcr:primaryType="nt:unstructured" > ordered="{Boolean}true" > propertyIndex="{Boolean}true" > type="Long" > function="fn:string-length(jcr:content/@jcr:data)"/> > {noformat} > Expected result: Index the size of @jcr:data > Current result: > {noformat} > ERROR o.a.j.o.p.i.l.LuceneDocumentMaker - Failed to calculate function value > for [function, length, @jcr:content/jcr:data] ... > java.lang.IllegalStateException: String is too long: 2325601444581057974 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-9046) Index function string-length should index size for binary properties
[ https://issues.apache.org/jira/browse/OAK-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-9046: Fix Version/s: 1.28.0 > Index function string-length should index size for binary properties > > > Key: OAK-9046 > URL: https://issues.apache.org/jira/browse/OAK-9046 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > > For index definition > {noformat} > jcr:primaryType="nt:unstructured" > ordered="{Boolean}true" > propertyIndex="{Boolean}true" > type="Long" > function="fn:string-length(jcr:content/@jcr:data)"/> > {noformat} > Expected result: Index the size of @jcr:data > Current result: > {noformat} > ERROR o.a.j.o.p.i.l.LuceneDocumentMaker - Failed to calculate function value > for [function, length, @jcr:content/jcr:data] ... > java.lang.IllegalStateException: String is too long: 2325601444581057974 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (OAK-9046) Index function string-length should index size for binary properties
[ https://issues.apache.org/jira/browse/OAK-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096501#comment-17096501 ] Thomas Mueller edited comment on OAK-9046 at 4/30/20, 1:07 PM: --- PR: https://github.com/oak-indexing/jackrabbit-oak/pull/151 was (Author: tmueller): PR: https://github.com/apache/jackrabbit-oak/pull/205 > Index function string-length should index size for binary properties > > > Key: OAK-9046 > URL: https://issues.apache.org/jira/browse/OAK-9046 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > For index definition > {noformat} > jcr:primaryType="nt:unstructured" > ordered="{Boolean}true" > propertyIndex="{Boolean}true" > type="Long" > function="fn:string-length(jcr:content/@jcr:data)"/> > {noformat} > Expected result: Index the size of @jcr:data > Current result: > {noformat} > ERROR o.a.j.o.p.i.l.LuceneDocumentMaker - Failed to calculate function value > for [function, length, @jcr:content/jcr:data] ... > java.lang.IllegalStateException: String is too long: 2325601444581057974 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9046) Index function string-length should index size for binary properties
[ https://issues.apache.org/jira/browse/OAK-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096501#comment-17096501 ] Thomas Mueller commented on OAK-9046: - PR: https://github.com/apache/jackrabbit-oak/pull/205 > Index function string-length should index size for binary properties > > > Key: OAK-9046 > URL: https://issues.apache.org/jira/browse/OAK-9046 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > For index definition > {noformat} > jcr:primaryType="nt:unstructured" > ordered="{Boolean}true" > propertyIndex="{Boolean}true" > type="Long" > function="fn:string-length(jcr:content/@jcr:data)"/> > {noformat} > Expected result: Index the size of @jcr:data > Current result: > {noformat} > ERROR o.a.j.o.p.i.l.LuceneDocumentMaker - Failed to calculate function value > for [function, length, @jcr:content/jcr:data] ... > java.lang.IllegalStateException: String is too long: 2325601444581057974 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9046) Index function string-length should index size for binary properties
[ https://issues.apache.org/jira/browse/OAK-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096485#comment-17096485 ] Thomas Mueller commented on OAK-9046: - For SQL-2, the function is called length(). for a relational database, it would be clear that this is the blob length (for a blob). In xpath, there is no blob AFAIK, so the function is called string-length. * http://jackrabbit.apache.org/oak/docs/query/grammar-sql2.html#dynamic_operand * http://jackrabbit.apache.org/oak/docs/query/grammar-xpath.html#dynamic_operand What we do currently is try to converting the binary to a string, then check the length... this fails for segment store, and would fail if the binary is large (e.g. 2 GB) due to out-of-memory. I think it's fine to use length in number of bytes for binaries. Actually we do that when calculating the length() function for conditions in a query as well. > Index function string-length should index size for binary properties > > > Key: OAK-9046 > URL: https://issues.apache.org/jira/browse/OAK-9046 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > For index definition > {noformat} > jcr:primaryType="nt:unstructured" > ordered="{Boolean}true" > propertyIndex="{Boolean}true" > type="Long" > function="fn:string-length(jcr:content/@jcr:data)"/> > {noformat} > Expected result: Index the size of @jcr:data > Current result: > {noformat} > ERROR o.a.j.o.p.i.l.LuceneDocumentMaker - Failed to calculate function value > for [function, length, @jcr:content/jcr:data] ... > java.lang.IllegalStateException: String is too long: 2325601444581057974 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (OAK-9046) Index function string-length should index size for binary properties
Thomas Mueller created OAK-9046: --- Summary: Index function string-length should index size for binary properties Key: OAK-9046 URL: https://issues.apache.org/jira/browse/OAK-9046 Project: Jackrabbit Oak Issue Type: Improvement Components: indexing Reporter: Thomas Mueller Assignee: Thomas Mueller For index definition {noformat} {noformat} Expected result: Index the size of @jcr:data Current result: {noformat} ERROR o.a.j.o.p.i.l.LuceneDocumentMaker - Failed to calculate function value for [function, length, @jcr:content/jcr:data] ... java.lang.IllegalStateException: String is too long: 2325601444581057974 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OAK-8971) Indexing: dynamic boost, as an alternative to IndexFieldProvider
[ https://issues.apache.org/jira/browse/OAK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-8971. - Resolution: Fixed > Indexing: dynamic boost, as an alternative to IndexFieldProvider > > > Key: OAK-8971 > URL: https://issues.apache.org/jira/browse/OAK-8971 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > Attachments: OAK-8971.patch > > > The org.apache.jackrabbit.oak.plugins.index.lucene.spi.IndexFieldProvider is > a callback that allows to change the behavior of indexing. There are multiple > problems: > * (1) Not available using oak-run > * (2) Only available for Lucene indexes > Instead of a callback, a configuration option in the index property should be > added, such that the most commonly used features are available in oak-run, > and can be implemented in other indexes (e.g. Elastisearch, Solr). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8971) Indexing: dynamic boost, as an alternative to IndexFieldProvider
[ https://issues.apache.org/jira/browse/OAK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089756#comment-17089756 ] Thomas Mueller commented on OAK-8971: - http://svn.apache.org/r1876831 (documentation) > Indexing: dynamic boost, as an alternative to IndexFieldProvider > > > Key: OAK-8971 > URL: https://issues.apache.org/jira/browse/OAK-8971 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > Attachments: OAK-8971.patch > > > The org.apache.jackrabbit.oak.plugins.index.lucene.spi.IndexFieldProvider is > a callback that allows to change the behavior of indexing. There are multiple > problems: > * (1) Not available using oak-run > * (2) Only available for Lucene indexes > Instead of a callback, a configuration option in the index property should be > added, such that the most commonly used features are available in oak-run, > and can be implemented in other indexes (e.g. Elastisearch, Solr). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8971) Indexing: dynamic boost, as an alternative to IndexFieldProvider
[ https://issues.apache.org/jira/browse/OAK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089745#comment-17089745 ] Thomas Mueller commented on OAK-8971: - http://svn.apache.org/r1876830 > Indexing: dynamic boost, as an alternative to IndexFieldProvider > > > Key: OAK-8971 > URL: https://issues.apache.org/jira/browse/OAK-8971 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8971.patch > > > The org.apache.jackrabbit.oak.plugins.index.lucene.spi.IndexFieldProvider is > a callback that allows to change the behavior of indexing. There are multiple > problems: > * (1) Not available using oak-run > * (2) Only available for Lucene indexes > Instead of a callback, a configuration option in the index property should be > added, such that the most commonly used features are available in oak-run, > and can be implemented in other indexes (e.g. Elastisearch, Solr). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8971) Indexing: dynamic boost, as an alternative to IndexFieldProvider
[ https://issues.apache.org/jira/browse/OAK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8971: Fix Version/s: 1.28.0 > Indexing: dynamic boost, as an alternative to IndexFieldProvider > > > Key: OAK-8971 > URL: https://issues.apache.org/jira/browse/OAK-8971 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > Attachments: OAK-8971.patch > > > The org.apache.jackrabbit.oak.plugins.index.lucene.spi.IndexFieldProvider is > a callback that allows to change the behavior of indexing. There are multiple > problems: > * (1) Not available using oak-run > * (2) Only available for Lucene indexes > Instead of a callback, a configuration option in the index property should be > added, such that the most commonly used features are available in oak-run, > and can be implemented in other indexes (e.g. Elastisearch, Solr). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9024) oak-solr-osgi imports org.slf4j.impl
[ https://issues.apache.org/jira/browse/OAK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089403#comment-17089403 ] Thomas Mueller commented on OAK-9024: - I checked if oak-core or jackrabbit or oak-blob-plugins imports slf4j.impl, they don't. But I don't know what else to investigate, so I give up. More insights and patches are welcome! > oak-solr-osgi imports org.slf4j.impl > > > Key: OAK-9024 > URL: https://issues.apache.org/jira/browse/OAK-9024 > Project: Jackrabbit Oak > Issue Type: Bug > Components: solr >Reporter: Julian Reschke >Priority: Minor > Fix For: 1.28.0 > > > From the manifest: > {{org.slf4j.impl;version="[1.6,2)"}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9024) oak-solr-osgi imports org.slf4j.impl
[ https://issues.apache.org/jira/browse/OAK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089374#comment-17089374 ] Thomas Mueller commented on OAK-9024: - I have to admit I don't have much insight in how this is supposed to work. Looking at the pom file oak-solr-osgi/pom.xml (and that project only has a pom file), I don't see slf4j.impl. I see: {noformat} org.apache.solr solr-core ${solr.version} org.slf4j slf4j-jdk14 org.apache.hadoop hadoop-annotations commons-fileupload commons-fileupload runtime {noformat} That's it. The dependencies are: {noformat} mvn dependency:tree [INFO] --- maven-dependency-plugin:2.10:tree (default-cli) @ oak-solr-osgi --- [INFO] org.apache.jackrabbit:oak-solr-osgi:bundle:1.27-SNAPSHOT [INFO] +- org.osgi:org.osgi.core:jar:4.2.0:provided [INFO] +- org.osgi:org.osgi.compendium:jar:4.2.0:provided [INFO] +- org.apache.jackrabbit:oak-core:jar:1.27-SNAPSHOT:provided [INFO] | +- org.osgi:org.osgi.service.component.annotations:jar:1.3.0:provided [INFO] | +- org.osgi:org.osgi.service.metatype.annotations:jar:1.3.0:provided [INFO] | +- org.apache.jackrabbit:oak-jackrabbit-api:jar:1.27-SNAPSHOT:runtime [INFO] | +- org.apache.jackrabbit:oak-api:jar:1.27-SNAPSHOT:runtime [INFO] | +- org.apache.jackrabbit:oak-core-spi:jar:1.27-SNAPSHOT:runtime [INFO] | +- org.apache.jackrabbit:oak-query-spi:jar:1.27-SNAPSHOT:runtime [INFO] | +- org.apache.jackrabbit:oak-security-spi:jar:1.27-SNAPSHOT:provided [INFO] | +- org.apache.jackrabbit:oak-commons:jar:1.27-SNAPSHOT:runtime [INFO] | +- org.apache.jackrabbit:oak-blob-plugins:jar:1.27-SNAPSHOT:provided [INFO] | | +- org.apache.jackrabbit:jackrabbit-data:jar:2.20.0:provided [INFO] | | +- org.apache.jackrabbit:oak-blob:jar:1.27-SNAPSHOT:provided [INFO] | | \- org.osgi:org.osgi.annotation:jar:6.0.0:provided [INFO] | +- org.apache.jackrabbit:oak-store-spi:jar:1.27-SNAPSHOT:runtime [INFO] | +- com.google.guava:guava:jar:15.0:runtime [INFO] | +- commons-io:commons-io:jar:2.6:runtime [INFO] | +- javax.jcr:jcr:jar:2.0:runtime [INFO] | +- org.apache.jackrabbit:jackrabbit-jcr-commons:jar:2.20.0:runtime [INFO] | \- org.slf4j:slf4j-api:jar:1.7.30:runtime [INFO] +- org.apache.jackrabbit:oak-solr-core:jar:1.27-SNAPSHOT:runtime [INFO] +- org.apache.jackrabbit:oak-search:jar:1.27-SNAPSHOT:runtime [INFO] | \- org.apache.tika:tika-core:jar:1.24:runtime [INFO] +- org.apache.solr:solr-core:jar:6.6.6:runtime [INFO] | +- org.apache.lucene:lucene-analyzers-kuromoji:jar:6.6.6:runtime [INFO] | +- org.apache.lucene:lucene-analyzers-phonetic:jar:6.6.6:runtime [INFO] | +- org.apache.lucene:lucene-backward-codecs:jar:6.6.6:runtime [INFO] | +- org.apache.lucene:lucene-classification:jar:6.6.6:runtime [INFO] | +- org.apache.lucene:lucene-codecs:jar:6.6.6:runtime [INFO] | +- org.apache.lucene:lucene-expressions:jar:6.6.6:runtime [INFO] | +- org.apache.lucene:lucene-memory:jar:6.6.6:runtime [INFO] | +- org.apache.lucene:lucene-spatial-extras:jar:6.6.6:runtime [INFO] | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.3:runtime [INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.10.3:runtime [INFO] | +- com.github.ben-manes.caffeine:caffeine:jar:2.4.0:runtime [INFO] | +- com.google.protobuf:protobuf-java:jar:3.1.0:runtime [INFO] | +- com.tdunning:t-digest:jar:3.1:runtime [INFO] | +- commons-codec:commons-codec:jar:1.14:runtime [INFO] | +- commons-collections:commons-collections:jar:3.2.2:runtime [INFO] | +- commons-configuration:commons-configuration:jar:1.6:runtime [INFO] | +- dom4j:dom4j:jar:1.6.1:runtime [INFO] | +- info.ganglia.gmetric4j:gmetric4j:jar:1.0.7:runtime [INFO] | +- io.dropwizard.metrics:metrics-core:jar:3.2.3:runtime [INFO] | +- io.dropwizard.metrics:metrics-ganglia:jar:3.2.2:runtime [INFO] | +- io.dropwizard.metrics:metrics-graphite:jar:3.2.2:runtime [INFO] | +- io.dropwizard.metrics:metrics-jetty9:jar:3.2.2:runtime [INFO] | +- io.dropwizard.metrics:metrics-jvm:jar:3.2.2:runtime [INFO] | +- javax.servlet:javax.servlet-api:jar:3.1.0:runtime [INFO] | +- joda-time:joda-time:jar:2.2:runtime [INFO] | +- log4j:log4j:jar:1.2.17:runtime [INFO] | +- net.hydromatic:eigenbase-properties:jar:1.1.5:runtime [INFO] | +- org.antlr:antlr4-runtime:jar:4.5.1-1:runtime [INFO] | +- org.apache.calcite:calcite-core:jar:1.11.0:runtime [INFO] | +- org.apache.calcite:calcite-linq4j:jar:1.11.0:runtime [INFO] | +- org.apache.calcite.avatica:avatica-core:jar:1.9.0:runtime [INFO] | +- org.apache.curator:curator-client:jar:2.8.0:runtime [INFO] | +- org.apache.curator:curator-framework:jar:2.8.0:runtime [INFO] | +-
[jira] [Commented] (OAK-8934) Indexing: filter entries with a regular expression
[ https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088731#comment-17088731 ] Thomas Mueller commented on OAK-8934: - [~amrverma] the problem is that Jackrabbit is using SVN, not git: https://stackoverflow.com/questions/708202/git-format-patch-to-be-svn-compatible > Indexing: filter entries with a regular expression > -- > > Key: OAK-8934 > URL: https://issues.apache.org/jira/browse/OAK-8934 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Labels: amrit > Fix For: 1.26.0 > > Attachments: OAK-8934-1_8.patch, OAK-8934-1_8_svn.patch > > > We should provide a way to filter the index using a regular expression. For > example, only index nodes that contain a reference to another node. (Not a > JCR reference, but a reference within the value itself). For example, index a > node if one of the properties contains: > * /content/abc > * > * and so on > This will allow to run a query to find if /content/abc is referenced. The > index and the query will probably need to use a tag, and the cost of the > index needs to be high. Otherwise the query engine can't know when this index > should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8934) Indexing: filter entries with a regular expression
[ https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8934: Attachment: OAK-8934-1_8_svn.patch > Indexing: filter entries with a regular expression > -- > > Key: OAK-8934 > URL: https://issues.apache.org/jira/browse/OAK-8934 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Labels: amrit > Fix For: 1.26.0 > > Attachments: OAK-8934-1_8.patch, OAK-8934-1_8_svn.patch > > > We should provide a way to filter the index using a regular expression. For > example, only index nodes that contain a reference to another node. (Not a > JCR reference, but a reference within the value itself). For example, index a > node if one of the properties contains: > * /content/abc > * > * and so on > This will allow to run a query to find if /content/abc is referenced. The > index and the query will probably need to use a tag, and the cost of the > index needs to be high. Otherwise the query engine can't know when this index > should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8934) Indexing: filter entries with a regular expression
[ https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088697#comment-17088697 ] Thomas Mueller commented on OAK-8934: - But I applied the rejected chunks manually. Running tests now. > Indexing: filter entries with a regular expression > -- > > Key: OAK-8934 > URL: https://issues.apache.org/jira/browse/OAK-8934 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Labels: amrit > Fix For: 1.26.0 > > Attachments: OAK-8934-1_8.patch, OAK-8934-1_8_svn.patch > > > We should provide a way to filter the index using a regular expression. For > example, only index nodes that contain a reference to another node. (Not a > JCR reference, but a reference within the value itself). For example, index a > node if one of the properties contains: > * /content/abc > * > * and so on > This will allow to run a query to find if /content/abc is referenced. The > index and the query will probably need to use a tag, and the cost of the > index needs to be high. Otherwise the query engine can't know when this index > should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8934) Indexing: filter entries with a regular expression
[ https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088696#comment-17088696 ] Thomas Mueller commented on OAK-8934: - [~amrverma] the patch didn't apply cleanly, when using SVN. I think you have used the wrong command lines options for git... you need to use (I think) something like this: {noformat} git diff --no-prefix trunk..issues/OAK-8950 > OAK-8950.patch {noformat} > Indexing: filter entries with a regular expression > -- > > Key: OAK-8934 > URL: https://issues.apache.org/jira/browse/OAK-8934 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Labels: amrit > Fix For: 1.26.0 > > Attachments: OAK-8934-1_8.patch > > > We should provide a way to filter the index using a regular expression. For > example, only index nodes that contain a reference to another node. (Not a > JCR reference, but a reference within the value itself). For example, index a > node if one of the properties contains: > * /content/abc > * > * and so on > This will allow to run a query to find if /content/abc is referenced. The > index and the query will probably need to use a tag, and the cost of the > index needs to be high. Otherwise the query engine can't know when this index > should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8934) Indexing: filter entries with a regular expression
[ https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088684#comment-17088684 ] Thomas Mueller commented on OAK-8934: - [~amrverma] +1... I will commit the patch now > Indexing: filter entries with a regular expression > -- > > Key: OAK-8934 > URL: https://issues.apache.org/jira/browse/OAK-8934 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Labels: amrit > Fix For: 1.26.0 > > Attachments: OAK-8934-1_8.patch > > > We should provide a way to filter the index using a regular expression. For > example, only index nodes that contain a reference to another node. (Not a > JCR reference, but a reference within the value itself). For example, index a > node if one of the properties contains: > * /content/abc > * > * and so on > This will allow to run a query to find if /content/abc is referenced. The > index and the query will probably need to use a tag, and the cost of the > index needs to be high. Otherwise the query engine can't know when this index > should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (OAK-8971) Indexing: dynamic boost, as an alternative to IndexFieldProvider
[ https://issues.apache.org/jira/browse/OAK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085505#comment-17085505 ] Thomas Mueller edited comment on OAK-8971 at 4/17/20, 7:26 AM: --- git pull request https://github.com/oak-indexing/jackrabbit-oak/pull/131 was (Author: tmueller): git pull request https://github.com/apache/jackrabbit-oak/pull/202 > Indexing: dynamic boost, as an alternative to IndexFieldProvider > > > Key: OAK-8971 > URL: https://issues.apache.org/jira/browse/OAK-8971 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8971.patch > > > The org.apache.jackrabbit.oak.plugins.index.lucene.spi.IndexFieldProvider is > a callback that allows to change the behavior of indexing. There are multiple > problems: > * (1) Not available using oak-run > * (2) Only available for Lucene indexes > Instead of a callback, a configuration option in the index property should be > added, such that the most commonly used features are available in oak-run, > and can be implemented in other indexes (e.g. Elastisearch, Solr). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8971) Indexing: dynamic boost, as an alternative to IndexFieldProvider
[ https://issues.apache.org/jira/browse/OAK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085505#comment-17085505 ] Thomas Mueller commented on OAK-8971: - git pull request https://github.com/apache/jackrabbit-oak/pull/202 > Indexing: dynamic boost, as an alternative to IndexFieldProvider > > > Key: OAK-8971 > URL: https://issues.apache.org/jira/browse/OAK-8971 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8971.patch > > > The org.apache.jackrabbit.oak.plugins.index.lucene.spi.IndexFieldProvider is > a callback that allows to change the behavior of indexing. There are multiple > problems: > * (1) Not available using oak-run > * (2) Only available for Lucene indexes > Instead of a callback, a configuration option in the index property should be > added, such that the most commonly used features are available in oak-run, > and can be implemented in other indexes (e.g. Elastisearch, Solr). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8971) Indexing: dynamic boost, as an alternative to IndexFieldProvider
[ https://issues.apache.org/jira/browse/OAK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085057#comment-17085057 ] Thomas Mueller commented on OAK-8971: - Patch for review. This is only implemented for the Lucene index currently. To enable the feature, add a property to be indexed, e.g.: {noformat} dynamicBoost = true (Boolean) propertyIndex = true name = jcr:content/metadata/predictedTags/.* (String) isRegexp = true (Boolean) {noformat} That way, if a node jcr:content/metadata/predictedTags is added (for the indexed node type), then dynamic boost is used. It will read the child nodes of that node (jcr:content/metadata/predictedTags) and for each node it will read: * name (String) * confidence (Double) It will then add a field, for each token of the "name" property, with boost = confidence. I added a test case to show how this can be implemented using the IndexFieldProvider. Unlike with the IndexFieldProvider, it is now possible to change the confidence or name of the predicted tag, if the node predictedTags is changed as well (for example, using a counter property there). This will cause the document to be updated in the index. With the IndexFieldProvider, this was not possible. (See also the test case that illustrates this). > Indexing: dynamic boost, as an alternative to IndexFieldProvider > > > Key: OAK-8971 > URL: https://issues.apache.org/jira/browse/OAK-8971 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8971.patch > > > The org.apache.jackrabbit.oak.plugins.index.lucene.spi.IndexFieldProvider is > a callback that allows to change the behavior of indexing. There are multiple > problems: > * (1) Not available using oak-run > * (2) Only available for Lucene indexes > Instead of a callback, a configuration option in the index property should be > added, such that the most commonly used features are available in oak-run, > and can be implemented in other indexes (e.g. Elastisearch, Solr). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8971) Indexing: dynamic boost, as an alternative to IndexFieldProvider
[ https://issues.apache.org/jira/browse/OAK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8971: Attachment: OAK-8971.patch > Indexing: dynamic boost, as an alternative to IndexFieldProvider > > > Key: OAK-8971 > URL: https://issues.apache.org/jira/browse/OAK-8971 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8971.patch > > > The org.apache.jackrabbit.oak.plugins.index.lucene.spi.IndexFieldProvider is > a callback that allows to change the behavior of indexing. There are multiple > problems: > * (1) Not available using oak-run > * (2) Only available for Lucene indexes > Instead of a callback, a configuration option in the index property should be > added, such that the most commonly used features are available in oak-run, > and can be implemented in other indexes (e.g. Elastisearch, Solr). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8971) Indexing: dynamic boost, as an alternative to IndexFieldProvider
[ https://issues.apache.org/jira/browse/OAK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8971: Summary: Indexing: dynamic boost, as an alternative to IndexFieldProvider (was: Indexing: option augment an indexed property as an alternative to IndexFieldProvider) > Indexing: dynamic boost, as an alternative to IndexFieldProvider > > > Key: OAK-8971 > URL: https://issues.apache.org/jira/browse/OAK-8971 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > The org.apache.jackrabbit.oak.plugins.index.lucene.spi.IndexFieldProvider is > a callback that allows to change the behavior of indexing. There are multiple > problems: > * (1) Not available using oak-run > * (2) Only available for Lucene indexes > Instead of a callback, a configuration option in the index property should be > added, such that the most commonly used features are available in oak-run, > and can be implemented in other indexes (e.g. Elastisearch, Solr). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OAK-5187) Support regex for relative property path while indexing properties
[ https://issues.apache.org/jira/browse/OAK-5187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller reassigned OAK-5187: --- Assignee: (was: Chetan Mehrotra) > Support regex for relative property path while indexing properties > -- > > Key: OAK-5187 > URL: https://issues.apache.org/jira/browse/OAK-5187 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene >Reporter: Chetan Mehrotra >Priority: Major > > Lucene index definition currently support index properties based on regex. > The regex support is currently restricted to property name only and not for > relative property paths. > However at time we have queries like below > {noformat} > /jcr:root/content/docs//element(*,app:Asset)[(jcr:content/targeting_123/section_456/@foo=110)] > {noformat} > Here the relative property path can vary. To index them we need to support > relative property based on regex or globs 'jcr:content/\*/\*/foo' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OAK-8997) Index importer: ClusterNodeStoreLock needs a retry logic
[ https://issues.apache.org/jira/browse/OAK-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-8997. - Resolution: Fixed > Index importer: ClusterNodeStoreLock needs a retry logic > > > Key: OAK-8997 > URL: https://issues.apache.org/jira/browse/OAK-8997 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, oak-run >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > Attachments: OAK-8997.patch > > > The ClusterNodeStoreLock class has a TODO for a retry logic. This needs to be > implemented, because it can happen in practise. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8997) Index importer: ClusterNodeStoreLock needs a retry logic
[ https://issues.apache.org/jira/browse/OAK-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8997: Fix Version/s: 1.28.0 > Index importer: ClusterNodeStoreLock needs a retry logic > > > Key: OAK-8997 > URL: https://issues.apache.org/jira/browse/OAK-8997 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, oak-run >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > Attachments: OAK-8997.patch > > > The ClusterNodeStoreLock class has a TODO for a retry logic. This needs to be > implemented, because it can happen in practise. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8997) Index importer: ClusterNodeStoreLock needs a retry logic
[ https://issues.apache.org/jira/browse/OAK-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077914#comment-17077914 ] Thomas Mueller commented on OAK-8997: - http://svn.apache.org/r1876276 (trunk) > Index importer: ClusterNodeStoreLock needs a retry logic > > > Key: OAK-8997 > URL: https://issues.apache.org/jira/browse/OAK-8997 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, oak-run >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8997.patch > > > The ClusterNodeStoreLock class has a TODO for a retry logic. This needs to be > implemented, because it can happen in practise. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8997) Index importer: ClusterNodeStoreLock needs a retry logic
[ https://issues.apache.org/jira/browse/OAK-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8997: Attachment: OAK-8997.patch > Index importer: ClusterNodeStoreLock needs a retry logic > > > Key: OAK-8997 > URL: https://issues.apache.org/jira/browse/OAK-8997 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, oak-run >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8997.patch > > > The ClusterNodeStoreLock class has a TODO for a retry logic. This needs to be > implemented, because it can happen in practise. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8997) Index importer: ClusterNodeStoreLock needs a retry logic
[ https://issues.apache.org/jira/browse/OAK-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077252#comment-17077252 ] Thomas Mueller commented on OAK-8997: - Patch for review > Index importer: ClusterNodeStoreLock needs a retry logic > > > Key: OAK-8997 > URL: https://issues.apache.org/jira/browse/OAK-8997 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, oak-run >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8997.patch > > > The ClusterNodeStoreLock class has a TODO for a retry logic. This needs to be > implemented, because it can happen in practise. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8997) Index importer: ClusterNodeStoreLock needs a retry logic
[ https://issues.apache.org/jira/browse/OAK-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8997: Component/s: oak-run indexing > Index importer: ClusterNodeStoreLock needs a retry logic > > > Key: OAK-8997 > URL: https://issues.apache.org/jira/browse/OAK-8997 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, oak-run >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > The ClusterNodeStoreLock class has a TODO for a retry logic. This needs to be > implemented, because it can happen in practise. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (OAK-8997) Index importer: ClusterNodeStoreLock needs a retry logic
Thomas Mueller created OAK-8997: --- Summary: Index importer: ClusterNodeStoreLock needs a retry logic Key: OAK-8997 URL: https://issues.apache.org/jira/browse/OAK-8997 Project: Jackrabbit Oak Issue Type: Improvement Reporter: Thomas Mueller Assignee: Thomas Mueller The ClusterNodeStoreLock class has a TODO for a retry logic. This needs to be implemented, because it can happen in practise. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (OAK-8991) MarkSweepGarbageCollector: repeated warnings for files that don't exist
Thomas Mueller created OAK-8991: --- Summary: MarkSweepGarbageCollector: repeated warnings for files that don't exist Key: OAK-8991 URL: https://issues.apache.org/jira/browse/OAK-8991 Project: Jackrabbit Oak Issue Type: Improvement Components: blob Reporter: Thomas Mueller When using the MarkSweepGarbageCollector (using for example a file data store), if the blob id file (from the BlobIdTracker) contains records that don't exist in the datastore, then a warning is logged when trying to remove the (unreferenced) file: {noformat} *WARN* org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Error occurred while deleting blob with id [...] org.apache.jackrabbit.core.data.DataStoreException: Record ... does not exist at org.apache.jackrabbit.core.data.AbstractDataStore.getRecord(AbstractDataStore.java:59) [org.apache.jackrabbit.jackrabbit-data:2.16.3] at org.apache.jackrabbit.oak.plugins.blob.datastore.OakFileDataStore.getRecordForId(OakFileDataStore.java:259) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.datastore.DataStoreBlobStore.getRecordForId(DataStoreBlobStore.java:520) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.datastore.DataStoreBlobStore.countDeleteChunks(DataStoreBlobStore.java:426) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector$BlobCollectionType.sweepInternal(MarkSweepGarbageCollector.java:859) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.sweep(MarkSweepGarbageCollector.java:423) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.markAndSweep(MarkSweepGarbageCollector.java:287) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] at org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector.collectGarbage(MarkSweepGarbageCollector.java:194) [org.apache.jackrabbit.oak-blob-plugins:1.8.9] {noformat} That means it tried to remove a file that doesn't exist. This indicates a problem in the process; for example, the blob id tracker file(s) was/were restored from an older backup. (Possibly there are other cases how this could occur). Now, the next time the garbage collection is run, the same files will try to be removed, and that again fails. It would be better if the files that don't exist are removed from the blob id tracker file, so that they are not tried to be removed later again and again. If the blob id tracker file(s) are incorrect, I think it would be better to delete and rebuild them, otherwise some of the unreferenced binaries will never be removed. Possibly a warning should be logged, with instructions on how to rebuild these files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8978) Cache facet results
[ https://issues.apache.org/jira/browse/OAK-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071097#comment-17071097 ] Thomas Mueller commented on OAK-8978: - [~catholicon] thanks! Opening the index for each row was done due to OAK-8898 (to fix a "AlreadyClosedException", which can lead to a JVM crash). > Cache facet results > --- > > Key: OAK-8978 > URL: https://issues.apache.org/jira/browse/OAK-8978 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8978-2.patch > > > In OAK-8898, the "AlreadyClosedException" was fixed when reading facet > results. > If the facets are read repeatedly (for each row), then the reader is now > opened and closed each time. This can slow down the application. > It should be possible to cache the facet result in DelayedLuceneFacetProvider -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8978) Cache facet results
[ https://issues.apache.org/jira/browse/OAK-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8978: Attachment: (was: OAK-8978.patch) > Cache facet results > --- > > Key: OAK-8978 > URL: https://issues.apache.org/jira/browse/OAK-8978 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8978-2.patch > > > In OAK-8898, the "AlreadyClosedException" was fixed when reading facet > results. > If the facets are read repeatedly (for each row), then the reader is now > opened and closed each time. This can slow down the application. > It should be possible to cache the facet result in DelayedLuceneFacetProvider -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8978) Cache facet results
[ https://issues.apache.org/jira/browse/OAK-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8978: Attachment: OAK-8978-2.patch > Cache facet results > --- > > Key: OAK-8978 > URL: https://issues.apache.org/jira/browse/OAK-8978 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8978-2.patch > > > In OAK-8898, the "AlreadyClosedException" was fixed when reading facet > results. > If the facets are read repeatedly (for each row), then the reader is now > opened and closed each time. This can slow down the application. > It should be possible to cache the facet result in DelayedLuceneFacetProvider -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8978) Cache facet results
[ https://issues.apache.org/jira/browse/OAK-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8978: Attachment: OAK-8978.patch > Cache facet results > --- > > Key: OAK-8978 > URL: https://issues.apache.org/jira/browse/OAK-8978 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8978.patch > > > In OAK-8898, the "AlreadyClosedException" was fixed when reading facet > results. > If the facets are read repeatedly (for each row), then the reader is now > opened and closed each time. This can slow down the application. > It should be possible to cache the facet result in DelayedLuceneFacetProvider -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OAK-8978) Cache facet results
[ https://issues.apache.org/jira/browse/OAK-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller reassigned OAK-8978: --- Assignee: Thomas Mueller > Cache facet results > --- > > Key: OAK-8978 > URL: https://issues.apache.org/jira/browse/OAK-8978 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > In OAK-8898, the "AlreadyClosedException" was fixed when reading facet > results. > If the facets are read repeatedly (for each row), then the reader is now > opened and closed each time. This can slow down the application. > It should be possible to cache the facet result in DelayedLuceneFacetProvider -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (OAK-8978) Cache facet results
Thomas Mueller created OAK-8978: --- Summary: Cache facet results Key: OAK-8978 URL: https://issues.apache.org/jira/browse/OAK-8978 Project: Jackrabbit Oak Issue Type: Improvement Components: indexing Reporter: Thomas Mueller In OAK-8898, the "AlreadyClosedException" was fixed when reading facet results. If the facets are read repeatedly (for each row), then the reader is now opened and closed each time. This can slow down the application. It should be possible to cache the facet result in DelayedLuceneFacetProvider -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8898) On querying, IndexReader failed with AlreadyClosedException
[ https://issues.apache.org/jira/browse/OAK-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8898: Attachment: OAK-8898-1.10.8.patch > On querying, IndexReader failed with AlreadyClosedException > --- > > Key: OAK-8898 > URL: https://issues.apache.org/jira/browse/OAK-8898 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Mohit Kataria >Assignee: Mohit Kataria >Priority: Major > Fix For: 1.26.0, 1.22.3 > > Attachments: OAK-8898-1.10.8.patch > > > This is an intermittent issue, where on querying the code throws > AlreadyClosedException. > > {code:java} > Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader > is closed > at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:262) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:108) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at org.apache.lucene.index.IndexReader.document(IndexReader.java:446) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getAccessibleSampleCount(StatisticalSortedSetDocValuesFacetCounts.java:169) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren0(StatisticalSortedSetDocValuesFacetCounts.java:104) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren(StatisticalSortedSetDocValuesFacetCounts.java:70) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LuceneFacetProvider.getFacets(LucenePropertyIndex.java:1547) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextResultRow.getFacets(FulltextIndex.java:353) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextPathCursor$2.getValue(FulltextIndex.java:472) > [org.apache.jackrabbit.oak-lucene:1.10.2] > ... 237 common frames omitted > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (OAK-8898) On querying, IndexReader failed with AlreadyClosedException
[ https://issues.apache.org/jira/browse/OAK-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069349#comment-17069349 ] Thomas Mueller edited comment on OAK-8898 at 3/28/20, 9:43 AM: --- {noformat} OAK-8898-1.10.8.patch {noformat} is a patch for Oak 1.10.8 oak-lucene/src. This is _not_ the 1.10 branch, which contains further changes; it is for [https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.10.8] I created a diagnostic build using: {noformat} cd oak-parent mvn versions:set -DnewVersion=1.10.8-OAK-8898 cd .. mvn -DskipTests clean install {noformat} See also [http://jackrabbit.apache.org/oak/docs/diagnostic-builds.html] To verify the patch is installed, see the log file and search for "org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex oak.lucene.oldFacetProvider = false". To switch to the old (buggy) behavior, set the system property "oak.lucene.oldFacetProvider" to "true", e.g. {noformat} java -Doak.lucene.oldFacetProvider=true ... {noformat} was (Author: tmueller): {noformat} OAK-8898-1.10.8.patch {noformat} is a patch for Oak 1.10.8 oak-lucene/src. This is _not_ the 1.10 branch, which contains further changes; it is for https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.10.8 I created a diagnostic build using: {noformat} cd oak-parent mvn versions:set -DnewVersion=1.10.8-OAK-8898 cd .. mvn -DskipTests clean install {noformat} To verify the patch is installed, see the log file and search for "org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex oak.lucene.oldFacetProvider = false". To switch to the old (buggy) behavior, set the system property "oak.lucene.oldFacetProvider" to "true", e.g. {noformat} java -Doak.lucene.oldFacetProvider=true ... {noformat} > On querying, IndexReader failed with AlreadyClosedException > --- > > Key: OAK-8898 > URL: https://issues.apache.org/jira/browse/OAK-8898 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Mohit Kataria >Assignee: Mohit Kataria >Priority: Major > Fix For: 1.26.0, 1.22.3 > > Attachments: OAK-8898-1.10.8.patch > > > This is an intermittent issue, where on querying the code throws > AlreadyClosedException. > > {code:java} > Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader > is closed > at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:262) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:108) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at org.apache.lucene.index.IndexReader.document(IndexReader.java:446) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getAccessibleSampleCount(StatisticalSortedSetDocValuesFacetCounts.java:169) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren0(StatisticalSortedSetDocValuesFacetCounts.java:104) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren(StatisticalSortedSetDocValuesFacetCounts.java:70) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LuceneFacetProvider.getFacets(LucenePropertyIndex.java:1547) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextResultRow.getFacets(FulltextIndex.java:353) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextPathCursor$2.getValue(FulltextIndex.java:472) > [org.apache.jackrabbit.oak-lucene:1.10.2] > ... 237 common frames omitted > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (OAK-8898) On querying, IndexReader failed with AlreadyClosedException
[ https://issues.apache.org/jira/browse/OAK-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069349#comment-17069349 ] Thomas Mueller edited comment on OAK-8898 at 3/28/20, 9:42 AM: --- {noformat} OAK-8898-1.10.8.patch {noformat} is a patch for Oak 1.10.8 oak-lucene/src. This is _not_ the 1.10 branch, which contains further changes; it is for https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.10.8 I created a diagnostic build using: {noformat} cd oak-parent mvn versions:set -DnewVersion=1.10.8-OAK-8898 cd .. mvn -DskipTests clean install {noformat} To verify the patch is installed, see the log file and search for "org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex oak.lucene.oldFacetProvider = false". To switch to the old (buggy) behavior, set the system property "oak.lucene.oldFacetProvider" to "true", e.g. {noformat} java -Doak.lucene.oldFacetProvider=true ... {noformat} was (Author: tmueller): OAK-8898-1.10.8.patch is a patch for Oak 1.10.8 oak-lucene/src. This is _not_ the 1.10 branch, which contains further changes; it is for https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.10.8 I created a diagnostic build using: {noformat} cd oak-parent mvn versions:set -DnewVersion=1.10.8-OAK-8898 cd .. mvn -DskipTests clean install {noformat} To verify the patch is installed, see the log file and search for "org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex oak.lucene.oldFacetProvider = false". To switch to the old (buggy) behavior, set the system property "oak.lucene.oldFacetProvider" to "true", e.g. {noformat} java -Doak.lucene.oldFacetProvider=true ... {noformat} > On querying, IndexReader failed with AlreadyClosedException > --- > > Key: OAK-8898 > URL: https://issues.apache.org/jira/browse/OAK-8898 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Mohit Kataria >Assignee: Mohit Kataria >Priority: Major > Fix For: 1.26.0, 1.22.3 > > > This is an intermittent issue, where on querying the code throws > AlreadyClosedException. > > {code:java} > Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader > is closed > at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:262) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:108) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at org.apache.lucene.index.IndexReader.document(IndexReader.java:446) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getAccessibleSampleCount(StatisticalSortedSetDocValuesFacetCounts.java:169) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren0(StatisticalSortedSetDocValuesFacetCounts.java:104) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren(StatisticalSortedSetDocValuesFacetCounts.java:70) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LuceneFacetProvider.getFacets(LucenePropertyIndex.java:1547) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextResultRow.getFacets(FulltextIndex.java:353) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextPathCursor$2.getValue(FulltextIndex.java:472) > [org.apache.jackrabbit.oak-lucene:1.10.2] > ... 237 common frames omitted > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8898) On querying, IndexReader failed with AlreadyClosedException
[ https://issues.apache.org/jira/browse/OAK-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069349#comment-17069349 ] Thomas Mueller commented on OAK-8898: - OAK-8898-1.10.8.patch is a patch for Oak 1.10.8 oak-lucene/src. This is _not_ the 1.10 branch, which contains further changes; it is for https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.10.8 I created a diagnostic build using: {noformat} cd oak-parent mvn versions:set -DnewVersion=1.10.8-OAK-8898 cd .. mvn -DskipTests clean install {noformat} To verify the patch is installed, see the log file and search for "org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex oak.lucene.oldFacetProvider = false". To switch to the old (buggy) behavior, set the system property "oak.lucene.oldFacetProvider" to "true", e.g. {noformat} java -Doak.lucene.oldFacetProvider=true ... {noformat} > On querying, IndexReader failed with AlreadyClosedException > --- > > Key: OAK-8898 > URL: https://issues.apache.org/jira/browse/OAK-8898 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Mohit Kataria >Assignee: Mohit Kataria >Priority: Major > Fix For: 1.26.0, 1.22.3 > > > This is an intermittent issue, where on querying the code throws > AlreadyClosedException. > > {code:java} > Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader > is closed > at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:262) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:108) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at org.apache.lucene.index.IndexReader.document(IndexReader.java:446) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getAccessibleSampleCount(StatisticalSortedSetDocValuesFacetCounts.java:169) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren0(StatisticalSortedSetDocValuesFacetCounts.java:104) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren(StatisticalSortedSetDocValuesFacetCounts.java:70) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LuceneFacetProvider.getFacets(LucenePropertyIndex.java:1547) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextResultRow.getFacets(FulltextIndex.java:353) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextPathCursor$2.getValue(FulltextIndex.java:472) > [org.apache.jackrabbit.oak-lucene:1.10.2] > ... 237 common frames omitted > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OAK-8898) On querying, IndexReader failed with AlreadyClosedException
[ https://issues.apache.org/jira/browse/OAK-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller reassigned OAK-8898: --- Assignee: Mohit Kataria > On querying, IndexReader failed with AlreadyClosedException > --- > > Key: OAK-8898 > URL: https://issues.apache.org/jira/browse/OAK-8898 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Mohit Kataria >Assignee: Mohit Kataria >Priority: Major > Fix For: 1.26.0, 1.22.3 > > > This is an intermittent issue, where on querying the code throws > AlreadyClosedException. > > {code:java} > Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader > is closed > at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:262) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:108) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at org.apache.lucene.index.IndexReader.document(IndexReader.java:446) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getAccessibleSampleCount(StatisticalSortedSetDocValuesFacetCounts.java:169) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren0(StatisticalSortedSetDocValuesFacetCounts.java:104) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren(StatisticalSortedSetDocValuesFacetCounts.java:70) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LuceneFacetProvider.getFacets(LucenePropertyIndex.java:1547) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextResultRow.getFacets(FulltextIndex.java:353) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextPathCursor$2.getValue(FulltextIndex.java:472) > [org.apache.jackrabbit.oak-lucene:1.10.2] > ... 237 common frames omitted > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (OAK-8971) Indexing: option augment an indexed property as an alternative to IndexFieldProvider
Thomas Mueller created OAK-8971: --- Summary: Indexing: option augment an indexed property as an alternative to IndexFieldProvider Key: OAK-8971 URL: https://issues.apache.org/jira/browse/OAK-8971 Project: Jackrabbit Oak Issue Type: Improvement Components: indexing Reporter: Thomas Mueller Assignee: Thomas Mueller The org.apache.jackrabbit.oak.plugins.index.lucene.spi.IndexFieldProvider is a callback that allows to change the behavior of indexing. There are multiple problems: * (1) Not available using oak-run * (2) Only available for Lucene indexes Instead of a callback, a configuration option in the index property should be added, such that the most commonly used features are available in oak-run, and can be implemented in other indexes (e.g. Elastisearch, Solr). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-6303) Cache in CachingBlobStore might grow beyond configured limit
[ https://issues.apache.org/jira/browse/OAK-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060977#comment-17060977 ] Thomas Mueller commented on OAK-6303: - [~reschke] I just saw this is old issue the same as OAK-8950, but for a different class: CachingBlobStore I don't know if this issue is still needed... I unassigned it from me (should have do that a long time ago, sorry) > Cache in CachingBlobStore might grow beyond configured limit > > > Key: OAK-6303 > URL: https://issues.apache.org/jira/browse/OAK-6303 > Project: Jackrabbit Oak > Issue Type: Bug > Components: blob, core >Reporter: Julian Reschke >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > Attachments: OAK-6303-test.diff, OAK-6303.diff > > > It appears that depending on actual cache entry sizes, the {{CacheLIRS}} > might grow beyond the configured limit. > For {{RDBBlobStore}}, the limit is currently configured to 16MB, yet storing > random 2M entries appears to fill the cache with 64MB of data (according to > it's own stats). > The attached test case reproduces this. > (it seems this is caused by the fact that each of the 16 segments of the > cache can hold 2 entries, no matter how big they are...) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OAK-6303) Cache in CachingBlobStore might grow beyond configured limit
[ https://issues.apache.org/jira/browse/OAK-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller reassigned OAK-6303: --- Assignee: (was: Thomas Mueller) > Cache in CachingBlobStore might grow beyond configured limit > > > Key: OAK-6303 > URL: https://issues.apache.org/jira/browse/OAK-6303 > Project: Jackrabbit Oak > Issue Type: Bug > Components: blob, core >Reporter: Julian Reschke >Priority: Major > Fix For: 1.28.0 > > Attachments: OAK-6303-test.diff, OAK-6303.diff > > > It appears that depending on actual cache entry sizes, the {{CacheLIRS}} > might grow beyond the configured limit. > For {{RDBBlobStore}}, the limit is currently configured to 16MB, yet storing > random 2M entries appears to fill the cache with 64MB of data (according to > it's own stats). > The attached test case reproduces this. > (it seems this is caused by the fact that each of the 16 segments of the > cache can hold 2 entries, no matter how big they are...) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OAK-8950) DataStore: FileCache should use one cache segment
[ https://issues.apache.org/jira/browse/OAK-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-8950. - Resolution: Fixed > DataStore: FileCache should use one cache segment > - > > Key: OAK-8950 > URL: https://issues.apache.org/jira/browse/OAK-8950 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > The FileCache in the caching data store (Azure, S3) uses the default segment > count of 16. The effect of that is: > * if the maximum cache size is e.g. 16 GB > * and there are e.g. 15 files 1 GB each (total 15 GB), > * it can happen that some files are evicted, > * because internally the cache is using 16 segments of 1 GB each, > * and by chance 2 files could be in the same segment, > * so that one of those files is evicted > The workaround is to use a really large cache size (e.g. 100 GB if you only > want 15 GB of cache size), but the drawback is that, if most files are very > small, that the cache size could become actually 100 GB. > The best solution is probably to use only 1 segment. There is tiny a > concurrency issue: right now, deleting files is synchronized on the segment. > But I think that's not a big problem (to be tested). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8950) DataStore: FileCache should use one cache segment
[ https://issues.apache.org/jira/browse/OAK-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058695#comment-17058695 ] Thomas Mueller commented on OAK-8950: - http://svn.apache.org/r1875151 (trunk) > DataStore: FileCache should use one cache segment > - > > Key: OAK-8950 > URL: https://issues.apache.org/jira/browse/OAK-8950 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > The FileCache in the caching data store (Azure, S3) uses the default segment > count of 16. The effect of that is: > * if the maximum cache size is e.g. 16 GB > * and there are e.g. 15 files 1 GB each (total 15 GB), > * it can happen that some files are evicted, > * because internally the cache is using 16 segments of 1 GB each, > * and by chance 2 files could be in the same segment, > * so that one of those files is evicted > The workaround is to use a really large cache size (e.g. 100 GB if you only > want 15 GB of cache size), but the drawback is that, if most files are very > small, that the cache size could become actually 100 GB. > The best solution is probably to use only 1 segment. There is tiny a > concurrency issue: right now, deleting files is synchronized on the segment. > But I think that's not a big problem (to be tested). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8950) DataStore: FileCache should use one cache segment
[ https://issues.apache.org/jira/browse/OAK-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8950: Fix Version/s: 1.26.0 > DataStore: FileCache should use one cache segment > - > > Key: OAK-8950 > URL: https://issues.apache.org/jira/browse/OAK-8950 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.26.0 > > > The FileCache in the caching data store (Azure, S3) uses the default segment > count of 16. The effect of that is: > * if the maximum cache size is e.g. 16 GB > * and there are e.g. 15 files 1 GB each (total 15 GB), > * it can happen that some files are evicted, > * because internally the cache is using 16 segments of 1 GB each, > * and by chance 2 files could be in the same segment, > * so that one of those files is evicted > The workaround is to use a really large cache size (e.g. 100 GB if you only > want 15 GB of cache size), but the drawback is that, if most files are very > small, that the cache size could become actually 100 GB. > The best solution is probably to use only 1 segment. There is tiny a > concurrency issue: right now, deleting files is synchronized on the segment. > But I think that's not a big problem (to be tested). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8950) DataStore: FileCache should use one cache segment
[ https://issues.apache.org/jira/browse/OAK-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057929#comment-17057929 ] Thomas Mueller commented on OAK-8950: - Patch for review: [https://github.com/oak-indexing/jackrabbit-oak/pull/63] > DataStore: FileCache should use one cache segment > - > > Key: OAK-8950 > URL: https://issues.apache.org/jira/browse/OAK-8950 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > The FileCache in the caching data store (Azure, S3) uses the default segment > count of 16. The effect of that is: > * if the maximum cache size is e.g. 16 GB > * and there are e.g. 15 files 1 GB each (total 15 GB), > * it can happen that some files are evicted, > * because internally the cache is using 16 segments of 1 GB each, > * and by chance 2 files could be in the same segment, > * so that one of those files is evicted > The workaround is to use a really large cache size (e.g. 100 GB if you only > want 15 GB of cache size), but the drawback is that, if most files are very > small, that the cache size could become actually 100 GB. > The best solution is probably to use only 1 segment. There is tiny a > concurrency issue: right now, deleting files is synchronized on the segment. > But I think that's not a big problem (to be tested). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (OAK-8950) DataStore: FileCache should use one cache segment
Thomas Mueller created OAK-8950: --- Summary: DataStore: FileCache should use one cache segment Key: OAK-8950 URL: https://issues.apache.org/jira/browse/OAK-8950 Project: Jackrabbit Oak Issue Type: Improvement Components: blob Reporter: Thomas Mueller Assignee: Thomas Mueller The FileCache in the caching data store (Azure, S3) uses the default segment count of 16. The effect of that is: * if the maximum cache size is e.g. 16 GB * and there are e.g. 15 files 1 GB each (total 15 GB), * it can happen that some files are evicted, * because internally the cache is using 16 segments of 1 GB each, * and by chance 2 files could be in the same segment, * so that one of those files is evicted The workaround is to use a really large cache size (e.g. 100 GB if you only want 15 GB of cache size), but the drawback is that, if most files are very small, that the cache size could become actually 100 GB. The best solution is probably to use only 1 segment. There is tiny a concurrency issue: right now, deleting files is synchronized on the segment. But I think that's not a big problem (to be tested). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8898) On querying, IndexReader failed with AlreadyClosedException
[ https://issues.apache.org/jira/browse/OAK-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056052#comment-17056052 ] Thomas Mueller commented on OAK-8898: - [~mkataria] I created a branch here: [https://github.com/oak-indexing/jackrabbit-oak/tree/OAK-8898] This allows to reproduce the issue (it is based on your test case). I also found the root cause, and a possible solution (see LucenePropertyIndex.OLD_FACET_PROVIDER). The problem seems to be that the reader is used after it is closed, by leaking the reference to the searcher to the LuceneFacetProvider in loadDocs(). I created a DelayedLuceneFacetProvider that opens acquires and releases the searcher when needed (acquireIndexNode, release in finally). It would be good if the test can reproduce the issue even without the delays; we can discuss this. > On querying, IndexReader failed with AlreadyClosedException > --- > > Key: OAK-8898 > URL: https://issues.apache.org/jira/browse/OAK-8898 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Mohit Kataria >Priority: Major > > This is an intermittent issue, where on querying the code throws > AlreadyClosedException. > > {code:java} > Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader > is closed > at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:262) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:108) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at org.apache.lucene.index.IndexReader.document(IndexReader.java:446) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getAccessibleSampleCount(StatisticalSortedSetDocValuesFacetCounts.java:169) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren0(StatisticalSortedSetDocValuesFacetCounts.java:104) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren(StatisticalSortedSetDocValuesFacetCounts.java:70) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LuceneFacetProvider.getFacets(LucenePropertyIndex.java:1547) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextResultRow.getFacets(FulltextIndex.java:353) > [org.apache.jackrabbit.oak-lucene:1.10.2] > at > org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextPathCursor$2.getValue(FulltextIndex.java:472) > [org.apache.jackrabbit.oak-lucene:1.10.2] > ... 237 common frames omitted > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (OAK-8934) Indexing: filter entries with a regular expression
[ https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051214#comment-17051214 ] Thomas Mueller edited comment on OAK-8934 at 3/4/20, 1:11 PM: -- [http://svn.apache.org/r1874786|http://svn.apache.org/r1874786] was (Author: tmueller): [http://svn.apache.org/r1874786|http://svn/] > Indexing: filter entries with a regular expression > -- > > Key: OAK-8934 > URL: https://issues.apache.org/jira/browse/OAK-8934 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Labels: amrit > > We should provide a way to filter the index using a regular expression. For > example, only index nodes that contain a reference to another node. (Not a > JCR reference, but a reference within the value itself). For example, index a > node if one of the properties contains: > * /content/abc > * > * and so on > This will allow to run a query to find if /content/abc is referenced. The > index and the query will probably need to use a tag, and the cost of the > index needs to be high. Otherwise the query engine can't know when this index > should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OAK-8934) Indexing: filter entries with a regular expression
[ https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-8934. - Resolution: Fixed > Indexing: filter entries with a regular expression > -- > > Key: OAK-8934 > URL: https://issues.apache.org/jira/browse/OAK-8934 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Labels: amrit > Fix For: 1.26.0 > > > We should provide a way to filter the index using a regular expression. For > example, only index nodes that contain a reference to another node. (Not a > JCR reference, but a reference within the value itself). For example, index a > node if one of the properties contains: > * /content/abc > * > * and so on > This will allow to run a query to find if /content/abc is referenced. The > index and the query will probably need to use a tag, and the cost of the > index needs to be high. Otherwise the query engine can't know when this index > should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8934) Indexing: filter entries with a regular expression
[ https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8934: Fix Version/s: 1.26.0 > Indexing: filter entries with a regular expression > -- > > Key: OAK-8934 > URL: https://issues.apache.org/jira/browse/OAK-8934 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Labels: amrit > Fix For: 1.26.0 > > > We should provide a way to filter the index using a regular expression. For > example, only index nodes that contain a reference to another node. (Not a > JCR reference, but a reference within the value itself). For example, index a > node if one of the properties contains: > * /content/abc > * > * and so on > This will allow to run a query to find if /content/abc is referenced. The > index and the query will probably need to use a tag, and the cost of the > index needs to be high. Otherwise the query engine can't know when this index > should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8934) Indexing: filter entries with a regular expression
[ https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051214#comment-17051214 ] Thomas Mueller commented on OAK-8934: - [http://svn.apache.org/r1874786|http://svn/] > Indexing: filter entries with a regular expression > -- > > Key: OAK-8934 > URL: https://issues.apache.org/jira/browse/OAK-8934 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Labels: amrit > > We should provide a way to filter the index using a regular expression. For > example, only index nodes that contain a reference to another node. (Not a > JCR reference, but a reference within the value itself). For example, index a > node if one of the properties contains: > * /content/abc > * > * and so on > This will allow to run a query to find if /content/abc is referenced. The > index and the query will probably need to use a tag, and the cost of the > index needs to be high. Otherwise the query engine can't know when this index > should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (OAK-8934) Indexing: filter entries with a regular expression
Thomas Mueller created OAK-8934: --- Summary: Indexing: filter entries with a regular expression Key: OAK-8934 URL: https://issues.apache.org/jira/browse/OAK-8934 Project: Jackrabbit Oak Issue Type: Improvement Components: indexing Reporter: Thomas Mueller Assignee: Thomas Mueller We should provide a way to filter the index using a regular expression. For example, only index nodes that contain a reference to another node. (Not a JCR reference, but a reference within the value itself). For example, index a node if one of the properties contains: * /content/abc * * and so on This will allow to run a query to find if /content/abc is referenced. The index and the query will probably need to use a tag, and the cost of the index needs to be high. Otherwise the query engine can't know when this index should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8934) Indexing: filter entries with a regular expression
[ https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8934: Labels: amrit (was: ) > Indexing: filter entries with a regular expression > -- > > Key: OAK-8934 > URL: https://issues.apache.org/jira/browse/OAK-8934 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Labels: amrit > > We should provide a way to filter the index using a regular expression. For > example, only index nodes that contain a reference to another node. (Not a > JCR reference, but a reference within the value itself). For example, index a > node if one of the properties contains: > * /content/abc > * > * and so on > This will allow to run a query to find if /content/abc is referenced. The > index and the query will probably need to use a tag, and the cost of the > index needs to be high. Otherwise the query engine can't know when this index > should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OAK-8910) Improve OAK Lucene Index Documentation
[ https://issues.apache.org/jira/browse/OAK-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-8910. - Fix Version/s: 1.26.0 Resolution: Fixed > Improve OAK Lucene Index Documentation > -- > > Key: OAK-8910 > URL: https://issues.apache.org/jira/browse/OAK-8910 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Assignee: Thomas Mueller >Priority: Minor > Labels: amrit > Fix For: 1.26.0 > > Attachments: OAK-8910.patch > > > Improve [http://jackrabbit.apache.org/oak/docs/query/lucene.html] with the > following: > * Extend the *analyzers* section including a reference on how to support > *stemming* ([http://jackrabbit.apache.org/oak/docs/query/lucene.html]) > * *supersedes* - does not seem to be documented** > * *functionName (string)* & *useIfExists (string)* are not listed in the > canonical *Index Definition* structure. > * *function (string)* is not listed in the canonical *Property Definitions* > structure > * *weight* - in the canonical structure the default value is -1, but the > actual default is 5 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8910) Improve OAK Lucene Index Documentation
[ https://issues.apache.org/jira/browse/OAK-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046769#comment-17046769 ] Thomas Mueller commented on OAK-8910: - http://svn.apache.org/r1874582 (trunk) > Improve OAK Lucene Index Documentation > -- > > Key: OAK-8910 > URL: https://issues.apache.org/jira/browse/OAK-8910 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Assignee: Thomas Mueller >Priority: Minor > Labels: amrit > Attachments: OAK-8910.patch > > > Improve [http://jackrabbit.apache.org/oak/docs/query/lucene.html] with the > following: > * Extend the *analyzers* section including a reference on how to support > *stemming* ([http://jackrabbit.apache.org/oak/docs/query/lucene.html]) > * *supersedes* - does not seem to be documented** > * *functionName (string)* & *useIfExists (string)* are not listed in the > canonical *Index Definition* structure. > * *function (string)* is not listed in the canonical *Property Definitions* > structure > * *weight* - in the canonical structure the default value is -1, but the > actual default is 5 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OAK-8910) Improve OAK Lucene Index Documentation
[ https://issues.apache.org/jira/browse/OAK-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller reassigned OAK-8910: --- Assignee: Thomas Mueller > Improve OAK Lucene Index Documentation > -- > > Key: OAK-8910 > URL: https://issues.apache.org/jira/browse/OAK-8910 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Assignee: Thomas Mueller >Priority: Minor > Labels: amrit > Attachments: OAK-8910.patch > > > Improve [http://jackrabbit.apache.org/oak/docs/query/lucene.html] with the > following: > * Extend the *analyzers* section including a reference on how to support > *stemming* ([http://jackrabbit.apache.org/oak/docs/query/lucene.html]) > * *supersedes* - does not seem to be documented** > * *functionName (string)* & *useIfExists (string)* are not listed in the > canonical *Index Definition* structure. > * *function (string)* is not listed in the canonical *Property Definitions* > structure > * *weight* - in the canonical structure the default value is -1, but the > actual default is 5 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-7671) [oak-run] Deprecate the datastorecheck command in favor of datastore
[ https://issues.apache.org/jira/browse/OAK-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046750#comment-17046750 ] Thomas Mueller commented on OAK-7671: - Github has some issues currently according to https://www.githubstatus.com/ For me the patch looks good. For the method "encodeId", it would be good to add some comments on what it is doing and some example input and output. It's very hard to understand right now. But this was the case before, and is not related to the patch. If you already know some details (maybe by debugging), it would be good to add the info. It doesn't need to be a Javadoc: {noformat} /** * Encode the ... and extract the ... * Example: * => ... * => ... */ static String encodeId(String line, BlobStoreOptions.Type dsType) { // 0102030405... => 01/02/03/0102030405... blobId = (blobId.substring(0, 2) + FILE_SEPARATOR.value() + blobId.substring(2, 4) + FILE_SEPARATOR.value() + blobId .substring(4, 6) + FILE_SEPARATOR.value() + blobId); // 0102030405... => 0102-030405... blobId = (blobId.substring(0, 4) + DASH + blobId.substring(4)); if (list.size() > 1) { // ( this part I don't understand... why list.get(1)? what does it do?) return delimJoiner.join(blobId, EscapeUtils.unescapeLineBreaks(list.get(1))); {noformat} > [oak-run] Deprecate the datastorecheck command in favor of datastore > > > Key: OAK-7671 > URL: https://issues.apache.org/jira/browse/OAK-7671 > Project: Jackrabbit Oak > Issue Type: Task > Components: run >Reporter: Amit Jain >Assignee: Nitin Gupta >Priority: Major > Fix For: 1.26.0 > > > With the introduction of \{{datastore}} command which supports both garbage > collection as well as consistency check the \{{datastorecheck}} command > should be deprecated and delegated internally to use that implementation. > Besides some options which are currently not supported by the new command > should also be implemented e.g. --ids, --refs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040129#comment-17040129 ] Thomas Mueller commented on OAK-8783: - http://svn.apache.org/r1874198 (trunk) > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch, > OAK-8783-v2.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8892) Add javadoc to package-info files
[ https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8892: Labels: amrit (was: ) > Add javadoc to package-info files > - > > Key: OAK-8892 > URL: https://issues.apache.org/jira/browse/OAK-8892 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Assignee: Thomas Mueller >Priority: Minor > Labels: amrit > Fix For: 1.26.0 > > Attachments: OAK-8892.patch > > > Add javadoc to package-info files in all packages of {{oak-lucene}} , > {{oak-query-spi}} and {{oak-search}} . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OAK-8892) Add javadoc to package-info files
[ https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller reassigned OAK-8892: --- Assignee: Thomas Mueller > Add javadoc to package-info files > - > > Key: OAK-8892 > URL: https://issues.apache.org/jira/browse/OAK-8892 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Assignee: Thomas Mueller >Priority: Minor > Fix For: 1.26.0 > > Attachments: OAK-8892.patch > > > Add javadoc to package-info files in all packages of {{oak-lucene}} , > {{oak-query-spi}} and {{oak-search}} . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OAK-8892) Add javadoc to package-info files
[ https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-8892. - Resolution: Fixed > Add javadoc to package-info files > - > > Key: OAK-8892 > URL: https://issues.apache.org/jira/browse/OAK-8892 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Assignee: Thomas Mueller >Priority: Minor > Labels: amrit > Fix For: 1.26.0 > > Attachments: OAK-8892.patch > > > Add javadoc to package-info files in all packages of {{oak-lucene}} , > {{oak-query-spi}} and {{oak-search}} . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8892) Add javadoc to package-info files
[ https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040100#comment-17040100 ] Thomas Mueller commented on OAK-8892: - Thanks [~reschke]! "svn patch" didn't work as expected... Now hopefully it's better: http://svn.apache.org/r1874197 (trunk) > Add javadoc to package-info files > - > > Key: OAK-8892 > URL: https://issues.apache.org/jira/browse/OAK-8892 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Priority: Minor > Fix For: 1.26.0 > > Attachments: OAK-8892.patch > > > Add javadoc to package-info files in all packages of {{oak-lucene}} , > {{oak-query-spi}} and {{oak-search}} . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8902) Add support in oak-run to list down blob ids for lucene indexes
[ https://issues.apache.org/jira/browse/OAK-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039035#comment-17039035 ] Thomas Mueller commented on OAK-8902: - See my comments. > Add support in oak-run to list down blob ids for lucene indexes > --- > > Key: OAK-8902 > URL: https://issues.apache.org/jira/browse/OAK-8902 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Nitin Gupta >Assignee: Nitin Gupta >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038820#comment-17038820 ] Thomas Mueller commented on OAK-8783: - Thanks [~amitjain]! I didn't think about this... > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch, > OAK-8783-v2.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8892) Add javadoc to package-info files
[ https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038182#comment-17038182 ] Thomas Mueller commented on OAK-8892: - http://svn.apache.org/r1874108 > Add javadoc to package-info files > - > > Key: OAK-8892 > URL: https://issues.apache.org/jira/browse/OAK-8892 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Priority: Minor > Labels: amrit > Attachments: OAK-8892.patch > > > Add javadoc to package-info files in all packages of {{oak-lucene}} , > {{oak-query-spi}} and {{oak-search}} . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OAK-8892) Add javadoc to package-info files
[ https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-8892. - Resolution: Fixed > Add javadoc to package-info files > - > > Key: OAK-8892 > URL: https://issues.apache.org/jira/browse/OAK-8892 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Priority: Minor > Labels: amrit > Attachments: OAK-8892.patch > > > Add javadoc to package-info files in all packages of {{oak-lucene}} , > {{oak-query-spi}} and {{oak-search}} . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038181#comment-17038181 ] Thomas Mueller commented on OAK-8783: - http://svn.apache.org/r1874107 (trunk). Review is still welcome. > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch, > OAK-8783-v2.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8892) Add javadoc to package-info files
[ https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038128#comment-17038128 ] Thomas Mueller commented on OAK-8892: - [~reschke] no that was a mistake, I'm sorry... I will remove the export versions and try again. /cc [~amrverma] > Add javadoc to package-info files > - > > Key: OAK-8892 > URL: https://issues.apache.org/jira/browse/OAK-8892 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Priority: Minor > Labels: amrit > Attachments: OAK-8892.patch > > > Add javadoc to package-info files in all packages of {{oak-lucene}} , > {{oak-query-spi}} and {{oak-search}} . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036858#comment-17036858 ] Thomas Mueller commented on OAK-8783: - [~ngupta] [~tihom88] [~fabrizio.fort...@gmail.com] could you please review OAK-8783-v2.patch ? > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch, > OAK-8783-v2.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8783: Attachment: OAK-8783-v2.patch > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch, > OAK-8783-v2.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OAK-8892) Add javadoc to package-info files
[ https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-8892. - Resolution: Fixed > Add javadoc to package-info files > - > > Key: OAK-8892 > URL: https://issues.apache.org/jira/browse/OAK-8892 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Priority: Minor > Labels: amrit > Attachments: OAK-8892.patch > > > Add javadoc to package-info files in all packages of {{oak-lucene}} , > {{oak-query-spi}} and {{oak-search}} . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8892) Add javadoc to package-info files
[ https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036188#comment-17036188 ] Thomas Mueller commented on OAK-8892: - Thanks! http://svn.apache.org/r1873977 > Add javadoc to package-info files > - > > Key: OAK-8892 > URL: https://issues.apache.org/jira/browse/OAK-8892 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Priority: Minor > Labels: amrit > Attachments: OAK-8892.patch > > > Add javadoc to package-info files in all packages of {{oak-lucene}} , > {{oak-query-spi}} and {{oak-search}} . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8711) Queries with facets should not use traversal
[ https://issues.apache.org/jira/browse/OAK-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030682#comment-17030682 ] Thomas Mueller commented on OAK-8711: - The attached patch looks good to me. One nitpick: in the test case, you could in theory check if the right index is used, by executing "explain select ..." and then check the query plan. But I think it's not strictly needed to have such a test case, I'm fine with what you have right now. > Queries with facets should not use traversal > > > Key: OAK-8711 > URL: https://issues.apache.org/jira/browse/OAK-8711 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Nitin Gupta >Assignee: Nitin Gupta >Priority: Major > Labels: amrit > Attachments: OAK-8711.patch > > > Consider a scenario where a query is there with facets and the traversal cost > is less than the index cost that serves the facet query . This would be > problematic. > > In this case we should maybe set the traversal cost to infinity so that > traversal is not an option for queries with facets. > > In case there is no index available to serve this faceted query we can > probably throw an exception with a meaningful message . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8892) Add javadoc to package-info files
[ https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029829#comment-17029829 ] Thomas Mueller commented on OAK-8892: - See pull request https://github.com/apache/jackrabbit-oak/pull/175 > Add javadoc to package-info files > - > > Key: OAK-8892 > URL: https://issues.apache.org/jira/browse/OAK-8892 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Amrit Verma >Priority: Minor > Labels: amrit > > Add javadoc to package-info files in all packages of {{oak-lucene}} , > {{oak-query-spi}} and {{oak-search}} . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8711) Queries with facets should not use traversal
[ https://issues.apache.org/jira/browse/OAK-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029830#comment-17029830 ] Thomas Mueller commented on OAK-8711: - See pull request https://github.com/apache/jackrabbit-oak/pull/174 > Queries with facets should not use traversal > > > Key: OAK-8711 > URL: https://issues.apache.org/jira/browse/OAK-8711 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Nitin Gupta >Assignee: Nitin Gupta >Priority: Major > Labels: amrit > > Consider a scenario where a query is there with facets and the traversal cost > is less than the index cost that serves the facet query . This would be > problematic. > > In this case we should maybe set the traversal cost to infinity so that > traversal is not an option for queries with facets. > > In case there is no index available to serve this faceted query we can > probably throw an exception with a meaningful message . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8711) Queries with facets should not use traversal
[ https://issues.apache.org/jira/browse/OAK-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8711: Labels: amrit (was: ) > Queries with facets should not use traversal > > > Key: OAK-8711 > URL: https://issues.apache.org/jira/browse/OAK-8711 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Nitin Gupta >Assignee: Nitin Gupta >Priority: Major > Labels: amrit > > Consider a scenario where a query is there with facets and the traversal cost > is less than the index cost that serves the facet query . This would be > problematic. > > In this case we should maybe set the traversal cost to infinity so that > traversal is not an option for queries with facets. > > In case there is no index available to serve this faceted query we can > probably throw an exception with a meaningful message . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OAK-8854) Improved log message when failed to index an node due to IOException
[ https://issues.apache.org/jira/browse/OAK-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-8854. - Resolution: Fixed > Improved log message when failed to index an node due to IOException > > > Key: OAK-8854 > URL: https://issues.apache.org/jira/browse/OAK-8854 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.22.0 > > > When there is an IOException trying to index the node, there are cases where > the root cause (IOException message) is not logged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8854) Improved log message when failed to index an node due to IOException
[ https://issues.apache.org/jira/browse/OAK-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8854: Fix Version/s: 1.22.0 > Improved log message when failed to index an node due to IOException > > > Key: OAK-8854 > URL: https://issues.apache.org/jira/browse/OAK-8854 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.22.0 > > > When there is an IOException trying to index the node, there are cases where > the root cause (IOException message) is not logged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8854) Improved log message when failed to index an node due to IOException
[ https://issues.apache.org/jira/browse/OAK-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012951#comment-17012951 ] Thomas Mueller commented on OAK-8854: - http://svn.apache.org/r1872603 http://svn.apache.org/r1872604 > Improved log message when failed to index an node due to IOException > > > Key: OAK-8854 > URL: https://issues.apache.org/jira/browse/OAK-8854 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > When there is an IOException trying to index the node, there are cases where > the root cause (IOException message) is not logged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (OAK-8854) Improved log message when failed to index an node due to IOException
Thomas Mueller created OAK-8854: --- Summary: Improved log message when failed to index an node due to IOException Key: OAK-8854 URL: https://issues.apache.org/jira/browse/OAK-8854 Project: Jackrabbit Oak Issue Type: Improvement Components: indexing Reporter: Thomas Mueller Assignee: Thomas Mueller When there is an IOException trying to index the node, there are cases where the root cause (IOException message) is not logged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-6254) DataStore: API to retrieve approximate storage size
[ https://issues.apache.org/jira/browse/OAK-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-6254: Priority: Minor (was: Major) > DataStore: API to retrieve approximate storage size > --- > > Key: OAK-6254 > URL: https://issues.apache.org/jira/browse/OAK-6254 > Project: Jackrabbit Oak > Issue Type: Bug > Components: blob >Reporter: Thomas Mueller >Priority: Minor > > The estimated size of the datastore (on disk) is needed to: > * monitor growth over time, or growth of certain operations > * monitor if garbage collection is effective > * avoid out of disk space > * estimate backup size > * statistical purposes (for example, if there are many repositories, to group > them by size) > Datastore size: we could use the following heuristic: We could read the file > sizes in ./datastore/00/00 (if it exists) and multiply by 65536; or > ./datastore/00 and multiply by 256. That would give a rough estimation > (within about 20% for repositories with datastore size > 50 GB). > I think this is mainly important for the FileDataStore. The S3 datastore, if > there is a simple and fast S3 API to read the size, then that would be good > as well, but if there is none, then returning "unknown" is fine for me. > As for the API, I would use something like this: {{long > getEstimatedStorageSize(int accuracyLevel)}} with accuracyLevel 1 for > inaccurate (fastest), 2 more accurate (slower),..., 9 precise (possibly very > slow). Similar to > [java.util.zip.Deflater.setLevel|https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html#setLevel(int)]. > I would expect it takes up to 1 second for accuracyLevel 0, up to 5 seconds > for accuracyLevel 1, and possibly hours for level 9. With level 1, I would > read files in 00/00, with level 2 - 8 I would read files in 00, and with > level 9 I would read all the files. For level 1, I wouldn't stop; for level > 2, if it takes more than 5 seconds, I would stop and return the current best > estimate. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-6254) DataStore: API to retrieve approximate storage size
[ https://issues.apache.org/jira/browse/OAK-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-6254: Fix Version/s: (was: 1.22.0) > DataStore: API to retrieve approximate storage size > --- > > Key: OAK-6254 > URL: https://issues.apache.org/jira/browse/OAK-6254 > Project: Jackrabbit Oak > Issue Type: Bug > Components: blob >Reporter: Thomas Mueller >Priority: Major > > The estimated size of the datastore (on disk) is needed to: > * monitor growth over time, or growth of certain operations > * monitor if garbage collection is effective > * avoid out of disk space > * estimate backup size > * statistical purposes (for example, if there are many repositories, to group > them by size) > Datastore size: we could use the following heuristic: We could read the file > sizes in ./datastore/00/00 (if it exists) and multiply by 65536; or > ./datastore/00 and multiply by 256. That would give a rough estimation > (within about 20% for repositories with datastore size > 50 GB). > I think this is mainly important for the FileDataStore. The S3 datastore, if > there is a simple and fast S3 API to read the size, then that would be good > as well, but if there is none, then returning "unknown" is fine for me. > As for the API, I would use something like this: {{long > getEstimatedStorageSize(int accuracyLevel)}} with accuracyLevel 1 for > inaccurate (fastest), 2 more accurate (slower),..., 9 precise (possibly very > slow). Similar to > [java.util.zip.Deflater.setLevel|https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html#setLevel(int)]. > I would expect it takes up to 1 second for accuracyLevel 0, up to 5 seconds > for accuracyLevel 1, and possibly hours for level 9. With level 1, I would > read files in 00/00, with level 2 - 8 I would read files in 00, and with > level 9 I would read all the files. For level 1, I wouldn't stop; for level > 2, if it takes more than 5 seconds, I would stop and return the current best > estimate. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-5787) BlobStore should be AutoCloseable
[ https://issues.apache.org/jira/browse/OAK-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994394#comment-16994394 ] Thomas Mueller commented on OAK-5787: - +1 > BlobStore should be AutoCloseable > - > > Key: OAK-5787 > URL: https://issues.apache.org/jira/browse/OAK-5787 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Fix For: 1.22.0 > > Attachments: OAK-5787.diff > > > {{DocumentNodeStore}} currently calls {{close()}} if the blob store instance > implements {{Closeable}}. > This has led to problems where wrapper implementations did not implement it, > and thus the actual blob store instance wasn't properly shut down. > Proposal: make {{BlobStore}} extend {{Closeable}} and get rid of all > {{instanceof}} checks. > [~thomasm] [~amitjain] - feedback appreciated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984861#comment-16984861 ] Thomas Mueller commented on OAK-8783: - http://svn.apache.org/r1870584 (trunk) - reviews are still welcome. I also had to change the version (from 1.0.1 to 1.1.0). > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8783: Component/s: indexing > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8783: Fix Version/s: (was: 1.22.0) > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8783: Fix Version/s: 1.22.0 > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.22.0 > > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984859#comment-16984859 ] Thomas Mueller commented on OAK-8783: - Good point! I will change the newObjectNotRespectingOrder test, so that it doesn't expect any specific order. > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984815#comment-16984815 ] Thomas Mueller commented on OAK-8783: - [~ngupta] [~tihom88] [~fabrizio.fort...@gmail.com] could you please review OAK-8783-json-1.patch ? > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-8783) Merge index definitions
[ https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-8783: Attachment: OAK-8783-json-1.patch > Merge index definitions > --- > > Key: OAK-8783 > URL: https://issues.apache.org/jira/browse/OAK-8783 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch > > > If there are multiple versions of an index, e.g. asset-2-custom-2 and > asset-3, then oak-run should be able to merge them to asset-3-custom-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)