Re: Access azure segments metadata in a case-insensitive way

2020-01-23 Thread Tomek Rękawek
Hi Aravindo,

I’m in favour of merging the patch. I think being strict in what we write and 
tolerant in what we read is a good thing. Please create an OAK issue and ping 
me and Andrei Dulceanu, so we can merge it.

Regards,
Tomek

-- 
Tomek Rękawek | ASF committer | www.apache.org
tom...@apache.org

> On 23 Jan 2020, at 11:08, Aravindo Wingeier  wrote:
> 
> Hi dev's,
> 
> We use azcopy to copy segments from one azure blob container to another for 
> testing. There is a bug in the current version of azcopy (10.3.3), which 
> makes all metadata keys start with a capital letter - "type" becomes "Type". 
> As a consequence, the current implementation can not find the segments in the 
> azure blob storage. 
> 
> The azcopy issue was already reported [1] in 2018, I am contacting MS 
> directly to follow up on this. As an alternative, we currently use azcopy 
> version 7, which is much slower and has reliability issues. 
> 
> I have little hope that azcopy will be fixed soon, therefore I suggest a 
> patch to oak-segment-azure, that would be backward compatible and ignore the 
> case of the keys when reading metadata. See the patch draft at [2]. 
> 
> What do you think is the best way to go forward? 
> 
> Best regards,
> 
> Aravindo Wingeier
> 
> [1]: https://github.com/Azure/azure-storage-azcopy/issues/113
> [2]: https://github.com/apache/jackrabbit-oak/pull/173



Intent to backport OAK-8124

2019-03-15 Thread Tomek Rękawek
Hello,

I’d like to backport the OAK-8124 to 1.10 and 1.8. This patch adds the 
security-related commit hooks, which were missing for the partial oak->oak 
migration in oak-upgrade.

Regards,
Tomek

-- 
Tomek Rękawek | ASF committer | www.apache.org
tom...@apache.org



Intent to backport: OAK-7540 to 1.8.x

2018-08-22 Thread Tomek Rękawek
Hello,

The unique indices may sometimes break on the Composite Node Store. Vikas fixed 
this in OAK-7540. I'd like to backport the fix to the 1.8, after an user 
request.

Regards,
Tomek

-- 
Tomek Rękawek | ASF committer | www.apache.org
tom...@apache.org


Intent to backport: OAK-7686 and OAK-7687

2018-08-09 Thread Tomek Rękawek
Hello,

The issues in subject fix incorrect behaviour of the oak-upgrade, when the 
partial migration is done (eg. only /content/site is being migrated). In this 
case, the full reindexing is triggered after starting the target repository. I 
plan to backport the issues to 1.8 and 1.6 branches, as we have an Oak 1.6 user 
who was hit by the described problem.

Regards,
Tomek

-- 
Tomek Rękawek | ASF committer | www.apache.org
tom...@apache.org



Re: Decide if a composite node store setup expose multiple checkpoint mbeans

2018-07-09 Thread Tomek Rękawek
Hello Vikas,

I think there was a similar case, described in OAK-5309 (multiple instances of 
the RevisionGCMBean). We introduced an extra property there - “role” - which 
can be used to differentiate the mbeans. It’s similar to the option 2 in your 
email. The empty role means that the mbean is related to the “main” node store, 
while non-empty one is only used for the partial node stores, gathered together 
by CNS. Maybe we can use similar approach here?

Regards,
Tomek

-- 
Tomek Rękawek | ASF committer | www.apache.org
tom...@apache.org

> On 5 Jul 2018, at 23:59, Vikas Saurabh  wrote:
> 
> Hi,
> 
> We recently discovered OAK-7610 [0] where
> ActiveDeletedBlobCollectorMBeanImpl got confused due to multiple
> implementations of CheckpointMBean being exposed in composite node
> store setups (since OAK-6315 [1] which implemented checkpoint bean for
> composite node store)
> 
> While, for the time being, we are going to avoid that confusion by
> changing ActiveDeletedBlobCollectorMBeanImpl to keep on returning
> oldest checkpoint timestamp if all CheckpointMBean implementations
> report the same oldest checkpoint timestamp. But that "work-around"
> works currently because composite node store uses global node store to
> list checkpoint to get oldest timestamp... but the approach is
> incorrect in general as there's no such guarantee.
> 
> So, here's the question for the discussion: how should the situation
> be handled correctly. Afaict, there are a few options (in decreasing
> order of my preference):
> 1. there's only a single checkpoint mbean exposed (that implies that
> mounted node store services need to "know" that they are mounted
> stores and hence shouldn't expose their own bean)
> 2. composite node store's checkpointMBean implementation can expose
> some metadata (say implement a marker interface) - discovering such
> implementation can mean "use this implementation for repository level
> functionality"
> 3. keep the work-around to be implemented in OAK-7610 [0] but document
> (ensure??) that the assumption that "all implementations would have
> same oldest checkpoint timestamp"
> 
> Would love to get some feedback.
> 
> [0]: https://issues.apache.org/jira/browse/OAK-7610
> [1]: https://issues.apache.org/jira/browse/OAK-7315
> 
> 
> Thanks,
> Vikas



Intent to backport OAK-7335

2018-03-13 Thread Tomek Rękawek
Hi,

I’m planning to backport OAK-7335 to 1.6.x. It’ll make the oak-upgrade more 
permissive when migrating nodes with long names (details in the issue).

Regards,
Tomek

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



Re: Azure Segment Store

2018-03-06 Thread Tomek Rękawek
Hi Ian,

> On 5 Mar 2018, at 17:47, Ian Boston <i...@tfd.co.uk> wrote:
> 
> I assume that the patch deals with the 50K limit[1] to the number of blocks
> per Azure Blob store ?

As far as I understand, it’s the limit that applies to the number of blocks in 
a single blob. Block is a single write. Since the segments are immutable 
(written at once), we don’t need to worry about this limit for the segments. 
It’s a different case for the journal file - a single commit leads to a single 
append which adds a block. However, the patch takes care of this, by creating 
journal.log.001, .002, when we’re close to the limit [1].

Regards,
Tomek

[1] 
https://github.com/trekawek/jackrabbit-oak/blob/OAK-6922/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureJournalFile.java#L37

--
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



signature.asc
Description: Message signed with OpenPGP


Azure Segment Store

2018-03-01 Thread Tomek Rękawek
Hello,

I prepared a prototype for the Azure-based Segment Store, which allows to 
persist all the SegmentMK-related resources (segments, journal, manifest, etc.) 
on a remote service, namely the Azure Blob Storage [1]. The whole description 
of the approach, data structure, etc. as well as the patch can be found in 
OAK-6922. It uses the extension points introduced in the OAK-6921.

While it’s still an experimental code, I’d like to commit it to trunk rather 
sooner than later. The patch is already pretty big and I’d like to avoid 
developing it “privately” on my own branch. It’s a new, optional Maven module, 
which doesn’t change any existing behaviour of Oak or SegmentMK. The only 
change it makes externally is adding a few exports to the oak-segment-tar, so 
it can use the SPI introduced in the OAK-6921. We may narrow these exports to a 
single package if you think it’d be good for the encapsulation.

There’s a related issue OAK-7297, which introduces the new fixture for 
benchmark and ITs. After merging it, all the Oak integration tests pass on the 
Azure Segment Store.

Looking forward for the feedback.

Regards,
Tomek

[1] https://azure.microsoft.com/en-us/services/storage/blobs/

--
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



signature.asc
Description: Message signed with OpenPGP


Intent to backport: OAK-6878

2017-10-27 Thread Tomek Rękawek
Hello,

I plan to backport the OAK-6878 to 1.6 branch today, so it’ll be included in 
the Monday release. It allows to set the S3DataStore configuration fields using 
a properties file. It’s requested by the customer - without this patch it’s 
impossible to set the cacheSize for the S3 migration.

Regards,
Tomek

--
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



signature.asc
Description: Message signed with OpenPGP


Re: [CompositeDataStore] How to properly create delegate data stores?

2017-10-26 Thread Tomek Rękawek
Hi Matt,

> On 24 Oct 2017, at 21:54, Matt Ryan <o...@mvryan.org> wrote:
> It is still unclear to me how this works in terms of configuration files,
> and how this would work for the CompositeDataStore.  This is how I believe
> it would work for two FileDataStores in the composite:
> 
> FDS config 1:
> 
> path=datastore/ds1
> role=local1
> 
> FDS config 2:
> 
> path=datastore/ds2
> role=local2
> 
> CompositeDataStore config:
> 
> local1:readOnly=false
> local2:readOnly=true
> 
> Something like that anyway.

Yes, I’d see something like this too.

> My questions then are:  How do we store both FileDataStore configuration
> files when both have the same PID?  What is the file name for each one?
> And how to do they associate with the FileDataStoreFactory?

For the factory services we use suffixes for the config files:

org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-local1.cfg
org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-local2.cfg
org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-other.cfg

OSGi knows that the […].FileDataStoreFactory is a factory and creates as many 
instances as needed, binding the provided configurations.

Regards,
Tomek

--
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com


signature.asc
Description: Message signed with OpenPGP


Intent to backport OAK-6604 and OAK-6611 to 1.6

2017-10-10 Thread Tomek Rękawek
Hi,

I plan to backport these two issues. They improve the S3 resilience in 
oak-upgrade by using the newer version of S3DataStore and waiting until all the 
uploads are finished.

Regards,
Tomek

--
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



signature.asc
Description: Message signed with OpenPGP


Re: [CompositeDataStore] How to properly create delegate data stores?

2017-10-04 Thread Tomek Rękawek
Hello Matt,

Please find my replies inlined.

> On 4 Oct 2017, at 00:13, Matt Ryan <o...@mvryan.org> wrote:
> 
>> 1. Create new BlobStoreProvider interface, with just one method:
>> getBlobStore().
>> 2. Modify all the existing blob store services adding them an optional
>> “role” property (any string).

> One concern I have with this approach is that if we want a data store to be
> usable as a CompositeDataStore delegate, that data store has to make
> specific provisions to do this.  My thinking was that it would be
> preferable to have the CompositeDataStore contain as much of the logic as
> possible.  Ideally a data store should work as a delegate without having to
> make any changes to the data store itself.  (Not sure if we can achieve
> this, but…)

Could you elaborate on what kind of provisioning is required for the delegatees?

From what I understand, you didn’t plan to rely on OSGi to get the delegate 
data stores, but initialise all of them in the CompositeDataStore (“contain as 
much of the logic as possible”). I’m not sure if this is a right approach. It 
means that composite data store have to depend on every existing blob store and 
know it internals. If something changes in any blob store, the composite data 
store have to be updated as well. For the data stores with a rich configuration 
(s3DataStore) this may get quite complex.

On the other hand, the OSGi-based approach makes the whole thing simpler, less 
coupled, extensible and easier for the maintenance. CompositeDataStore doesn’t 
need to know any concrete implementation, but rely on the BlobStore interface, 
without knowing the implementation. OSGi will take care of providing the 
already-configured delegatees.

>> 3. If the data store service is configured with this role, it should
>> register the BlobStoreProvider service rather than a normal BlobStore.
>> 4. The CompositeDataStoreService should be configured with a list of blob
>> store roles it should wait for.
>> 5. The CompositeDataStoreService has a MANDATORY_MULTIPLE @Reference of
>> type BlobStoreProvider.
>> 6. Once (a) the CompositeDataStoreService is activated and (b) all the blob
>> store providers are there, it’ll register a BlobStore service, which will
>> be picked up by the node store.


> I have concerns about this part also.  Which blob store providers should
> the CompositeDataStoreService wait for?
> 
> For example, should it wait for S3DataStore?  If yes, and if the
> installation doesn’t use the S3 connector, that provider will never show
> up, and therefore the CompositeDataStoreService would never get
> registered.  If it doesn’t wait for S3DataStore but the installation does
> use S3DataStore, what happens if that bundle is unloaded?

As above, the CompositeDataStore won’t wait for any particular implementations, 
but for the BlobStoreProvider configured with an appropriate roles. It knows 
the role list, so it can tell when all the roles are in place.

For instance, we can configure CompositeDataStore with following role list: 
local1, local2, shared.

Now, in the OSGi we’re configuring two FileDataStores, named “local1” and 
“local2” and also a S3DataStore named “shared”.

CompositeDataStore will be notified about all the data store registrations and 
as soon as three data stores are in place, it can carry on with its 
initialisation.

> Wouldn’t this approach require that every possible data store that can be a
> blob store provider for the composite be included in each installation that
> wants to use the CompositeDataStore?

No. The CompositeDataStore will only reference the BlobStoreProvider interface, 
not the actual implementations. It’ll be even possible for the customer to 
implement a completely new blob store implementation and use it as a delegatee 
(as long as he implements the BlobStoreProvider). Not that we expect customers 
to do that, but this kind of decoupling makes it easier to work on the Oak 
codebase.

Regards,
Tomek

--
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



signature.asc
Description: Message signed with OpenPGP


Re: oak-upgrade blocking the 1.7.1 release

2017-06-06 Thread Tomek Rękawek
Hi,

In OAK-6306 Davide provided a workaround for the broken javadocs (using a 
different Lucene version for the javadocs generation only). We can’t upgrade 
the Lucene version used by oak-upgrade, because it’ll break the Jackrabbit 2 - 
and we need to have this working in this module.

Regards,
Tomek

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com

> On 6 Jun 2017, at 09:23, Julian Reschke <julian.resc...@gmx.de> wrote:
> 
> On 2017-06-06 09:09, Alex Parvulescu wrote:
>> I'm not convinced. If you look at the javadoc error it is exactly like the
>> one from OAK-6150 (blocking 1.7.0 release at that time) that seemed to
>> magically go away.
> 
> Actually, I meant to make that a question
> 
> Davide, you did the 1.7.0 release, right? Do you recall how you got past the 
> error?
> 
> We may want to do this for 1.7.1 again, but in the mid term, we need to fix 
> this somehow...
> 
> Best regards, Julian



smime.p7s
Description: S/MIME cryptographic signature


backporting OAK-6294

2017-06-02 Thread Tomek Rękawek
Hi,

I’d like to backport the OAK-6294 to 1.6 and 1.4 before Monday, so it’ll be 
included in Oak 1.4.16. It fixes a NPE reported by the customer.

Regards,
Tomek

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



smime.p7s
Description: S/MIME cryptographic signature


Re: [VOTE] Release Apache Jackrabbit 2.14.1

2017-05-30 Thread Tomek Rękawek
Hi Julian,

> On 29 May 2017, at 11:37, Julian Reschke <resc...@apache.org> wrote:
>[ ] +1 Release this package as Apache Jackrabbit 2.14.1


+1

Regards,
Tomek

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



smime.p7s
Description: S/MIME cryptographic signature