[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers

2023-01-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679526#comment-17679526
 ] 

ASF subversion and git services commented on JCLOUDS-1488:
--

Commit e478dd5452d70a5ea2082337b05ad91f331f0eb6 in jclouds's branch 
refs/heads/master from Andrew Gaul
[ https://gitbox.apache.org/repos/asf?p=jclouds.git;h=e478dd5452 ]

JCLOUDS-1371: JCLOUDS-1488: optimize fs prefix

This reduces the number of stat calls required when prefix is deep in the
filesystem hierarchy.  Further optimizations to delimiter are possible.
References gaul/s3proxy#473.


> Filesystem list call with prefix is slow in large containers
> 
>
> Key: JCLOUDS-1488
> URL: https://issues.apache.org/jira/browse/JCLOUDS-1488
> Project: jclouds
>  Issue Type: Bug
>  Components: jclouds-blobstore
>Affects Versions: 2.1.1
> Environment: Java version: java version "1.8.0_131"
> Operating system: Fedora 27 x86_64
>Reporter: Lari Sinisalo
>Assignee: Andrew Gaul
>Priority: Major
>  Labels: filesystem
> Fix For: 2.2.0, 2.1.2
>
> Attachments: JCLOUDS1488.java
>
>
> When the filesystem blobstore is used, running the following code takes very 
> long if there are a lot of files in the container:
> {code:java}
>     ListContainerOptions options = new ListContainerOptions();
>     options.prefix("test-container-subdirectory/");
>     Set results =
>   blobStore.list("test-container",options);
> {code}
> See the attached Java source file [^JCLOUDS1488.java] for the full code.
> On my system, running the attached Java code takes over 10 seconds to list a 
> single file if there are 500,000 files in the container outside that prefix.
> Output from the attached code:
> {code:java}
> Number of blobs listed: 1
> First listed blob: test-container-subdirectory/file-to-list
> Time it took to list the blobs: 13256 ms
> {code}
> A more general version of this problem was reported previously in 
> JCLOUDS-1371.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers

2019-01-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755717#comment-16755717
 ] 

ASF subversion and git services commented on JCLOUDS-1488:
--

Commit 30b2ee9016a9f296a7e7ff5e219972f32db385dd in jclouds-labs's branch 
refs/heads/2.1.x from Andrew Gaul
[ https://gitbox.apache.org/repos/asf?p=jclouds-labs.git;h=30b2ee9 ]

JCLOUDS-1371: JCLOUDS-1488: list optimize prefix

Previously getBlobKeysInsideContainer returned all keys and filtered
in LocalBlobStore.  Now getBlobKeysInsideContainer filters via prefix
which can dramatically decrease the number of keys returned,
especially for the filesystem provider.  Further optimizations are
possible for delimiter.


> Filesystem list call with prefix is slow in large containers
> 
>
> Key: JCLOUDS-1488
> URL: https://issues.apache.org/jira/browse/JCLOUDS-1488
> Project: jclouds
>  Issue Type: Bug
>  Components: jclouds-blobstore
>Affects Versions: 2.1.1
> Environment: Java version: java version "1.8.0_131"
> Operating system: Fedora 27 x86_64
>Reporter: Lari Sinisalo
>Assignee: Andrew Gaul
>Priority: Major
>  Labels: filesystem
> Fix For: 2.2.0, 2.1.2
>
> Attachments: JCLOUDS1488.java
>
>
> When the filesystem blobstore is used, running the following code takes very 
> long if there are a lot of files in the container:
> {code:java}
>     ListContainerOptions options = new ListContainerOptions();
>     options.prefix("test-container-subdirectory/");
>     Set results =
>   blobStore.list("test-container",options);
> {code}
> See the attached Java source file [^JCLOUDS1488.java] for the full code.
> On my system, running the attached Java code takes over 10 seconds to list a 
> single file if there are 500,000 files in the container outside that prefix.
> Output from the attached code:
> {code:java}
> Number of blobs listed: 1
> First listed blob: test-container-subdirectory/file-to-list
> Time it took to list the blobs: 13256 ms
> {code}
> A more general version of this problem was reported previously in 
> JCLOUDS-1371.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers

2019-01-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755715#comment-16755715
 ] 

ASF subversion and git services commented on JCLOUDS-1488:
--

Commit aad98e6f9660fd7a4dd30fa72a2c16a41e1d8584 in jclouds-labs's branch 
refs/heads/master from Andrew Gaul
[ https://gitbox.apache.org/repos/asf?p=jclouds-labs.git;h=aad98e6 ]

JCLOUDS-1371: JCLOUDS-1488: list optimize prefix

Previously getBlobKeysInsideContainer returned all keys and filtered
in LocalBlobStore.  Now getBlobKeysInsideContainer filters via prefix
which can dramatically decrease the number of keys returned,
especially for the filesystem provider.  Further optimizations are
possible for delimiter.


> Filesystem list call with prefix is slow in large containers
> 
>
> Key: JCLOUDS-1488
> URL: https://issues.apache.org/jira/browse/JCLOUDS-1488
> Project: jclouds
>  Issue Type: Bug
>  Components: jclouds-blobstore
>Affects Versions: 2.1.1
> Environment: Java version: java version "1.8.0_131"
> Operating system: Fedora 27 x86_64
>Reporter: Lari Sinisalo
>Assignee: Andrew Gaul
>Priority: Major
>  Labels: filesystem
> Fix For: 2.2.0, 2.1.2
>
> Attachments: JCLOUDS1488.java
>
>
> When the filesystem blobstore is used, running the following code takes very 
> long if there are a lot of files in the container:
> {code:java}
>     ListContainerOptions options = new ListContainerOptions();
>     options.prefix("test-container-subdirectory/");
>     Set results =
>   blobStore.list("test-container",options);
> {code}
> See the attached Java source file [^JCLOUDS1488.java] for the full code.
> On my system, running the attached Java code takes over 10 seconds to list a 
> single file if there are 500,000 files in the container outside that prefix.
> Output from the attached code:
> {code:java}
> Number of blobs listed: 1
> First listed blob: test-container-subdirectory/file-to-list
> Time it took to list the blobs: 13256 ms
> {code}
> A more general version of this problem was reported previously in 
> JCLOUDS-1371.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers

2019-01-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755566#comment-16755566
 ] 

ASF subversion and git services commented on JCLOUDS-1488:
--

Commit 7bf9c474c656926203a9ac34a1ed27db35c8515d in jclouds's branch 
refs/heads/2.1.x from Andrew Gaul
[ https://gitbox.apache.org/repos/asf?p=jclouds.git;h=7bf9c47 ]

JCLOUDS-1371: JCLOUDS-1488: list optimize prefix

Previously getBlobKeysInsideContainer returned all keys and filtered
in LocalBlobStore.  Now getBlobKeysInsideContainer filters via prefix
which can dramatically decrease the number of keys returned,
especially for the filesystem provider.  Further optimizations are
possible for delimiter.


> Filesystem list call with prefix is slow in large containers
> 
>
> Key: JCLOUDS-1488
> URL: https://issues.apache.org/jira/browse/JCLOUDS-1488
> Project: jclouds
>  Issue Type: Bug
>  Components: jclouds-blobstore
>Affects Versions: 2.1.1
> Environment: Java version: java version "1.8.0_131"
> Operating system: Fedora 27 x86_64
>Reporter: Lari Sinisalo
>Priority: Major
>  Labels: filesystem
> Attachments: JCLOUDS1488.java
>
>
> When the filesystem blobstore is used, running the following code takes very 
> long if there are a lot of files in the container:
> {code:java}
>     ListContainerOptions options = new ListContainerOptions();
>     options.prefix("test-container-subdirectory/");
>     Set results =
>   blobStore.list("test-container",options);
> {code}
> See the attached Java source file [^JCLOUDS1488.java] for the full code.
> On my system, running the attached Java code takes over 10 seconds to list a 
> single file if there are 500,000 files in the container outside that prefix.
> Output from the attached code:
> {code:java}
> Number of blobs listed: 1
> First listed blob: test-container-subdirectory/file-to-list
> Time it took to list the blobs: 13256 ms
> {code}
> A more general version of this problem was reported previously in 
> JCLOUDS-1371.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers

2019-01-18 Thread Andrew Gaul (JIRA)


[ 
https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746537#comment-16746537
 ] 

Andrew Gaul commented on JCLOUDS-1488:
--

I agree with your diagnosis and this is a long-standing shortcoming of the 
filesystem provider.  Could you submit a pull request with your proposed 
solution?  See also 
[JCLOUDS-1371|https://issues.apache.org/jira/browse/JCLOUDS-1371].

> Filesystem list call with prefix is slow in large containers
> 
>
> Key: JCLOUDS-1488
> URL: https://issues.apache.org/jira/browse/JCLOUDS-1488
> Project: jclouds
>  Issue Type: Bug
>  Components: jclouds-blobstore
>Affects Versions: 2.1.1
> Environment: Java version: java version "1.8.0_131"
> Operating system: Fedora 27 x86_64
>Reporter: Lari Sinisalo
>Priority: Major
>  Labels: filesystem
> Attachments: JCLOUDS1488.java
>
>
> When the filesystem blobstore is used, running the following code takes very 
> long if there are a lot of files in the container:
> {code:java}
>     ListContainerOptions options = new ListContainerOptions();
>     options.prefix("test-container-subdirectory/");
>     Set results =
>   blobStore.list("test-container",options);
> {code}
> See the attached Java source file [^JCLOUDS1488.java] for the full code.
> On my system, running the attached Java code takes over 10 seconds to list a 
> single file if there are 500,000 files in the container outside that prefix.
> Output from the attached code:
> {code:java}
> Number of blobs listed: 1
> First listed blob: test-container-subdirectory/file-to-list
> Time it took to list the blobs: 13256 ms
> {code}
> A more general version of this problem was reported previously in 
> JCLOUDS-1371.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers

2019-01-18 Thread Lari Sinisalo (JIRA)


[ 
https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746126#comment-16746126
 ] 

Lari Sinisalo commented on JCLOUDS-1488:


In org.jclouds.blobstore.config.LocalBlobStore.list(String, 
ListContainerOptions), there is the following code:

{code}
  // Loading blobs from container
  Iterable blobBelongingToContainer = null;
  try {
 blobBelongingToContainer = 
storageStrategy.getBlobKeysInsideContainer(containerName);
  } catch (IOException e) {
 logger.error(e, "An error occurred loading blobs contained into container 
%s", containerName);
 propagate(e);
  }
{code}

This getBlobKeysInsideContainer lists the keys of all blobs inside the 
container. It takes only the container name as a parameter, so it will always 
ignore the prefix in the ListContainerOptions.

The getBlobKeysInsideContainer implementation in FilesystemStorageStrategyImpl 
is as follows:

{code}
   /**
    * Returns all the blobs key inside a container
    *
    * @param container
    * @return
    * @throws IOException
    */
   @Override
   public Iterable getBlobKeysInsideContainer(String container) throws 
IOException {
  filesystemContainerNameValidator.validate(container);
  // check if container exists
  // TODO maybe an error is more appropriate
  Set blobNames = Sets.newHashSet();
  if (!containerExists(container)) {
 return blobNames;
  }

  File containerFile = openFolder(container);
  final int containerPathLength = containerFile.getAbsolutePath().length() 
+ 1;
  populateBlobKeysInContainer(containerFile, blobNames, new 
Function() {
 @Override
 public String apply(String string) {
    return denormalize(string.substring(containerPathLength));
 }
  });
  return blobNames;
   }
{code}

The openFolder call here opens the container root directory. It seems that if 
this call would receive a subdirectory path instead, the list call would be 
much more efficient.

I am not quite sure what would be the appropriate way to extract the 
subdirectory path from the prefix. This would need to be done in a way that 
does not allow path traversal outside the container root directory. Passing the 
necessary information to getBlobKeysInsideContainer would also require 
interface changes.

> Filesystem list call with prefix is slow in large containers
> 
>
> Key: JCLOUDS-1488
> URL: https://issues.apache.org/jira/browse/JCLOUDS-1488
> Project: jclouds
>  Issue Type: Bug
>  Components: jclouds-blobstore
>Affects Versions: 2.1.1
> Environment: Java version: java version "1.8.0_131"
> Operating system: Fedora 27 x86_64
>Reporter: Lari Sinisalo
>Priority: Major
>  Labels: filesystem
> Attachments: JCLOUDS1488.java
>
>
> When the filesystem blobstore is used, running the following code takes very 
> long if there are a lot of files in the container:
> {code:java}
>     ListContainerOptions options = new ListContainerOptions();
>     options.prefix("test-container-subdirectory/");
>     Set results =
>   blobStore.list("test-container",options);
> {code}
> See the attached Java source file [^JCLOUDS1488.java] for the full code.
> On my system, running the attached Java code takes over 10 seconds to list a 
> single file if there are 500,000 files in the container outside that prefix.
> Output from the attached code:
> {code:java}
> Number of blobs listed: 1
> First listed blob: test-container-subdirectory/file-to-list
> Time it took to list the blobs: 13256 ms
> {code}
> A more general version of this problem was reported previously in 
> JCLOUDS-1371.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)