[jira] [Commented] (JENA-804) Jena is not reusing already allocated space on the file system which results in large amounts of disk space reserved by Jena files
[ https://issues.apache.org/jira/browse/JENA-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628525#comment-14628525 ] Trevor Donaldson commented on JENA-804: --- I agree with Keith. We are unable to use Jena because of this issue. We often delete a graph and replace the data inside of a graph. Doing this a few times caused us to reach critical mass in our storage very quickly. Jena is not reusing already allocated space on the file system which results in large amounts of disk space reserved by Jena files -- Key: JENA-804 URL: https://issues.apache.org/jira/browse/JENA-804 Project: Apache Jena Issue Type: Bug Components: Jena Affects Versions: Jena 2.11.2, TDB 1.0.2 Environment: Windows 7, IBM JRE 1.7, Tomcat 7.0.54 Reporter: Keith Wells Attachments: TdbGrowthTests.java, out.txt, test-tdb-size.sh We have a product based on Jena TDB where we insert quads to Jena TDB along with the deletion of quads. We understand the performance over space architectural decision to not clean up deleted nodeids from the indexes. But the usage of disk space appears that Jena TDB is not reusing allocated space which had been allocated by Jena previously. Based on this comment there appears to be something that is not correct on file space utilization, http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3cce7d7929.2a707%25rve...@dotnetrdf.org%3E: The indexes won't shrink - TDB never gives disk space back to the OS - but disk space is reused when reallocated within the same JVM.. In this scenario on the same JVM with NO server stops or starts, we add 27765 graphs to IndexTdb and immediately remove them, repeating this process several times. {noformat} MB Bytes Diff (Bytes) Start 193 203239424 Reindex 5 249 262066176 58826752 Reindex 6 249 262086656 20480 Reindex 10298 312500224 50413568 Reindex 11298 312520704 20480 Reindex 12298 312541184 20480 Reindex 13298 312586240 45056 Reindex 14306 320995328 8409088 Reindex 15330 346181632 25186304 Reindex 16330 346198538 16906 Reindex 17346 362999808 16801270 Reindex 18346 363020288 20480 Reindex 19346 363040768 20480 Reindex 20346 363061248 20480 Reindex 21346 363081728 20480 Reindex 22354 371490816 8409088 Reindex 23378 396677120 25186304 End 193 203239424 {noformat} The system starts with 193MB of data allocated by indexTdb. A reindex consists of a remove followed by an add of these graphs. As you can see from the data there is a dramatic increase in the size of indexTdb on the disk after repeadedly removing and adding graphs. After Reindex 23, there is 378 MB of disk space used. If Jena TDB reused allocated space there would be no need to allocate more space other than what is used by deleted node ids (unless nodeid storage is eating all of this space?). Jena does not appear to be reusing the allocated disk space. At the very end of this scenario, we exported the nquads and reloaded them to show the original disk space was 193MB back to where it started. We believe Jena TDB is not reusing the space allocated by the TDB file system within the same JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (JENA-979) add a fuseki admin service to list all existing backups
[ https://issues.apache.org/jira/browse/JENA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Seaborne reassigned JENA-979: -- Assignee: Andy Seaborne add a fuseki admin service to list all existing backups --- Key: JENA-979 URL: https://issues.apache.org/jira/browse/JENA-979 Project: Apache Jena Issue Type: New Feature Components: Fuseki Affects Versions: Fuseki 2.0.0, Fuseki 2.0.1, Fuseki 2.3.0 Reporter: Yang Yuanzhe Assignee: Andy Seaborne Priority: Minor Add a fuseki admin service to list all existing backups. Service URL: $/backupList Response example: { backups : [ ds_2015-06-08_04-23-02.nq.gz , ds_2015-06-08_07-57-10.nq.gz , ds_2015-06-08_07-57-48.nq.gz , ds_2015-06-09_03-46-37.nq.gz , ds_2015-06-17_12-20-31.nq , ds_2015-06-17_12-20-31.nq.gz , ds_2015-06-23_10-53-47.nq.gz ] } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] jena pull request: Deprecating and removing some iterators from je...
Github user afs commented on the pull request: https://github.com/apache/jena/pull/80#issuecomment-121589199 Hi - could you try closing this pull request from your end? It seems to have got stuck in some way, whereby the commit controls don't have any effect. It's not the only one; the same happened to one of my PRs and the way I sorted it was from the requester end. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Deprecating and removing some iterators from je...
Github user ajs6f closed the pull request at: https://github.com/apache/jena/pull/80 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (JENA-804) Jena is not reusing already allocated space on the file system which results in large amounts of disk space reserved by Jena files
[ https://issues.apache.org/jira/browse/JENA-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628525#comment-14628525 ] Trevor Donaldson edited comment on JENA-804 at 7/15/15 7:37 PM: I agree with Keith. My team and my customer is unable to use Jena because of this issue. We often delete a graph and replace the data inside of a graph. Doing this a few times caused us to reach critical mass in our storage very quickly. was (Author: tmdonalds): I agree with Keith. We are unable to use Jena because of this issue. We often delete a graph and replace the data inside of a graph. Doing this a few times caused us to reach critical mass in our storage very quickly. Jena is not reusing already allocated space on the file system which results in large amounts of disk space reserved by Jena files -- Key: JENA-804 URL: https://issues.apache.org/jira/browse/JENA-804 Project: Apache Jena Issue Type: Bug Components: Jena Affects Versions: Jena 2.11.2, TDB 1.0.2 Environment: Windows 7, IBM JRE 1.7, Tomcat 7.0.54 Reporter: Keith Wells Attachments: TdbGrowthTests.java, out.txt, test-tdb-size.sh We have a product based on Jena TDB where we insert quads to Jena TDB along with the deletion of quads. We understand the performance over space architectural decision to not clean up deleted nodeids from the indexes. But the usage of disk space appears that Jena TDB is not reusing allocated space which had been allocated by Jena previously. Based on this comment there appears to be something that is not correct on file space utilization, http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3cce7d7929.2a707%25rve...@dotnetrdf.org%3E: The indexes won't shrink - TDB never gives disk space back to the OS - but disk space is reused when reallocated within the same JVM.. In this scenario on the same JVM with NO server stops or starts, we add 27765 graphs to IndexTdb and immediately remove them, repeating this process several times. {noformat} MB Bytes Diff (Bytes) Start 193 203239424 Reindex 5 249 262066176 58826752 Reindex 6 249 262086656 20480 Reindex 10298 312500224 50413568 Reindex 11298 312520704 20480 Reindex 12298 312541184 20480 Reindex 13298 312586240 45056 Reindex 14306 320995328 8409088 Reindex 15330 346181632 25186304 Reindex 16330 346198538 16906 Reindex 17346 362999808 16801270 Reindex 18346 363020288 20480 Reindex 19346 363040768 20480 Reindex 20346 363061248 20480 Reindex 21346 363081728 20480 Reindex 22354 371490816 8409088 Reindex 23378 396677120 25186304 End 193 203239424 {noformat} The system starts with 193MB of data allocated by indexTdb. A reindex consists of a remove followed by an add of these graphs. As you can see from the data there is a dramatic increase in the size of indexTdb on the disk after repeadedly removing and adding graphs. After Reindex 23, there is 378 MB of disk space used. If Jena TDB reused allocated space there would be no need to allocate more space other than what is used by deleted node ids (unless nodeid storage is eating all of this space?). Jena does not appear to be reusing the allocated disk space. At the very end of this scenario, we exported the nquads and reloaded them to show the original disk space was 193MB back to where it started. We believe Jena TDB is not reusing the space allocated by the TDB file system within the same JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-979) add a fuseki admin service to list all existing backups
[ https://issues.apache.org/jira/browse/JENA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628685#comment-14628685 ] ASF GitHub Bot commented on JENA-979: - Github user asfgit closed the pull request at: https://github.com/apache/jena/pull/82 add a fuseki admin service to list all existing backups --- Key: JENA-979 URL: https://issues.apache.org/jira/browse/JENA-979 Project: Apache Jena Issue Type: New Feature Components: Fuseki Affects Versions: Fuseki 2.0.0, Fuseki 2.0.1, Fuseki 2.3.0 Reporter: Yang Yuanzhe Assignee: Andy Seaborne Add a fuseki admin service to list all existing backups. Service URL: $/backupList Response example: { backups : [ ds_2015-06-08_04-23-02.nq.gz , ds_2015-06-08_07-57-10.nq.gz , ds_2015-06-08_07-57-48.nq.gz , ds_2015-06-09_03-46-37.nq.gz , ds_2015-06-17_12-20-31.nq , ds_2015-06-17_12-20-31.nq.gz , ds_2015-06-23_10-53-47.nq.gz ] } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] jena pull request: JENA-979: add a fuseki admin service to list al...
Github user asfgit closed the pull request at: https://github.com/apache/jena/pull/82 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (JENA-979) add a fuseki admin service to list all existing backups
[ https://issues.apache.org/jira/browse/JENA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Seaborne resolved JENA-979. Resolution: Fixed Fix Version/s: Fuseki 2.3.0 add a fuseki admin service to list all existing backups --- Key: JENA-979 URL: https://issues.apache.org/jira/browse/JENA-979 Project: Apache Jena Issue Type: New Feature Components: Fuseki Affects Versions: Fuseki 2.0.0, Fuseki 2.0.1, Fuseki 2.3.0 Reporter: Yang Yuanzhe Assignee: Andy Seaborne Fix For: Fuseki 2.3.0 Add a fuseki admin service to list all existing backups. Service URL: $/backupList Response example: { backups : [ ds_2015-06-08_04-23-02.nq.gz , ds_2015-06-08_07-57-10.nq.gz , ds_2015-06-08_07-57-48.nq.gz , ds_2015-06-09_03-46-37.nq.gz , ds_2015-06-17_12-20-31.nq , ds_2015-06-17_12-20-31.nq.gz , ds_2015-06-23_10-53-47.nq.gz ] } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-966) LazyIterator
[ https://issues.apache.org/jira/browse/JENA-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627974#comment-14627974 ] ASF GitHub Bot commented on JENA-966: - Github user ajs6f closed the pull request at: https://github.com/apache/jena/pull/80 LazyIterator Key: JENA-966 URL: https://issues.apache.org/jira/browse/JENA-966 Project: Apache Jena Issue Type: Bug Components: Core Affects Versions: Jena 3.0.0 Reporter: Claude Warren Assignee: Claude Warren Fix For: Jena 3.0.0 LazyIterator is an abstract class. The documentation indicates that the create() method needs to be overridden to create an instance. From this I would expect that now LazyIterator(){ @Override public ExtendedIteratorModel create() { ... }}; Would work however LazyIterator does not override: remoteNext(), andThen(), toList(), and toSet(). I believe these should be implemented in the class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] jena pull request: Deprecating and removing some iterators from je...
Github user afs commented on the pull request: https://github.com/apache/jena/pull/80#issuecomment-121606159 That unstuck the PR - thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (JENA-979) add a fuseki admin service to list all existing backups
[ https://issues.apache.org/jira/browse/JENA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628204#comment-14628204 ] ASF GitHub Bot commented on JENA-979: - Github user afs commented on the pull request: https://github.com/apache/jena/pull/82#issuecomment-121646813 WIP: I'm incorporating this by integrating into the admin action framework (i.e. `extends ActionCtl`) Currently: * The URL is `/$/backup-list` Camel-case is less common for URL API naming (YMMV). * Returns the file names, not the full path names. The server does not reveal its absolute path setup. * Returns non-hidden files : Filters on `!file.isHidden() file.isFile()` * Response is `Content-Type: application/json` add a fuseki admin service to list all existing backups --- Key: JENA-979 URL: https://issues.apache.org/jira/browse/JENA-979 Project: Apache Jena Issue Type: New Feature Components: Fuseki Affects Versions: Fuseki 2.0.0, Fuseki 2.0.1, Fuseki 2.3.0 Reporter: Yang Yuanzhe Assignee: Andy Seaborne Add a fuseki admin service to list all existing backups. Service URL: $/backupList Response example: { backups : [ ds_2015-06-08_04-23-02.nq.gz , ds_2015-06-08_07-57-10.nq.gz , ds_2015-06-08_07-57-48.nq.gz , ds_2015-06-09_03-46-37.nq.gz , ds_2015-06-17_12-20-31.nq , ds_2015-06-17_12-20-31.nq.gz , ds_2015-06-23_10-53-47.nq.gz ] } -- This message was sent by Atlassian JIRA (v6.3.4#6332)