[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-09-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Jonathan Druart  changed:

   What|Removed |Added

 Blocks||29135
   See Also|https://bugs.koha-community |
   |.org/bugzilla3/show_bug.cgi |
   |?id=29135   |


Referenced Bugs:

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=29135
[Bug 29135] OAI should not include biblionumbers from deleteditems when
determining deletedbiblios
-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-09-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Nick Clemens  changed:

   What|Removed |Added

   See Also||https://bugs.koha-community
   ||.org/bugzilla3/show_bug.cgi
   ||?id=29135

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-07-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Nick Clemens  changed:

   What|Removed |Added

 Blocks||28741


Referenced Bugs:

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28741
[Bug 28741] OAI ListSets does not correctly build resumption token
-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-05-11 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Fridolin Somers  changed:

   What|Removed |Added

 CC||fridolin.som...@biblibre.co
   ||m

--- Comment #27 from Fridolin Somers  ---
Enhancement not pushed to 20.11.x

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-05-07 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #26 from Jonathan Druart  ---
Pushed to master for 21.05, thanks to everybody involved!

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-05-07 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Jonathan Druart  changed:

   What|Removed |Added

 Status|Passed QA   |Pushed to master
 Version(s)||21.05.00
released in||

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-04-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #25 from Ere Maijala  ---
Nick, thanks for QA and benchmark results. It's good to see that it makes a
difference with smaller data sets as well. :)

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-04-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Nick Clemens  changed:

   What|Removed |Added

 CC||n...@bywatersolutions.com

--- Comment #24 from Nick Clemens  ---
Created attachment 120317
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=120317=edit
Simple script for testing

Simple script for benchmarking and/or checking results

With 30k records, none deleted, I got
~3min before patch - ~2:30 with patch

With 15k deleted 15k active:
~2min before / ~1:40 after

Checking lists of biblionumbers were identical before and after the patches

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-04-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Nick Clemens  changed:

   What|Removed |Added

 Attachment #116345|0   |1
is obsolete||

--- Comment #23 from Nick Clemens  ---
Created attachment 120316
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=120316=edit
Bug 27584: Refactor OAI-PMH paging to improve performance

Includes the following optimizations:
- Use next biblionumber instead of large offset in the queries.
- Use unions instead of subqueries
- Avoid fetching item timestamps when items are not included.

Test plan:

1. Without the patch, try harvesting a Koha database with (and without for good
measure) `include_items: 1` in the OAI-PMH configuration file pointed to by
preference OAI-PMH:ConfFile and take note of performance. For useful metrics
the database must be large enough to not fit in InnoDB buffers or OS file
cache.
2. Apply the patch.
3. Run tests: prove -v t/db_dependent/OAI
4. Try again the harvesting from step 1 and compare performance with step 1.

Signed-off-by: David Cook 

Signed-off-by: Nick Clemens 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-04-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Nick Clemens  changed:

   What|Removed |Added

 Status|Signed Off  |Passed QA

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-04-27 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Tomás Cohen Arazi  changed:

   What|Removed |Added

 CC||tomasco...@gmail.com

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-04-27 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Kyle M Hall  changed:

   What|Removed |Added

 CC||k...@bywatersolutions.com
 QA Contact|testo...@bugs.koha-communit |k...@bywatersolutions.com
   |y.org   |

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-07 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #22 from David Cook  ---
(In reply to Ere Maijala from comment #20)

> 
> I think there's an advantage with tracking the different timestamps even if
> it's more complicated. When item data is not included, it wouldn't be useful
> to harvest biblios as updated when an item changes, since the biblio record
> would be identical. 

Agreed

> If you meant that we could have another timestamp that
> would indicate the latest change for the logical record that OAI-PMH would
> provide, yeah, that'd work, but trying to track latest item changes in
> biblios would complicate other functionality and could also have unintended
> consequences such as increased overhead for batch operations. 
> 

I meant we'd keep the existing timestamp for the bibliographic data and then
add a timestamp to indicate when the bibliographic data + items/holdings data
was changed. Then depending on the OAI setting, it would decide which timestamp
was relevant.

> Also, I'm
> still optimistic that we can get bug 20447 merged somewhere in the future,
> and that would add to the complexity.
> 

That'll be interesting when that time comes.

> As I see it, the "proper" solution would be to have a publishing process
> that would run in background to create sets of records for harvesting. With
> published sets we could handle inclusion of items, deletions etc. in the
> publishing process, and the OAI-PMH provider would only need to serve the
> results. However, this would be a whole lot more complicated than what we
> currently do, and there'd be a fair chance that the publishing process would
> do a lot of work to create result sets that nobody ever harvests.
> Additionally, it would make quick (semi-realtime) incremental harvesting
> impossible.

I don't think that would be feasible. I reckon all we need are some good
indexes. 

But I think this change makes sense for now.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-05 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #21 from Frédéric Demians  ---
It's not without wonder, bug as it is, Koha OAI Server has now become a dark
mystery to me, a little like Jean Sibelius violin concerto in D minor op 47.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-05 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Jonathan Druart  changed:

   What|Removed |Added

 CC||frede...@tamil.fr,
   ||jonathan.dru...@bugs.koha-c
   ||ommunity.org,
   ||julian.maur...@biblibre.com

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #20 from Ere Maijala  ---
Thanks for the comments and review!

Indeed, things will be easier when deleted records are included in the normal
tables, but for now this is quite alright, and there's no need to union them
all together.

I think there's an advantage with tracking the different timestamps even if
it's more complicated. When item data is not included, it wouldn't be useful to
harvest biblios as updated when an item changes, since the biblio record would
be identical. If you meant that we could have another timestamp that would
indicate the latest change for the logical record that OAI-PMH would provide,
yeah, that'd work, but trying to track latest item changes in biblios would
complicate other functionality and could also have unintended consequences such
as increased overhead for batch operations. Also, I'm still optimistic that we
can get bug 20447 merged somewhere in the future, and that would add to the
complexity.

As I see it, the "proper" solution would be to have a publishing process that
would run in background to create sets of records for harvesting. With
published sets we could handle inclusion of items, deletions etc. in the
publishing process, and the OAI-PMH provider would only need to serve the
results. However, this would be a whole lot more complicated than what we
currently do, and there'd be a fair chance that the publishing process would do
a lot of work to create result sets that nobody ever harvests. Additionally, it
would make quick (semi-realtime) incremental harvesting impossible.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #19 from David Cook  ---
As a tester, I'm not really commenting on performance. I'm just confirming that
the code doesn't break anything and works as a user would expect.

For what it's worth, on a small database, it's quite quick. I don't have a big
enough test database on hand at this moment to test the code on. But at a
glance it looks like it should be fine.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

David Cook  changed:

   What|Removed |Added

 Attachment #116310|0   |1
is obsolete||

--- Comment #18 from David Cook  ---
Created attachment 116345
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=116345=edit
Bug 27584: Refactor OAI-PMH paging to improve performance

Includes the following optimizations:
- Use next biblionumber instead of large offset in the queries.
- Use unions instead of subqueries
- Avoid fetching item timestamps when items are not included.

Test plan:

1. Without the patch, try harvesting a Koha database with (and without for good
measure) `include_items: 1` in the OAI-PMH configuration file pointed to by
preference OAI-PMH:ConfFile and take note of performance. For useful metrics
the database must be large enough to not fit in InnoDB buffers or OS file
cache.
2. Apply the patch.
3. Run tests: prove -v t/db_dependent/OAI
4. Try again the harvesting from step 1 and compare performance with step 1.

Signed-off-by: David Cook 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

David Cook  changed:

   What|Removed |Added

 Status|Needs Signoff   |Signed Off

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #17 from David Cook  ---
My test plan:

0) Use koha-testing-docker
1) Enable "OAI-PMH"
2) Go to
http://localhost:8080/cgi-bin/koha/oai.pl?verb=ListRecords=oai_dc
3) Apply patch
4) koha-plack --restart kohadev
5) Go to
http://localhost:8080/cgi-bin/koha/oai.pl?verb=ListRecords=oai_dc
6) Note that the result lists are the same

7) Set OAI-PMH:ConfFile to "/kohadevbox/koha/oai-conf.yml"
8) Create oai-conf.yml* 
9) Go to
http://localhost:8080/cgi-bin/koha/oai.pl?verb=ListRecords=marcxml
10) Note that identifiers are same as previous list
11) Note that items (952 fields) are included in metadata


*
---
format:
marcxml:
  metadataPrefix: marcxml
  metadataNamespace: http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim
  schema: http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd
  include_items: 1
oai_dc:
  metadataPrefix: oai_dc
  metadataNamespace: http://www.openarchives.org/OAI/2.0/oai_dc/
  schema: http://www.openarchives.org/OAI/2.0/oai_dc.xsd
  xsl_file:
/usr/share/koha/intranet/htdocs/intranet-tmpl/prog/en/xslt/MARC21slim2OAIDC.xsl

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #16 from David Cook  ---
At a glance, I think that your code is probably an improvement.

However, looking at Koha::OAI::Server::ListBase makes me wonder if we shouldn't
denormalize a bit though. If we had a timestamp in a biblio table that included
a timestamp for the last item activity, that would remove the need for a lot of
these complex SQL queries. 

If we fetched all the biblio metadata in our first query, we'd also save
overhead. 

But... both of those would involve more work.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #15 from David Cook  ---
After reviewing the code again, I'm only now realizing that we're not even
trying to do a UNION of biblios and deleted biblios, so the results aren't in
date order...

Although after reviewing the OAI-PMH spec, it actually explicitly says that
date ordering should not be assumed:

"The protocol does not define the semantics of incompleteness. Therefore, a
harvester should not assume that the members in an incomplete list conform to
some selection criteria (e.g., date ordering)."

You learn something new every day...

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #14 from Ere Maijala  ---
Adding to the previous results, harvesting of the forementioned records
completed in less than 5 hours, and the harvesting speed was pretty much
constant.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #13 from Ere Maijala  ---
Some benchmarking results:

My test system is intentionally memory-constrained, items are included and
OAI-PMH:MaxCount = 1000. All values are measured from 100 requests starting at
offset 190341 when doing a full harvesting without date limits. The database
contains 0.97 million biblios and 2.4 million items. Reported times are
averages with standard deviation in parentheses.

Full duration of oai.pl: 12.98s (0.94s)
Biblionumber query: 0.0024s (0.0024s)
Fetch+create record: 6.84s (0.51s)
Creating response: 4.97s (0.50s)

As far as I can see, this represents the full harvesting run pretty well, so it
doesn't slow down anymore when getting to higher biblionumbers. As the results
indicate, query duration for a set of results is now pretty much meaningless.
We spend most of the time collecting the record metadata and creating a DOM for
it, and then writing the actual response, which is part of the HTTP::OAI
module.

So to sum it up: with these changes the biblionumber query is no longer a
bottleneck. Previously, it easily took 15 to 20 seconds on my test system.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Ere Maijala  changed:

   What|Removed |Added

 Blocks||27463


Referenced Bugs:

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27463
[Bug 27463] OAI-PMH date handling in ListBase.pm
-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-03 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #12 from Ere Maijala  ---
This should work pretty well. I just haven't had time yet to test with a large
number of items, but I'll try to accomplish that as well (and hope it
works...).

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-03 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Ere Maijala  changed:

   What|Removed |Added

 Attachment #116309|0   |1
is obsolete||

--- Comment #11 from Ere Maijala  ---
Created attachment 116310
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=116310=edit
Bug 27584: Refactor OAI-PMH paging to improve performance

Includes the following optimizations:
- Use next biblionumber instead of large offset in the queries.
- Use unions instead of subqueries
- Avoid fetching item timestamps when items are not included.

Test plan:

1. Without the patch, try harvesting a Koha database with (and without for good
measure) `include_items: 1` in the OAI-PMH configuration file pointed to by
preference OAI-PMH:ConfFile and take note of performance. For useful metrics
the database must be large enough to not fit in InnoDB buffers or OS file
cache.
2. Apply the patch.
3. Run tests: prove -v t/db_dependent/OAI
4. Try again the harvesting from step 1 and compare performance with step 1.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-03 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Ere Maijala  changed:

   What|Removed |Added

 Status|ASSIGNED|Needs Signoff

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-03 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #10 from Ere Maijala  ---
Created attachment 116309
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=116309=edit
Bug 27584: Refactor OAI-PMH paging to improve performance

Includes the following optimizations:
- Use next biblionumber instead of large offset in the queries.
- Use unions instead of subqueries
- Avoid fetching item timestamps when items are not included.

Test plan:

1. Without the patch, try harvesting a Koha database with (and without for good
measure) `include_items: 1` in the OAI-PMH configuration file pointed to by
preference OAI-PMH:ConfFile and take note of performance. For useful metrics
the database must be large enough to not fit in InnoDB buffers or OS file
cache.
2. Apply the patch.
3. Run tests: prove -v t/db_dependent/OAI
4. Try again the harvesting from step 1 and compare performance with step 1.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-03 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #9 from David Cook  ---
(In reply to Ere Maijala from comment #7)
> Hold on, a new version coming up shortly. Much improved, I believe! :)

End of the work day for me, mate. Actually, that was an hour ago, but
performance always thrills me.

Looking forward to the new version!

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-03 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #8 from David Cook  ---
I think there's an open bug somewhere to harmonize the biblio and deletebiblio
tables as well. That's probably the most optimal plan...

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-03 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #7 from Ere Maijala  ---
Hold on, a new version coming up shortly. Much improved, I believe! :)

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-03 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #6 from David Cook  ---
Other ideas... would be moving the index to a different table or a different
system. 

In theory, there's no reason why we couldn't use Zebra or Elasticsearch for
doing OAI. If I recall correctly, I think that DSpace uses Solr for its OAI. 

But the further away from the database, the less likely it's going to be
correct/up-to-date.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-03 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

David Cook  changed:

   What|Removed |Added

 CC||dc...@prosentient.com.au

--- Comment #5 from David Cook  ---
Curious to see what you come up with here.

--

Here's a little look at doing a UNION on nearly 600,000 biblio and
deletedbiblio entries. 

EXPLAIN select * from (select biblionumber from deletedbiblio UNION select
biblionumber from biblio) u limit 1;
+--+--+---+---+---+--+-+--++-+
| id   | select_type  | table | type  | possible_keys | key  |
key_len | ref  | rows   | Extra   |
+--+--+---+---+---+--+-+--++-+
|1 | PRIMARY  | | ALL   | NULL  | NULL | NULL
   | NULL | 578038 | |
|2 | DERIVED  | deletedbiblio | index | NULL  | blbnoidx | 4   
   | NULL |   4361 | Using index |
|3 | UNION| biblio| index | NULL  | blbnoidx | 4   
   | NULL | 573677 | Using index |
| NULL | UNION RESULT | | ALL   | NULL  | NULL | NULL
   | NULL |   NULL | |
+--+--+---+---+---+--+-+--++-+
4 rows in set (0.00 sec)

select * from (select biblionumber from deletedbiblio UNION select biblionumber
from biblio) u limit 1;
+--+
| biblionumber |
+--+
|3 |
+--+
1 row in set (13.39 sec)

OR

select * from (select biblionumber from deletedbiblio UNION select biblionumber
from biblio) u limit 0,50;
50 rows in set (12.60 sec)

select biblionumber, (select metadata from biblio_metadata where biblionumber =
u.biblionumber) from (select biblionumber from deletedbiblio UNION select
biblionumber from biblio) u limit 0,50;
50 rows in set (13.58 sec)
--

Of course, there's no index on `timestamp`, so we can't sort that list.
However, perhaps if we added a composite `timestamp,biblionumber` index to the
biblio and deletedbiblio tables... 

Still... 13 seconds isn't brilliant.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-02 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Ere Maijala  changed:

   What|Removed |Added

 Attachment #116184|0   |1
is obsolete||

--- Comment #4 from Ere Maijala  ---
Comment on attachment 116184
  --> https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=116184
Bug 27584: Refactor OAI-PMH paging to improve performance

I believe there's an even faster way, a patch coming up when done benchmarking
and testing.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-02 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Ere Maijala  changed:

   What|Removed |Added

 Attachment #116184|1   |0
is obsolete||

--- Comment #3 from Ere Maijala  ---
Comment on attachment 116184
  --> https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=116184
Bug 27584: Refactor OAI-PMH paging to improve performance

Oops, the patch is fine (I messed up with Plack).

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-02 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Ere Maijala  changed:

   What|Removed |Added

 Attachment #116184|0   |1
is obsolete||

--- Comment #2 from Ere Maijala  ---
Comment on attachment 116184
  --> https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=116184
Bug 27584: Refactor OAI-PMH paging to improve performance

Initial patch was bad, needs some fixing.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-01 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

--- Comment #1 from Ere Maijala  ---
Created attachment 116184
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=116184=edit
Bug 27584: Refactor OAI-PMH paging to improve performance

Uses next biblionumber instead of large offset in the queries.

Test plan:

1. Without the patch, try harvesting a Koha database with `include_items: 1` in
the OAI-PMH configuration file pointed to by preference OAI-PMH:ConfFile and
take note of performance. For useful metrics the database must be large enough
to not fit in InnoDB buffers or OS file cache.
2. Apply the patch.
3. Run tests: prove -v t/db_dependent/OAI
4. Try again the harvesting from step 1 and compare performance with step 1.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-01 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Ere Maijala  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 27584] Improve OAI-PMH provider performance

2021-02-01 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27584

Ere Maijala  changed:

   What|Removed |Added

   Assignee|koha-b...@lists.koha-commun |ere.maij...@helsinki.fi
   |ity.org |

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/