[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2020-09-14 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Joonas Kylmälä  changed:

   What|Removed |Added

   See Also||https://bugs.koha-community
   ||.org/bugzilla3/show_bug.cgi
   ||?id=26448

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-05-17 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Martin Renvoize  changed:

   What|Removed |Added

 Status|Pushed to Master|RESOLVED
 Resolution|--- |FIXED
 Version(s)||19.05.00
released in||

--- Comment #73 from Martin Renvoize  ---
Enhancement will not be backported to 18.11.x series.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-05-12 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Jonathan Druart  changed:

   What|Removed |Added

 Blocks||22892


Referenced Bugs:

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=22892
[Bug 22892] Warning when reindexing without parameters
-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-05-12 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Jonathan Druart  changed:

   What|Removed |Added

 CC||jonathan.dru...@bugs.koha-c
   ||ommunity.org

--- Comment #72 from Jonathan Druart  
---
The rename of the script caused a new issue on misc4dev
https://gitlab.com/koha-community/koha-misc4dev/issues/31

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-05-10 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Nick Clemens  changed:

   What|Removed |Added

 Status|Passed QA   |Pushed to Master
 CC||n...@bywatersolutions.com

--- Comment #71 from Nick Clemens  ---
Awesome work all!

Pushed to master for 19.05

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-05-01 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #70 from David Cook  ---
(In reply to Ere Maijala from comment #62)
> On a second thought, I'd leave lock file out. Unlike rebuild_zebra, there's
> normally only need to run rebuild_elasticsearch manually. If you need to
> e.g. cron it for some reason, an external locking mechanism can be used.
> Also, you may want to rebuild authorities and biblios side by side, and lock
> file would just complicate that.

Mmm that's a good point. Yeah, I'll withdraw my concern about it as well in
that case.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Martin Renvoize  changed:

   What|Removed |Added

  Attachment #86057|0   |1
is obsolete||

--- Comment #69 from Martin Renvoize  ---
Created attachment 89113
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=89113=edit
Bug 21872: Fix name of rebuild_elasticsearch.pl

Signed-off-by: Josef Moravec 
Signed-off-by: Martin Renvoize 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Martin Renvoize  changed:

   What|Removed |Added

  Attachment #86053|0   |1
is obsolete||

--- Comment #65 from Martin Renvoize  ---
Created attachment 89109
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=89109=edit
Bug 21872: Add multiprocess support to Elasticsearch indexing utility

Test plan:
1. Time execution without -p parameter
2. Time execution with -p 2 or -p3 or -p 4 depending on CPU core count

Signed-off-by: Josef Moravec 
Signed-off-by: Martin Renvoize 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Martin Renvoize  changed:

   What|Removed |Added

  Attachment #86055|0   |1
is obsolete||

--- Comment #67 from Martin Renvoize  ---
Created attachment 89111
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=89111=edit
Bug 21872: Remove duplicate modulo condition in authorities iterator

Signed-off-by: Josef Moravec 
Signed-off-by: Martin Renvoize 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Martin Renvoize  changed:

   What|Removed |Added

  Attachment #86054|0   |1
is obsolete||

--- Comment #66 from Martin Renvoize  ---
Created attachment 89110
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=89110=edit
Bug 21872: Simplify conditions and exit on invalid combination of arguments

Change to zero based indexing for slice index to simplify some
conditions. Exit with error message if trying to combine processes
and biblio numbers arguments.

Signed-off-by: Josef Moravec 
Signed-off-by: Martin Renvoize 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Martin Renvoize  changed:

   What|Removed |Added

  Attachment #86056|0   |1
is obsolete||

--- Comment #68 from Martin Renvoize  ---
Created attachment 89112
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=89112=edit
Bug 21872: Add support for -p parameter to koha-elasticsearch

Signed-off-by: Josef Moravec 
Signed-off-by: Martin Renvoize 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Martin Renvoize  changed:

   What|Removed |Added

 Status|Signed Off  |Passed QA

--- Comment #64 from Martin Renvoize  ---
Hi Ere, 

Yeah, I've been digging further into this code and I'd entirely
forgotten/overlooked that this script really is intended as a human interface
and that the regular indexing is actually handled live  rather than this script
running as a daemon or under cron.. Don't worry about a lock file at all,
apologies for my not realising that earlier (seems I still have more to learn
about ES than I thought)

Given the feedback I've had above I'm now confident that the issues have been
thought through and appear to have been handled appropriately.

I'm going to go ahead and PQA, thanks for all the efforts everyone and for the
responses to queries.

Great to see this one going through.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #63 from Ere Maijala  ---
...and if you really feel that lock file should be added, let's make that a
separate bug. It's not as simple as I first thought at least if you use the
same mechanism as rebuild_zebra.pl.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

 Status|Failed QA   |Signed Off

--- Comment #62 from Ere Maijala  ---
On a second thought, I'd leave lock file out. Unlike rebuild_zebra, there's
normally only need to run rebuild_elasticsearch manually. If you need to e.g.
cron it for some reason, an external locking mechanism can be used. Also, you
may want to rebuild authorities and biblios side by side, and lock file would
just complicate that.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #61 from Ere Maijala  ---
With Elasticsearch the only race condition I can think of would be running
indexing with -d while another indexing run is going. Otherwise it's just waste
of resources. That said, adding a lock file makes sense. I'll do that.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #60 from David Cook  ---
(In reply to Martin Renvoize from comment #57)
> Finally, I'll be looking into the possibility of race conditions being
> introduced with this.  We had to introduce lock files for the zebra indexer
> as overlapping runs of the script could cause problems, especially with the
> query that got the list of bib/auths to index during each run.  I'm vaguely
> feeling that might also be a problem here, but I'm not entirely sure yet as
> I'm still looking at how the iterator is being built.
> 

I am also concerned about there not being a lock file. I suppose I'm less
concerned about race conditions so much as accidentally running multiple
indexing runs before the first has even completed.

I was thinking about the scenario you mentioned where the parent process dies
and there's multiple child processes. I would be concerned that the lock would
be lost when the parent dies, although
https://perldoc.perl.org/functions/flock.html says that locks are inherited
across fork calls. In hindsight, I was thinking about the fork and exec
(http://www.wumpus-cave.net/2014/04/21/underappreciated-perl-passing-file-descriptors/),
but that shouldn't be an issue here.

So yeah... I think adding a lock file would be trivial but very worthwhile.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #59 from David Cook  ---
(In reply to Martin Renvoize from comment #57)
> 1) The 'die' on fork failure isn't cleaning up after itself.. imagine a case
> where we want to sporn 5 subprocesses, it we get to process 4 and then run
> out of memory for example. The parent script will die and leave behind
> zombie child processes.

I don't think that this is an actual problem. I think that this happens all the
time. The parent dies, the init process (ie 1) becomes the new parent, and it
reaps the children when they complete. I don't even think they actually do
become zombie child processes in this process. This is also the same process
used to daemonize a process.

Where you run into a problem with zombie child processes is when the parent
lives, the child exits, and the parent doesn't reap the child, which means that
you have zombie child processes filling up your process table. That's a real
problem. 

> 2) It doesn't look like there's any form of signal handling here and as such
> a CTRL+C for example could end up leaving zombie processes too.
> 

You don't need any signal handling. If you do a CTRL+C on the parent process,
it'll cascade down through the child processes, because they'll share the same
process group ID. 

So a CTRL+C won't leave zombie child processes. Even if the CTRL+C just killed
the parent and not the children (e.g. the children had set their own process
group ID after forking), then they'd just be inherited by init and cleaned up
anyway. 

> I'm also wrapping my head around the use of wait vs waitpid here.. I
> remember tripping myself up using them before, but can't remember the
> details well enough right now to be confident I've not missed something.
> 

I think wait() and waitpid(-1) are roughly equivalent? 

They could probably be more rigorous in checking that the PID returned by
wait() actually matches the child PIDs, but not the end of the world. 

Even if the parent forgot to wait and exited early, the child processes would
be cleaned up once they completed.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #58 from David Gustafsson  ---
Good points. With regard to the race condition I'm pretty sure there is none
since the data is partitioned per process. No process should ever process the
data of any of the others. Or are you talking about running multiple instances
of the script at the same time?

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Martin Renvoize  changed:

   What|Removed |Added

 Status|Signed Off  |Failed QA

--- Comment #57 from Martin Renvoize  ---
QA looking here.

I've got a couple of points to make before continuing.

1) The 'die' on fork failure isn't cleaning up after itself.. imagine a case
where we want to sporn 5 subprocesses, it we get to process 4 and then run out
of memory for example. The parent script will die and leave behind zombie child
processes.
2) It doesn't look like there's any form of signal handling here and as such a
CTRL+C for example could end up leaving zombie processes too.

I'm also wrapping my head around the use of wait vs waitpid here.. I remember
tripping myself up using them before, but can't remember the details well
enough right now to be confident I've not missed something.

Finally, I'll be looking into the possibility of race conditions being
introduced with this.  We had to introduce lock files for the zebra indexer as
overlapping runs of the script could cause problems, especially with the query
that got the list of bib/auths to index during each run.  I'm vaguely feeling
that might also be a problem here, but I'm not entirely sure yet as I'm still
looking at how the iterator is being built.

It's great to see this work however.. I'd love to see if make it into the 19.05
release.

Failing for the first issue raised above for now.

(I found https://www.perl.com/article/fork-yeah-/ pretty helpful whilst QAing
this.. it gave me the insight to spot the above issues where I may have missed
them otherwise)

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-04-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Martin Renvoize  changed:

   What|Removed |Added

 CC||martin.renvoize@ptfs-europe
   ||.com
 QA Contact||martin.renvoize@ptfs-europe
   ||.com

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-05 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Josef Moravec  changed:

   What|Removed |Added

 Status|Needs Signoff   |Signed Off

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-05 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Josef Moravec  changed:

   What|Removed |Added

  Attachment #85979|0   |1
is obsolete||

--- Comment #56 from Josef Moravec  ---
Created attachment 86057
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=86057=edit
Bug 21872: Fix name of rebuild_elasticsearch.pl

Signed-off-by: Josef Moravec 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-05 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Josef Moravec  changed:

   What|Removed |Added

  Attachment #85978|0   |1
is obsolete||

--- Comment #55 from Josef Moravec  ---
Created attachment 86056
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=86056=edit
Bug 21872: Add support for -p parameter to koha-elasticsearch

Signed-off-by: Josef Moravec 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-05 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Josef Moravec  changed:

   What|Removed |Added

  Attachment #85976|0   |1
is obsolete||

--- Comment #53 from Josef Moravec  ---
Created attachment 86054
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=86054=edit
Bug 21872: Simplify conditions and exit on invalid combination of arguments

Change to zero based indexing for slice index to simplify some
conditions. Exit with error message if trying to combine processes
and biblio numbers arguments.

Signed-off-by: Josef Moravec 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-05 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Josef Moravec  changed:

   What|Removed |Added

  Attachment #85975|0   |1
is obsolete||

--- Comment #52 from Josef Moravec  ---
Created attachment 86053
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=86053=edit
Bug 21872: Add multiprocess support to Elasticsearch indexing utility

Test plan:
1. Time execution without -p parameter
2. Time execution with -p 2 or -p3 or -p 4 depending on CPU core count

Signed-off-by: Josef Moravec 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-05 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Josef Moravec  changed:

   What|Removed |Added

  Attachment #85977|0   |1
is obsolete||

--- Comment #54 from Josef Moravec  ---
Created attachment 86055
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=86055=edit
Bug 21872: Remove duplicate modulo condition in authorities iterator

Signed-off-by: Josef Moravec 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

   Priority|P5 - low|P3

--- Comment #51 from Ere Maijala  ---
Increasing importance since this can make a huge difference in bigger
libraries.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #50 from Ere Maijala  ---
Created attachment 85979
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=85979=edit
Bug 21872: Fix name of rebuild_elasticsearch.pl

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

  Attachment #82780|0   |1
is obsolete||

--- Comment #46 from Ere Maijala  ---
Created attachment 85975
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=85975=edit
Bug 21872: Add multiprocess support to Elasticsearch indexing utility

Test plan:
1. Time execution without -p parameter
2. Time execution with -p 2 or -p3 or -p 4 depending on CPU core count

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #49 from Ere Maijala  ---
Created attachment 85978
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=85978=edit
Bug 21872: Add support for -p parameter to koha-elasticsearch

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

 Status|ASSIGNED|Needs Signoff

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

  Attachment #85121|0   |1
is obsolete||

--- Comment #48 from Ere Maijala  ---
Created attachment 85977
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=85977=edit
Bug 21872: Remove duplicate modulo condition in authorities iterator

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

  Attachment #85119|0   |1
is obsolete||

--- Comment #47 from Ere Maijala  ---
Created attachment 85976
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=85976=edit
Bug 21872: Simplify conditions and exit on invalid combination of arguments

Change to zero based indexing for slice index to simplify some
conditions. Exit with error message if trying to combine processes
and biblio numbers arguments.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #45 from Josef Moravec  ---
(In reply to Ere Maijala from comment #44)
> Right, I'll add it.

Thanks

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

 Status|Needs Signoff   |ASSIGNED

--- Comment #44 from Ere Maijala  ---
Right, I'll add it.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Josef Moravec  changed:

   What|Removed |Added

 CC||josef.mora...@gmail.com

--- Comment #43 from Josef Moravec  ---
Hi Ere,
I think that koha-elasticsearch debian script should be able to pass -p
parameter to rebuild_elastic_search.pl

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-03-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

   Assignee|koha-b...@lists.koha-commun |ere.maij...@helsinki.fi
   |ity.org |
 Status|In Discussion   |Needs Signoff

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-02-14 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #42 from David Gustafsson  ---
I have tried out the patch locally with a small number of biblios, and seems to
work just fine! Will be interesting to try out in our staging environment with
a much larger number of records.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-02-14 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #41 from David Gustafsson  ---
Found what I think was a duplicate condition in authorities iterator, and
removed it.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-02-14 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #40 from David Gustafsson  ---
Created attachment 85121
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=85121=edit
Bug 21872: Remove duplicate modulo condition in authorities iterator

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-02-14 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #39 from Ere Maijala  ---
Thanks, that makes sense!

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-02-14 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #38 from David Gustafsson  ---
Created attachment 85119
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=85119=edit
Bug 21872: Simplify conditions and exit on invalid combination of arguments

Change to zero based indexing for slice index to simplify some
conditions. Exit with error message if trying to combine processes
and biblio numbers arguments.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-02-14 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

David Gustafsson  changed:

   What|Removed |Added

  Attachment #82586|0   |1
is obsolete||

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-02-14 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

David Gustafsson  changed:

   What|Removed |Added

  Attachment #82585|0   |1
is obsolete||

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-02-14 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

David Gustafsson  changed:

   What|Removed |Added

  Attachment #82582|0   |1
is obsolete||

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-02-13 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #37 from David Gustafsson  ---
(In reply to Ere Maijala from comment #33)
> David Gustafsson, what do you think about the latest one?

Hello! Sorry about the late reply. I have been a little bit buried in non-Koha
related work the last few months. I think it looks good. I first found it a
little bit hardcore with a low-level fork implementation, but since there is no
need to spawn and wait for workers more than once when using several long lived
threads equal to the concurrency level, the code is simple enough to
understand. 

If I may make a suggestion I think starting the slice index on 0 instead of 1
and assign index using "$slice_index = $proc - 1" in the process dispatch loop
would get rid of the "$slice_modulo = 0 if ($slice_modulo == $slice_count);"
condition in the iterator.

I think I will be able to test the patch tomorrow and can provide a patch for
this change.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-01-25 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #36 from Joonas Kylmälä  ---
(In reply to david holoshka from comment #35)
> We were force to rewrite rebuild_elastic_search.pl as it just died after a
> couple days never finishing to index our 2.4 million bibliographic records.

Why did it die?

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2019-01-25 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

david holoshka  changed:

   What|Removed |Added

 CC||david.holos...@ub.lu.se

--- Comment #35 from david holoshka  ---
We were force to rewrite rebuild_elastic_search.pl as it just died after a
couple days never finishing to index our 2.4 million bibliographic records. Our
version forks a copy of the process to each machine core using biblio_metadata
based limits precalculated by the parent process (this has been upgraded since
I sent you a copy of the code, David to make sure each core gets the same
number of records to index). My old algorithm didn't distribute the load well
as the metadata ids gaps were create by biblio updates with time.  With 8 cores
the indexing completes in 50 minutes with elastic search running on the same
virtual machine. We speed up the process a great deal by accessing the metadata
table directly instead of through the iterator.  The only draw back is memory
usage due to needing to put the 952 item data (coincidentally also 2.4 million
items) in hashes.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-12-05 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #34 from Ere Maijala  ---
No problem, David Cook, the forking one turned out to be quite nice, if I may
say so.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-12-05 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #33 from Ere Maijala  ---
David Gustafsson, what do you think about the latest one?

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-12-02 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #32 from David Cook  ---
(In reply to Ere Maijala from comment #31)
> Ok, so what do you think about the latest one? Should be pretty
> straight-forward to use and no new dependencies required.

Apologies for my earlier comments. Please don't feel obligated to use forking
just because of my suggestions!

Actually, I just noticed that misc/search_tools/rebuild_elastic_search.pl
doesn't have a lock file, which seems problematic but predates this patch. 

Logging should be fine since the child processes should inherit the STDOUT file
handle...

I don't have an Elasticsearch on hand for testing but looks workable at a
glance.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

 Status|Needs Signoff   |In Discussion

--- Comment #31 from Ere Maijala  ---
Ok, so what do you think about the latest one? Should be pretty
straight-forward to use and no new dependencies required.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

 Status|In Discussion   |Needs Signoff

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

  Attachment #82599|0   |1
is obsolete||

--- Comment #30 from Ere Maijala  ---
Created attachment 82780
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=82780=edit
Bug 21872: Add multiprocess support to Elasticsearch indexing utility

Test plan:
1. Time execution without -p parameter
2. Time execution with -p 2 or -p3 or -p 4 depending on CPU core count

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-30 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #29 from Ere Maijala  ---
I'm going to whip up something hoping I can do easy forking without extra reqs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #28 from David Cook  ---
(In reply to Ere Maijala from comment #26)
> Yes. Actually, there's no incremental indexing but changes are sent to ES
> when a record is saved. It's pretty fast since ES can take the update and
> make it visible later. Rebuild is typically needed only if you change the
> indexing rules or import a lot of records somehow without indexing.

Apologies for the imprecision in my language. When I said incremental, I meant
small or individual, so I was referring to what you're describing. Glad to be
on the same page! That's great. 

That context for a rebuild makes sense too. Not something that the average user
will be doing.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-29 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #27 from David Cook  ---
(In reply to Ere Maijala from comment #25)
> For good indexing performance you need to send records to Elasticsearch in
> batches. The current default is to collect 5000 records and then commit the
> batch to ES. If we have a lot of workers that only process one record at a
> time, we also need IPC to collect the records in the main process to be able
> to update in batches.
> 

It's fairly trivial to have workers process batches rather than single records,
and IPC really isn't that hard either. 

> All that's of course possible, but I'm not sure there's any real benefit
> from the way more complex mechanism compared to the slice version.

I'm just providing an alternative suggestion. You're the one doing the real
work, so if you want to go with the slice version, then that sounds good to me.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-28 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #26 from Ere Maijala  ---
(In reply to David Cook from comment #24)
> (In reply to Ere Maijala from comment #17)
> > David, see my attached patch. The mechanism would work regardless of whether
> > it's an incremental indexing process, though there are currently no
> > parameters available to support incremental indexing since it shouldn't be
> > needed.
> > 
> 
> Admittedly I don't use Elasticsearch, but are you saying that the
> incremental indexing uses a different mechanism than this one? So
> misc/search_tools/rebuild_elastic_search.pl only it used for a total
> reindexing of the database? When is that typically required?

Yes. Actually, there's no incremental indexing but changes are sent to ES when
a record is saved. It's pretty fast since ES can take the update and make it
visible later. Rebuild is typically needed only if you change the indexing
rules or import a lot of records somehow without indexing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-28 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #25 from Ere Maijala  ---
(In reply to David Cook from comment #23)
> (In reply to Ere Maijala from comment #20)
> > That means
> > I'd rather change the script so that the main process would only feed
> > children with record ID's and the children would do all the rest.
> 
> That's what I'd think.
> 
> (In reply to Ere Maijala from comment #21)
> > Oh, but then the batching and committing of changes would become difficult.
> > On a second thought I'm not sure ForkManager is quite as suitable for the
> > task as it might seem.
> 
> Why would batching and committing changes be difficult? (That's a genuine
> question. I haven't done much hands-on with Elasticsearch and Solr indexing
> APIs myself, so happy to admit my ignorance there.)

For good indexing performance you need to send records to Elasticsearch in
batches. The current default is to collect 5000 records and then commit the
batch to ES. If we have a lot of workers that only process one record at a
time, we also need IPC to collect the records in the main process to be able to
update in batches.

All that's of course possible, but I'm not sure there's any real benefit from
the way more complex mechanism compared to the slice version.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-28 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #24 from David Cook  ---
(In reply to Ere Maijala from comment #17)
> David, see my attached patch. The mechanism would work regardless of whether
> it's an incremental indexing process, though there are currently no
> parameters available to support incremental indexing since it shouldn't be
> needed.
> 

Admittedly I don't use Elasticsearch, but are you saying that the incremental
indexing uses a different mechanism than this one? So
misc/search_tools/rebuild_elastic_search.pl only it used for a total reindexing
of the database? When is that typically required?

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-28 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #23 from David Cook  ---
(In reply to Ere Maijala from comment #20)
> That means
> I'd rather change the script so that the main process would only feed
> children with record ID's and the children would do all the rest.

That's what I'd think.

(In reply to Ere Maijala from comment #21)
> Oh, but then the batching and committing of changes would become difficult.
> On a second thought I'm not sure ForkManager is quite as suitable for the
> task as it might seem.

Why would batching and committing changes be difficult? (That's a genuine
question. I haven't done much hands-on with Elasticsearch and Solr indexing
APIs myself, so happy to admit my ignorance there.)

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-28 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #22 from David Cook  ---
(In reply to David Gustafsson from comment #19)
> Isn't PEO event-loop based and thus runs in a single thread? If so it would
> not help at all in speeding up the indexing process (except for perhaps
> committing to Elasticsearch in parallel since that does not run in perl).

POE does work off an event loop, so it does run in a single process/single
thread, but that's where POE::Wheel::Run becomes relevant. That module forks
child processes and uses pipes for bilateral communication between the parent
and children. The children do the parallel processing and the parent manages
the job/task queue for distributing work to the children. 

It could be used for Elasticsearch or Zebra really. The current rebuild scripts
are written in Perl but the Zebra one is just a wrapper around command line
tools.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-28 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #21 from Ere Maijala  ---
Oh, but then the batching and committing of changes would become difficult. On
a second thought I'm not sure ForkManager is quite as suitable for the task as
it might seem.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-28 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #20 from Ere Maijala  ---
Parallel::ForkManager is fine, but I don't think we can make it a required
module just for this, so it needs to be optional. And that would make the
script a bit more complex since it would need to accommodate for both
situations. I'm not sure if it makes sense to have a lot of slice sources since
it may cause concurrency or congestion issues on the MySQL side and there's
perhaps also the possibility of getting connection timeouts since a slice
wouldn't be processed until there are children available. That means I'd rather
change the script so that the main process would only feed children with record
ID's and the children would do all the rest.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-28 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #19 from David Gustafsson  ---
(In reply to David Cook from comment #15)
> (In reply to Joonas Kylmälä from comment #2)
> > (In reply to David Cook from comment #1)
> > > I'll just split hairs and mentioning that multithreading in Perl is not
> > > recommended and never really done, but you could achieve the thing by
> > > forking workers. 
> > 
> > Thanks for making the distinction.
> > 
> > > 
> > > In #10662, I use the following modules to perform rapid event-driven
> > > processing of job queues:
> > > 
> > > https://metacpan.org/pod/POE::Component::JobQueue
> > > https://metacpan.org/pod/POE::Wheel::Run
> > 
> > The Parallel::ForkManager is also used already in Koha so it would be worth
> > to take look if it could be used with the indexing code as it looks super
> > simple!
> 
> Parallel::ForkManager is only used in the tests at the moment and it's
> marked as a non-required dependency, but... it is marked as a dependency in
> Koha and I do see it in the debian/control file as well, so I suppose a
> person could use it. 
> 
> The nice thing about POE::Wheel::Run is that it uses bilateral communication
> channels between the parent and children, so you can fork off X number of
> workers and then continue to send data to the workers. Plus the event-driven
> nature of POE means that things happen really quickly. You can have the
> parent manage the queue, and have it fire off data to the children workers. 
> 
> There's even a POE::Component::* module for non-blocking HTTP requests,
> although I haven't played with it myself yet, but that could also speed
> things up with indexing ElasticSearch, but that would probably require not
> using Catmandu (which I think is Ere's plan in the long-run anyway?).

Isn't PEO event-loop based and thus runs in a single thread? If so it would not
help at all in speeding up the indexing process (except for perhaps committing
to Elasticsearch in parallel since that does not run in perl).

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-28 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #18 from David Gustafsson  ---
(In reply to Ere Maijala from comment #13)
> David, is there a compelling reason to do it with a predefined record range?
> I find it a bit complicated, and it doesn't currently work the same way for
> authorities. 
> 
> I've just attached an implementation along the lines I described earlier. It
> can be used e.g. like this:
> 
> echo -n "1,2,3" | xargs -d "," -I{} -P 3 perl
> misc/search_tools/rebuild_elastic_search.pl -v -b --slice={},3
> 
> This allows one to index the records in parallel without prior knowledge of
> the available record id's and is fairly simple in implementation.

The main reason would be that instead of for example one long lived thread per
CPU (or 4 as above) you would split up the work in many more batches that can
be balanced across CPUs with a certain concurrency level until none are left.
This could potentially distribute load more evenly assuming for example one or
more of the long living thread finishes early. But in practice they probably
would finish almost the same time, so it does not really matter if using one or
the other model.

Parallel also outputs the workers output in sequence, which could be nice, but
also not all that important.

I mainly made the patch because I knew it would be a quick and dirty way to get
a working parallel indexing.

Parallel::ForkManager looks great to me, I would probably have used it instead
of parallel if was aware of it. It would probably be quite easy to implement as
part of the rebuild script (with the slice approach) instead having to use
xargs. Then you could also use a larger number for slice to produce more
workers since ForManager has a $MAX_PROCESSES argument.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-27 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #17 from Ere Maijala  ---
David, see my attached patch. The mechanism would work regardless of whether
it's an incremental indexing process, though there are currently no parameters
available to support incremental indexing since it shouldn't be needed.

I'd rather keep this simple. I don't see the need for e.g. IPC mechanisms that
tend to complicate things for little gain. Also keep in mind that rebuilding
the index is not a daily process or such.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-27 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #16 from David Cook  ---
(In reply to Ere Maijala from comment #3)
> What I was referring to would be to just add a couple of parameters to the
> indexing that would control which records a single script would process.
> Then you'd be able to run multiple processes in parallel like this:
> 
> [...] --offset=0 --skip=3
> [...] --offset=1 --skip=3
> [...] --offset=2 --skip=3
> 
> The first one would process records 1, 4, 7...
> The second one would process records 2, 5, 8...
> The third one would process records 3, 6, 9...

That would be easier than building a new higher performance indexer...

How would you know the offsets in an automated way, or are you thinking about
this more for just manual use? 

Are you talking about a total rebuild or incremental indexing?

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-27 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #15 from David Cook  ---
(In reply to Joonas Kylmälä from comment #2)
> (In reply to David Cook from comment #1)
> > I'll just split hairs and mentioning that multithreading in Perl is not
> > recommended and never really done, but you could achieve the thing by
> > forking workers. 
> 
> Thanks for making the distinction.
> 
> > 
> > In #10662, I use the following modules to perform rapid event-driven
> > processing of job queues:
> > 
> > https://metacpan.org/pod/POE::Component::JobQueue
> > https://metacpan.org/pod/POE::Wheel::Run
> 
> The Parallel::ForkManager is also used already in Koha so it would be worth
> to take look if it could be used with the indexing code as it looks super
> simple!

Parallel::ForkManager is only used in the tests at the moment and it's marked
as a non-required dependency, but... it is marked as a dependency in Koha and I
do see it in the debian/control file as well, so I suppose a person could use
it. 

The nice thing about POE::Wheel::Run is that it uses bilateral communication
channels between the parent and children, so you can fork off X number of
workers and then continue to send data to the workers. Plus the event-driven
nature of POE means that things happen really quickly. You can have the parent
manage the queue, and have it fire off data to the children workers. 

There's even a POE::Component::* module for non-blocking HTTP requests,
although I haven't played with it myself yet, but that could also speed things
up with indexing ElasticSearch, but that would probably require not using
Catmandu (which I think is Ere's plan in the long-run anyway?).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-23 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

  Attachment #82598|0   |1
is obsolete||

--- Comment #14 from Ere Maijala  ---
Created attachment 82599
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=82599=edit
Bug 21872: Add slice parameter to rebuild_elastic_search.pl

The slice parameter allows one to define a slice of the records to index for
parallel processing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-23 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

Ere Maijala  changed:

   What|Removed |Added

 Status|NEW |In Discussion

--- Comment #13 from Ere Maijala  ---
David, is there a compelling reason to do it with a predefined record range? I
find it a bit complicated, and it doesn't currently work the same way for
authorities. 

I've just attached an implementation along the lines I described earlier. It
can be used e.g. like this:

echo -n "1,2,3" | xargs -d "," -I{} -P 3 perl
misc/search_tools/rebuild_elastic_search.pl -v -b --slice={},3

This allows one to index the records in parallel without prior knowledge of the
available record id's and is fairly simple in implementation.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-23 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #12 from Ere Maijala  ---
Created attachment 82598
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=82598=edit
Bug 21872: Add slice parameter to rebuild_elastic_search.pl

The slice parameter allows one to define a slice of the records to index for
parallel processing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #11 from David Gustafsson  ---
I might as well finish to so there is a working (hopefully) proof of concept.
I'm sure there are lots of minor things that needs fixing before this would be
ready for sign off, but I think the current code should work (at least for me
it does).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #10 from David Gustafsson  ---
Created attachment 82586
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=82586=edit
Bug 21872: Add parallel_rebuild_biblios.pl script

Add parallel_rebuild_biblios.pl script for rebuild biblios index in
parallel. Adjust some option names and script pods.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

David Gustafsson  changed:

   What|Removed |Added

  Attachment #82584|0   |1
is obsolete||

--- Comment #9 from David Gustafsson  ---
Created attachment 82585
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=82585=edit
Bug 21872: Fix some issues with record_batches.pl

Fix behavior for case where no biblios or just one biblio exists.
Improve performance by only selecting items greater than last end
of range instead of increasing offset by batch size. Also don't use
underscore for option names.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

David Gustafsson  changed:

   What|Removed |Added

  Attachment #82583|0   |1
is obsolete||

--- Comment #8 from David Gustafsson  ---
Created attachment 82584
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=82584=edit
Bug 21872: Fix some issues with record_batches.pl

Fix behavior for case where no biblios or just one biblio exists.
Improve performance by only selecting items greater than last end
of range instead of increasing offset by batch size. Also don't use
underscore for option names.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #7 from David Gustafsson  ---
Created attachment 82583
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=82583=edit
Bug 21872: Fix some issues with record_batches.pl

Fix behavior for case where no biblios or just one biblio exists.
Improve performance by only selecting items greater than last end
of range instead of increasing offset by batch size. Also don't use
underscore for option names.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #6 from David Gustafsson  ---
Something like this. Now all that is needed is to create the wrapper script.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #5 from David Gustafsson  ---
Created attachment 82582
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=82582=edit
Bug 21872: Elasticsearch indexing faster by making it multi-threaded

Add record_batches script for generating biblionumber bathes and
add --start-bnumber and --end-bnumber to rebuild_elastic_search.pl
script.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #4 from David Gustafsson  ---
A really simple method of achieving this could be to use GNU parallel to run
multiple instances of rebuild_elastic_search.pl script if it where to accept
--start-biblionum --end-biblionum options.

I have already made a script to generate batches utilized in a parallel export
that we use:
https://github.com/ub-digit/Koha/blob/gub-dev-record-batches-script/misc/record_batches.pl

I might rewrite this script a bit since I think there are better ways to
produce batches, but it works.

It could then be used in a wrapper script for parallel running of
rebuild_elastic_search.pl like:

$KOHA_ROOT/misc/record_batches.pl | parallel --colsep ' ' -j$CONCURRENCY_LEVEL
$KOHA_ROOT/misc/search_tools/rebuild_elastic_search.pl --start-biblionum={1}
--end-biblionum={2}

The above is just pseudo-code and would have to be worked out to forward
options to rebuild_elastic_search.pl etc, but I think this would be a pretty
easy and efficient way to implement parallel indexing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #3 from Ere Maijala  ---
What I was referring to would be to just add a couple of parameters to the
indexing that would control which records a single script would process. Then
you'd be able to run multiple processes in parallel like this:

[...] --offset=0 --skip=3
[...] --offset=1 --skip=3
[...] --offset=2 --skip=3

The first one would process records 1, 4, 7...
The second one would process records 2, 5, 8...
The third one would process records 3, 6, 9...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #2 from Joonas Kylmälä  ---
(In reply to David Cook from comment #1)
> I'll just split hairs and mentioning that multithreading in Perl is not
> recommended and never really done, but you could achieve the thing by
> forking workers. 

Thanks for making the distinction.

> 
> In #10662, I use the following modules to perform rapid event-driven
> processing of job queues:
> 
> https://metacpan.org/pod/POE::Component::JobQueue
> https://metacpan.org/pod/POE::Wheel::Run

The Parallel::ForkManager is also used already in Koha so it would be worth to
take look if it could be used with the indexing code as it looks super simple!

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

2018-11-21 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

David Cook  changed:

   What|Removed |Added

 CC||dc...@prosentient.com.au

--- Comment #1 from David Cook  ---
(In reply to Joonas Kylmälä from comment #0)
> The record prosessing in misc/search_tools/rebuild_elastic_search.pl happens
> currently single threadeadly, it could be made multithreaded to take full
> advantage of multicore systems.
> 
>  says on IRC following about this: "[..] a simple way would be to add
> start offset and skip count to the indexing script so you could run multiple
> in parallel"
> 
> So in the line "while ( my $record = $next->() ) {" the next->() function
> gets called and that should be possible to multithread.

I'll just split hairs and mentioning that multithreading in Perl is not
recommended and never really done, but you could achieve the thing by forking
workers. 

In #10662, I use the following modules to perform rapid event-driven processing
of job queues:

https://metacpan.org/pod/POE::Component::JobQueue
https://metacpan.org/pod/POE::Wheel::Run

Another option would be to use a message queue and separate workers for doing
the indexing. 

Just a thought.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/