Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

2024-02-15 Thread Petter Reinholdtsen


I just came across the article "Whispering in Norwegian: Navigating
Orthographic and Dialectic Challenges" Per E Kummervold, Javier de la
Rosa, Freddy Wetjen, Rolv-Arild Braaten and Per Erik Solberg,
https://arxiv.org/pdf/2402.01917.pdf>.

I found this quote particularly interesting:

  Although the original PyTorch training code was not released by
  OpenAI, a collaborative effort with HuggingFace led to an alternative
  implementation in the Transformers library.  This has also been
  adapted for Jax. The project participated in developing and
  open-sourcing training scripts for TPU-v4-pods, enabling dynamic
  changes to the training data during runtime (The National Library of
  Norway, 2024).

The reference point to https://www.github.com/NbAiLab/nostram >.
I have not investigated further.  Perhaps the alternative implementation
can be used to make a model from scratch and provide source for the
files requested by the ftpmasters?

Unrelated to this, there is an alternative implementation using the
whisper models called whisper.cpp, available from
https://github.com/ggerganov/whisper.cpp.git >.  It might be
easier to package than the openai whisper implementation.

-- 
Happy hacking
Petter Reinholdtsen



Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

2023-07-30 Thread Jonas Smedegaard
>> can you please explain how I can recreate the files *.tiktoken?
>> There seem to be some sources missing ...
>
> The two files in question are 50k lines of ASCII text that seem to be
> some kind of index / vocabulary, and I have no idea how they were
> created.

Perhaps there is some clues to be had at the reimplementation at
https://github.com/ggerganov/whisper.cpp/ - or perhaps their authors
know?

...and perhaps you might find interest in packaging that C++
reimplementation too/instead? ;-)


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/
 * Sponsorship: https://ko-fi.com/drjones

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

signature.asc
Description: signature


Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

2023-06-21 Thread Petter Reinholdtsen


The upload to contrib / experimental was rejected by the ftpmasters with
the following comment:

> can you please explain how I can recreate the files *.tiktoken?  There
> seem to be some sources missing ...

The two files in question are 50k lines of ASCII text that seem to be
some kind of index / vocabulary, and I have no idea how they were
created.  I suspect they might be an artifact of the model training, but
do not know.  Anyone got a clue to spare on how these were created and
how to rebuild them?  If we lack the source to rebuild them, I currently
believe the whisper package will have to go to non-free, not contrib.
Any help to figure this out would be most appreciated.

-- 
Happy hacking
Petter Reinholdtsen



Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

2023-04-17 Thread Petter Reinholdtsen


Control: retitle -1 ITP: whisper -- Robust Speech Recognition via Large-Scale 
Weak Supervision

I have decided to upload this package to experimental under the unbrella
of the Deep Learning Team.  I suspect it should go into contrib because
of the state of its neural network models.

Not quite sure how to handle the models.  Perhaps create a non-free
package with one model, or simply ask people to download the model
individually?

-- 
Happy hacking
Petter Reinholdtsen



Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

2023-04-16 Thread Petter Reinholdtsen


Draft packaging for OpenAI Whisper is now available from
https://salsa.debian.org/deeplearning-team/openai-whisper >.

I dropped the dependency for ffmpeg-python, due to an inactive
ffmpeg-python upstream and no real need for this dependency.

The package build and work, but will download the requested model from
the Internet on first invocation and store it in ~/.cache/whisper/.

-- 
Happy hacking
Petter Reinholdtsen



Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

2023-04-12 Thread Petter Reinholdtsen
[Petter Reinholdtsen]
> I created a draft build setup for tiktoken in
> https://salsa.debian.org/pere/tiktoken >.  It currently build but
> I am not convinced it is working.

The repository has been moved to
https://salsa.debian.org/deeplearning-team/tiktoken >.
I have also started on packaging for triton, which is available from
https://salsa.debian.org/deeplearning-team/triton >.

-- 
Happy hacking
Petter Reinholdtsen



Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

2023-04-10 Thread Petter Reinholdtsen


I have also created a draft build setup for ffmpeg-python in
https://salsa.debian.org/pere/ffmpeg-python >.  It currently build
but I am not convinced it is working.  I've asked upstream for a new
release, https://github.com/kkroening/ffmpeg-python/issues/760 >,
as the last release was in 2019.

I've also discovered that Whisper depend on triton,
https://github.com/openai/triton >.

Since I started looking at this, I have found the Unofficial Policy for
Debian & Machine Learning, 
https://salsa.debian.org/deeplearning-team/ml-policy >,
which seem relevant for how to handle Whisper in Debian.

-- 
Happy hacking
Petter Reinholdtsen



Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

2023-04-08 Thread Petter Reinholdtsen


I created a draft build setup for tiktoken in
https://salsa.debian.org/pere/tiktoken >.  It currently build but
I am not convinced it is working.

-- 
Happy hacking
Petter Reinholdtsen



Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

2023-04-08 Thread Petter Reinholdtsen


Package: wnpp
Severity: wishlist

  Package name: whisper
  Version : v20230314
  Upstream Author : OpenAI
  URL : https://github.com/openai/whisper
  License : MIT
  Programming Lang: Python
  Description : Robust Speech Recognition via Large-Scale Weak Supervision

Whisper provide speech to text conversion using a neural network model
created by OpenAI.  The required packages are today available using pip,
and as far as I can see from the dependencies, tiktoken[1] and
ffmpeg-python[2] are currently missing from Debian.

 [1] https://pypi.org/project/tiktoken/ > and
 https://github.com/openai/tiktoken >
 [2] https://pypi.org/project/ffmpeg-python/ > and
 https://github.com/kkroening/ffmpeg-python >

-- 
Happy hacking
Petter Reinholdtsen