Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision
I just came across the article "Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges" Per E Kummervold, Javier de la Rosa, Freddy Wetjen, Rolv-Arild Braaten and Per Erik Solberg, https://arxiv.org/pdf/2402.01917.pdf>. I found this quote particularly interesting: Although the original PyTorch training code was not released by OpenAI, a collaborative effort with HuggingFace led to an alternative implementation in the Transformers library. This has also been adapted for Jax. The project participated in developing and open-sourcing training scripts for TPU-v4-pods, enabling dynamic changes to the training data during runtime (The National Library of Norway, 2024). The reference point to https://www.github.com/NbAiLab/nostram >. I have not investigated further. Perhaps the alternative implementation can be used to make a model from scratch and provide source for the files requested by the ftpmasters? Unrelated to this, there is an alternative implementation using the whisper models called whisper.cpp, available from https://github.com/ggerganov/whisper.cpp.git >. It might be easier to package than the openai whisper implementation. -- Happy hacking Petter Reinholdtsen
Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision
>> can you please explain how I can recreate the files *.tiktoken? >> There seem to be some sources missing ... > > The two files in question are 50k lines of ASCII text that seem to be > some kind of index / vocabulary, and I have no idea how they were > created. Perhaps there is some clues to be had at the reimplementation at https://github.com/ggerganov/whisper.cpp/ - or perhaps their authors know? ...and perhaps you might find interest in packaging that C++ reimplementation too/instead? ;-) - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ * Sponsorship: https://ko-fi.com/drjones [x] quote me freely [ ] ask before reusing [ ] keep private signature.asc Description: signature
Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision
The upload to contrib / experimental was rejected by the ftpmasters with the following comment: > can you please explain how I can recreate the files *.tiktoken? There > seem to be some sources missing ... The two files in question are 50k lines of ASCII text that seem to be some kind of index / vocabulary, and I have no idea how they were created. I suspect they might be an artifact of the model training, but do not know. Anyone got a clue to spare on how these were created and how to rebuild them? If we lack the source to rebuild them, I currently believe the whisper package will have to go to non-free, not contrib. Any help to figure this out would be most appreciated. -- Happy hacking Petter Reinholdtsen
Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision
Control: retitle -1 ITP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision I have decided to upload this package to experimental under the unbrella of the Deep Learning Team. I suspect it should go into contrib because of the state of its neural network models. Not quite sure how to handle the models. Perhaps create a non-free package with one model, or simply ask people to download the model individually? -- Happy hacking Petter Reinholdtsen
Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision
Draft packaging for OpenAI Whisper is now available from https://salsa.debian.org/deeplearning-team/openai-whisper >. I dropped the dependency for ffmpeg-python, due to an inactive ffmpeg-python upstream and no real need for this dependency. The package build and work, but will download the requested model from the Internet on first invocation and store it in ~/.cache/whisper/. -- Happy hacking Petter Reinholdtsen
Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision
[Petter Reinholdtsen] > I created a draft build setup for tiktoken in > https://salsa.debian.org/pere/tiktoken >. It currently build but > I am not convinced it is working. The repository has been moved to https://salsa.debian.org/deeplearning-team/tiktoken >. I have also started on packaging for triton, which is available from https://salsa.debian.org/deeplearning-team/triton >. -- Happy hacking Petter Reinholdtsen
Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision
I have also created a draft build setup for ffmpeg-python in https://salsa.debian.org/pere/ffmpeg-python >. It currently build but I am not convinced it is working. I've asked upstream for a new release, https://github.com/kkroening/ffmpeg-python/issues/760 >, as the last release was in 2019. I've also discovered that Whisper depend on triton, https://github.com/openai/triton >. Since I started looking at this, I have found the Unofficial Policy for Debian & Machine Learning, https://salsa.debian.org/deeplearning-team/ml-policy >, which seem relevant for how to handle Whisper in Debian. -- Happy hacking Petter Reinholdtsen
Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision
I created a draft build setup for tiktoken in https://salsa.debian.org/pere/tiktoken >. It currently build but I am not convinced it is working. -- Happy hacking Petter Reinholdtsen
Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision
Package: wnpp Severity: wishlist Package name: whisper Version : v20230314 Upstream Author : OpenAI URL : https://github.com/openai/whisper License : MIT Programming Lang: Python Description : Robust Speech Recognition via Large-Scale Weak Supervision Whisper provide speech to text conversion using a neural network model created by OpenAI. The required packages are today available using pip, and as far as I can see from the dependencies, tiktoken[1] and ffmpeg-python[2] are currently missing from Debian. [1] https://pypi.org/project/tiktoken/ > and https://github.com/openai/tiktoken > [2] https://pypi.org/project/ffmpeg-python/ > and https://github.com/kkroening/ffmpeg-python > -- Happy hacking Petter Reinholdtsen