Re: Creating a Ladino spell-checker and including it in OS projects

2022-06-17 Thread Dan Kenigsberg
On Wed, Jun 15, 2022 at 10:23:47AM +0300, Gabor Szabo wrote:
> On Mon, Jun 13, 2022 at 9:00 AM Dan Kenigsberg 
> wrote:
> 
> > On Sun, Jun 12, 2022 at 04:24:27PM +0300, Gabor Szabo wrote:
> > > Hi Dan,
> > >
> > >
> > > On Tue, Jun 7, 2022 at 11:18 PM Dan Kenigsberg  > >
> > > wrote:
> > >
> > > > On Wed, Jun 01, 2022 at 06:47:41AM +0300, Gabor Szabo wrote:
> > > > > Hi,
> > > > >
> > > > > I've been working on an online Ladino (Judeo-Espanyol) dictionary
> > > > > https://diksionaryo.szabgab.com/ The code is open source the
> > content is
> > > > > CC BY-SA 4.0  https://creativecommons.org/licenses/by-sa/4.0/
> > > > > All linked from the About page.
> > > > > Along with the creation of the translation I also have a (growing)
> > list
> > > > of
> > > > > ladino words.
> > > > >
> > > > > I would like to make this available as a spell checker in various
> > Open
> > > > > Source tools.
> > > > > E.g. Firefox, Chromium, LibreOffice etc.
> > > > > I wrote about it a few weeks ago
> > > > > https://szabgab.com/add-spellchecker-to-various-applications.html
> > but I
> > > > am
> > > > > still unclear what and how to do.
> > > > >
> > > > > I started to generate a pair of files that resemble the format of
> > hspell,
> > > > > but I don't know how to really test them and in any case they don't
> > seem
> > > > to
> > > > > work well.
> > > > > I also don't know how to distribute what I already have and how to
> > make
> > > > it
> > > > > included in those projects.
> > > > >
> > > > > Anyone here has experience with spell-checkers?
> > > > > Could anyone help me in the project or at least point me in the right
> > > > > direction?
> > > >
> > > > Well, if I were you, I'd start by creating a github repository with
> > your
> > > > code and a tagged version of your artifacts, these .aff and .dic files
> > > > used by hunspell.

I think that if you do so ^^^ you'd have a stable URL to share.

> > > >
> > >
> > > It is being generated now on every push:
> > > https://github.com/szabgab/ladino-diksionaryo-generated/
> >
> > Thanks for the URL. But where are the artifacts? They probably hide in
> > plain sight... Can you provide a URL to the .aff/.dic files?
> >
> 
> Oh well, GitHub can be tricky sometimes :)
> (I think this is the direct link to download the zip of the two files:
> https://github.com/szabgab/ladino-diksionaryo-generated/suites/6938585306/artifacts/270273527
> )
> 
> Manually:
> Visit the project repo:
> https://github.com/szabgab/ladino-diksionaryo-generated/
> click on "Actions"
> then on the job that created the artifact (in this case it is called CI)
> There you'll have the artifacts of the project
> e.g. This is the direct link to a recent build
> https://github.com/szabgab/ladino-diksionaryo-generated/actions/runs/2500028854
> 
> If you now click on "hunspell" it will download the two files in a zip.
> 
> AFAIK the artifacts are removed after a few weeks so these links will be
> gone, but the desription above should still work.
> 
> 
> 
> >
> > >
> > > I can put on some tags if you think they are important for some reason,
> > but
> > > I don't have specific release points.
> > > Every change in the dictionary triggers the re-build of the whole web
> > site
> > > and the two files as well.
> > >
> > >
> > >
> > > > This would let anyone with high-enough motivation the ability to test
> > it
> > > > on their own machine (I may volunteer).
> > > >
> > >
> > > I'd really like to know how do you (or some else) test it.
> >
> > `hunspell -D` shows where you can drop the files; then `hunspell -d
> > language` would lets me spell-check a text, say a random page from
> > https://lad.wikipedia.org.
> >
> >
> Thanks. And yeah, the ladino version of vikipedia is quite bad - as I am
> told - as it is written mostly by spanish speakers
> who include a lot of words from modern spanish instead of using Ladino.
> That's another project to work on to fix that :)

I can say that it works well on my box. Nice!

I suppose that the coverage should improve. For example, I think you can
add Astronomiya to the lexicon. But I cannot judge Ladino correctness in
any way...

In hspell it took us several years of pain-stakingly spellchecking
Wikipedia pages to reach good coverage. But please do not let this
discourage you. Helping Ladino survive is a noble cause to follow.


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Creating a Ladino spell-checker and including it in OS projects

2022-06-16 Thread Gabor Szabo
On Thu, Jun 16, 2022 at 10:44 AM Steve Litt 
wrote:

>
> I'm building a standalone spellchecker for well-formed XML HTML5
> right now. Maybe we can correspond. I don't know how to make a
> soundsalike algorithm.


Hi Steve,

I am not sure how is your project related to the Ladino spell-checker,
but I wonder, what you isn't that extracting the text from HTML and then
running the text through an already existing spell-checker?


Gabor
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Creating a Ladino spell-checker and including it in OS projects

2022-06-16 Thread Steve Litt
Gabor Szabo said on Wed, 1 Jun 2022 06:47:41 +0300

>Hi,
>
>I've been working on an online Ladino (Judeo-Espanyol) dictionary
>https://diksionaryo.szabgab.com/ The code is open source the content is
>CC BY-SA 4.0  https://creativecommons.org/licenses/by-sa/4.0/
>All linked from the About page.
>Along with the creation of the translation I also have a (growing)
>list of ladino words.
>
>I would like to make this available as a spell checker in various Open
>Source tools.
>E.g. Firefox, Chromium, LibreOffice etc.
>I wrote about it a few weeks ago
>https://szabgab.com/add-spellchecker-to-various-applications.html but
>I am still unclear what and how to do.
>
>I started to generate a pair of files that resemble the format of
>hspell, but I don't know how to really test them and in any case they
>don't seem to work well.
>I also don't know how to distribute what I already have and how to
>make it included in those projects.
>
>Anyone here has experience with spell-checkers?
>Could anyone help me in the project or at least point me in the right
>direction?

I'm building a standalone spellchecker for well-formed XML HTML5
right now. Maybe we can correspond. I don't know how to make a
soundsalike algorithm.

Thanks,

SteveT

Steve Litt 
March 2022 featured book: Making Mental Models: Advanced Edition
http://www.troubleshooters.com/mmm

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Creating a Ladino spell-checker and including it in OS projects

2022-06-15 Thread Gabor Szabo
On Mon, Jun 13, 2022 at 9:00 AM Dan Kenigsberg 
wrote:

> On Sun, Jun 12, 2022 at 04:24:27PM +0300, Gabor Szabo wrote:
> > Hi Dan,
> >
> >
> > On Tue, Jun 7, 2022 at 11:18 PM Dan Kenigsberg  >
> > wrote:
> >
> > > On Wed, Jun 01, 2022 at 06:47:41AM +0300, Gabor Szabo wrote:
> > > > Hi,
> > > >
> > > > I've been working on an online Ladino (Judeo-Espanyol) dictionary
> > > > https://diksionaryo.szabgab.com/ The code is open source the
> content is
> > > > CC BY-SA 4.0  https://creativecommons.org/licenses/by-sa/4.0/
> > > > All linked from the About page.
> > > > Along with the creation of the translation I also have a (growing)
> list
> > > of
> > > > ladino words.
> > > >
> > > > I would like to make this available as a spell checker in various
> Open
> > > > Source tools.
> > > > E.g. Firefox, Chromium, LibreOffice etc.
> > > > I wrote about it a few weeks ago
> > > > https://szabgab.com/add-spellchecker-to-various-applications.html
> but I
> > > am
> > > > still unclear what and how to do.
> > > >
> > > > I started to generate a pair of files that resemble the format of
> hspell,
> > > > but I don't know how to really test them and in any case they don't
> seem
> > > to
> > > > work well.
> > > > I also don't know how to distribute what I already have and how to
> make
> > > it
> > > > included in those projects.
> > > >
> > > > Anyone here has experience with spell-checkers?
> > > > Could anyone help me in the project or at least point me in the right
> > > > direction?
> > >
> > > Well, if I were you, I'd start by creating a github repository with
> your
> > > code and a tagged version of your artifacts, these .aff and .dic files
> > > used by hunspell.
> > >
> >
> > It is being generated now on every push:
> > https://github.com/szabgab/ladino-diksionaryo-generated/
>
> Thanks for the URL. But where are the artifacts? They probably hide in
> plain sight... Can you provide a URL to the .aff/.dic files?
>

Oh well, GitHub can be tricky sometimes :)
(I think this is the direct link to download the zip of the two files:
https://github.com/szabgab/ladino-diksionaryo-generated/suites/6938585306/artifacts/270273527
)

Manually:
Visit the project repo:
https://github.com/szabgab/ladino-diksionaryo-generated/
click on "Actions"
then on the job that created the artifact (in this case it is called CI)
There you'll have the artifacts of the project
e.g. This is the direct link to a recent build
https://github.com/szabgab/ladino-diksionaryo-generated/actions/runs/2500028854

If you now click on "hunspell" it will download the two files in a zip.

AFAIK the artifacts are removed after a few weeks so these links will be
gone, but the desription above should still work.



>
> >
> > I can put on some tags if you think they are important for some reason,
> but
> > I don't have specific release points.
> > Every change in the dictionary triggers the re-build of the whole web
> site
> > and the two files as well.
> >
> >
> >
> > > This would let anyone with high-enough motivation the ability to test
> it
> > > on their own machine (I may volunteer).
> > >
> >
> > I'd really like to know how do you (or some else) test it.
>
> `hunspell -D` shows where you can drop the files; then `hunspell -d
> language` would lets me spell-check a text, say a random page from
> https://lad.wikipedia.org.
>
>
Thanks. And yeah, the ladino version of vikipedia is quite bad - as I am
told - as it is written mostly by spanish speakers
who include a lot of words from modern spanish instead of using Ladino.
That's another project to work on to fix that :)

Gabor
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Creating a Ladino spell-checker and including it in OS projects

2022-06-13 Thread Dan Kenigsberg
On Sun, Jun 12, 2022 at 04:24:27PM +0300, Gabor Szabo wrote:
> Hi Dan,
> 
> 
> On Tue, Jun 7, 2022 at 11:18 PM Dan Kenigsberg 
> wrote:
> 
> > On Wed, Jun 01, 2022 at 06:47:41AM +0300, Gabor Szabo wrote:
> > > Hi,
> > >
> > > I've been working on an online Ladino (Judeo-Espanyol) dictionary
> > > https://diksionaryo.szabgab.com/ The code is open source the content is
> > > CC BY-SA 4.0  https://creativecommons.org/licenses/by-sa/4.0/
> > > All linked from the About page.
> > > Along with the creation of the translation I also have a (growing) list
> > of
> > > ladino words.
> > >
> > > I would like to make this available as a spell checker in various Open
> > > Source tools.
> > > E.g. Firefox, Chromium, LibreOffice etc.
> > > I wrote about it a few weeks ago
> > > https://szabgab.com/add-spellchecker-to-various-applications.html but I
> > am
> > > still unclear what and how to do.
> > >
> > > I started to generate a pair of files that resemble the format of hspell,
> > > but I don't know how to really test them and in any case they don't seem
> > to
> > > work well.
> > > I also don't know how to distribute what I already have and how to make
> > it
> > > included in those projects.
> > >
> > > Anyone here has experience with spell-checkers?
> > > Could anyone help me in the project or at least point me in the right
> > > direction?
> >
> > Well, if I were you, I'd start by creating a github repository with your
> > code and a tagged version of your artifacts, these .aff and .dic files
> > used by hunspell.
> >
> 
> It is being generated now on every push:
> https://github.com/szabgab/ladino-diksionaryo-generated/

Thanks for the URL. But where are the artifacts? They probably hide in
plain sight... Can you provide a URL to the .aff/.dic files?

> 
> I can put on some tags if you think they are important for some reason, but
> I don't have specific release points.
> Every change in the dictionary triggers the re-build of the whole web site
> and the two files as well.
> 
> 
> 
> > This would let anyone with high-enough motivation the ability to test it
> > on their own machine (I may volunteer).
> >
> 
> I'd really like to know how do you (or some else) test it.

`hunspell -D` shows where you can drop the files; then `hunspell -d
language` would lets me spell-check a text, say a random page from
https://lad.wikipedia.org.


Regards,
Dan.


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Creating a Ladino spell-checker and including it in OS projects

2022-06-12 Thread Gabor Szabo
Hi Dan,


On Tue, Jun 7, 2022 at 11:18 PM Dan Kenigsberg 
wrote:

> On Wed, Jun 01, 2022 at 06:47:41AM +0300, Gabor Szabo wrote:
> > Hi,
> >
> > I've been working on an online Ladino (Judeo-Espanyol) dictionary
> > https://diksionaryo.szabgab.com/ The code is open source the content is
> > CC BY-SA 4.0  https://creativecommons.org/licenses/by-sa/4.0/
> > All linked from the About page.
> > Along with the creation of the translation I also have a (growing) list
> of
> > ladino words.
> >
> > I would like to make this available as a spell checker in various Open
> > Source tools.
> > E.g. Firefox, Chromium, LibreOffice etc.
> > I wrote about it a few weeks ago
> > https://szabgab.com/add-spellchecker-to-various-applications.html but I
> am
> > still unclear what and how to do.
> >
> > I started to generate a pair of files that resemble the format of hspell,
> > but I don't know how to really test them and in any case they don't seem
> to
> > work well.
> > I also don't know how to distribute what I already have and how to make
> it
> > included in those projects.
> >
> > Anyone here has experience with spell-checkers?
> > Could anyone help me in the project or at least point me in the right
> > direction?
>
> Well, if I were you, I'd start by creating a github repository with your
> code and a tagged version of your artifacts, these .aff and .dic files
> used by hunspell.
>

It is being generated now on every push:
https://github.com/szabgab/ladino-diksionaryo-generated/

I can put on some tags if you think they are important for some reason, but
I don't have specific release points.
Every change in the dictionary triggers the re-build of the whole web site
and the two files as well.



> This would let anyone with high-enough motivation the ability to test it
> on their own machine (I may volunteer).
>

I'd really like to know how do you (or some else) test it.


>
> Next, I'm afraid, comes specialized packaging for different environments.
> In hspell I added Makefile targets for hunspell-he.rpm and for Mozilla
> xpi. The former is good enough for most application on
> Fedora/CentOS/RHEL. Then you could propose the rpm to Fedora;
> or the xpi to Mozilla; or hope that someone else does it for you for
> OSes and applications you are not familiar with.
>

Thanks, I'll look into that.


>
> It's a long process, but when it works it's quite satisfying.
>
>
I am sure. The whole direction creation project will take a long time, but
if I can show more value coming out
from the project (e.g.  a spell checker in FF or Chrome)  that might
motivate more people to help out.

BTW do you, or anyone else on the list have other suggestions which OS
projects might be worth targeting
to include the Ladino spell-checker or dictionary?

regards
   Gabor
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Creating a Ladino spell-checker and including it in OS projects

2022-06-07 Thread Dan Kenigsberg
On Wed, Jun 01, 2022 at 06:47:41AM +0300, Gabor Szabo wrote:
> Hi,
> 
> I've been working on an online Ladino (Judeo-Espanyol) dictionary
> https://diksionaryo.szabgab.com/ The code is open source the content is
> CC BY-SA 4.0  https://creativecommons.org/licenses/by-sa/4.0/
> All linked from the About page.
> Along with the creation of the translation I also have a (growing) list of
> ladino words.
> 
> I would like to make this available as a spell checker in various Open
> Source tools.
> E.g. Firefox, Chromium, LibreOffice etc.
> I wrote about it a few weeks ago
> https://szabgab.com/add-spellchecker-to-various-applications.html but I am
> still unclear what and how to do.
> 
> I started to generate a pair of files that resemble the format of hspell,
> but I don't know how to really test them and in any case they don't seem to
> work well.
> I also don't know how to distribute what I already have and how to make it
> included in those projects.
> 
> Anyone here has experience with spell-checkers?
> Could anyone help me in the project or at least point me in the right
> direction?

Well, if I were you, I'd start by creating a github repository with your
code and a tagged version of your artifacts, these .aff and .dic files
used by hunspell.

This would let anyone with high-enough motivation the ability to test it
on their own machine (I may volunteer).

Next, I'm afraid, comes specialized packaging for different environments.
In hspell I added Makefile targets for hunspell-he.rpm and for Mozilla
xpi. The former is good enough for most application on
Fedora/CentOS/RHEL. Then you could propose the rpm to Fedora;
or the xpi to Mozilla; or hope that someone else does it for you for
OSes and applications you are not familiar with.

It's a long process, but when it works it's quite satisfying.

Good luck,
Dan.


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il