Re: FOSS tool to do general stats from text indata

2023-06-30 Thread Emanuel Berg
>> Well if you were prepared to type a search for
>> computational linguistics software into google, you would
>> find several free tools available for linux listed on pages
>> such as
>>
>> https://martinweisser.org/corpora_site/comp_ling_resources.html
>
> Indeed, that page has 4 hits for Unix and 3 for Linux.
>
>> https://www.sil.org/linguistics/linguistics-software
>
> Ditto 1 hit for Unix and 19 (!) for Linux.
>
> So, a total of 5 Unix hits and 22 Linux, all in all 27 hits,
> possible duplicates not subtracted.

Ah, if you are not into this field, making sense of those
lists of software is like searching for a microchip in
a supercomputer ...

But I did find a Debian package (metapackage) named
science-linguistics:

$ aptitude show science-linguistics
Package: science-linguistics
Version: 1.14.2
State: not installed
Priority: optional
Section: metapackages
Maintainer: Debian Science Team 

Architecture: all
Uncompressed Size: 43.0 k
Depends: science-config (= 1.14.2), science-tasks (= 1.14.2)
Recommends: apertium, apertium-lex-tools, artha, cg3, collatinus, dimbl, frog,
hfst, hfst-ospell, irstlm, libcld2-dev, link-grammar, lttoolbox,
mbt, mbtserver, python3-pynlpl, r-cran-lexrankr, r-cran-snowballc,
timbl, timblserver, ucto, uctodata, wordnet
Suggests: apertium-af-nl, apertium-apy, apertium-arg, apertium-arg-cat,
  apertium-bel, apertium-bel-rus, apertium-br-fr, apertium-ca-it,
  apertium-cat, apertium-cat-srd, apertium-crh, apertium-crh-tur,
  apertium-cy-en, apertium-dan, apertium-dan-nor, apertium-en-ca,
  apertium-en-es, apertium-en-gl, apertium-eo-ca, apertium-eo-en,
  apertium-eo-es, apertium-eo-fr, apertium-es-ast, apertium-es-ca,
  apertium-es-gl, apertium-es-it, apertium-es-pt, apertium-es-ro,
  apertium-eu-en, apertium-eu-es, apertium-fr-ca, apertium-fr-es,
  apertium-fra, apertium-fra-cat, apertium-hbs, apertium-hbs-eng,
  apertium-hbs-mkd, apertium-hbs-slv, apertium-hin, apertium-id-ms,
  apertium-is-sv, apertium-isl, apertium-isl-eng, apertium-ita,
  apertium-kaz, apertium-kaz-tat, apertium-mk-bg, apertium-mk-en,
  apertium-mlt-ara, apertium-nno, apertium-nno-nob, apertium-nob,
  apertium-oc-ca, apertium-oc-es, apertium-oci, apertium-pol,
  apertium-pt-ca, apertium-pt-gl, apertium-rus, apertium-separable,
  apertium-sme-nob, apertium-spa, apertium-spa-arg, apertium-srd,
  apertium-srd-ita, apertium-swe, apertium-swe-dan, apertium-swe-nor,
  apertium-szl, apertium-tat, apertium-tur, apertium-ukr, apertium-urd,
  apertium-urd-hin, frogdata, giella-sme, libcg3-dev, libfolia-dev,
  libmbt0-dev, libticcutils2-dev, libtimbl3-dev, libtimbl4-dev,
  libtimblserver2-dev, libucto1-dev, python3-nltk,
  python3-snowballstemmer, python3-streamparser, python3-thinc,
  python3-timbl, r-cran-nlp, r-cran-tm, sequitur-g2p, spacy, travatar,
  wnsqlbuilder
Description: Debian Science Linguistics packages
 This metapackage is part of the Debian Pure Blend "Debian Science" and
 installs packages related to Linguistics.
Homepage: https://wiki.debian.org/DebianScience/
Tags: field::linguistics, role::metapackage, suite::debian

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-30 Thread Emanuel Berg
debian-user wrote:

> Well if you were prepared to type a search for computational
> linguistics software into google, you would find several
> free tools available for linux listed on pages such as
>
> https://martinweisser.org/corpora_site/comp_ling_resources.html

Indeed, that page has 4 hits for Unix and 3 for Linux.

> https://www.sil.org/linguistics/linguistics-software

Ditto 1 hit for Unix and 19 (!) for Linux.

So, a total of 5 Unix hits and 22 Linux, all in all 27 hits,
possible duplicates not subtracted.

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-30 Thread debian-user
Emanuel Berg  wrote:
> Nicholas Geovanis wrote:
> 
> > Those books teach and discuss some of the software that's
> > used. I doubt you will find them in debian's repositories.
> > Of course you can do plenty of computational linguistics
> > with perl or python which you already have.
> >
> > What is a "regular expression" which is at the heart of perl
> > and python? An expression which conforms to a certain type
> > of grammar. Perl and python are used directly for analyzing
> > text (any old language). You are learning basic
> > computational linguistics.  
> 
> Okay, but if there isn't a tool readily available I think this
> is a window for a bunch of young programmers that feel the
> need to show their skills. It could be a degree project in
> Computer Science even, unless the Computational Linguistics
> guys have their own degree projects. If so, they can borrow
> FOSS and CLI from us and we'd get the tool as well when they
> are done, that would be a fair trade IMO :)

Well if you were prepared to type a search for computational
linguistics software into google, you would find several free tools
available for linux listed on pages such as

https://martinweisser.org/corpora_site/comp_ling_resources.html
https://www.sil.org/linguistics/linguistics-software

and other pages contining reviews of such software, so perhaps you
could start there rather than writing your own?



Re: FOSS tool to do general stats from text indata

2023-06-30 Thread Emanuel Berg
>> A basic search finds this web tool: 
>>
>> https://www.usingenglish.com/resources/text-statistics/
>
> I didn't get it to work in Emacs-w3m, be it lack of JavaScript
> support or something else. Anyway the page and tool claims to
> do this:
>
>   Total Word Count
>   Total Word Count (Excluding Common Words)
>   Number of Different Words
>   Different Words (Excluding Common Words)
>   Number of Paragraphs
>   Number of Sentences
>   Words per Sentence
>   Number of Characters (all)
>   Number of Characters (a-z)
>   Characters per Word
>   Syllables
>   Syllables per Word
>
> Sure, if one had a CLI tool doing that, I would say it's
> certainly a good start!

I have now tried it from a smartphone and it works great, It
does what I say (quote) above but actually much more and more
interesting things are analyzed and outputted as well,
including diagrams.

Alas, some output is not available unless one pays for the
enhanced version - I suppose that makes it shareware, as we
said in the 90s - but it still does a lot in its
current state.

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-30 Thread Emanuel Berg
Nicholas Geovanis wrote:

> Those books teach and discuss some of the software that's
> used. I doubt you will find them in debian's repositories.
> Of course you can do plenty of computational linguistics
> with perl or python which you already have.
>
> What is a "regular expression" which is at the heart of perl
> and python? An expression which conforms to a certain type
> of grammar. Perl and python are used directly for analyzing
> text (any old language). You are learning basic
> computational linguistics.

Okay, but if there isn't a tool readily available I think this
is a window for a bunch of young programmers that feel the
need to show their skills. It could be a degree project in
Computer Science even, unless the Computational Linguistics
guys have their own degree projects. If so, they can borrow
FOSS and CLI from us and we'd get the tool as well when they
are done, that would be a fair trade IMO :)

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-30 Thread Nicholas Geovanis
On Fri, Jun 30, 2023, 10:32 AM Emanuel Berg  wrote:

> Nicholas Geovanis wrote:
>
> > If you have python programming skills, you might
> > consider NLTK
> 
>  Unbelievable if there are no such tools anywhere already,
>  but I don't have one either so maybe there aren't then?
> >>>
> >>> There's a big subject called computational linguistics.
> >>> They have some specialized tools for what they call corpus
> >>> analysis. Because you mentioned statistics you threw
> >>> everyone off :-) And I really like R.
> >>
> >> Okay, so now we are getting somewhere. The technical term
> >> and scientific field of this activity is known as
> >> computational linguistics, and the guys that do that do
> >> corpus analysis. Sweet!
> >
> > Two standard text books are Foundations of Computational
> > Linguistics by R Hausser, and Computational Linguistics: An
> > Introduction by R Grishman.
> >
> > Syntactical analysis of human and artificial (programming)
> > languages is well known. But how do you attach meaning to
> > the symbols? Semantics. How do you identify style and
> > emphasis? These are the kind of starting points for
> > computational linguistics.
>
> Okay, but do we have software in the Debian repositories, or
> anywhere else in the Unix and FOSS world for that matter, so
> we can try it out in practice?
>

Those books teach and discuss some of the software that's used. I doubt you
will find them in debian's repositories. Of course you can do plenty of
computational linguistics with perl or python which you already have.

What is a "regular expression" which is at the heart of perl and python? An
expression which conforms to a certain type of grammar. Perl and python are
used directly for analyzing text (any old language). You are learning basic
computational linguistics.

-- 
> underground experts united
> https://dataswamp.org/~incal
>
>


Re: FOSS tool to do general stats from text indata

2023-06-30 Thread Emanuel Berg
Joel Roth wrote:

> A basic search finds this web tool: 
>
> https://www.usingenglish.com/resources/text-statistics/

I didn't get it to work in Emacs-w3m, be it lack of JavaScript
support or something else. Anyway the page and tool claims to
do this:

  Total Word Count
  Total Word Count (Excluding Common Words)
  Number of Different Words
  Different Words (Excluding Common Words)
  Number of Paragraphs
  Number of Sentences
  Words per Sentence
  Number of Characters (all)
  Number of Characters (a-z)
  Characters per Word
  Syllables
  Syllables per Word

Sure, if one had a CLI tool doing that, I would say it's
certainly a good start!

> Otherwise, I think you'll have to write your own -- or hire
> someone (like me :^) to write one for you.

I have to squeeze the money out of my political organizations
first ...

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-30 Thread Emanuel Berg
Nicholas Geovanis wrote:

> If you have python programming skills, you might
> consider NLTK

 Unbelievable if there are no such tools anywhere already,
 but I don't have one either so maybe there aren't then?
>>>
>>> There's a big subject called computational linguistics.
>>> They have some specialized tools for what they call corpus
>>> analysis. Because you mentioned statistics you threw
>>> everyone off :-) And I really like R.
>>
>> Okay, so now we are getting somewhere. The technical term
>> and scientific field of this activity is known as
>> computational linguistics, and the guys that do that do
>> corpus analysis. Sweet!
>
> Two standard text books are Foundations of Computational
> Linguistics by R Hausser, and Computational Linguistics: An
> Introduction by R Grishman.
>
> Syntactical analysis of human and artificial (programming)
> languages is well known. But how do you attach meaning to
> the symbols? Semantics. How do you identify style and
> emphasis? These are the kind of starting points for
> computational linguistics.

Okay, but do we have software in the Debian repositories, or
anywhere else in the Unix and FOSS world for that matter, so
we can try it out in practice?

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-30 Thread Nicholas Geovanis
On Fri, Jun 30, 2023, 8:32 AM Emanuel Berg  wrote:

> Nicholas Geovanis wrote:
>
> >>> If you have python programming skills, you might consider
> >>> NLTK
> >>
> >> Unbelievable if there are no such tools anywhere already,
> >> but I don't have one either so maybe there aren't then?
> >>
> >
> > There's a big subject called computational linguistics.
> > They have some specialized tools for what they call corpus
> > analysis. Because you mentioned statistics you threw
> > everyone off :-) And I really like R.
>
> Okay, so now we are getting somewhere. The technical term and
> scientific field of this activity is known as computational
> linguistics, and the guys that do that do corpus
> analysis. Sweet!
>

Two standard text books are Foundations of Computational Linguistics by R
Hausser, and Computational Linguistics: An Introduction by R Grishman.

Syntactical analysis of human and artificial (programming) languages is
well known. But how do you attach meaning to the symbols? Semantics. How do
you identify style and emphasis? These are the kind of starting points for
computational linguistics.

-- 
> underground experts united
> https://dataswamp.org/~incal
>
>


Re: FOSS tool to do general stats from text indata

2023-06-30 Thread Emanuel Berg
Nicholas Geovanis wrote:

>>> If you have python programming skills, you might consider
>>> NLTK
>>
>> Unbelievable if there are no such tools anywhere already,
>> but I don't have one either so maybe there aren't then?
>>
>
> There's a big subject called computational linguistics.
> They have some specialized tools for what they call corpus
> analysis. Because you mentioned statistics you threw
> everyone off :-) And I really like R.

Okay, so now we are getting somewhere. The technical term and
scientific field of this activity is known as computational
linguistics, and the guys that do that do corpus
analysis. Sweet!

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-28 Thread Nicholas Geovanis
On Sat, Jun 24, 2023, 3:04 PM Emanuel Berg  wrote:

> Cousin Stanley wrote:
>
> > If you have python programming skills, you might consider
> > NLTK
>
> Unbelievable if there are no such tools anywhere already, but
> I don't have one either so maybe there aren't then?
>

There's a big subject called computational linguistics. They have some
specialized tools for what they call corpus analysis. Because you mentioned
statistics you threw everyone off :-)
And I really like R.

-- 
> underground experts united
> https://dataswamp.org/~incal
>
>


Re: FOSS tool to do general stats from text indata

2023-06-28 Thread Emanuel Berg
dvalin wrote:

> As "stats" is a grab bag larger inside than the Tardis,
> I suspect that only on that other ship with the infinite
> improbability drive is a stats babelfish interpreter to be
> found. For the last 30+ years, I've just thrown together
> a few lines of Awk to generate the initially required stats,
> then tweaked the C-like code and regexes to add the
> inevitable nice-to-haves. Some result is immediate, and
> dissatisfaction with completeness motivates
> thetweaking/temporary_satisfaction cycle. Options are
> limitless, as is needed for an undefined task [...]

Haha, show us some stats then!

*handclaps in anticipation*

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-25 Thread tomas
On Sun, Jun 25, 2023 at 08:28:05AM +0200, Emanuel Berg wrote:
> tomas wrote:
> 
>  I mean a general tool, but with options to tweak the
>  report included, of course.
> >>>
> >>> If you can bear some tweaking, R is it.
> >> 
> >> Sure! Let's run R on this e-mail. Does it work and if so, what
> >> does it say?
> >
> > T a generic question -- a generic answer
> 
> R is a programming language, I'm looking for a tool that
> produces stats from text [...]

I give up.

Cheers
-- 
t


signature.asc
Description: PGP signature


Re: FOSS tool to do general stats from text indata

2023-06-25 Thread Emanuel Berg
tomas wrote:

 I mean a general tool, but with options to tweak the
 report included, of course.
>>>
>>> If you can bear some tweaking, R is it.
>> 
>> Sure! Let's run R on this e-mail. Does it work and if so, what
>> does it say?
>
> T a generic question -- a generic answer

R is a programming language, I'm looking for a tool that
produces stats from text. If such a tool uses R, or any other
programming language or stats engine to produce the outcome,
for me as a potential user that is entirely up to them who
write it.

> I don't even know what you mean by "general stats"

Some examples from doing stats on text are: average word
lenght, most commonly used words, the longest paragraphs ...

Those are simple examples, the next step it gets more
interesting as it could show what is statistically unusual,
that would be fun/exotic stats that a human user would
probably not spot.

E.g., parsing this mail, it could say "Emanuel Berg is almost
always calm and collective, entirely professional in his
approach, but here in the 4th paragraph of his mail he gets
VISIBLY UPSET using CAPS ONLY, possibly expressing FRUSTRATION
about NOT BEING UNDERSTOOD."

> the sports example you put in the other mail suggests that
> you want statistics gathered about a subject from written
> text

In the sports world they input the stats manually and that
data is then crunched by computers to produce lists and neat
graphics for their broadcasts. This is the first step
described above. This isn't unlike for example Emacs
`count-words-region' in combination with gnuplot - indeed, it
is exactly the same, almost, as these chars I type now are
produced manually, then Emacs could count and gnuplot
could show.

This first step would be neat depending on how much stuff is
quantified, the more the better obviously.

The second step however, that would be those "fun facts" the
commentators say, these are more advanced, like, and now
I just make something up, "Here is an amazing figure.
Player X has the worst stats on face-offs in his team, except
when the team plays on its home field and is down by two or
more goals, then he is 2nd best".

That second step, to have with text, would of course be even
more exciting.

I don't know if those crazy stats are discovered by a bunch of
fanatic hockey nerds just using the "step 1 stats" in creative
combinations - maybe using some sort of relational algebra
approach? - _or_ if they have some stats engine that crunches
the stats further to the meta-stats level, if you will, to
have the weird facts pop up automatically?

But yeah, if we don't even have a proper "step 1 stats" tool
for text - which is impossible to believe BTW - well,
obviously one can only dream of a "step 2 stats", a stats tool
on the meta level ...

> involves "understanding texts written in human languages",
> another big can of worms (which has become somewhat
> fashionable as of late).

It is not about understanding, it is about finding patterns
and meta-patterns, finding statistics that are themselves
statistically uncommon, which is why they are interesting.
Think exceptions and unexpected interrelations. Again, the
best example is probably a combination of the different stats
available at the "step 1 stats" level.

> If it's text statistics, good statistics packages have lots
> of resources. R is a good statistics package

Yeah, maybe I should ask them but as Debian is such a huge
system one would think someone here could show us how it, or
similar software can be used on a bunch of text, for
example on a mail like this.

It is already a bunch of data, surely you are not saying there
isn't a tool to tell us something of that data?

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-24 Thread tomas
On Sat, Jun 24, 2023 at 10:00:05PM +0200, Emanuel Berg wrote:
> tomas wrote:
> 
> >> Is there a CLI and FOSS tool that creates stats from text
> >> indata - e.g.,
> >> 
> >>   $ txt2stats path/to/indata/*.txt
> >> 
> >> I mean a general tool, but with options to tweak the report
> >> included, of course.
> >
> > If you can bear some tweaking, R is it.
> 
> Sure! Let's run R on this e-mail. Does it work and if so, what
> does it say?

T a generic question -- a generic answer. I don't even know what
you mean by "general stats" -- the sports example you put in the
other mail suggests that you want statistics gathered about a
subject from written text: this is far more than "just" stats
and involves "understanding texts written in human languages",
another big can of worms (which has become somewhat fashionable
as of late).

If it's text statistics, good statistics packages have lots of
resources. R is a good statistics package with a big community,
so it has:

  
https://towardsdatascience.com/a-light-introduction-to-text-analysis-in-r-ea291a9865a8?gi=001414a39e96
  https://www.r-bloggers.com/2021/02/text-analysis-with-r/
  https://bookdown.org/jdholster1/idsr/text-analysis.html
  https://m-clark.github.io/text-analysis-with-R/intro.html
  
https://towardsdatascience.com/r-packages-for-text-analysis-ad8d86684adb?gi=4a426e671fe6
  https://www.springboard.com/blog/data-science/text-mining-in-r/
  https://m-clark.github.io/text-analysis-with-R/string-theory.html

That said, there are others. In the Python galaxy, there is
the Natural Language Toolkit

  https://www.nltk.org/

But your question was posed in a way that I don't even know
whether I'm wasting our both times with this answer.

Cheers
-- 
t


signature.asc
Description: PGP signature


Re: FOSS tool to do general stats from text indata

2023-06-24 Thread John Hasler
 Emanuel Berg writes:
> Sure! Let's run R on this e-mail. Does it work and if so, what
> does it say?

Run 'apt-cache show r-base'.  You will want to look at all the 'r-cran'
packages for one that does what you need.
-- 
John Hasler 
j...@sugarbit.com
Elmwood, WI USA



Re: FOSS tool to do general stats from text indata

2023-06-24 Thread Emanuel Berg
tomas wrote:

>> Is there a CLI and FOSS tool that creates stats from text
>> indata - e.g.,
>> 
>>   $ txt2stats path/to/indata/*.txt
>> 
>> I mean a general tool, but with options to tweak the report
>> included, of course.
>
> If you can bear some tweaking, R is it.

Sure! Let's run R on this e-mail. Does it work and if so, what
does it say?

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-24 Thread Emanuel Berg
Cousin Stanley wrote:

> If you have python programming skills, you might consider
> NLTK

Unbelievable if there are no such tools anywhere already, but
I don't have one either so maybe there aren't then?

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-24 Thread Emanuel Berg
Joel Roth wrote:

> A basic search finds this web tool:
>
> https://www.usingenglish.com/resources/text-statistics/

Cool, I'll get back to you when I tried it God willing ...

> Otherwise, I think you'll have to write your own -- or hire
> someone (like me :^) to write one for you.

Surely there must be some awesome stats-from-text CLI tool in
the FOSS world?

What about the commercial/proprietary world?

How do they do it in professional sport (e.g. NHL) where the
the commentators sometimes say, "Here is some amazing stats.
Some dude has now scored the most goals EVER in the last
period of games that were at the time etc etc". They find out
such things manually?

What about the financial world?

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-24 Thread Emanuel Berg
paulf wrote:

>>> I don't know about all of your wishlist, but gnuplot is
>>> the proper tool for taking data from, say, a CSV file, and
>>> putting it into graphs of various types.
>> 
>> Well, gnuplot is great obviously but is more a tool to
>> visualize data, organized data, here we need a tool to
>> analyze and find patterns in data that is in its original,
>> raw form.
>
> What you desire sounds like a job for AI. And that's beyond
> my ken.

It depends how you define AI, just stats from data sounds like
a pretty mechanical job to me but on the other hand no one
said AI can't be mechanical as in Asimov's robots for example.

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-24 Thread Cousin Stanley

On 2023-06-23 13:30, Emanuel Berg wrote:

> Is there a CLI and FOSS tool that creates stats from text
> indata - e.g.,
>
>$ txt2stats path/to/indata/*.txt
>
> I mean a general tool, but with options to tweak the report
> included, of course.
>
> To produce neat stats, maybe even figures, and generate fun
> facts of the kind> 

  If you have python programming skills,
  you might consider 

NLTK :  Natural Language Toolkit

  google-ize  NLTK  for information


  I don't know if NLTK will lead you directly
  to the statistical numbers you are seeking
  but it will parse text, count words,
  provide word frequencies, enumerate sentences,
  etc 

  NLTK results could be fed to other python programs
  for graphical analysis.

  My personal experience with NLTK is limited
  and I haven't used it for anything other than
  simple tests just to get a feel for it 



--
Stanley C. Kitching
Human Being
Phoenix, Arizona



Re: FOSS tool to do general stats from text indata

2023-06-23 Thread tomas
On Fri, Jun 23, 2023 at 10:20:50PM +0200, Emanuel Berg wrote:
> Is there a CLI and FOSS tool that creates stats from text
> indata - e.g.,
> 
>   $ txt2stats path/to/indata/*.txt
> 
> I mean a general tool, but with options to tweak the report
> included, of course.

If you can bear some tweaking, R is it.

Cheers
-- 
t


signature.asc
Description: PGP signature


Re: FOSS tool to do general stats from text indata

2023-06-23 Thread Joel Roth
On Fri, Jun 23, 2023 at 10:20:50PM +0200, Emanuel Berg wrote:
> Is there a CLI and FOSS tool that creates stats from text
> indata - e.g.,
> 
>   $ txt2stats path/to/indata/*.txt
> 
> I mean a general tool, but with options to tweak the report
> included, of course.
> 
> To produce neat stats, maybe even figures, and generate fun
> facts of the kind
> 
>The longest word that occurs more frequently than 0.01 ...
> 
>The most common words to start a sentence ...
> 
>Average paragraph length ...
> 
>And even more crazy facts and stuff that you never think
>about until the stats tell you!
> 
> What do we have on that area?

A basic search finds this web tool: 

https://www.usingenglish.com/resources/text-statistics/

Otherwise, I think you'll have to write your own -- or
hire someone (like me :^) to write one for you. 


-- 
Joel Roth



Re: FOSS tool to do general stats from text indata

2023-06-23 Thread paulf
On Fri, 23 Jun 2023 23:05:10 +0200
Emanuel Berg  wrote:

> paulf wrote:
> 
> > I don't know about all of your wishlist, but gnuplot is the
> > proper tool for taking data from, say, a CSV file, and
> > putting it into graphs of various types.
> 
> Well, gnuplot is great obviously but is more a tool to
> visualize data, organized data, here we need a tool to analyze
> and find patterns in data that is in its original, raw form.

What you desire sounds like a job for AI. And that's beyond my ken.

Paul

-- 
Paul M. Foster
Personal Blog: http://noferblatz.com
Company Site: http://quillandmouse.com
Software Projects: https://gitlab.com/paulmfoster



Re: FOSS tool to do general stats from text indata

2023-06-23 Thread Emanuel Berg
paulf wrote:

> I don't know about all of your wishlist, but gnuplot is the
> proper tool for taking data from, say, a CSV file, and
> putting it into graphs of various types.

Well, gnuplot is great obviously but is more a tool to
visualize data, organized data, here we need a tool to analyze
and find patterns in data that is in its original, raw form.

But just to promote gnuplot further, as I got happy just by
you mentioning it, here is a cool diagram I once did:

  https://dataswamp.org/~incal/pimgs/comp/hits.png

And some others:

  https://dataswamp.org/~incal/figures/gnuplot/

Not a gnuplot veteran! But absolutely cool software.

-- 
underground experts united
https://dataswamp.org/~incal



Re: FOSS tool to do general stats from text indata

2023-06-23 Thread paulf
On Fri, 23 Jun 2023 22:20:50 +0200
Emanuel Berg  wrote:

> Is there a CLI and FOSS tool that creates stats from text
> indata - e.g.,
> 
>   $ txt2stats path/to/indata/*.txt
> 
> I mean a general tool, but with options to tweak the report
> included, of course.
> 
> To produce neat stats, maybe even figures, and generate fun
> facts of the kind
> 
>The longest word that occurs more frequently than 0.01 ...
> 
>The most common words to start a sentence ...
> 
>Average paragraph length ...
> 
>And even more crazy facts and stuff that you never think
>about until the stats tell you!
> 
> What do we have on that area?

I don't know about all of your wishlist, but gnuplot is the proper tool
for taking data from, say, a CSV file, and putting it into graphs of
various types.

Paul

-- 
Paul M. Foster
Personal Blog: http://noferblatz.com
Company Site: http://quillandmouse.com
Software Projects: https://gitlab.com/paulmfoster



FOSS tool to do general stats from text indata

2023-06-23 Thread Emanuel Berg
Is there a CLI and FOSS tool that creates stats from text
indata - e.g.,

  $ txt2stats path/to/indata/*.txt

I mean a general tool, but with options to tweak the report
included, of course.

To produce neat stats, maybe even figures, and generate fun
facts of the kind

   The longest word that occurs more frequently than 0.01 ...

   The most common words to start a sentence ...

   Average paragraph length ...

   And even more crazy facts and stuff that you never think
   about until the stats tell you!

What do we have on that area?

-- 
underground experts united
https://dataswamp.org/~incal