Re: FOSS tool to do general stats from text indata
>> Well if you were prepared to type a search for >> computational linguistics software into google, you would >> find several free tools available for linux listed on pages >> such as >> >> https://martinweisser.org/corpora_site/comp_ling_resources.html > > Indeed, that page has 4 hits for Unix and 3 for Linux. > >> https://www.sil.org/linguistics/linguistics-software > > Ditto 1 hit for Unix and 19 (!) for Linux. > > So, a total of 5 Unix hits and 22 Linux, all in all 27 hits, > possible duplicates not subtracted. Ah, if you are not into this field, making sense of those lists of software is like searching for a microchip in a supercomputer ... But I did find a Debian package (metapackage) named science-linguistics: $ aptitude show science-linguistics Package: science-linguistics Version: 1.14.2 State: not installed Priority: optional Section: metapackages Maintainer: Debian Science Team Architecture: all Uncompressed Size: 43.0 k Depends: science-config (= 1.14.2), science-tasks (= 1.14.2) Recommends: apertium, apertium-lex-tools, artha, cg3, collatinus, dimbl, frog, hfst, hfst-ospell, irstlm, libcld2-dev, link-grammar, lttoolbox, mbt, mbtserver, python3-pynlpl, r-cran-lexrankr, r-cran-snowballc, timbl, timblserver, ucto, uctodata, wordnet Suggests: apertium-af-nl, apertium-apy, apertium-arg, apertium-arg-cat, apertium-bel, apertium-bel-rus, apertium-br-fr, apertium-ca-it, apertium-cat, apertium-cat-srd, apertium-crh, apertium-crh-tur, apertium-cy-en, apertium-dan, apertium-dan-nor, apertium-en-ca, apertium-en-es, apertium-en-gl, apertium-eo-ca, apertium-eo-en, apertium-eo-es, apertium-eo-fr, apertium-es-ast, apertium-es-ca, apertium-es-gl, apertium-es-it, apertium-es-pt, apertium-es-ro, apertium-eu-en, apertium-eu-es, apertium-fr-ca, apertium-fr-es, apertium-fra, apertium-fra-cat, apertium-hbs, apertium-hbs-eng, apertium-hbs-mkd, apertium-hbs-slv, apertium-hin, apertium-id-ms, apertium-is-sv, apertium-isl, apertium-isl-eng, apertium-ita, apertium-kaz, apertium-kaz-tat, apertium-mk-bg, apertium-mk-en, apertium-mlt-ara, apertium-nno, apertium-nno-nob, apertium-nob, apertium-oc-ca, apertium-oc-es, apertium-oci, apertium-pol, apertium-pt-ca, apertium-pt-gl, apertium-rus, apertium-separable, apertium-sme-nob, apertium-spa, apertium-spa-arg, apertium-srd, apertium-srd-ita, apertium-swe, apertium-swe-dan, apertium-swe-nor, apertium-szl, apertium-tat, apertium-tur, apertium-ukr, apertium-urd, apertium-urd-hin, frogdata, giella-sme, libcg3-dev, libfolia-dev, libmbt0-dev, libticcutils2-dev, libtimbl3-dev, libtimbl4-dev, libtimblserver2-dev, libucto1-dev, python3-nltk, python3-snowballstemmer, python3-streamparser, python3-thinc, python3-timbl, r-cran-nlp, r-cran-tm, sequitur-g2p, spacy, travatar, wnsqlbuilder Description: Debian Science Linguistics packages This metapackage is part of the Debian Pure Blend "Debian Science" and installs packages related to Linguistics. Homepage: https://wiki.debian.org/DebianScience/ Tags: field::linguistics, role::metapackage, suite::debian -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
debian-user wrote: > Well if you were prepared to type a search for computational > linguistics software into google, you would find several > free tools available for linux listed on pages such as > > https://martinweisser.org/corpora_site/comp_ling_resources.html Indeed, that page has 4 hits for Unix and 3 for Linux. > https://www.sil.org/linguistics/linguistics-software Ditto 1 hit for Unix and 19 (!) for Linux. So, a total of 5 Unix hits and 22 Linux, all in all 27 hits, possible duplicates not subtracted. -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
Emanuel Berg wrote: > Nicholas Geovanis wrote: > > > Those books teach and discuss some of the software that's > > used. I doubt you will find them in debian's repositories. > > Of course you can do plenty of computational linguistics > > with perl or python which you already have. > > > > What is a "regular expression" which is at the heart of perl > > and python? An expression which conforms to a certain type > > of grammar. Perl and python are used directly for analyzing > > text (any old language). You are learning basic > > computational linguistics. > > Okay, but if there isn't a tool readily available I think this > is a window for a bunch of young programmers that feel the > need to show their skills. It could be a degree project in > Computer Science even, unless the Computational Linguistics > guys have their own degree projects. If so, they can borrow > FOSS and CLI from us and we'd get the tool as well when they > are done, that would be a fair trade IMO :) Well if you were prepared to type a search for computational linguistics software into google, you would find several free tools available for linux listed on pages such as https://martinweisser.org/corpora_site/comp_ling_resources.html https://www.sil.org/linguistics/linguistics-software and other pages contining reviews of such software, so perhaps you could start there rather than writing your own?
Re: FOSS tool to do general stats from text indata
>> A basic search finds this web tool: >> >> https://www.usingenglish.com/resources/text-statistics/ > > I didn't get it to work in Emacs-w3m, be it lack of JavaScript > support or something else. Anyway the page and tool claims to > do this: > > Total Word Count > Total Word Count (Excluding Common Words) > Number of Different Words > Different Words (Excluding Common Words) > Number of Paragraphs > Number of Sentences > Words per Sentence > Number of Characters (all) > Number of Characters (a-z) > Characters per Word > Syllables > Syllables per Word > > Sure, if one had a CLI tool doing that, I would say it's > certainly a good start! I have now tried it from a smartphone and it works great, It does what I say (quote) above but actually much more and more interesting things are analyzed and outputted as well, including diagrams. Alas, some output is not available unless one pays for the enhanced version - I suppose that makes it shareware, as we said in the 90s - but it still does a lot in its current state. -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
Nicholas Geovanis wrote: > Those books teach and discuss some of the software that's > used. I doubt you will find them in debian's repositories. > Of course you can do plenty of computational linguistics > with perl or python which you already have. > > What is a "regular expression" which is at the heart of perl > and python? An expression which conforms to a certain type > of grammar. Perl and python are used directly for analyzing > text (any old language). You are learning basic > computational linguistics. Okay, but if there isn't a tool readily available I think this is a window for a bunch of young programmers that feel the need to show their skills. It could be a degree project in Computer Science even, unless the Computational Linguistics guys have their own degree projects. If so, they can borrow FOSS and CLI from us and we'd get the tool as well when they are done, that would be a fair trade IMO :) -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
On Fri, Jun 30, 2023, 10:32 AM Emanuel Berg wrote: > Nicholas Geovanis wrote: > > > If you have python programming skills, you might > > consider NLTK > > Unbelievable if there are no such tools anywhere already, > but I don't have one either so maybe there aren't then? > >>> > >>> There's a big subject called computational linguistics. > >>> They have some specialized tools for what they call corpus > >>> analysis. Because you mentioned statistics you threw > >>> everyone off :-) And I really like R. > >> > >> Okay, so now we are getting somewhere. The technical term > >> and scientific field of this activity is known as > >> computational linguistics, and the guys that do that do > >> corpus analysis. Sweet! > > > > Two standard text books are Foundations of Computational > > Linguistics by R Hausser, and Computational Linguistics: An > > Introduction by R Grishman. > > > > Syntactical analysis of human and artificial (programming) > > languages is well known. But how do you attach meaning to > > the symbols? Semantics. How do you identify style and > > emphasis? These are the kind of starting points for > > computational linguistics. > > Okay, but do we have software in the Debian repositories, or > anywhere else in the Unix and FOSS world for that matter, so > we can try it out in practice? > Those books teach and discuss some of the software that's used. I doubt you will find them in debian's repositories. Of course you can do plenty of computational linguistics with perl or python which you already have. What is a "regular expression" which is at the heart of perl and python? An expression which conforms to a certain type of grammar. Perl and python are used directly for analyzing text (any old language). You are learning basic computational linguistics. -- > underground experts united > https://dataswamp.org/~incal > >
Re: FOSS tool to do general stats from text indata
Joel Roth wrote: > A basic search finds this web tool: > > https://www.usingenglish.com/resources/text-statistics/ I didn't get it to work in Emacs-w3m, be it lack of JavaScript support or something else. Anyway the page and tool claims to do this: Total Word Count Total Word Count (Excluding Common Words) Number of Different Words Different Words (Excluding Common Words) Number of Paragraphs Number of Sentences Words per Sentence Number of Characters (all) Number of Characters (a-z) Characters per Word Syllables Syllables per Word Sure, if one had a CLI tool doing that, I would say it's certainly a good start! > Otherwise, I think you'll have to write your own -- or hire > someone (like me :^) to write one for you. I have to squeeze the money out of my political organizations first ... -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
Nicholas Geovanis wrote: > If you have python programming skills, you might > consider NLTK Unbelievable if there are no such tools anywhere already, but I don't have one either so maybe there aren't then? >>> >>> There's a big subject called computational linguistics. >>> They have some specialized tools for what they call corpus >>> analysis. Because you mentioned statistics you threw >>> everyone off :-) And I really like R. >> >> Okay, so now we are getting somewhere. The technical term >> and scientific field of this activity is known as >> computational linguistics, and the guys that do that do >> corpus analysis. Sweet! > > Two standard text books are Foundations of Computational > Linguistics by R Hausser, and Computational Linguistics: An > Introduction by R Grishman. > > Syntactical analysis of human and artificial (programming) > languages is well known. But how do you attach meaning to > the symbols? Semantics. How do you identify style and > emphasis? These are the kind of starting points for > computational linguistics. Okay, but do we have software in the Debian repositories, or anywhere else in the Unix and FOSS world for that matter, so we can try it out in practice? -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
On Fri, Jun 30, 2023, 8:32 AM Emanuel Berg wrote: > Nicholas Geovanis wrote: > > >>> If you have python programming skills, you might consider > >>> NLTK > >> > >> Unbelievable if there are no such tools anywhere already, > >> but I don't have one either so maybe there aren't then? > >> > > > > There's a big subject called computational linguistics. > > They have some specialized tools for what they call corpus > > analysis. Because you mentioned statistics you threw > > everyone off :-) And I really like R. > > Okay, so now we are getting somewhere. The technical term and > scientific field of this activity is known as computational > linguistics, and the guys that do that do corpus > analysis. Sweet! > Two standard text books are Foundations of Computational Linguistics by R Hausser, and Computational Linguistics: An Introduction by R Grishman. Syntactical analysis of human and artificial (programming) languages is well known. But how do you attach meaning to the symbols? Semantics. How do you identify style and emphasis? These are the kind of starting points for computational linguistics. -- > underground experts united > https://dataswamp.org/~incal > >
Re: FOSS tool to do general stats from text indata
Nicholas Geovanis wrote: >>> If you have python programming skills, you might consider >>> NLTK >> >> Unbelievable if there are no such tools anywhere already, >> but I don't have one either so maybe there aren't then? >> > > There's a big subject called computational linguistics. > They have some specialized tools for what they call corpus > analysis. Because you mentioned statistics you threw > everyone off :-) And I really like R. Okay, so now we are getting somewhere. The technical term and scientific field of this activity is known as computational linguistics, and the guys that do that do corpus analysis. Sweet! -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
On Sat, Jun 24, 2023, 3:04 PM Emanuel Berg wrote: > Cousin Stanley wrote: > > > If you have python programming skills, you might consider > > NLTK > > Unbelievable if there are no such tools anywhere already, but > I don't have one either so maybe there aren't then? > There's a big subject called computational linguistics. They have some specialized tools for what they call corpus analysis. Because you mentioned statistics you threw everyone off :-) And I really like R. -- > underground experts united > https://dataswamp.org/~incal > >
Re: FOSS tool to do general stats from text indata
dvalin wrote: > As "stats" is a grab bag larger inside than the Tardis, > I suspect that only on that other ship with the infinite > improbability drive is a stats babelfish interpreter to be > found. For the last 30+ years, I've just thrown together > a few lines of Awk to generate the initially required stats, > then tweaked the C-like code and regexes to add the > inevitable nice-to-haves. Some result is immediate, and > dissatisfaction with completeness motivates > thetweaking/temporary_satisfaction cycle. Options are > limitless, as is needed for an undefined task [...] Haha, show us some stats then! *handclaps in anticipation* -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
On Sun, Jun 25, 2023 at 08:28:05AM +0200, Emanuel Berg wrote: > tomas wrote: > > I mean a general tool, but with options to tweak the > report included, of course. > >>> > >>> If you can bear some tweaking, R is it. > >> > >> Sure! Let's run R on this e-mail. Does it work and if so, what > >> does it say? > > > > T a generic question -- a generic answer > > R is a programming language, I'm looking for a tool that > produces stats from text [...] I give up. Cheers -- t signature.asc Description: PGP signature
Re: FOSS tool to do general stats from text indata
tomas wrote: I mean a general tool, but with options to tweak the report included, of course. >>> >>> If you can bear some tweaking, R is it. >> >> Sure! Let's run R on this e-mail. Does it work and if so, what >> does it say? > > T a generic question -- a generic answer R is a programming language, I'm looking for a tool that produces stats from text. If such a tool uses R, or any other programming language or stats engine to produce the outcome, for me as a potential user that is entirely up to them who write it. > I don't even know what you mean by "general stats" Some examples from doing stats on text are: average word lenght, most commonly used words, the longest paragraphs ... Those are simple examples, the next step it gets more interesting as it could show what is statistically unusual, that would be fun/exotic stats that a human user would probably not spot. E.g., parsing this mail, it could say "Emanuel Berg is almost always calm and collective, entirely professional in his approach, but here in the 4th paragraph of his mail he gets VISIBLY UPSET using CAPS ONLY, possibly expressing FRUSTRATION about NOT BEING UNDERSTOOD." > the sports example you put in the other mail suggests that > you want statistics gathered about a subject from written > text In the sports world they input the stats manually and that data is then crunched by computers to produce lists and neat graphics for their broadcasts. This is the first step described above. This isn't unlike for example Emacs `count-words-region' in combination with gnuplot - indeed, it is exactly the same, almost, as these chars I type now are produced manually, then Emacs could count and gnuplot could show. This first step would be neat depending on how much stuff is quantified, the more the better obviously. The second step however, that would be those "fun facts" the commentators say, these are more advanced, like, and now I just make something up, "Here is an amazing figure. Player X has the worst stats on face-offs in his team, except when the team plays on its home field and is down by two or more goals, then he is 2nd best". That second step, to have with text, would of course be even more exciting. I don't know if those crazy stats are discovered by a bunch of fanatic hockey nerds just using the "step 1 stats" in creative combinations - maybe using some sort of relational algebra approach? - _or_ if they have some stats engine that crunches the stats further to the meta-stats level, if you will, to have the weird facts pop up automatically? But yeah, if we don't even have a proper "step 1 stats" tool for text - which is impossible to believe BTW - well, obviously one can only dream of a "step 2 stats", a stats tool on the meta level ... > involves "understanding texts written in human languages", > another big can of worms (which has become somewhat > fashionable as of late). It is not about understanding, it is about finding patterns and meta-patterns, finding statistics that are themselves statistically uncommon, which is why they are interesting. Think exceptions and unexpected interrelations. Again, the best example is probably a combination of the different stats available at the "step 1 stats" level. > If it's text statistics, good statistics packages have lots > of resources. R is a good statistics package Yeah, maybe I should ask them but as Debian is such a huge system one would think someone here could show us how it, or similar software can be used on a bunch of text, for example on a mail like this. It is already a bunch of data, surely you are not saying there isn't a tool to tell us something of that data? -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
On Sat, Jun 24, 2023 at 10:00:05PM +0200, Emanuel Berg wrote: > tomas wrote: > > >> Is there a CLI and FOSS tool that creates stats from text > >> indata - e.g., > >> > >> $ txt2stats path/to/indata/*.txt > >> > >> I mean a general tool, but with options to tweak the report > >> included, of course. > > > > If you can bear some tweaking, R is it. > > Sure! Let's run R on this e-mail. Does it work and if so, what > does it say? T a generic question -- a generic answer. I don't even know what you mean by "general stats" -- the sports example you put in the other mail suggests that you want statistics gathered about a subject from written text: this is far more than "just" stats and involves "understanding texts written in human languages", another big can of worms (which has become somewhat fashionable as of late). If it's text statistics, good statistics packages have lots of resources. R is a good statistics package with a big community, so it has: https://towardsdatascience.com/a-light-introduction-to-text-analysis-in-r-ea291a9865a8?gi=001414a39e96 https://www.r-bloggers.com/2021/02/text-analysis-with-r/ https://bookdown.org/jdholster1/idsr/text-analysis.html https://m-clark.github.io/text-analysis-with-R/intro.html https://towardsdatascience.com/r-packages-for-text-analysis-ad8d86684adb?gi=4a426e671fe6 https://www.springboard.com/blog/data-science/text-mining-in-r/ https://m-clark.github.io/text-analysis-with-R/string-theory.html That said, there are others. In the Python galaxy, there is the Natural Language Toolkit https://www.nltk.org/ But your question was posed in a way that I don't even know whether I'm wasting our both times with this answer. Cheers -- t signature.asc Description: PGP signature
Re: FOSS tool to do general stats from text indata
Emanuel Berg writes: > Sure! Let's run R on this e-mail. Does it work and if so, what > does it say? Run 'apt-cache show r-base'. You will want to look at all the 'r-cran' packages for one that does what you need. -- John Hasler j...@sugarbit.com Elmwood, WI USA
Re: FOSS tool to do general stats from text indata
tomas wrote: >> Is there a CLI and FOSS tool that creates stats from text >> indata - e.g., >> >> $ txt2stats path/to/indata/*.txt >> >> I mean a general tool, but with options to tweak the report >> included, of course. > > If you can bear some tweaking, R is it. Sure! Let's run R on this e-mail. Does it work and if so, what does it say? -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
Cousin Stanley wrote: > If you have python programming skills, you might consider > NLTK Unbelievable if there are no such tools anywhere already, but I don't have one either so maybe there aren't then? -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
Joel Roth wrote: > A basic search finds this web tool: > > https://www.usingenglish.com/resources/text-statistics/ Cool, I'll get back to you when I tried it God willing ... > Otherwise, I think you'll have to write your own -- or hire > someone (like me :^) to write one for you. Surely there must be some awesome stats-from-text CLI tool in the FOSS world? What about the commercial/proprietary world? How do they do it in professional sport (e.g. NHL) where the the commentators sometimes say, "Here is some amazing stats. Some dude has now scored the most goals EVER in the last period of games that were at the time etc etc". They find out such things manually? What about the financial world? -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
paulf wrote: >>> I don't know about all of your wishlist, but gnuplot is >>> the proper tool for taking data from, say, a CSV file, and >>> putting it into graphs of various types. >> >> Well, gnuplot is great obviously but is more a tool to >> visualize data, organized data, here we need a tool to >> analyze and find patterns in data that is in its original, >> raw form. > > What you desire sounds like a job for AI. And that's beyond > my ken. It depends how you define AI, just stats from data sounds like a pretty mechanical job to me but on the other hand no one said AI can't be mechanical as in Asimov's robots for example. -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
On 2023-06-23 13:30, Emanuel Berg wrote: > Is there a CLI and FOSS tool that creates stats from text > indata - e.g., > >$ txt2stats path/to/indata/*.txt > > I mean a general tool, but with options to tweak the report > included, of course. > > To produce neat stats, maybe even figures, and generate fun > facts of the kind> If you have python programming skills, you might consider NLTK : Natural Language Toolkit google-ize NLTK for information I don't know if NLTK will lead you directly to the statistical numbers you are seeking but it will parse text, count words, provide word frequencies, enumerate sentences, etc NLTK results could be fed to other python programs for graphical analysis. My personal experience with NLTK is limited and I haven't used it for anything other than simple tests just to get a feel for it -- Stanley C. Kitching Human Being Phoenix, Arizona
Re: FOSS tool to do general stats from text indata
On Fri, Jun 23, 2023 at 10:20:50PM +0200, Emanuel Berg wrote: > Is there a CLI and FOSS tool that creates stats from text > indata - e.g., > > $ txt2stats path/to/indata/*.txt > > I mean a general tool, but with options to tweak the report > included, of course. If you can bear some tweaking, R is it. Cheers -- t signature.asc Description: PGP signature
Re: FOSS tool to do general stats from text indata
On Fri, Jun 23, 2023 at 10:20:50PM +0200, Emanuel Berg wrote: > Is there a CLI and FOSS tool that creates stats from text > indata - e.g., > > $ txt2stats path/to/indata/*.txt > > I mean a general tool, but with options to tweak the report > included, of course. > > To produce neat stats, maybe even figures, and generate fun > facts of the kind > >The longest word that occurs more frequently than 0.01 ... > >The most common words to start a sentence ... > >Average paragraph length ... > >And even more crazy facts and stuff that you never think >about until the stats tell you! > > What do we have on that area? A basic search finds this web tool: https://www.usingenglish.com/resources/text-statistics/ Otherwise, I think you'll have to write your own -- or hire someone (like me :^) to write one for you. -- Joel Roth
Re: FOSS tool to do general stats from text indata
On Fri, 23 Jun 2023 23:05:10 +0200 Emanuel Berg wrote: > paulf wrote: > > > I don't know about all of your wishlist, but gnuplot is the > > proper tool for taking data from, say, a CSV file, and > > putting it into graphs of various types. > > Well, gnuplot is great obviously but is more a tool to > visualize data, organized data, here we need a tool to analyze > and find patterns in data that is in its original, raw form. What you desire sounds like a job for AI. And that's beyond my ken. Paul -- Paul M. Foster Personal Blog: http://noferblatz.com Company Site: http://quillandmouse.com Software Projects: https://gitlab.com/paulmfoster
Re: FOSS tool to do general stats from text indata
paulf wrote: > I don't know about all of your wishlist, but gnuplot is the > proper tool for taking data from, say, a CSV file, and > putting it into graphs of various types. Well, gnuplot is great obviously but is more a tool to visualize data, organized data, here we need a tool to analyze and find patterns in data that is in its original, raw form. But just to promote gnuplot further, as I got happy just by you mentioning it, here is a cool diagram I once did: https://dataswamp.org/~incal/pimgs/comp/hits.png And some others: https://dataswamp.org/~incal/figures/gnuplot/ Not a gnuplot veteran! But absolutely cool software. -- underground experts united https://dataswamp.org/~incal
Re: FOSS tool to do general stats from text indata
On Fri, 23 Jun 2023 22:20:50 +0200 Emanuel Berg wrote: > Is there a CLI and FOSS tool that creates stats from text > indata - e.g., > > $ txt2stats path/to/indata/*.txt > > I mean a general tool, but with options to tweak the report > included, of course. > > To produce neat stats, maybe even figures, and generate fun > facts of the kind > >The longest word that occurs more frequently than 0.01 ... > >The most common words to start a sentence ... > >Average paragraph length ... > >And even more crazy facts and stuff that you never think >about until the stats tell you! > > What do we have on that area? I don't know about all of your wishlist, but gnuplot is the proper tool for taking data from, say, a CSV file, and putting it into graphs of various types. Paul -- Paul M. Foster Personal Blog: http://noferblatz.com Company Site: http://quillandmouse.com Software Projects: https://gitlab.com/paulmfoster
FOSS tool to do general stats from text indata
Is there a CLI and FOSS tool that creates stats from text indata - e.g., $ txt2stats path/to/indata/*.txt I mean a general tool, but with options to tweak the report included, of course. To produce neat stats, maybe even figures, and generate fun facts of the kind The longest word that occurs more frequently than 0.01 ... The most common words to start a sentence ... Average paragraph length ... And even more crazy facts and stuff that you never think about until the stats tell you! What do we have on that area? -- underground experts united https://dataswamp.org/~incal