No, 64 for trigrams in the case of the 4-letter DNA base alphabet, 'acgt'.

The number of possible trigrams in English text is at least 26^3 = 17576,  if you admit impossible triples.  Worse if you include punctuation and differentiate
upper case from lower ...

FWIW,  I've just copied and pasted Hamlet, to misquote Shakespeare. I've
kept the stage directions - this isn't a real study! - and produced four texts:

   50 {. alltxt  NB. keeping all punctuation, but removing repeated spaces

ACT I SCENE I. Elsinore. A platform before the cas


    50 {. LtrsOnlyTxt   NB. removing all punctuation from alltxt

ACT I SCENE I Elsinore A platform before the castl


   50 {. AllLcTxt     NB. lower casing alltxt, keeping punctuation

act i scene i. elsinore. a platform before the cas


    50 {. LCLtrsOnlyTxt  NB. lower casing LtrsOnlyTxt

actisceneielsinoreaplatformbeforethecastlefrancisc


NB. Sizes of "alphabets"
   #@~.each alltxt;LtrsOnlyTxt;AllLcTxt;LCLtrsOnlyTxt
┌──┬──┬──┬──┐
│63│52│38│27│
└──┴──┴──┴──┘

NB. Sizes of Nubs of all Trigrams
   #~.3 <\ alltxt
6839
   #~.3 <\ LtrsOnlyTxt    NB. Why more?
9853
   #~.3 <\ AllLcTxt
5421
   #~.3 <\ LCLtrsOnlyTxt
3909

Mike


On 12/10/2019 23:58, 'Jim Russell' via Programming wrote:
Only 64?  So worst case ascii text summaries would be 64^ 3 rows?

On Oct 12, 2019, at 6:36 PM, 'Mike Day' via 
Programming<programm...@jsoftware.com>  wrote:

Yes, that’s me.

Sorry about the “Nb”s ... I was adding comments and trying to make the lines 
runnable.

I should have said that I wrote these example functions just for this 
discussion. They’re not well tested, nor, likely/possibly, proof against edge 
conditions such as empty datasets.

BTW, I guess the “alphabet” for trigrams of DNA sequences has 4^3 = 64 elements.

Mike



Sent from my iPad

On 12 Oct 2019, at 21:00, 'Jim Russell' via 
Programming<programm...@jsoftware.com>  wrote:



On Oct 12, 2019, at 1:52 PM, 'Mike Day' via 
Programming<programm...@jsoftware.com>  wrote:
Sorry, I wasn’t considering trigrams in my off the cuff stuff,

Mike
Thanks Mike (Day?). Can’t always tell with fourm replies until I reply…

I appreciate (and am still studying) your stuff, and learning a lot from all 
these exchanges.  I converted your reply to a script also, and so far have 
{with my comments enclosed like this}:

NB.> eg for Ric Sherlock’s example, modified for unequal sample sizes:
NB.> Apologies for non-alignment, as seen on iPad anyway.

NB. {Defs moved to front, earlier results commented, and expected to
NB. change -- due to ?, but kept to sompare shape and type.}
NB. {Still trying to understand the monad/duad stuff...}
NB. {Changed a couple of Nb.'s t0 NB.}

NB. absolute frequencies for one set
fr1 =: 3 : 0
y fr1~ /:~ ~. y
:
alf =. x
ser =. y
<:@#/.~ alf, ser ) NB. relative frequencies for one set rfr1 =: (%+/)@:fr1 NB. compare frequencies cfr =: 3 : 0 y cfr~ /:~ ~.;y : alf =. x sers=. y alf&fr1 every y ) NB. compare relative frequencies crfr =: 3 : 0 y crfr~ /:~ ~.;y NB. Default base is sorted nub of all inputs. : alf =. x sers=. y alf&rfr1 every y ) ]'X Y'=: 'actg' {~ 2 30 ?@$ 4 tttgcctataaacaatgcagaccagcacgt ggcttcaacgactccagagtcttgctgagt NB.catctaagtcgataatccacttacttccgg NB.cagcaaggacaggtgctaatacacactcgc [X =: 'actg' {~ 40?@$ 4 attaggtgccgacagaagtggccaacctcatcgacaaagg NB.ttagcacttccctcagagttacccacactagctggtgcag [each NB. in Z? [&.>
fr1 each X;Y NB. Absolute frequencies, using sorted nubs as base
+----------+-------+
|13 10 11 6|6 8 8 8|
+----------+-------+
NB.+---------+--------+
NB.|9 13 8 10|10 9 7 4|
NB.+---------+--------+
rfr1 each X;Y NB. Relative freqs
+---------------------+------------------------------+
|0.325 0.25 0.275 0.15|0.2 0.266667 0.266667 0.266667|
+---------------------+------------------------------+
NB.+--------------------+------------------------------+
NB.|0.225 0.325 0.2 0.25|0.333333 0.3 0.233333 0.133333|
NB.+--------------------+------------------------------+
NB. Comparisons
crfr X;Y NB. Absolute
0.325 0.25 0.275 0.15
0.2 0.266667 0.266667 0.266667
NB.9 13 8 10
NB.10 9 7 4
NB. load'~/user/temp.ijs'
crfr X;Y NB. Relative
0.325 0.25 0.275 0.15
0.2 0.266667 0.266667 0.266667
NB. 0.225 0.325 0.2 0.25
NB.0.333333 0.3 0.233333 0.133333
NB.You can supply your “alphabet” as Left argument..
'tagc' fr1 X NB. In your preferred order!
6 13 11 10
NB.10 9 8 13
NB.fns below sign off,{now moved to front}
NB.Mike
NB.
----------------------------------------------------------------------
For information about J forums seehttp://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums seehttp://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums seehttp://www.jsoftware.com/forums.htm



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to