That thinking sounds about right, to me. (Also... I think it's symmetric: having the tokenizer handle representations of both noun lists and character lists rather than having it handle one but not the other.)
And, we're beginning to get into the territory Roger Hui was asking about. Thanks, -- Raul On Thu, Oct 10, 2019 at 12:58 AM ethiejiesa via Programming <[email protected]> wrote: > > Thank you for the thoughtful and lucid reply. > > If you don't mind, let me check my understanding. > > So, essentially, we keep forefront the general concept of numeric words > requiring speculative tokenization in the sense you describe. > > Is the following true? Let sj be the state table as defined in the dictionary > (at the entry for dyadic ;:). Then the following produces an equivalent (up > to word partitioning) Mealy machine: > > 2 (6 0 1; 6 8 1)}sj > > In other words, we replace ev with em in the cases where numeric words > terminate immediately. > > However, reading into your reply a bit, I gather that we choose ev here since > it better codifies the overall concept of numbers as speculatively-tokenized > words. That is, among the class of equivalent Mealy machines, we want to pick > the one with the best "semantics." > > Does that sound about right? > > On Mon, Oct 07, 2019 at 11:19:41AM -0400, Raul Miller wrote: > > On Sun, Oct 6, 2019 at 10:31 PM 'B. Wilson' via Programming > > <[email protected]> wrote: > > > Thank you for the confirmation. > > > > > > So in these two cases, word splitting happens exactly the same > > > if we use em() instead, correct? Is there a particular reason to > > > *not* use em() though? As far as I can tell, the main difference > > > would be in the traces. I would really like to know if ev() was > > > chosen here with some specific intention. > > > > > > Wildly speculating, ... > > > > Speculation is actually relevant here. Speculation implemented in the > > state machine. > > > > What we've got is a state machine implementation with a one level deep > > stack. This is not enough to handle parenthesis (which can be nested > > to an arbitrary depth), but is enough to recognize when a numeric list > > ends: > > > > It ends on whitespace when the next numeric word ends with a : > > > > It ends on whitespace when the next word begins with a > > non-alphanumeric-word-forming character. > > > > It ends with a non-alphanumeric-word-forming character otherwise. > > > > Hypothetically speaking, this could have been deferred to "the > > parser", but that would have made that parser more complicated. > > > > But, also, J has several parsers. There's the sentence parser > > mentioned at https://www.jsoftware.com/help/dictionary/dicte.htm > > > > There's also the numeric parser mentioned at > > https://www.jsoftware.com/help/dictionary/dcons.htm (which parses at > > the character level, in a simplistic fashion). > > > > There's also the wd parser mentioned at > > https://www.jsoftware.com/docs/help602/user/wd_commands.htm (which > > seems to have its own state machine - though not one which is exposed > > as a part of the core language). > > > > And, generally speaking, working with textual data often involves > > building parsers... > > > > Thanks, > > > > -- > > Raul > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
