skipping a character is done "all the time", by resetting j. It can only skip over the begining of word characters.
>it's hard to imagine a case where empty tokens are meaningful and useful. csv or other delimited data: a,,b,c has 4 fields. one empty. there'd be no change to the state machine tokeninizing J language. You are currently not allowed to emit empty. The change would not force you to start doing so. >(from a comprehensibility point of view) to be using <;._1 or <;._2 for that its slower. and the equivalent sj matrix to <;._1 is a single "row". ________________________________ From: Raul Miller <[email protected]> To: Programming forum <[email protected]> Sent: Friday, April 7, 2017 1:49 PM Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) I am not sure that empty words make sense with ;: Each state transition is a character. So, to achieve "empty box" you would need to be skipping a character. (So, for example, right now, you could dedicate a character to be a placeholder and then remove all instances of that character from all boxes.) Anyways, the ;: handles the 'tokenizer' roll for the language, and it's hard to imagine a case where empty tokens are meaningful and useful. Presumably you do want the empties for some reason, but I am thinking it would make more sense (from a comprehensibility point of view) to be using <;._1 or <;._2 for that. Thanks, -- Raul On Fri, Apr 7, 2017 at 1:25 PM, 'Pascal Jasmin' via Programming <[email protected]> wrote: > So right now, ew when j=-1 is a syntax error. And, also currently, you can > never emit empty boxes. If for some reason the intent of your machine is to > never emit empty boxes, then that output will give you a clue that you did > not define it correctly. No current machine would be affected. The speed > boost though would be significant compared to the workarounds. You could > also check with a: e. result if you want to discard all results with an error. > > > > > ________________________________ > From: Raul Miller <[email protected]> > To: Programming forum <[email protected]> > Sent: Friday, April 7, 2017 1:12 PM > Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) > > > > I'm not sure that eliminating syntax errors to get null words is a good idea. > > -- > Raul > > > On Fri, Apr 7, 2017 at 12:35 PM, 'Pascal Jasmin' via Programming > <[email protected]> wrote: >> getting back to the idea of storing symbols by 3!:1 as delimited strings. >> This would both be an improvement in storage, and eliminate the error prone >> dependence on 10 s: >> >> I've got the esc (uses ;:) method in the latest jpp ( >> https://github.com/Pascal-J/jpp ) to get 2MB/s throughput using 2 passes, >> and an escape code to handle both embeded escapes and nulls, and null >> delimited data/symbols. >> >> Several improvements to ;: would make this significantly faster and more >> flexible: >> >> emit null when j=-1, and emitword issued (previously suggested): This >> allows null fields to easily be "parsed" (current method used is to use >> function code 2, and examine gaps in order to add nulls as a 2nd pass. >> function code 2 is slower than 0, and overhead in calculating gaps, and >> inserting nulls) >> >> Add an action code that suspends/pauses current word. Next start word will >> append to current word, skipping any characters that were scanned during >> pause. This would allow "deleting" items in the middle of a word in a >> single pass instead of using the 2 pass approach (with 2nd pass using >> function code 1). Alternatively, it could function like ev, but if ew is in >> same state, it discards the elements between startword's. >> >> >> A custom action code (one interpretation of Henry's inclination, though he >> may have thought of custom function codes) that has a way of inserting a >> character. This would allow building an escaped sequence by inserting the >> escape character prior to last seen. >> >> Custom action codes would need to return characters to include (if it is not >> an ew,ev class), newi, newj at least. A new function code would be a >> variation on 2, emit i (i-j), actioncode, though "characters to include" >> would interact direction with function codes 0 and 1. >> >> >> A powerful tool for nested structures (see parenw machine in fsm.ijs that >> builds trees from parentheses groups) would be an emitwordandIncreaseDepth >> and emitwordandDecreaseDepth actions. So, as part of the return parameters >> for custom actions would be a code for the action: (noword, word, >> WordincreaseDepth, WordDecreaseDepth, vector) >> >> >> >> >> ________________________________ >> From: 'Pascal Jasmin' via Programming <[email protected]> >> To: "[email protected]" <[email protected]> >> Sent: Sunday, March 19, 2017 12:38 PM >> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) >> >> >> >> idea for double nullchars doesn't work as there's no way to know if a null >> is embedded at the end of one "string" or beginning of next string. Though >> null followed by a code of the number of consecutive nulls would work. If >> there are 255 nulls, the code 255 0 would be used. 510 consecutive nulls >> 255 255 0... >> >> >> >> >> ________________________________ >> From: 'Pascal Jasmin' via Programming <[email protected]> >> To: "[email protected]" <[email protected]> >> Sent: Sunday, March 19, 2017 11:33 AM >> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) >> >> >> >> >> >> Assuming that this comes with some improvement for s: then it would be easy >> to favour that improvement. >> >> things not to like about a global symbol table is that every typo is >> included, and any "app"/set that is loaded joins that table. AFAIU, >> Corruption happens if you create symbols, and then restore a table with >> 10&s:, and so any application that relies on 10&s: can crash another >> previously "loaded application" >> >> >> A problem is that 3!:1, or 3!:3 anyway, seems to just store indexes for >> symbols, which relies on 10 s: for actual persistence. >> >> A suggestion for 3!:1 of symbols would be to scan the array containing >> symbols for null (\0), then store 2&s: if not included, or 5&s: if there is >> a \0. AFAIU, utf8 is safe to not include 0 as an extended byte. >> >> An alternative to 5&s: would be a new 8&s: where "data nulls" are encodes >> similar to embedded ' in strings. double nullchars encode a data nullchar. >> single nullchar encodes terminating 2&s: nullchar. This format c/would be >> used for 3!:1. 2&s: could be modified to be the 8&s: proposal. >> >> 10&s: could store in this new format for portability. But the problem of >> previously assigned symbols in session persists, and so a locale level >> symbol table would make the most sense for robustness. Also, an >> "application"/locale that just uses `true`false symbols (bad example but >> replace with small set of enums), would (presumably) be faster if it didn't >> share a symbol table with a very large symbol array principally used to >> avoid string fills. >> >> >> A question about symbols/3!:1... the documentation suggests that indexes are >> limited to 32bit values. Is that true for j64 too? Query (new) and query >> (old) is not completely clear in documentation either, and does that differ >> from i. or e. ? >> >> >> >> ________________________________ >> From: Henry Rich <[email protected]> >> To: Programming forum <[email protected]> >> Sent: Sunday, March 19, 2017 12:14 AM >> Subject: [Jprogramming] Show cause hearing - (10 s: y) >> >> >> >> Does anyone use (10 s: y)? >> >> >> It is problematic in that the hash table (0 s: 4) may depend on the CPU >> >> and the J release level. >> >> >> I would rather decommit (10 s: y) and have the user reload the symbol >> >> table de novo. Any objections? >> >> >> Henry Rich >> >> ---------------------------------------------------------------------- >> >> For information about J forums see http://www.jsoftware.com/forums.htm > >> >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
