So then you can never use ;. either :P At the beggining of the day's thread, I linked to a machine that parses delimited data by also handling an escape character to embed the escape and delimiter.
2 of the 3 suggestions I made were entirely about speeding up that operation. The empty field for emit when j=-1 is one. The other is EmitPause: The next StartWord/Emitword operation group will append to last (paused) word. ________________________________ From: Raul Miller <[email protected]> To: Programming forum <[email protected]> Sent: Friday, April 7, 2017 3:20 PM Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) CSV has a lot of issues (consider, for example, quoting and escaping quotes), and if you wanted ;: to handle csv you would probably think about adding new operations to deal with the complexity. Thanks, -- Raul On Fri, Apr 7, 2017 at 3:03 PM, 'Pascal Jasmin' via Programming <[email protected]> wrote: > skipping a character is done "all the time", by resetting j. It can only > skip over the begining of word characters. > >>it's hard to imagine a case where empty tokens are meaningful and > > useful. > > csv or other delimited data: > > a,,b,c has 4 fields. one empty. > > there'd be no change to the state machine tokeninizing J language. You are > currently not allowed to emit empty. The change would not force you to start > doing so. > >>(from a comprehensibility point of view) to > > be using <;._1 or <;._2 for that > > its slower. and the equivalent sj matrix to <;._1 is a single "row". > ________________________________ > From: Raul Miller <[email protected]> > To: Programming forum <[email protected]> > Sent: Friday, April 7, 2017 1:49 PM > Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) > > > > I am not sure that empty words make sense with ;: > > Each state transition is a character. So, to achieve "empty box" you > would need to be skipping a character. > > (So, for example, right now, you could dedicate a character to be a > placeholder and then remove all instances of that character from all > boxes.) > > Anyways, the ;: handles the 'tokenizer' roll for the language, and > it's hard to imagine a case where empty tokens are meaningful and > useful. > > Presumably you do want the empties for some reason, but I am thinking > it would make more sense (from a comprehensibility point of view) to > be using <;._1 or <;._2 for that. > > Thanks, > > -- > Raul > > > > > On Fri, Apr 7, 2017 at 1:25 PM, 'Pascal Jasmin' via Programming > <[email protected]> wrote: >> So right now, ew when j=-1 is a syntax error. And, also currently, you can >> never emit empty boxes. If for some reason the intent of your machine is to >> never emit empty boxes, then that output will give you a clue that you did >> not define it correctly. No current machine would be affected. The speed >> boost though would be significant compared to the workarounds. You could >> also check with a: e. result if you want to discard all results with an >> error. >> >> >> >> >> ________________________________ >> From: Raul Miller <[email protected]> >> To: Programming forum <[email protected]> >> Sent: Friday, April 7, 2017 1:12 PM >> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) >> >> >> >> I'm not sure that eliminating syntax errors to get null words is a good idea. >> >> -- >> Raul >> >> >> On Fri, Apr 7, 2017 at 12:35 PM, 'Pascal Jasmin' via Programming >> <[email protected]> wrote: >>> getting back to the idea of storing symbols by 3!:1 as delimited strings. >>> This would both be an improvement in storage, and eliminate the error prone >>> dependence on 10 s: >>> >>> I've got the esc (uses ;:) method in the latest jpp ( >>> https://github.com/Pascal-J/jpp ) to get 2MB/s throughput using 2 passes, >>> and an escape code to handle both embeded escapes and nulls, and null >>> delimited data/symbols. >>> >>> Several improvements to ;: would make this significantly faster and more >>> flexible: >>> >>> emit null when j=-1, and emitword issued (previously suggested): This >>> allows null fields to easily be "parsed" (current method used is to use >>> function code 2, and examine gaps in order to add nulls as a 2nd pass. >>> function code 2 is slower than 0, and overhead in calculating gaps, and >>> inserting nulls) >>> >>> Add an action code that suspends/pauses current word. Next start word will >>> append to current word, skipping any characters that were scanned during >>> pause. This would allow "deleting" items in the middle of a word in a >>> single pass instead of using the 2 pass approach (with 2nd pass using >>> function code 1). Alternatively, it could function like ev, but if ew is >>> in same state, it discards the elements between startword's. >>> >>> >>> A custom action code (one interpretation of Henry's inclination, though he >>> may have thought of custom function codes) that has a way of inserting a >>> character. This would allow building an escaped sequence by inserting the >>> escape character prior to last seen. >>> >>> Custom action codes would need to return characters to include (if it is >>> not an ew,ev class), newi, newj at least. A new function code would be a >>> variation on 2, emit i (i-j), actioncode, though "characters to include" >>> would interact direction with function codes 0 and 1. >>> >>> >>> A powerful tool for nested structures (see parenw machine in fsm.ijs that >>> builds trees from parentheses groups) would be an emitwordandIncreaseDepth >>> and emitwordandDecreaseDepth actions. So, as part of the return parameters >>> for custom actions would be a code for the action: (noword, word, >>> WordincreaseDepth, WordDecreaseDepth, vector) >>> >>> >>> >>> >>> ________________________________ >>> From: 'Pascal Jasmin' via Programming <[email protected]> >>> To: "[email protected]" <[email protected]> >>> Sent: Sunday, March 19, 2017 12:38 PM >>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) >>> >>> >>> >>> idea for double nullchars doesn't work as there's no way to know if a null >>> is embedded at the end of one "string" or beginning of next string. Though >>> null followed by a code of the number of consecutive nulls would work. If >>> there are 255 nulls, the code 255 0 would be used. 510 consecutive nulls >>> 255 255 0... >>> >>> >>> >>> >>> ________________________________ >>> From: 'Pascal Jasmin' via Programming <[email protected]> >>> To: "[email protected]" <[email protected]> >>> Sent: Sunday, March 19, 2017 11:33 AM >>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) >>> >>> >>> >>> >>> >>> Assuming that this comes with some improvement for s: then it would be easy >>> to favour that improvement. >>> >>> things not to like about a global symbol table is that every typo is >>> included, and any "app"/set that is loaded joins that table. AFAIU, >>> Corruption happens if you create symbols, and then restore a table with >>> 10&s:, and so any application that relies on 10&s: can crash another >>> previously "loaded application" >>> >>> >>> A problem is that 3!:1, or 3!:3 anyway, seems to just store indexes for >>> symbols, which relies on 10 s: for actual persistence. >>> >>> A suggestion for 3!:1 of symbols would be to scan the array containing >>> symbols for null (\0), then store 2&s: if not included, or 5&s: if there is >>> a \0. AFAIU, utf8 is safe to not include 0 as an extended byte. >>> >>> An alternative to 5&s: would be a new 8&s: where "data nulls" are encodes >>> similar to embedded ' in strings. double nullchars encode a data nullchar. >>> single nullchar encodes terminating 2&s: nullchar. This format c/would be >>> used for 3!:1. 2&s: could be modified to be the 8&s: proposal. >>> >>> 10&s: could store in this new format for portability. But the problem of >>> previously assigned symbols in session persists, and so a locale level >>> symbol table would make the most sense for robustness. Also, an >>> "application"/locale that just uses `true`false symbols (bad example but >>> replace with small set of enums), would (presumably) be faster if it didn't >>> share a symbol table with a very large symbol array principally used to >>> avoid string fills. >>> >>> >>> A question about symbols/3!:1... the documentation suggests that indexes >>> are limited to 32bit values. Is that true for j64 too? Query (new) and >>> query (old) is not completely clear in documentation either, and does that >>> differ from i. or e. ? >>> >>> >>> >>> ________________________________ >>> From: Henry Rich <[email protected]> >>> To: Programming forum <[email protected]> >>> Sent: Sunday, March 19, 2017 12:14 AM >>> Subject: [Jprogramming] Show cause hearing - (10 s: y) >>> >>> >>> >>> Does anyone use (10 s: y)? >>> >>> >>> It is problematic in that the hash table (0 s: 4) may depend on the CPU >>> >>> and the J release level. >>> >>> >>> I would rather decommit (10 s: y) and have the user reload the symbol >>> >>> table de novo. Any objections? >>> >>> >>> Henry Rich >>> >>> ---------------------------------------------------------------------- >>> >>> For information about J forums see http://www.jsoftware.com/forums.htm > >> >>> >>> >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
