Yes. I have a working implentation that takes 2 passes of different machines. The first machine uses function code 2, and then scans for gaps in order to add empty cells (arguably 2+ passes just for this pass). 2nd machine uses function code 1 on each non empty box which can skip arbitrary characters.
________________________________ From: Raul Miller <[email protected]> To: Programming forum <[email protected]> Sent: Friday, April 7, 2017 6:34 PM Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) I'd probably use ;: for csv parsing, but I'd leave all characters in place (including any leading comma - and I'd be sure to prefix each line with a comma) and then post-process the contents of each box with a second ;: (which I would then raze). But that's just me. Thanks, -- Raul On Fri, Apr 7, 2017 at 6:26 PM, 'Pascal Jasmin' via Programming <[email protected]> wrote: > So then you can never use ;. either :P > > At the beggining of the day's thread, I linked to a machine that parses > delimited data by also handling an escape character to embed the escape and > delimiter. > > > 2 of the 3 suggestions I made were entirely about speeding up that operation. > > The empty field for emit when j=-1 is one. The other is EmitPause: The next > StartWord/Emitword operation group will append to last (paused) word. > > > ________________________________ > From: Raul Miller <[email protected]> > To: Programming forum <[email protected]> > Sent: Friday, April 7, 2017 3:20 PM > Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) > > > > CSV has a lot of issues (consider, for example, quoting and escaping > quotes), and if you wanted ;: to handle csv you would probably think > about adding new operations to deal with the complexity. > > Thanks, > > -- > Raul > > > On Fri, Apr 7, 2017 at 3:03 PM, 'Pascal Jasmin' via Programming > <[email protected]> wrote: >> skipping a character is done "all the time", by resetting j. It can only >> skip over the begining of word characters. >> >>>it's hard to imagine a case where empty tokens are meaningful and >> >> useful. >> >> csv or other delimited data: >> >> a,,b,c has 4 fields. one empty. >> >> there'd be no change to the state machine tokeninizing J language. You are >> currently not allowed to emit empty. The change would not force you to >> start doing so. >> >>>(from a comprehensibility point of view) to >> >> be using <;._1 or <;._2 for that >> >> its slower. and the equivalent sj matrix to <;._1 is a single "row". >> ________________________________ >> From: Raul Miller <[email protected]> >> To: Programming forum <[email protected]> >> Sent: Friday, April 7, 2017 1:49 PM >> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) >> >> >> >> I am not sure that empty words make sense with ;: >> >> Each state transition is a character. So, to achieve "empty box" you >> would need to be skipping a character. >> >> (So, for example, right now, you could dedicate a character to be a >> placeholder and then remove all instances of that character from all >> boxes.) >> >> Anyways, the ;: handles the 'tokenizer' roll for the language, and >> it's hard to imagine a case where empty tokens are meaningful and >> useful. >> >> Presumably you do want the empties for some reason, but I am thinking >> it would make more sense (from a comprehensibility point of view) to >> be using <;._1 or <;._2 for that. >> >> Thanks, >> >> -- >> Raul >> >> >> >> >> On Fri, Apr 7, 2017 at 1:25 PM, 'Pascal Jasmin' via Programming >> <[email protected]> wrote: >>> So right now, ew when j=-1 is a syntax error. And, also currently, you can >>> never emit empty boxes. If for some reason the intent of your machine is >>> to never emit empty boxes, then that output will give you a clue that you >>> did not define it correctly. No current machine would be affected. The >>> speed boost though would be significant compared to the workarounds. You >>> could also check with a: e. result if you want to discard all results with >>> an error. >>> >>> >>> >>> >>> ________________________________ >>> From: Raul Miller <[email protected]> >>> To: Programming forum <[email protected]> >>> Sent: Friday, April 7, 2017 1:12 PM >>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) >>> >>> >>> >>> I'm not sure that eliminating syntax errors to get null words is a good >>> idea. >>> >>> -- >>> Raul >>> >>> >>> On Fri, Apr 7, 2017 at 12:35 PM, 'Pascal Jasmin' via Programming >>> <[email protected]> wrote: >>>> getting back to the idea of storing symbols by 3!:1 as delimited strings. >>>> This would both be an improvement in storage, and eliminate the error >>>> prone dependence on 10 s: >>>> >>>> I've got the esc (uses ;:) method in the latest jpp ( >>>> https://github.com/Pascal-J/jpp ) to get 2MB/s throughput using 2 passes, >>>> and an escape code to handle both embeded escapes and nulls, and null >>>> delimited data/symbols. >>>> >>>> Several improvements to ;: would make this significantly faster and more >>>> flexible: >>>> >>>> emit null when j=-1, and emitword issued (previously suggested): This >>>> allows null fields to easily be "parsed" (current method used is to use >>>> function code 2, and examine gaps in order to add nulls as a 2nd pass. >>>> function code 2 is slower than 0, and overhead in calculating gaps, and >>>> inserting nulls) >>>> >>>> Add an action code that suspends/pauses current word. Next start word >>>> will append to current word, skipping any characters that were scanned >>>> during pause. This would allow "deleting" items in the middle of a word >>>> in a single pass instead of using the 2 pass approach (with 2nd pass using >>>> function code 1). Alternatively, it could function like ev, but if ew is >>>> in same state, it discards the elements between startword's. >>>> >>>> >>>> A custom action code (one interpretation of Henry's inclination, though he >>>> may have thought of custom function codes) that has a way of inserting a >>>> character. This would allow building an escaped sequence by inserting the >>>> escape character prior to last seen. >>>> >>>> Custom action codes would need to return characters to include (if it is >>>> not an ew,ev class), newi, newj at least. A new function code would be a >>>> variation on 2, emit i (i-j), actioncode, though "characters to include" >>>> would interact direction with function codes 0 and 1. >>>> >>>> >>>> A powerful tool for nested structures (see parenw machine in fsm.ijs that >>>> builds trees from parentheses groups) would be an emitwordandIncreaseDepth >>>> and emitwordandDecreaseDepth actions. So, as part of the return >>>> parameters for custom actions would be a code for the action: (noword, >>>> word, WordincreaseDepth, WordDecreaseDepth, vector) >>>> >>>> >>>> >>>> >>>> ________________________________ >>>> From: 'Pascal Jasmin' via Programming <[email protected]> >>>> To: "[email protected]" <[email protected]> >>>> Sent: Sunday, March 19, 2017 12:38 PM >>>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) >>>> >>>> >>>> >>>> idea for double nullchars doesn't work as there's no way to know if a null >>>> is embedded at the end of one "string" or beginning of next string. >>>> Though null followed by a code of the number of consecutive nulls would >>>> work. If there are 255 nulls, the code 255 0 would be used. 510 >>>> consecutive nulls 255 255 0... >>>> >>>> >>>> >>>> >>>> ________________________________ >>>> From: 'Pascal Jasmin' via Programming <[email protected]> >>>> To: "[email protected]" <[email protected]> >>>> Sent: Sunday, March 19, 2017 11:33 AM >>>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) >>>> >>>> >>>> >>>> >>>> >>>> Assuming that this comes with some improvement for s: then it would be >>>> easy to favour that improvement. >>>> >>>> things not to like about a global symbol table is that every typo is >>>> included, and any "app"/set that is loaded joins that table. AFAIU, >>>> Corruption happens if you create symbols, and then restore a table with >>>> 10&s:, and so any application that relies on 10&s: can crash another >>>> previously "loaded application" >>>> >>>> >>>> A problem is that 3!:1, or 3!:3 anyway, seems to just store indexes for >>>> symbols, which relies on 10 s: for actual persistence. >>>> >>>> A suggestion for 3!:1 of symbols would be to scan the array containing >>>> symbols for null (\0), then store 2&s: if not included, or 5&s: if there >>>> is a \0. AFAIU, utf8 is safe to not include 0 as an extended byte. >>>> >>>> An alternative to 5&s: would be a new 8&s: where "data nulls" are encodes >>>> similar to embedded ' in strings. double nullchars encode a data >>>> nullchar. single nullchar encodes terminating 2&s: nullchar. This format >>>> c/would be used for 3!:1. 2&s: could be modified to be the 8&s: proposal. >>>> >>>> 10&s: could store in this new format for portability. But the problem of >>>> previously assigned symbols in session persists, and so a locale level >>>> symbol table would make the most sense for robustness. Also, an >>>> "application"/locale that just uses `true`false symbols (bad example but >>>> replace with small set of enums), would (presumably) be faster if it >>>> didn't share a symbol table with a very large symbol array principally >>>> used to avoid string fills. >>>> >>>> >>>> A question about symbols/3!:1... the documentation suggests that indexes >>>> are limited to 32bit values. Is that true for j64 too? Query (new) and >>>> query (old) is not completely clear in documentation either, and does that >>>> differ from i. or e. ? >>>> >>>> >>>> >>>> ________________________________ >>>> From: Henry Rich <[email protected]> >>>> To: Programming forum <[email protected]> >>>> Sent: Sunday, March 19, 2017 12:14 AM >>>> Subject: [Jprogramming] Show cause hearing - (10 s: y) >>>> >>>> >>>> >>>> Does anyone use (10 s: y)? >>>> >>>> >>>> It is problematic in that the hash table (0 s: 4) may depend on the CPU >>>> >>>> and the J release level. >>>> >>>> >>>> I would rather decommit (10 s: y) and have the user reload the symbol >>>> >>>> table de novo. Any objections? >>>> >>>> >>>> Henry Rich >>>> >>>> ---------------------------------------------------------------------- >>>> >>>> For information about J forums see http://www.jsoftware.com/forums.htm > >> >>> >>>> >>>> >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
