Yes.  I have a working implentation that takes 2 passes of different machines.  
The first machine uses function code 2, and then scans for gaps in order to add 
empty cells (arguably 2+ passes just for this pass).  2nd machine uses function 
code 1 on each non empty box which can skip arbitrary characters.




________________________________
From: Raul Miller <[email protected]>
To: Programming forum <[email protected]> 
Sent: Friday, April 7, 2017 6:34 PM
Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)



I'd probably use ;: for csv parsing, but I'd leave all characters in
place (including any leading comma - and I'd be sure to prefix each
line with a comma) and then post-process the contents of each box with
a second ;: (which I would then raze).

But that's just me.

Thanks,

-- 
Raul






On Fri, Apr 7, 2017 at 6:26 PM, 'Pascal Jasmin' via Programming
<[email protected]> wrote:
> So then you can never use ;. either :P
>
> At the beggining of the day's thread, I linked to a machine that parses 
> delimited data by also handling an escape character to embed the escape and 
> delimiter.
>
>
> 2 of the 3 suggestions I made were entirely about speeding up that operation.
>
> The empty field for emit when j=-1 is one.  The other is EmitPause:  The next 
> StartWord/Emitword operation group will append to last (paused) word.
>
>
> ________________________________
> From: Raul Miller <[email protected]>
> To: Programming forum <[email protected]>
> Sent: Friday, April 7, 2017 3:20 PM
> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>
>
>
> CSV has a lot of issues (consider, for example, quoting and escaping
> quotes), and if you wanted ;: to handle csv you would probably think
> about adding new operations to deal with the complexity.
>
> Thanks,
>
> --
> Raul
>
>
> On Fri, Apr 7, 2017 at 3:03 PM, 'Pascal Jasmin' via Programming
> <[email protected]> wrote:
>> skipping a character is done "all the time", by resetting j.  It can only 
>> skip over the begining of word characters.
>>
>>>it's hard to imagine a case where empty tokens are meaningful and
>>
>> useful.
>>
>> csv or other delimited data:
>>
>> a,,b,c has 4 fields.  one empty.
>>
>> there'd be no change to the state machine tokeninizing J language.  You are 
>> currently not allowed to emit empty.  The change would not force you to 
>> start doing so.
>>
>>>(from a comprehensibility point of view) to
>>
>> be using <;._1 or <;._2 for that
>>
>> its slower.   and the equivalent sj matrix to <;._1 is a single "row".
>> ________________________________
>> From: Raul Miller <[email protected]>
>> To: Programming forum <[email protected]>
>> Sent: Friday, April 7, 2017 1:49 PM
>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>>
>>
>>
>> I am not sure that empty words make sense with ;:
>>
>> Each state transition is a character. So, to achieve "empty box" you
>> would need to be skipping a character.
>>
>> (So, for example, right now, you could dedicate a character to be a
>> placeholder and then remove all instances of that character from all
>> boxes.)
>>
>> Anyways, the ;: handles the 'tokenizer' roll for the language, and
>> it's hard to imagine a case where empty tokens are meaningful and
>> useful.
>>
>> Presumably you do want the empties for some reason, but I am thinking
>> it would make more sense (from a comprehensibility point of view) to
>> be using <;._1 or <;._2 for that.
>>
>> Thanks,
>>
>> --
>> Raul
>>
>>
>>
>>
>> On Fri, Apr 7, 2017 at 1:25 PM, 'Pascal Jasmin' via Programming
>> <[email protected]> wrote:
>>> So right now, ew when j=-1 is a syntax error.  And, also currently, you can 
>>> never emit empty boxes.  If for some reason the intent of your machine is 
>>> to never emit empty boxes, then that output will give you a clue that you 
>>> did not define it correctly.  No current machine would be affected.  The 
>>> speed boost though would be significant compared to the workarounds.  You 
>>> could also check with a: e. result if you want to discard all results with 
>>> an error.
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Raul Miller <[email protected]>
>>> To: Programming forum <[email protected]>
>>> Sent: Friday, April 7, 2017 1:12 PM
>>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>>>
>>>
>>>
>>> I'm not sure that eliminating syntax errors to get null words is a good 
>>> idea.
>>>
>>> --
>>> Raul
>>>
>>>
>>> On Fri, Apr 7, 2017 at 12:35 PM, 'Pascal Jasmin' via Programming
>>> <[email protected]> wrote:
>>>> getting back to the idea of storing symbols by 3!:1 as delimited strings.  
>>>> This would both be an improvement in storage, and eliminate the error 
>>>> prone dependence on 10 s:
>>>>
>>>> I've got the esc (uses ;:) method in the latest jpp ( 
>>>> https://github.com/Pascal-J/jpp ) to get 2MB/s throughput using 2 passes, 
>>>> and an escape code to handle both embeded escapes and nulls, and null 
>>>> delimited data/symbols.
>>>>
>>>> Several improvements to ;: would make this significantly faster and more 
>>>> flexible:
>>>>
>>>> emit null when j=-1, and emitword issued (previously suggested):  This 
>>>> allows null fields to easily be "parsed" (current method used is to use 
>>>> function code 2, and examine gaps in order to add nulls as a 2nd pass.  
>>>> function code 2 is slower than 0, and overhead in calculating gaps, and 
>>>> inserting nulls)
>>>>
>>>> Add an action code that suspends/pauses current word.  Next start word 
>>>> will append to current word, skipping any characters that were scanned 
>>>> during pause.  This would allow "deleting" items in the middle of a word 
>>>> in a single pass instead of using the 2 pass approach (with 2nd pass using 
>>>> function code 1).  Alternatively, it could function like ev, but if ew is 
>>>> in same state, it discards the elements between startword's.
>>>>
>>>>
>>>> A custom action code (one interpretation of Henry's inclination, though he 
>>>> may have thought of custom function codes) that has a way of inserting a 
>>>> character.  This would allow building an escaped sequence by inserting the 
>>>> escape character prior to last seen.
>>>>
>>>> Custom action codes would need to return characters to include (if it is 
>>>> not an ew,ev class), newi, newj at least.  A new function code would be a 
>>>> variation on 2, emit i (i-j), actioncode, though "characters to include" 
>>>> would interact direction with function codes 0 and 1.
>>>>
>>>>
>>>> A powerful tool for nested structures (see parenw machine in fsm.ijs that 
>>>> builds trees from parentheses groups) would be an emitwordandIncreaseDepth 
>>>> and emitwordandDecreaseDepth actions.  So, as part of the return 
>>>> parameters for custom actions would be a code for the action: (noword, 
>>>> word, WordincreaseDepth, WordDecreaseDepth, vector)
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: 'Pascal Jasmin' via Programming <[email protected]>
>>>> To: "[email protected]" <[email protected]>
>>>> Sent: Sunday, March 19, 2017 12:38 PM
>>>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>>>>
>>>>
>>>>
>>>> idea for double nullchars doesn't work as there's no way to know if a null 
>>>> is embedded at the end of one "string" or beginning of next string.  
>>>> Though null followed by a code of the number of consecutive nulls would 
>>>> work.  If there are 255 nulls, the code 255 0 would be used.  510 
>>>> consecutive nulls 255 255 0...
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: 'Pascal Jasmin' via Programming <[email protected]>
>>>> To: "[email protected]" <[email protected]>
>>>> Sent: Sunday, March 19, 2017 11:33 AM
>>>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Assuming that this comes with some improvement for s: then it would be 
>>>> easy to favour that improvement.
>>>>
>>>> things not to like about a global symbol table is that every typo is 
>>>> included, and any "app"/set that is loaded joins that table.  AFAIU, 
>>>> Corruption happens if you create symbols, and then restore a table with 
>>>> 10&s:, and so any application that relies on 10&s: can crash another 
>>>> previously "loaded application"
>>>>
>>>>
>>>> A problem is that 3!:1, or 3!:3 anyway, seems to just store indexes for 
>>>> symbols, which relies on 10 s: for actual persistence.
>>>>
>>>> A suggestion for 3!:1 of symbols would be to scan the array containing 
>>>> symbols for null (\0), then store 2&s: if not included, or 5&s: if there 
>>>> is a \0.  AFAIU, utf8 is safe to not include 0 as an extended byte.
>>>>
>>>> An alternative to 5&s: would be a new 8&s: where "data nulls" are encodes 
>>>> similar to embedded ' in strings.  double nullchars encode a data 
>>>> nullchar.  single nullchar encodes terminating 2&s: nullchar.  This format 
>>>> c/would be used for 3!:1.  2&s: could be modified to be the 8&s: proposal.
>>>>
>>>> 10&s: could store in this new format for portability.  But the problem of 
>>>> previously assigned symbols in session persists, and so a locale level 
>>>> symbol table would make the most sense for robustness.  Also, an 
>>>> "application"/locale that just uses `true`false symbols (bad example but 
>>>> replace with small set of enums), would (presumably) be faster if it 
>>>> didn't share a symbol table with a very large symbol array principally 
>>>> used to avoid string fills.
>>>>
>>>>
>>>> A question about symbols/3!:1... the documentation suggests that indexes 
>>>> are limited to 32bit values.  Is that true for j64 too?  Query (new) and 
>>>> query (old) is not completely clear in documentation either, and does that 
>>>> differ from i. or e. ?
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Henry Rich <[email protected]>
>>>> To: Programming forum <[email protected]>
>>>> Sent: Sunday, March 19, 2017 12:14 AM
>>>> Subject: [Jprogramming] Show cause hearing - (10 s: y)
>>>>
>>>>
>>>>
>>>> Does anyone use (10 s: y)?
>>>>
>>>>
>>>> It is problematic in that the hash table (0 s: 4) may depend on the CPU
>>>>
>>>> and the J release level.
>>>>
>>>>
>>>> I would rather decommit (10 s: y) and have the user reload the symbol
>>>>
>>>> table de novo.  Any objections?
>>>>
>>>>
>>>> Henry Rich
>>>>
>>>> ----------------------------------------------------------------------
>>>>
>>>> For information about J forums see http://www.jsoftware.com/forums.htm

>
>>
>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to