Re: [Jprogramming] sequential machine and empty word output

Raul Miller Mon, 24 Apr 2023 09:09:11 -0700

Parsing csv seems like the motivation here.

If so, it would also be good to have a more complete test suite.


In particular, csv double quote handling --
https://stackoverflow.com/questions/66096193/having-multiple-double-quotes-inside-quoted-string-csv-file
for example -- means that your opcode 9 deserves some careful thought.

(Traditionally, cleanup of the double quotes would happen in a
separate step, after ;: had completed breaking out the words. Here, I
think you mean something slightly different for opcode 9. Instead of
ev (which would include all text from the end of the previous word),
you might have been thinking of some different concept which would
skip over one of the double quote characters?)

(I haven't emulated a machine implementation here, there's enough
detail involved that I would much rather look at a working demo and
how it handles test cases.)

Thanks,

--
Raul

On Mon, Apr 24, 2023 at 5:34 AM Danil Osipchuk <danil.osipc...@gmail.com> wrote:
>
> I wonder if I'm the only one bothered by semicolon's assertion of strictly
> i>j.
>
> Generally, empty words can be used as markers to impose some additional
> regularity on the output, to make it easier to process later.
>
> An obvious example would be parsing a csv file with 3 fields per record
> where any can be empty:
> ,,
> ,1st field, Is empty
> Full record, 3, "Hello, world"
>
> It is natural to parse it into empty strings where appropriate, but i>j
> gets into a way.
>
>
> Letting i>:j in and adding 3( for the sake of completeness) new opcodes
> like below seems to be increasing SM's usefulness considerably in mostly
> backwards compatible way. What do others think?
>
> 8    j=.i+1
> 9    j=.i+1  [ ew(i,j,r,c)
> 10   j=.i+1  [ ev(i,j,r,c)
>
> NB. Rows: 0: Waiting for terminating comma, 1: Inside of quotes
> NB. Columns: 0: comma, 1: double quotes, 2: other
>
>    <"1 (2 3 2$ 0 9 1 1 0 0   1 0 0 0 1 0)
> +---+---+---+
> |0 9|1 1|0 0|
> +---+---+---+
> |1 0|0 0|1 0|
> +---+---+---+
>    csv =: (0;(2 3 2$ 0 9 1 1 0 0   1 0 0 0 1 0   );(',';'"');0 0 0 _1 ) &
> ;:
>    csv ',,'
> ++++
> ||||
> ++++
>    csv ',1st field, Is empty'
> ++---------+---------+
> ||1st field| Is empty|
> ++---------+---------+
>    csv 'Full record, 3, "Hello, world"'
> +-----------+--+--------------+
> |Full record| 3|"Hello, world"|
> +-----------+--+--------------+
>
> ====
>
> dlab:~/Sources/jsource-master/jsrc$ diff w.c w.c.orig
> 251c251
> < #define CHKJ(j)             ASSERT(BETWEENC((j),0,i),EVINDEX);
> ---
> > #define CHKJ(j)             ASSERT(BETWEENO((j),0,i),EVINDEX);
> 272,274d271
> <   case 8:         j=i+1; break;
>          \
> <   case 9:         if(0<=vi){EMIT(T,vj,vi,vr,vc); vi=vr=-1;}
> EMIT(T,j,i,r,c);        j=i+1; break;  \
> <   case 10:        if(r!=vr){if(0<=vi)EMIT(T,vj,vi,vr,vc); vj=j; vr=r;
> vc=c;} vi=i;  j=i+1; break;  \
> 339c336
> <  v=sv; DQ(p*q, k=*v++; e=*v++;
> ASSERT((UI)k<(UI)p&&(UI)e<=(UI)10,EVINDEX););
> ---
> >  v=sv; DQ(p*q, k=*v++; e=*v++;
> ASSERT((UI)k<(UI)p&&(UI)e<=(UI)7,EVINDEX););
> 346c343
> <   if(2<=n){ijrd[1]=j=*v++; ASSERT(BETWEENC(j, -1, i),EVINDEX);}
> ---
> >   if(2<=n){ijrd[1]=j=*v++; ASSERT(BETWEENO(j, -1, i),EVINDEX);}
>
>
> regards,
>  Danil
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] sequential machine and empty word output

Reply via email to