Re: [Jprogramming] Beginner Understanding CSV file reading/writing

Joe Bogner Tue, 10 Dec 2013 06:36:26 -0800

Just to expand on Devon's post, I often use a combination of cut and each
to split up a string


This will do the same  (with a few more steps behind the scenes)

> ',' cut each LF cut ('1,2,"embedded comma",3.4',CR, LF,'5,6,"no
comma",7.8',CR, LF) -. CR

as

<;._1&>',',&.><;._2 CR-.~('1,2,"embedded comma",3.4',CR,LF,'5,6,"no
comma",7.8',CR,LF)

Jon, in case it helps to break it down:

[Split on comma] [each] [Split on LF] [Remove CR] ('1,2,"embedded
comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF)


Step 1 - Remove the extra CR

CR-. removes extra carriage returns from the string. They are unnecessary
since we are splitting on LF

You can accomplish the same by doing this:

('1,2,"embedded comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF) -. CR

As Brian mentioned, the tilde just reverses the arguments.

CR -.~ ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF)

Step 2 - Split on the last character, which is now LF

http://www.jsoftware.com/jwiki/Vocabulary/semidot

<;._2 will split on the last character of the string and drop it

<;._2 ('A',LF,'B',LF,'C',LF)
┌─┬─┬─┐
│A│B│C│
└─┴─┴─┘

If you check out the definition of 'cut' you will see it has this same
operation

Step 3 - Split on comma for each item

In Step 2 - we created a boxed array of strings for each LF. We now need to
operate on each box and split based on comma

The 'each' adverb will do this, which is what Devon has as "&.>"

[Split on comma] is <;._1&>',' ,

You can see it in action here:

   <;._1&>',' , each ('a,b';'c,d')
┌─┬─┐
│a│b│
├─┼─┤
│c│d│
└─┴─┘

The trick here is to use the cut conjunction to split on commas. The split
conjunction either uses the first or the last item in the array to split. A
CSV file won't have the comma at the beginning or the end, so we need to
first add a comma at the beginning of each boxed array so we can tell cut
to split on it

That is what &>',' is doing. It's adding a comma at the beginning of each
item

 ',' ,&.> ('a,b';'c,d')
┌────┬────┐
│,a,b│,c,d│
└────┴────┘

',' , each ('a,b';'c,d')

┌────┬────┐
│,a,b│,c,d│
└────┴────┘


Now that each boxed string starts with a comma, we can cut on the first
character and drop it

<;._1 &> ',' , each ('a,b';'c,d')


Back to the beginning:

   <;._1 &> ',' , each <;._2 ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no
comma",7.8',CR,LF)

Split on comma - for each item - in a LF split string

┌─┬─┬────────────────┬────┐
│1│2│"embedded comma"│3.4 │
├─┼─┼────────────────┼────┤
│5│6│"no comma"      │7.8 │
└─┴─┴────────────────┴────┘


Hope that helps. I learned more by going through it and wanted to share

On Sat, Dec 7, 2013 at 5:44 PM, Devon McCormick <[email protected]> wrote:

> Yes - sorry for typing it in w/o testing it.  Note that the point at which
> the error was picked up is indicated by extra spaces in the returned line:
>    mat=.<.; _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv'
> |domain error
> |   mat=.<.;    _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv'
>
> A good way to to debug a line like this is to look at successively longer
> pieces, starting w/the rightmost one, e.g. (on my system):
>    jpath '~temp/test.csv'
> c:/users/devonmcc/j64-701-user/temp/test.csv
>
> Do I have this file?
>    fexist jpath '~temp/test.csv'
> 0
>
> So, I don't have this file - I only used it to mimic the example you sent.
> If I create this file locally so I can continue looking at longer pieces:
>    ('1,2,"embedded, comma",3.4',CR,LF,'5,6,"no comma",7.8') fwrite
> 'test.csv'
> 45
>    fexist 'test.csv'
> 1
>
> BTW - "fexist" is defined
>    fexist=: 1:@(1!:4) ::0:@(([: < 8 u: >) ::]&>)@(<^:(L. = 0:))
> in case you don't have it.
>
> Continuing with longer fragments shows us what the data looks like at each
> step:
>    NB. mat=. <.;_1&>',',&.><;._2 CR-.~freads 'test.csv'
>    freads 'test.csv'
> 1,2,"embedded, comma",3.4
> 5,6,"no comma",7.8
>
>    CR-.~freads 'test.csv'
> 1,2,"embedded, comma",3.4
> 5,6,"no comma",7.8
>
>    <;._2 CR-.~freads 'test.csv'
> +-------------------------+------------------+
> |1,2,"embedded, comma",3.4|5,6,"no comma",7.8|
> +-------------------------+------------------+
>    ',',&.><;._2 CR-.~freads 'test.csv'
> +--------------------------+-------------------+
> |,1,2,"embedded, comma",3.4|,5,6,"no comma",7.8|
> +--------------------------+-------------------+
>    <.;_1&>',',&.><;._2 CR-.~freads 'test.csv'
> |domain error
> |   <.;    _1&>',',&.><;._2 CR-.~freads'test.csv'
>
> Fixing the error:
>    <;._1&>',',&.><;._2 CR-.~freads 'test.csv'
> +-+-+----------+-------+---+
> |1|2|"embedded | comma"|3.4|
> +-+-+----------+-------+---+
> |5|6|"no comma"|7.8    |   |
> +-+-+----------+-------+---+
>
>
>
>
>
>
> On Sat, Dec 7, 2013 at 10:27 AM, Brian Schott <[email protected]
> >wrote:
>
> > It looks like there is a typo in command with `mat`: .;  should be ;.  .
> > 'mat` is not a verb but a noun, btw.
> > I think tilde is a dyadic tilde, not monadic and swaps the arguments of
> -.
> > in this case.
> >
> > On Sat, Dec 7, 2013 at 9:08 AM, Jon Hough <[email protected]> wrote:
> >
> > > I'd like to thank everyone for replying.
> > > I suppose I should think about using J7.
> > >
> > > I did try Devon's example:
> > > "You can read CSV files in J pretty simply without using any predefined
> > >  verbs like this:
> > >
> > >  mat=. <.;_1&>',',&.><;._2 CR-.~freads jpath '~temp/test.csv'
> > >
> > > and I got the error:
> > > |domain error
> > > |   mat=.<.;    _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv'
> > >
> > > As an aside, I don't really understand what the "mat" function is
> doing.
> > > I'm still reading
> > > "J for C Programmers" so my understanding is a little shaky, but mat
> > seems
> > > to be monadic, with the argument as the file to read. I'm not sure if
> > this
> > > is an example of a tacit verb, because the argument ('~temp/test.csv')
> > > seems to be hardcoded into the verb.
> > >
> > > I assume:
> > > freads jpath '~temp/test.csv'
> > > reads the file.(http://www.jsoftware.com/user/script_files.htm)
> > > I do not really understand this: ~freads (I do not understand this use
> of
> > > the monadic tilde)
> > > I am trying to read this verb from right to left, but am not getting
> very
> > > far, even using the J dictionary and reference card for support.
> > > I would really appreciate any help at all in deciphering this.
> > >
> > > Thanks and regards,
> > > Jon
> > >
> > >
> > --
> > (B=) <-----my sig
> > Brian Schott
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
>
>
>
> --
> Devon McCormick, CFA
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Beginner Understanding CSV file reading/writing

Reply via email to