Re: [Jprogramming] Beginner Understanding CSV file reading/writing

Jon Hough Wed, 11 Dec 2013 08:34:23 -0800

Thanks for the replies. It's going to take a while to take all this in. 

Regards,
Jon


> Date: Tue, 10 Dec 2013 11:38:30 -0500
> From: [email protected]
> To: [email protected]
> Subject: Re: [Jprogramming] Beginner Understanding CSV file reading/writing
> 
> You may also want to look at this:
> http://www.jsoftware.com/jwiki/NYCJUG/2012-12-11#Example_of_Free-Form_Text_Wrangling.
> 
> 
> On Tue, Dec 10, 2013 at 11:34 AM, Devon McCormick <[email protected]>wrote:
> 
> > Just to gild the lily, one of our NYCJUG members implemented CSV parsing
> > using J's finite-state machine primitives:
> > http://www.jsoftware.com/jwiki/NYCJUG/2013-06-11?action=AttachFile&do=view&target=Parsing+CSV+Files+with+a+Finite+State+Machine.pdf.
> >
> >
> > On Tue, Dec 10, 2013 at 9:35 AM, Joe Bogner <[email protected]> wrote:
> >
> >> Just to expand on Devon's post, I often use a combination of cut and each
> >> to split up a string
> >>
> >> This will do the same  (with a few more steps behind the scenes)
> >>
> >> > ',' cut each LF cut ('1,2,"embedded comma",3.4',CR, LF,'5,6,"no
> >> comma",7.8',CR, LF) -. CR
> >>
> >> as
> >>
> >> <;._1&>',',&.><;._2 CR-.~('1,2,"embedded comma",3.4',CR,LF,'5,6,"no
> >> comma",7.8',CR,LF)
> >>
> >> Jon, in case it helps to break it down:
> >>
> >> [Split on comma] [each] [Split on LF] [Remove CR] ('1,2,"embedded
> >> comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF)
> >>
> >>
> >> Step 1 - Remove the extra CR
> >>
> >> CR-. removes extra carriage returns from the string. They are unnecessary
> >> since we are splitting on LF
> >>
> >> You can accomplish the same by doing this:
> >>
> >> ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF) -. CR
> >>
> >> As Brian mentioned, the tilde just reverses the arguments.
> >>
> >> CR -.~ ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF)
> >>
> >> Step 2 - Split on the last character, which is now LF
> >>
> >> http://www.jsoftware.com/jwiki/Vocabulary/semidot
> >>
> >> <;._2 will split on the last character of the string and drop it
> >>
> >> <;._2 ('A',LF,'B',LF,'C',LF)
> >> ┌─┬─┬─┐
> >> │A│B│C│
> >> └─┴─┴─┘
> >>
> >> If you check out the definition of 'cut' you will see it has this same
> >> operation
> >>
> >> Step 3 - Split on comma for each item
> >>
> >> In Step 2 - we created a boxed array of strings for each LF. We now need
> >> to
> >> operate on each box and split based on comma
> >>
> >> The 'each' adverb will do this, which is what Devon has as "&.>"
> >>
> >> [Split on comma] is <;._1&>',' ,
> >>
> >> You can see it in action here:
> >>
> >>    <;._1&>',' , each ('a,b';'c,d')
> >> ┌─┬─┐
> >> │a│b│
> >> ├─┼─┤
> >> │c│d│
> >> └─┴─┘
> >>
> >> The trick here is to use the cut conjunction to split on commas. The split
> >> conjunction either uses the first or the last item in the array to split.
> >> A
> >> CSV file won't have the comma at the beginning or the end, so we need to
> >> first add a comma at the beginning of each boxed array so we can tell cut
> >> to split on it
> >>
> >> That is what &>',' is doing. It's adding a comma at the beginning of each
> >> item
> >>
> >>  ',' ,&.> ('a,b';'c,d')
> >> ┌────┬────┐
> >> │,a,b│,c,d│
> >> └────┴────┘
> >>
> >> ',' , each ('a,b';'c,d')
> >>
> >> ┌────┬────┐
> >> │,a,b│,c,d│
> >> └────┴────┘
> >>
> >>
> >> Now that each boxed string starts with a comma, we can cut on the first
> >> character and drop it
> >>
> >> <;._1 &> ',' , each ('a,b';'c,d')
> >>
> >>
> >> Back to the beginning:
> >>
> >>    <;._1 &> ',' , each <;._2 ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no
> >> comma",7.8',CR,LF)
> >>
> >> Split on comma - for each item - in a LF split string
> >>
> >> ┌─┬─┬────────────────┬────┐
> >> │1│2│"embedded comma"│3.4 │
> >> ├─┼─┼────────────────┼────┤
> >> │5│6│"no comma"      │7.8 │
> >> └─┴─┴────────────────┴────┘
> >>
> >>
> >> Hope that helps. I learned more by going through it and wanted to share
> >>
> >> On Sat, Dec 7, 2013 at 5:44 PM, Devon McCormick <[email protected]>
> >> wrote:
> >>
> >> > Yes - sorry for typing it in w/o testing it.  Note that the point at
> >> which
> >> > the error was picked up is indicated by extra spaces in the returned
> >> line:
> >> >    mat=.<.; _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv'
> >> > |domain error
> >> > |   mat=.<.;    _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv'
> >> >
> >> > A good way to to debug a line like this is to look at successively
> >> longer
> >> > pieces, starting w/the rightmost one, e.g. (on my system):
> >> >    jpath '~temp/test.csv'
> >> > c:/users/devonmcc/j64-701-user/temp/test.csv
> >> >
> >> > Do I have this file?
> >> >    fexist jpath '~temp/test.csv'
> >> > 0
> >> >
> >> > So, I don't have this file - I only used it to mimic the example you
> >> sent.
> >> > If I create this file locally so I can continue looking at longer
> >> pieces:
> >> >    ('1,2,"embedded, comma",3.4',CR,LF,'5,6,"no comma",7.8') fwrite
> >> > 'test.csv'
> >> > 45
> >> >    fexist 'test.csv'
> >> > 1
> >> >
> >> > BTW - "fexist" is defined
> >> >    fexist=: 1:@(1!:4) ::0:@(([: < 8 u: >) ::]&>)@(<^:(L. = 0:))
> >> > in case you don't have it.
> >> >
> >> > Continuing with longer fragments shows us what the data looks like at
> >> each
> >> > step:
> >> >    NB. mat=. <.;_1&>',',&.><;._2 CR-.~freads 'test.csv'
> >> >    freads 'test.csv'
> >> > 1,2,"embedded, comma",3.4
> >> > 5,6,"no comma",7.8
> >> >
> >> >    CR-.~freads 'test.csv'
> >> > 1,2,"embedded, comma",3.4
> >> > 5,6,"no comma",7.8
> >> >
> >> >    <;._2 CR-.~freads 'test.csv'
> >> > +-------------------------+------------------+
> >> > |1,2,"embedded, comma",3.4|5,6,"no comma",7.8|
> >> > +-------------------------+------------------+
> >> >    ',',&.><;._2 CR-.~freads 'test.csv'
> >> > +--------------------------+-------------------+
> >> > |,1,2,"embedded, comma",3.4|,5,6,"no comma",7.8|
> >> > +--------------------------+-------------------+
> >> >    <.;_1&>',',&.><;._2 CR-.~freads 'test.csv'
> >> > |domain error
> >> > |   <.;    _1&>',',&.><;._2 CR-.~freads'test.csv'
> >> >
> >> > Fixing the error:
> >> >    <;._1&>',',&.><;._2 CR-.~freads 'test.csv'
> >> > +-+-+----------+-------+---+
> >> > |1|2|"embedded | comma"|3.4|
> >> > +-+-+----------+-------+---+
> >> > |5|6|"no comma"|7.8    |   |
> >> > +-+-+----------+-------+---+
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Sat, Dec 7, 2013 at 10:27 AM, Brian Schott <[email protected]
> >> > >wrote:
> >> >
> >> > > It looks like there is a typo in command with `mat`: .;  should be ;.
> >>  .
> >> > > 'mat` is not a verb but a noun, btw.
> >> > > I think tilde is a dyadic tilde, not monadic and swaps the arguments
> >> of
> >> > -.
> >> > > in this case.
> >> > >
> >> > > On Sat, Dec 7, 2013 at 9:08 AM, Jon Hough <[email protected]>
> >> wrote:
> >> > >
> >> > > > I'd like to thank everyone for replying.
> >> > > > I suppose I should think about using J7.
> >> > > >
> >> > > > I did try Devon's example:
> >> > > > "You can read CSV files in J pretty simply without using any
> >> predefined
> >> > > >  verbs like this:
> >> > > >
> >> > > >  mat=. <.;_1&>',',&.><;._2 CR-.~freads jpath '~temp/test.csv'
> >> > > >
> >> > > > and I got the error:
> >> > > > |domain error
> >> > > > |   mat=.<.;    _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv'
> >> > > >
> >> > > > As an aside, I don't really understand what the "mat" function is
> >> > doing.
> >> > > > I'm still reading
> >> > > > "J for C Programmers" so my understanding is a little shaky, but mat
> >> > > seems
> >> > > > to be monadic, with the argument as the file to read. I'm not sure
> >> if
> >> > > this
> >> > > > is an example of a tacit verb, because the argument
> >> ('~temp/test.csv')
> >> > > > seems to be hardcoded into the verb.
> >> > > >
> >> > > > I assume:
> >> > > > freads jpath '~temp/test.csv'
> >> > > > reads the file.(http://www.jsoftware.com/user/script_files.htm)
> >> > > > I do not really understand this: ~freads (I do not understand this
> >> use
> >> > of
> >> > > > the monadic tilde)
> >> > > > I am trying to read this verb from right to left, but am not getting
> >> > very
> >> > > > far, even using the J dictionary and reference card for support.
> >> > > > I would really appreciate any help at all in deciphering this.
> >> > > >
> >> > > > Thanks and regards,
> >> > > > Jon
> >> > > >
> >> > > >
> >> > > --
> >> > > (B=) <-----my sig
> >> > > Brian Schott
> >> > > ----------------------------------------------------------------------
> >> > > For information about J forums see
> >> http://www.jsoftware.com/forums.htm
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Devon McCormick, CFA
> >> > ----------------------------------------------------------------------
> >> > For information about J forums see http://www.jsoftware.com/forums.htm
> >> >
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> >
> >
> >
> > --
> > Devon McCormick, CFA
> >
> >
> 
> 
> -- 
> Devon McCormick, CFA
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
                                          
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Beginner Understanding CSV file reading/writing

Reply via email to