Thanks for the replies. It's going to take a while to take all this in. Regards, Jon
> Date: Tue, 10 Dec 2013 11:38:30 -0500 > From: [email protected] > To: [email protected] > Subject: Re: [Jprogramming] Beginner Understanding CSV file reading/writing > > You may also want to look at this: > http://www.jsoftware.com/jwiki/NYCJUG/2012-12-11#Example_of_Free-Form_Text_Wrangling. > > > On Tue, Dec 10, 2013 at 11:34 AM, Devon McCormick <[email protected]>wrote: > > > Just to gild the lily, one of our NYCJUG members implemented CSV parsing > > using J's finite-state machine primitives: > > http://www.jsoftware.com/jwiki/NYCJUG/2013-06-11?action=AttachFile&do=view&target=Parsing+CSV+Files+with+a+Finite+State+Machine.pdf. > > > > > > On Tue, Dec 10, 2013 at 9:35 AM, Joe Bogner <[email protected]> wrote: > > > >> Just to expand on Devon's post, I often use a combination of cut and each > >> to split up a string > >> > >> This will do the same (with a few more steps behind the scenes) > >> > >> > ',' cut each LF cut ('1,2,"embedded comma",3.4',CR, LF,'5,6,"no > >> comma",7.8',CR, LF) -. CR > >> > >> as > >> > >> <;._1&>',',&.><;._2 CR-.~('1,2,"embedded comma",3.4',CR,LF,'5,6,"no > >> comma",7.8',CR,LF) > >> > >> Jon, in case it helps to break it down: > >> > >> [Split on comma] [each] [Split on LF] [Remove CR] ('1,2,"embedded > >> comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF) > >> > >> > >> Step 1 - Remove the extra CR > >> > >> CR-. removes extra carriage returns from the string. They are unnecessary > >> since we are splitting on LF > >> > >> You can accomplish the same by doing this: > >> > >> ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF) -. CR > >> > >> As Brian mentioned, the tilde just reverses the arguments. > >> > >> CR -.~ ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF) > >> > >> Step 2 - Split on the last character, which is now LF > >> > >> http://www.jsoftware.com/jwiki/Vocabulary/semidot > >> > >> <;._2 will split on the last character of the string and drop it > >> > >> <;._2 ('A',LF,'B',LF,'C',LF) > >> ┌─┬─┬─┐ > >> │A│B│C│ > >> └─┴─┴─┘ > >> > >> If you check out the definition of 'cut' you will see it has this same > >> operation > >> > >> Step 3 - Split on comma for each item > >> > >> In Step 2 - we created a boxed array of strings for each LF. We now need > >> to > >> operate on each box and split based on comma > >> > >> The 'each' adverb will do this, which is what Devon has as "&.>" > >> > >> [Split on comma] is <;._1&>',' , > >> > >> You can see it in action here: > >> > >> <;._1&>',' , each ('a,b';'c,d') > >> ┌─┬─┐ > >> │a│b│ > >> ├─┼─┤ > >> │c│d│ > >> └─┴─┘ > >> > >> The trick here is to use the cut conjunction to split on commas. The split > >> conjunction either uses the first or the last item in the array to split. > >> A > >> CSV file won't have the comma at the beginning or the end, so we need to > >> first add a comma at the beginning of each boxed array so we can tell cut > >> to split on it > >> > >> That is what &>',' is doing. It's adding a comma at the beginning of each > >> item > >> > >> ',' ,&.> ('a,b';'c,d') > >> ┌────┬────┐ > >> │,a,b│,c,d│ > >> └────┴────┘ > >> > >> ',' , each ('a,b';'c,d') > >> > >> ┌────┬────┐ > >> │,a,b│,c,d│ > >> └────┴────┘ > >> > >> > >> Now that each boxed string starts with a comma, we can cut on the first > >> character and drop it > >> > >> <;._1 &> ',' , each ('a,b';'c,d') > >> > >> > >> Back to the beginning: > >> > >> <;._1 &> ',' , each <;._2 ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no > >> comma",7.8',CR,LF) > >> > >> Split on comma - for each item - in a LF split string > >> > >> ┌─┬─┬────────────────┬────┐ > >> │1│2│"embedded comma"│3.4 │ > >> ├─┼─┼────────────────┼────┤ > >> │5│6│"no comma" │7.8 │ > >> └─┴─┴────────────────┴────┘ > >> > >> > >> Hope that helps. I learned more by going through it and wanted to share > >> > >> On Sat, Dec 7, 2013 at 5:44 PM, Devon McCormick <[email protected]> > >> wrote: > >> > >> > Yes - sorry for typing it in w/o testing it. Note that the point at > >> which > >> > the error was picked up is indicated by extra spaces in the returned > >> line: > >> > mat=.<.; _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv' > >> > |domain error > >> > | mat=.<.; _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv' > >> > > >> > A good way to to debug a line like this is to look at successively > >> longer > >> > pieces, starting w/the rightmost one, e.g. (on my system): > >> > jpath '~temp/test.csv' > >> > c:/users/devonmcc/j64-701-user/temp/test.csv > >> > > >> > Do I have this file? > >> > fexist jpath '~temp/test.csv' > >> > 0 > >> > > >> > So, I don't have this file - I only used it to mimic the example you > >> sent. > >> > If I create this file locally so I can continue looking at longer > >> pieces: > >> > ('1,2,"embedded, comma",3.4',CR,LF,'5,6,"no comma",7.8') fwrite > >> > 'test.csv' > >> > 45 > >> > fexist 'test.csv' > >> > 1 > >> > > >> > BTW - "fexist" is defined > >> > fexist=: 1:@(1!:4) ::0:@(([: < 8 u: >) ::]&>)@(<^:(L. = 0:)) > >> > in case you don't have it. > >> > > >> > Continuing with longer fragments shows us what the data looks like at > >> each > >> > step: > >> > NB. mat=. <.;_1&>',',&.><;._2 CR-.~freads 'test.csv' > >> > freads 'test.csv' > >> > 1,2,"embedded, comma",3.4 > >> > 5,6,"no comma",7.8 > >> > > >> > CR-.~freads 'test.csv' > >> > 1,2,"embedded, comma",3.4 > >> > 5,6,"no comma",7.8 > >> > > >> > <;._2 CR-.~freads 'test.csv' > >> > +-------------------------+------------------+ > >> > |1,2,"embedded, comma",3.4|5,6,"no comma",7.8| > >> > +-------------------------+------------------+ > >> > ',',&.><;._2 CR-.~freads 'test.csv' > >> > +--------------------------+-------------------+ > >> > |,1,2,"embedded, comma",3.4|,5,6,"no comma",7.8| > >> > +--------------------------+-------------------+ > >> > <.;_1&>',',&.><;._2 CR-.~freads 'test.csv' > >> > |domain error > >> > | <.; _1&>',',&.><;._2 CR-.~freads'test.csv' > >> > > >> > Fixing the error: > >> > <;._1&>',',&.><;._2 CR-.~freads 'test.csv' > >> > +-+-+----------+-------+---+ > >> > |1|2|"embedded | comma"|3.4| > >> > +-+-+----------+-------+---+ > >> > |5|6|"no comma"|7.8 | | > >> > +-+-+----------+-------+---+ > >> > > >> > > >> > > >> > > >> > > >> > > >> > On Sat, Dec 7, 2013 at 10:27 AM, Brian Schott <[email protected] > >> > >wrote: > >> > > >> > > It looks like there is a typo in command with `mat`: .; should be ;. > >> . > >> > > 'mat` is not a verb but a noun, btw. > >> > > I think tilde is a dyadic tilde, not monadic and swaps the arguments > >> of > >> > -. > >> > > in this case. > >> > > > >> > > On Sat, Dec 7, 2013 at 9:08 AM, Jon Hough <[email protected]> > >> wrote: > >> > > > >> > > > I'd like to thank everyone for replying. > >> > > > I suppose I should think about using J7. > >> > > > > >> > > > I did try Devon's example: > >> > > > "You can read CSV files in J pretty simply without using any > >> predefined > >> > > > verbs like this: > >> > > > > >> > > > mat=. <.;_1&>',',&.><;._2 CR-.~freads jpath '~temp/test.csv' > >> > > > > >> > > > and I got the error: > >> > > > |domain error > >> > > > | mat=.<.; _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv' > >> > > > > >> > > > As an aside, I don't really understand what the "mat" function is > >> > doing. > >> > > > I'm still reading > >> > > > "J for C Programmers" so my understanding is a little shaky, but mat > >> > > seems > >> > > > to be monadic, with the argument as the file to read. I'm not sure > >> if > >> > > this > >> > > > is an example of a tacit verb, because the argument > >> ('~temp/test.csv') > >> > > > seems to be hardcoded into the verb. > >> > > > > >> > > > I assume: > >> > > > freads jpath '~temp/test.csv' > >> > > > reads the file.(http://www.jsoftware.com/user/script_files.htm) > >> > > > I do not really understand this: ~freads (I do not understand this > >> use > >> > of > >> > > > the monadic tilde) > >> > > > I am trying to read this verb from right to left, but am not getting > >> > very > >> > > > far, even using the J dictionary and reference card for support. > >> > > > I would really appreciate any help at all in deciphering this. > >> > > > > >> > > > Thanks and regards, > >> > > > Jon > >> > > > > >> > > > > >> > > -- > >> > > (B=) <-----my sig > >> > > Brian Schott > >> > > ---------------------------------------------------------------------- > >> > > For information about J forums see > >> http://www.jsoftware.com/forums.htm > >> > > > >> > > >> > > >> > > >> > -- > >> > Devon McCormick, CFA > >> > ---------------------------------------------------------------------- > >> > For information about J forums see http://www.jsoftware.com/forums.htm > >> > > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > >> > > > > > > > > -- > > Devon McCormick, CFA > > > > > > > -- > Devon McCormick, CFA > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
