You may also want to look at this:
http://www.jsoftware.com/jwiki/NYCJUG/2012-12-11#Example_of_Free-Form_Text_Wrangling.


On Tue, Dec 10, 2013 at 11:34 AM, Devon McCormick <[email protected]>wrote:

> Just to gild the lily, one of our NYCJUG members implemented CSV parsing
> using J's finite-state machine primitives:
> http://www.jsoftware.com/jwiki/NYCJUG/2013-06-11?action=AttachFile&do=view&target=Parsing+CSV+Files+with+a+Finite+State+Machine.pdf.
>
>
> On Tue, Dec 10, 2013 at 9:35 AM, Joe Bogner <[email protected]> wrote:
>
>> Just to expand on Devon's post, I often use a combination of cut and each
>> to split up a string
>>
>> This will do the same  (with a few more steps behind the scenes)
>>
>> > ',' cut each LF cut ('1,2,"embedded comma",3.4',CR, LF,'5,6,"no
>> comma",7.8',CR, LF) -. CR
>>
>> as
>>
>> <;._1&>',',&.><;._2 CR-.~('1,2,"embedded comma",3.4',CR,LF,'5,6,"no
>> comma",7.8',CR,LF)
>>
>> Jon, in case it helps to break it down:
>>
>> [Split on comma] [each] [Split on LF] [Remove CR] ('1,2,"embedded
>> comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF)
>>
>>
>> Step 1 - Remove the extra CR
>>
>> CR-. removes extra carriage returns from the string. They are unnecessary
>> since we are splitting on LF
>>
>> You can accomplish the same by doing this:
>>
>> ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF) -. CR
>>
>> As Brian mentioned, the tilde just reverses the arguments.
>>
>> CR -.~ ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no comma",7.8',CR,LF)
>>
>> Step 2 - Split on the last character, which is now LF
>>
>> http://www.jsoftware.com/jwiki/Vocabulary/semidot
>>
>> <;._2 will split on the last character of the string and drop it
>>
>> <;._2 ('A',LF,'B',LF,'C',LF)
>> ┌─┬─┬─┐
>> │A│B│C│
>> └─┴─┴─┘
>>
>> If you check out the definition of 'cut' you will see it has this same
>> operation
>>
>> Step 3 - Split on comma for each item
>>
>> In Step 2 - we created a boxed array of strings for each LF. We now need
>> to
>> operate on each box and split based on comma
>>
>> The 'each' adverb will do this, which is what Devon has as "&.>"
>>
>> [Split on comma] is <;._1&>',' ,
>>
>> You can see it in action here:
>>
>>    <;._1&>',' , each ('a,b';'c,d')
>> ┌─┬─┐
>> │a│b│
>> ├─┼─┤
>> │c│d│
>> └─┴─┘
>>
>> The trick here is to use the cut conjunction to split on commas. The split
>> conjunction either uses the first or the last item in the array to split.
>> A
>> CSV file won't have the comma at the beginning or the end, so we need to
>> first add a comma at the beginning of each boxed array so we can tell cut
>> to split on it
>>
>> That is what &>',' is doing. It's adding a comma at the beginning of each
>> item
>>
>>  ',' ,&.> ('a,b';'c,d')
>> ┌────┬────┐
>> │,a,b│,c,d│
>> └────┴────┘
>>
>> ',' , each ('a,b';'c,d')
>>
>> ┌────┬────┐
>> │,a,b│,c,d│
>> └────┴────┘
>>
>>
>> Now that each boxed string starts with a comma, we can cut on the first
>> character and drop it
>>
>> <;._1 &> ',' , each ('a,b';'c,d')
>>
>>
>> Back to the beginning:
>>
>>    <;._1 &> ',' , each <;._2 ('1,2,"embedded comma",3.4',CR,LF,'5,6,"no
>> comma",7.8',CR,LF)
>>
>> Split on comma - for each item - in a LF split string
>>
>> ┌─┬─┬────────────────┬────┐
>> │1│2│"embedded comma"│3.4 │
>> ├─┼─┼────────────────┼────┤
>> │5│6│"no comma"      │7.8 │
>> └─┴─┴────────────────┴────┘
>>
>>
>> Hope that helps. I learned more by going through it and wanted to share
>>
>> On Sat, Dec 7, 2013 at 5:44 PM, Devon McCormick <[email protected]>
>> wrote:
>>
>> > Yes - sorry for typing it in w/o testing it.  Note that the point at
>> which
>> > the error was picked up is indicated by extra spaces in the returned
>> line:
>> >    mat=.<.; _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv'
>> > |domain error
>> > |   mat=.<.;    _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv'
>> >
>> > A good way to to debug a line like this is to look at successively
>> longer
>> > pieces, starting w/the rightmost one, e.g. (on my system):
>> >    jpath '~temp/test.csv'
>> > c:/users/devonmcc/j64-701-user/temp/test.csv
>> >
>> > Do I have this file?
>> >    fexist jpath '~temp/test.csv'
>> > 0
>> >
>> > So, I don't have this file - I only used it to mimic the example you
>> sent.
>> > If I create this file locally so I can continue looking at longer
>> pieces:
>> >    ('1,2,"embedded, comma",3.4',CR,LF,'5,6,"no comma",7.8') fwrite
>> > 'test.csv'
>> > 45
>> >    fexist 'test.csv'
>> > 1
>> >
>> > BTW - "fexist" is defined
>> >    fexist=: 1:@(1!:4) ::0:@(([: < 8 u: >) ::]&>)@(<^:(L. = 0:))
>> > in case you don't have it.
>> >
>> > Continuing with longer fragments shows us what the data looks like at
>> each
>> > step:
>> >    NB. mat=. <.;_1&>',',&.><;._2 CR-.~freads 'test.csv'
>> >    freads 'test.csv'
>> > 1,2,"embedded, comma",3.4
>> > 5,6,"no comma",7.8
>> >
>> >    CR-.~freads 'test.csv'
>> > 1,2,"embedded, comma",3.4
>> > 5,6,"no comma",7.8
>> >
>> >    <;._2 CR-.~freads 'test.csv'
>> > +-------------------------+------------------+
>> > |1,2,"embedded, comma",3.4|5,6,"no comma",7.8|
>> > +-------------------------+------------------+
>> >    ',',&.><;._2 CR-.~freads 'test.csv'
>> > +--------------------------+-------------------+
>> > |,1,2,"embedded, comma",3.4|,5,6,"no comma",7.8|
>> > +--------------------------+-------------------+
>> >    <.;_1&>',',&.><;._2 CR-.~freads 'test.csv'
>> > |domain error
>> > |   <.;    _1&>',',&.><;._2 CR-.~freads'test.csv'
>> >
>> > Fixing the error:
>> >    <;._1&>',',&.><;._2 CR-.~freads 'test.csv'
>> > +-+-+----------+-------+---+
>> > |1|2|"embedded | comma"|3.4|
>> > +-+-+----------+-------+---+
>> > |5|6|"no comma"|7.8    |   |
>> > +-+-+----------+-------+---+
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Sat, Dec 7, 2013 at 10:27 AM, Brian Schott <[email protected]
>> > >wrote:
>> >
>> > > It looks like there is a typo in command with `mat`: .;  should be ;.
>>  .
>> > > 'mat` is not a verb but a noun, btw.
>> > > I think tilde is a dyadic tilde, not monadic and swaps the arguments
>> of
>> > -.
>> > > in this case.
>> > >
>> > > On Sat, Dec 7, 2013 at 9:08 AM, Jon Hough <[email protected]>
>> wrote:
>> > >
>> > > > I'd like to thank everyone for replying.
>> > > > I suppose I should think about using J7.
>> > > >
>> > > > I did try Devon's example:
>> > > > "You can read CSV files in J pretty simply without using any
>> predefined
>> > > >  verbs like this:
>> > > >
>> > > >  mat=. <.;_1&>',',&.><;._2 CR-.~freads jpath '~temp/test.csv'
>> > > >
>> > > > and I got the error:
>> > > > |domain error
>> > > > |   mat=.<.;    _1&>',',&.><;._2 CR-.~freads jpath'~temp/test.csv'
>> > > >
>> > > > As an aside, I don't really understand what the "mat" function is
>> > doing.
>> > > > I'm still reading
>> > > > "J for C Programmers" so my understanding is a little shaky, but mat
>> > > seems
>> > > > to be monadic, with the argument as the file to read. I'm not sure
>> if
>> > > this
>> > > > is an example of a tacit verb, because the argument
>> ('~temp/test.csv')
>> > > > seems to be hardcoded into the verb.
>> > > >
>> > > > I assume:
>> > > > freads jpath '~temp/test.csv'
>> > > > reads the file.(http://www.jsoftware.com/user/script_files.htm)
>> > > > I do not really understand this: ~freads (I do not understand this
>> use
>> > of
>> > > > the monadic tilde)
>> > > > I am trying to read this verb from right to left, but am not getting
>> > very
>> > > > far, even using the J dictionary and reference card for support.
>> > > > I would really appreciate any help at all in deciphering this.
>> > > >
>> > > > Thanks and regards,
>> > > > Jon
>> > > >
>> > > >
>> > > --
>> > > (B=) <-----my sig
>> > > Brian Schott
>> > > ----------------------------------------------------------------------
>> > > For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> > >
>> >
>> >
>> >
>> > --
>> > Devon McCormick, CFA
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
>
> --
> Devon McCormick, CFA
>
>


-- 
Devon McCormick, CFA
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to