The three invisible characters at the beginning are https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
FYI, -- Raul On Tue, May 26, 2020 at 10:01 AM 'Rob Hodgkinson' via Programming <[email protected]> wrote: > > I requested the data sets from Harvey, the sp1500 (full) dataset had 3 > invisible chars in front of “Date”, so with a “printable” filter verb these > could be removed. > > Hopefully Harvey’s data will now work. I have not yet heard back… my > findings below. > > Henry and you were right & each {.b would have shown 7 (not 4) in the first > cell. > > > So the header record differs in the 2 datasets, shown here… aaa is the short > data set, bbb is the full data set. > > aa=. ',' readdsv (jpath '\Users\rob\jwork\test1.csv') > a=. removedoublequotes each aa > 3{.a > ┌──────┬─────┬──────┬──────┬──────┬──────┬────────┬──────┐ > │Date │Price│Open │High │Low │Vol. │Change %│ │ > ├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤ > │May 22│ 2020│670.21│668.40│670.41│665.21│- │0.23% │ > ├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤ > │May 21│ 2020│668.66│672.93│675.06│666.13│- │-0.70%│ > └──────┴─────┴──────┴──────┴──────┴──────┴────────┴──────┘ > a.&i. each {.a > ┌─────────────┬─────────────────┬──────────────┬──────────────┬──────────┬─────────────┬───────────────────────────┬┐ > │68 97 116 101│80 114 105 99 101│79 112 101 110│72 105 103 104│76 111 119│86 > 111 108 46│67 104 97 110 103 101 32 37││ > └─────────────┴─────────────────┴──────────────┴──────────────┴──────────┴─────────────┴───────────────────────────┴┘ > > > NB. All looks OK so far … ‘Date’ is ASCII 68 97 116 101 > > a.{~ 68 97 116 101 > Date > > > bb=. ',' readdsv (jpath '\Users\rob\jwork\sp1500.csv') > b=. removedoublequotes each bb > 3{.b > ┌──────┬─────┬──────┬──────┬──────┬──────┬────────┬──────┐ > │Date │Price│Open │High │Low │Vol. │Change %│ │ > ├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤ > │May 22│ 2020│670.21│668.40│670.41│665.21│- │0.23% │ > ├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤ > │May 21│ 2020│668.66│672.93│675.06│666.13│- │-0.70%│ > └──────┴─────┴──────┴──────┴──────┴──────┴────────┴──────┘ > a.&i. each {.b > ┌─────────────────────────┬─────────────────┬──────────────┬──────────────┬──────────┬─────────────┬───────────────────────────┬┐ > │239 187 191 68 97 116 101│80 114 105 99 101│79 112 101 110│72 105 103 104│76 > 111 119│86 111 108 46│67 104 97 110 103 101 32 37││ > └─────────────────────────┴─────────────────┴──────────────┴──────────────┴──────────┴─────────────┴───────────────────────────┴┘ > > > The first cell contains 3 leading unprintable chars, removed with a > ‘printable’ filter function such as: > > printable =: verb define > y #~ (32&<: *.127&>:) a. i. y > ) > > >0{{.b > Date > a. i. >0{{.b > 239 187 191 68 97 116 101 > > printable >0{{.b > Date > a. i. printable >0{{.b > 68 97 116 101 > > This filters out the range correctly, and requires printable prior to calling > toupper. > if. 'DATE' -: toupper printable (> 0 { ({. a)) do. a=. }. a end. > > Harvey should confirm, but I anticipate this is solved. > > …/Rob > > > On 26 May 2020, at 11:49 pm, Raul Miller <[email protected]> wrote: > > > > You did not show the shapes of the data offending label for the > > truncated 10 year case. > > > > Shapes have to match for -: to match. > > > > Shapes can be different because if you have 1 dimension(s), or if you > > have invisible characters. > > > > Take care, > > > > -- > > Raul > > > > On Tue, May 26, 2020 at 4:53 AM HH PackRat <[email protected]> wrote: > >> > >> On 5/25/20, Henry Rich <[email protected]> wrote: > >>> You used {: in the last line. Try it with {. . > >>> > >>> and on 5/25/20, 'robert therriault' via Programming > >>> <[email protected]> wrote: > >>> I noticed in the single line test that you used {: a and not {. a > >>> > >>> and on 5/25/20, bill lam <[email protected]> wrote: > >>> ... Maybe there are some typo such as {: instead of {. inside you code . > >> > >> Thanks for your eagle eyes in catching that typo! I had an older > >> remarked (NB.) line immediately above this line of code that had the > >> {:a which I visually copied instead of the correct {.a that I had used > >> everywhere else. > >> > >> Unfortunately, making this change did NOT change the results for the > >> full 10-year data. (It did work for the 10-day test case.) I have > >> no idea why this difference should exist. (The test case is the > >> column header row plus the first 10 days of the full 10-year file.) I > >> scanned the 10-year data, but nothing stood out as an anomaly. The > >> ONLY difference I noticed in the running of the J program is that the > >> initial boxing looks slightly different for the two sets of data. > >> (The data is read into file aa by the dsv routine in J.) I have no > >> idea why the two sets of data should look slightly different since the > >> data is the same, except for quantity. The difference is in the > >> display of the headers in the full data. (I tried my best to make > >> these look like the originals. It's very hard with a proportional > >> font.) > >> > >> Here is what a truncated output looks like for the 10-day test data: > >> > >> 3 {. aa > >> ┌─────┬─────┬─────┬─────┬─────┬──────┬───────┬──────┐ > >> │ "Date" │ "Price"│"Open" │ "High" │"Low" │ "Vol." │"Change %"│ > >> │ > >> ├─────┼─────┼─────┼─────┼─────┼──────┼───────┼──────┤ > >> │"May 22│ 2020" │"670.21"│"668.40"│"670.41"│"665.21"│"-" > >> │"0.23%" │ > >> ├─────┼─────┼─────┼─────┼──────┼─────┼───────┼──────┤ > >> │"May 21│ 2020" │"668.66"│"672.93"│"675.06"│"666.13"│"-" > >> │"-0.70%"│ > >> └─────┴─────┴─────┴─────┴──────┴─────┴───────┴──────┘ > >> 3 {. a > >> ┌─────┬───┬─────┬────┬────┬────┬──────┬────┐ > >> │Date │Price│Open │High │Low │Vol. │Change %│ │ > >> ├─────┼───┼─────┼────┼────┼────┼──────┼────┤ > >> │May 22 │ 2020│670.21│668.40│670.41│665.21│- │0.23% │ > >> ├─────┼───┼─────┼────┼────┼────┼──────┼────┤ > >> │May 21 │ 2020│668.66│672.93│675.06│666.13│- │-0.70%│ > >> └─────┴───┴─────┴────┴────┴────┴──────┴────┘ > >> DATE > >> 1 <-------- match is TRUE and column header row is deleted below > >> 3 {. a > >> ┌─────┬───┬────┬────┬─────┬────┬┬─────┐ > >> │May 22│ 2020│670.21│668.40│670.41│665.21│-│0.23% │ > >> ├─────┼───┼────┼────┼─────┼────┼┼─────┤ > >> │May 21│ 2020│668.66│672.93│675.06│666.13│-│-0.70%│ > >> ├─────┼───┼────┼────┼─────┼────┼┼─────┤ > >> │May 20│ 2020│673.36│670.68│675.42│670.26│-│1.73% │ > >> └─────┴───┴────┴────┴─────┴────┴┴─────┘ > >> > >> > >> And here is what a truncated output looks like for the full 10-year data: > >> > >> 3 {. aa > >> ┌──────┬────┬──────┬─────┬─────┬──────┬────────┬──────┐ > >> │"Date"│"Price"│"Open" │"High" │"Low" │"Vol." │"Change %"│ > >> │ > >> ├──────┼────┼──────┼─────┼─────┼──────┼────────┼──────┤ > >> │"May 22 │ 2020" │"670.21"│"668.40"│"670.41"│"665.21"│"-" > >> │"0.23%" │ > >> ├──────┼────┼──────┼─────┼─────┼──────┼────────┼──────┤ > >> │"May 21 │ 2020" │"668.66"│"672.93"│"675.06"│"666.13"│"-" > >> │"-0.70%"│ > >> └──────┴────┴──────┴─────┴─────┴──────┴────────┴──────┘ > >> 3 {. a > >> ┌─────┬───┬-─-─-─┬─────┬────┬────┬──────┬────┐ > >> │Date│Price│Open │High │Low │Vol. │Change %│ │ > >> ├─────┼───┼-─-─-─┼─────┼────┼────┼──────┼────┤ > >> │May 22 │ 2020│670.21│668.40│670.41│665.21│- │0.23% │ > >> ├─────┼───┼-─-──-┼─────┼────┼────┼──────┼────┤ > >> │May 21 │ 2020│668.66│672.93│675.06│666.13│- │-0.70%│ > >> └─────┴───┴─-─-─-┴─────┴────┴────┴──────┴────┘ > >> DATE > >> 0 <-------- match is FALSE and column header row is not deleted below > >> 3 {. a > >> ┌─────┬───┬-─-─-─┬─────┬────┬────┬──────┬────┐ > >> │Date│Price│Open │High │Low │Vol. │Change %│ │ > >> ├─────┼───┼-─-─-─┼─────┼────┼────┼──────┼────┤ > >> │May 22 │ 2020│670.21│668.40│670.41│665.21│- │0.23% │ > >> ├─────┼───┼-─-──-┼─────┼────┼────┼──────┼────┤ > >> │May 21 │ 2020│668.66│672.93│675.06│666.13│- │-0.70%│ > >> └─────┴───┴─-─-─-┴─────┴────┴────┴──────┴────┘ > >> > >> I'm curious why the headers are completely in sync with the data box > >> shapes when using the test data but out of sync with the box shapes > >> when using the full data (as well as having a match that is false > >> rather than true). > >> > >> Here is the very brief code that I've been using for testing: > >> > >> NB. ci2.ijs > >> require 'files stdlib' > >> require 'tables\dsv' > >> root=: '!user\......' NB. wherever you wish > >> > >> NB. syntax: ci2 'datafilename' > >> ci2=: 3 : 0 > >> aa=. ',' readdsv (jpath root,y) > >> smoutput 3 {. aa > >> a=. removedoublequotes each aa > >> smoutput 3 {. a > >> smoutput toupper (> 0 { ({. a)) > >> smoutput 'DATE' -: toupper (> 0 { ({. a)) > >> if. 'DATE' -: toupper (> 0 { ({. a)) do. a=. }. a end. > >> smoutput 3 {. a > >> ) > >> NB. ======================== > >> NB. adapted from source: "Special Matrices & Lists" in "Phrases" > >> NB. [original: removeblanks] > >> removedoublequotes=: -.@('"'&E.) # ] > >> NB. ======================== > >> > >> I would be happy to share the test file and full file via email with > >> anyone who is interested in figuring out this puzzle. The test file > >> is 699 B, and the full file is 159 KB. > >> > >> I'm eager to find out why the test file works but the full file > >> doesn't (even though the test file is the first 11 rows of the full > >> file). > >> > >> Harvey > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
