The three invisible characters at the beginning are
https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

FYI,

-- 
Raul

On Tue, May 26, 2020 at 10:01 AM 'Rob Hodgkinson' via Programming
<[email protected]> wrote:
>
> I requested the data sets from Harvey, the sp1500 (full) dataset had 3 
> invisible chars in front of “Date”, so with a “printable” filter verb these 
> could be removed.
>
> Hopefully Harvey’s data will now work.  I have not yet heard back… my 
> findings below.
>
> Henry and you were right & each {.b would have shown 7 (not 4) in the first 
> cell.
>
>
> So the header record differs in the 2 datasets, shown here… aaa is the short 
> data set, bbb is the full data set.
>
>    aa=. ',' readdsv (jpath '\Users\rob\jwork\test1.csv')
>    a=. removedoublequotes each aa
>    3{.a
> ┌──────┬─────┬──────┬──────┬──────┬──────┬────────┬──────┐
> │Date │Price│Open │High │Low │Vol. │Change %│ │
> ├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤
> │May 22│ 2020│670.21│668.40│670.41│665.21│- │0.23% │
> ├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤
> │May 21│ 2020│668.66│672.93│675.06│666.13│- │-0.70%│
> └──────┴─────┴──────┴──────┴──────┴──────┴────────┴──────┘
>    a.&i. each {.a
> ┌─────────────┬─────────────────┬──────────────┬──────────────┬──────────┬─────────────┬───────────────────────────┬┐
> │68 97 116 101│80 114 105 99 101│79 112 101 110│72 105 103 104│76 111 119│86 
> 111 108 46│67 104 97 110 103 101 32 37││
> └─────────────┴─────────────────┴──────────────┴──────────────┴──────────┴─────────────┴───────────────────────────┴┘
>
>
> NB. All looks OK so far … ‘Date’ is ASCII 68 97 116 101
>
>    a.{~ 68 97 116 101
> Date
>
>
>    bb=. ',' readdsv (jpath '\Users\rob\jwork\sp1500.csv')
>    b=. removedoublequotes each bb
>    3{.b
> ┌──────┬─────┬──────┬──────┬──────┬──────┬────────┬──────┐
> │Date │Price│Open │High │Low │Vol. │Change %│ │
> ├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤
> │May 22│ 2020│670.21│668.40│670.41│665.21│- │0.23% │
> ├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤
> │May 21│ 2020│668.66│672.93│675.06│666.13│- │-0.70%│
> └──────┴─────┴──────┴──────┴──────┴──────┴────────┴──────┘
>    a.&i. each {.b
> ┌─────────────────────────┬─────────────────┬──────────────┬──────────────┬──────────┬─────────────┬───────────────────────────┬┐
> │239 187 191 68 97 116 101│80 114 105 99 101│79 112 101 110│72 105 103 104│76 
> 111 119│86 111 108 46│67 104 97 110 103 101 32 37││
> └─────────────────────────┴─────────────────┴──────────────┴──────────────┴──────────┴─────────────┴───────────────────────────┴┘
>
>
> The first cell contains 3 leading unprintable chars, removed with a 
> ‘printable’ filter function such as:
>
> printable =: verb define
>  y #~ (32&<: *.127&>:) a. i. y
> )
>
>    >0{{.b
> Date
>    a. i. >0{{.b
> 239 187 191 68 97 116 101
>
>    printable >0{{.b
> Date
>    a. i. printable >0{{.b
> 68 97 116 101
>
> This filters out the range correctly, and requires printable prior to calling 
> toupper.
>     if. 'DATE' -: toupper printable (> 0 { ({. a)) do. a=. }. a end.
>
> Harvey should confirm, but I anticipate this is solved.
>
> …/Rob
>
> > On 26 May 2020, at 11:49 pm, Raul Miller <[email protected]> wrote:
> >
> > You did not show the shapes of the data offending label for the
> > truncated 10 year case.
> >
> > Shapes have to match for -: to match.
> >
> > Shapes can be different because if you have 1 dimension(s), or if you
> > have invisible characters.
> >
> > Take care,
> >
> > --
> > Raul
> >
> > On Tue, May 26, 2020 at 4:53 AM HH PackRat <[email protected]> wrote:
> >>
> >> On 5/25/20, Henry Rich <[email protected]> wrote:
> >>> You used {: in the last line.  Try it with {. .
> >>>
> >>> and on 5/25/20, 'robert therriault' via Programming 
> >>> <[email protected]> wrote:
> >>> I noticed in the single line test that you used {: a and not {. a
> >>>
> >>> and on 5/25/20, bill lam <[email protected]> wrote:
> >>> ... Maybe there are some typo such as {: instead of {. inside you code .
> >>
> >> Thanks for your eagle eyes in catching that typo!  I had an older
> >> remarked (NB.) line  immediately above this line of code that had the
> >> {:a which I visually copied instead of the correct {.a that I had used
> >> everywhere else.
> >>
> >> Unfortunately, making this change did NOT change the results for the
> >> full 10-year data.   (It did work for the 10-day test case.)  I have
> >> no idea why this difference should exist.  (The test case is the
> >> column header row plus the first 10 days of the full 10-year file.)  I
> >> scanned the 10-year data, but nothing stood out as an anomaly.  The
> >> ONLY difference I noticed in the running of the J program is that the
> >> initial boxing looks slightly different for the two sets of data.
> >> (The data is read into file aa by the dsv routine in J.)  I have no
> >> idea why the two sets of data should look slightly different since the
> >> data is the same, except for quantity.  The difference is in the
> >> display of the headers in the full data.  (I tried my best to make
> >> these look like the originals.  It's very hard with a proportional
> >> font.)
> >>
> >> Here is what a truncated output looks like for the 10-day test data:
> >>
> >> 3 {. aa
> >> ┌─────┬─────┬─────┬─────┬─────┬──────┬───────┬──────┐
> >> │ "Date" │ "Price"│"Open"  │ "High"  │"Low"   │  "Vol."   │"Change %"│
> >>           │
> >> ├─────┼─────┼─────┼─────┼─────┼──────┼───────┼──────┤
> >> │"May 22│ 2020" │"670.21"│"668.40"│"670.41"│"665.21"│"-"             
> >> │"0.23%" │
> >> ├─────┼─────┼─────┼─────┼──────┼─────┼───────┼──────┤
> >> │"May 21│ 2020" │"668.66"│"672.93"│"675.06"│"666.13"│"-"             
> >> │"-0.70%"│
> >> └─────┴─────┴─────┴─────┴──────┴─────┴───────┴──────┘
> >> 3 {. a
> >> ┌─────┬───┬─────┬────┬────┬────┬──────┬────┐
> >> │Date     │Price│Open   │High   │Low    │Vol.   │Change %│        │
> >> ├─────┼───┼─────┼────┼────┼────┼──────┼────┤
> >> │May 22 │ 2020│670.21│668.40│670.41│665.21│-            │0.23% │
> >> ├─────┼───┼─────┼────┼────┼────┼──────┼────┤
> >> │May 21 │ 2020│668.66│672.93│675.06│666.13│-            │-0.70%│
> >> └─────┴───┴─────┴────┴────┴────┴──────┴────┘
> >> DATE
> >> 1   <-------- match is TRUE and column header row is deleted below
> >> 3 {. a
> >> ┌─────┬───┬────┬────┬─────┬────┬┬─────┐
> >> │May 22│ 2020│670.21│668.40│670.41│665.21│-│0.23% │
> >> ├─────┼───┼────┼────┼─────┼────┼┼─────┤
> >> │May 21│ 2020│668.66│672.93│675.06│666.13│-│-0.70%│
> >> ├─────┼───┼────┼────┼─────┼────┼┼─────┤
> >> │May 20│ 2020│673.36│670.68│675.42│670.26│-│1.73% │
> >> └─────┴───┴────┴────┴─────┴────┴┴─────┘
> >>
> >>
> >> And here is what a truncated output looks like for the full 10-year data:
> >>
> >> 3 {. aa
> >> ┌──────┬────┬──────┬─────┬─────┬──────┬────────┬──────┐
> >> │"Date"│"Price"│"Open"    │"High"  │"Low"    │"Vol."    │"Change %"│
> >>            │
> >> ├──────┼────┼──────┼─────┼─────┼──────┼────────┼──────┤
> >> │"May 22  │ 2020" │"670.21"│"668.40"│"670.41"│"665.21"│"-"
> >>  │"0.23%" │
> >> ├──────┼────┼──────┼─────┼─────┼──────┼────────┼──────┤
> >> │"May 21  │ 2020" │"668.66"│"672.93"│"675.06"│"666.13"│"-"
> >>  │"-0.70%"│
> >> └──────┴────┴──────┴─────┴─────┴──────┴────────┴──────┘
> >> 3 {. a
> >> ┌─────┬───┬-─-─-─┬─────┬────┬────┬──────┬────┐
> >> │Date│Price│Open  │High    │Low    │Vol.   │Change %│         │
> >> ├─────┼───┼-─-─-─┼─────┼────┼────┼──────┼────┤
> >> │May 22 │ 2020│670.21│668.40│670.41│665.21│-            │0.23% │
> >> ├─────┼───┼-─-──-┼─────┼────┼────┼──────┼────┤
> >> │May 21 │ 2020│668.66│672.93│675.06│666.13│-            │-0.70%│
> >> └─────┴───┴─-─-─-┴─────┴────┴────┴──────┴────┘
> >> DATE
> >> 0   <-------- match is FALSE and column header row is not deleted below
> >> 3 {. a
> >> ┌─────┬───┬-─-─-─┬─────┬────┬────┬──────┬────┐
> >> │Date│Price│Open  │High    │Low    │Vol.   │Change %│         │
> >> ├─────┼───┼-─-─-─┼─────┼────┼────┼──────┼────┤
> >> │May 22 │ 2020│670.21│668.40│670.41│665.21│-            │0.23% │
> >> ├─────┼───┼-─-──-┼─────┼────┼────┼──────┼────┤
> >> │May 21 │ 2020│668.66│672.93│675.06│666.13│-            │-0.70%│
> >> └─────┴───┴─-─-─-┴─────┴────┴────┴──────┴────┘
> >>
> >> I'm curious why the headers are completely in sync with the data box
> >> shapes when using the test data but out of sync with the box shapes
> >> when using the full data (as well as having a match that is false
> >> rather than true).
> >>
> >> Here is the very brief code that I've been using for testing:
> >>
> >> NB. ci2.ijs
> >> require 'files stdlib'
> >> require 'tables\dsv'
> >> root=: '!user\......'  NB. wherever you wish
> >>
> >> NB.               syntax:  ci2  'datafilename'
> >> ci2=: 3 : 0
> >>  aa=. ',' readdsv (jpath root,y)
> >> smoutput 3 {. aa
> >>  a=. removedoublequotes each aa
> >> smoutput 3 {. a
> >> smoutput toupper (> 0 { ({. a))
> >> smoutput 'DATE' -: toupper (> 0 { ({. a))
> >>  if. 'DATE' -: toupper (> 0 { ({. a)) do. a=. }. a end.
> >> smoutput 3 {. a
> >> )
> >> NB. ========================
> >> NB. adapted from source: "Special Matrices & Lists" in "Phrases"
> >> NB. [original: removeblanks]
> >> removedoublequotes=: -.@('"'&E.) # ]
> >> NB. ========================
> >>
> >> I would be happy to share the test file and full file via email with
> >> anyone who is interested in figuring out this puzzle.  The test file
> >> is 699 B, and the full file is 159 KB.
> >>
> >> I'm eager to find out why the test file works but the full file
> >> doesn't (even though the test file is the first 11 rows of the full
> >> file).
> >>
> >> Harvey
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to