Re: [Jprogramming] Why is this match false?

'Rob Hodgkinson' via Programming Tue, 26 May 2020 07:02:00 -0700

I requested the data sets from Harvey, the sp1500 (full) dataset had 3 
invisible chars in front of “Date”, so with a “printable” filter verb these 
could be removed.


Hopefully Harvey’s data will now work.  I have not yet heard back… my findings 
below.

Henry and you were right & each {.b would have shown 7 (not 4) in the first 
cell.


So the header record differs in the 2 datasets, shown here… aaa is the short 
data set, bbb is the full data set.

   aa=. ',' readdsv (jpath '\Users\rob\jwork\test1.csv')
   a=. removedoublequotes each aa
   3{.a
┌──────┬─────┬──────┬──────┬──────┬──────┬────────┬──────┐
│Date │Price│Open │High │Low │Vol. │Change %│ │
├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤
│May 22│ 2020│670.21│668.40│670.41│665.21│- │0.23% │
├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤
│May 21│ 2020│668.66│672.93│675.06│666.13│- │-0.70%│
└──────┴─────┴──────┴──────┴──────┴──────┴────────┴──────┘
   a.&i. each {.a
┌─────────────┬─────────────────┬──────────────┬──────────────┬──────────┬─────────────┬───────────────────────────┬┐
│68 97 116 101│80 114 105 99 101│79 112 101 110│72 105 103 104│76 111 119│86 
111 108 46│67 104 97 110 103 101 32 37││
└─────────────┴─────────────────┴──────────────┴──────────────┴──────────┴─────────────┴───────────────────────────┴┘


NB. All looks OK so far … ‘Date’ is ASCII 68 97 116 101

   a.{~ 68 97 116 101
Date


   bb=. ',' readdsv (jpath '\Users\rob\jwork\sp1500.csv')
   b=. removedoublequotes each bb
   3{.b
┌──────┬─────┬──────┬──────┬──────┬──────┬────────┬──────┐
│Date │Price│Open │High │Low │Vol. │Change %│ │
├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤
│May 22│ 2020│670.21│668.40│670.41│665.21│- │0.23% │
├──────┼─────┼──────┼──────┼──────┼──────┼────────┼──────┤
│May 21│ 2020│668.66│672.93│675.06│666.13│- │-0.70%│
└──────┴─────┴──────┴──────┴──────┴──────┴────────┴──────┘
   a.&i. each {.b
┌─────────────────────────┬─────────────────┬──────────────┬──────────────┬──────────┬─────────────┬───────────────────────────┬┐
│239 187 191 68 97 116 101│80 114 105 99 101│79 112 101 110│72 105 103 104│76 
111 119│86 111 108 46│67 104 97 110 103 101 32 37││
└─────────────────────────┴─────────────────┴──────────────┴──────────────┴──────────┴─────────────┴───────────────────────────┴┘


The first cell contains 3 leading unprintable chars, removed with a ‘printable’ 
filter function such as:

printable =: verb define
 y #~ (32&<: *.127&>:) a. i. y
)

   >0{{.b
Date
   a. i. >0{{.b
239 187 191 68 97 116 101

   printable >0{{.b
Date
   a. i. printable >0{{.b
68 97 116 101

This filters out the range correctly, and requires printable prior to calling 
toupper.
    if. 'DATE' -: toupper printable (> 0 { ({. a)) do. a=. }. a end.

Harvey should confirm, but I anticipate this is solved.

…/Rob

> On 26 May 2020, at 11:49 pm, Raul Miller <[email protected]> wrote:
> 
> You did not show the shapes of the data offending label for the
> truncated 10 year case.
> 
> Shapes have to match for -: to match.
> 
> Shapes can be different because if you have 1 dimension(s), or if you
> have invisible characters.
> 
> Take care,
> 
> -- 
> Raul
> 
> On Tue, May 26, 2020 at 4:53 AM HH PackRat <[email protected]> wrote:
>> 
>> On 5/25/20, Henry Rich <[email protected]> wrote:
>>> You used {: in the last line.  Try it with {. .
>>> 
>>> and on 5/25/20, 'robert therriault' via Programming 
>>> <[email protected]> wrote:
>>> I noticed in the single line test that you used {: a and not {. a
>>> 
>>> and on 5/25/20, bill lam <[email protected]> wrote:
>>> ... Maybe there are some typo such as {: instead of {. inside you code .
>> 
>> Thanks for your eagle eyes in catching that typo!  I had an older
>> remarked (NB.) line  immediately above this line of code that had the
>> {:a which I visually copied instead of the correct {.a that I had used
>> everywhere else.
>> 
>> Unfortunately, making this change did NOT change the results for the
>> full 10-year data.   (It did work for the 10-day test case.)  I have
>> no idea why this difference should exist.  (The test case is the
>> column header row plus the first 10 days of the full 10-year file.)  I
>> scanned the 10-year data, but nothing stood out as an anomaly.  The
>> ONLY difference I noticed in the running of the J program is that the
>> initial boxing looks slightly different for the two sets of data.
>> (The data is read into file aa by the dsv routine in J.)  I have no
>> idea why the two sets of data should look slightly different since the
>> data is the same, except for quantity.  The difference is in the
>> display of the headers in the full data.  (I tried my best to make
>> these look like the originals.  It's very hard with a proportional
>> font.)
>> 
>> Here is what a truncated output looks like for the 10-day test data:
>> 
>> 3 {. aa
>> ┌─────┬─────┬─────┬─────┬─────┬──────┬───────┬──────┐
>> │ "Date" │ "Price"│"Open"  │ "High"  │"Low"   │  "Vol."   │"Change %"│
>>           │
>> ├─────┼─────┼─────┼─────┼─────┼──────┼───────┼──────┤
>> │"May 22│ 2020" │"670.21"│"668.40"│"670.41"│"665.21"│"-"             
>> │"0.23%" │
>> ├─────┼─────┼─────┼─────┼──────┼─────┼───────┼──────┤
>> │"May 21│ 2020" │"668.66"│"672.93"│"675.06"│"666.13"│"-"             
>> │"-0.70%"│
>> └─────┴─────┴─────┴─────┴──────┴─────┴───────┴──────┘
>> 3 {. a
>> ┌─────┬───┬─────┬────┬────┬────┬──────┬────┐
>> │Date     │Price│Open   │High   │Low    │Vol.   │Change %│        │
>> ├─────┼───┼─────┼────┼────┼────┼──────┼────┤
>> │May 22 │ 2020│670.21│668.40│670.41│665.21│-            │0.23% │
>> ├─────┼───┼─────┼────┼────┼────┼──────┼────┤
>> │May 21 │ 2020│668.66│672.93│675.06│666.13│-            │-0.70%│
>> └─────┴───┴─────┴────┴────┴────┴──────┴────┘
>> DATE
>> 1   <-------- match is TRUE and column header row is deleted below
>> 3 {. a
>> ┌─────┬───┬────┬────┬─────┬────┬┬─────┐
>> │May 22│ 2020│670.21│668.40│670.41│665.21│-│0.23% │
>> ├─────┼───┼────┼────┼─────┼────┼┼─────┤
>> │May 21│ 2020│668.66│672.93│675.06│666.13│-│-0.70%│
>> ├─────┼───┼────┼────┼─────┼────┼┼─────┤
>> │May 20│ 2020│673.36│670.68│675.42│670.26│-│1.73% │
>> └─────┴───┴────┴────┴─────┴────┴┴─────┘
>> 
>> 
>> And here is what a truncated output looks like for the full 10-year data:
>> 
>> 3 {. aa
>> ┌──────┬────┬──────┬─────┬─────┬──────┬────────┬──────┐
>> │"Date"│"Price"│"Open"    │"High"  │"Low"    │"Vol."    │"Change %"│
>>            │
>> ├──────┼────┼──────┼─────┼─────┼──────┼────────┼──────┤
>> │"May 22  │ 2020" │"670.21"│"668.40"│"670.41"│"665.21"│"-"
>>  │"0.23%" │
>> ├──────┼────┼──────┼─────┼─────┼──────┼────────┼──────┤
>> │"May 21  │ 2020" │"668.66"│"672.93"│"675.06"│"666.13"│"-"
>>  │"-0.70%"│
>> └──────┴────┴──────┴─────┴─────┴──────┴────────┴──────┘
>> 3 {. a
>> ┌─────┬───┬-─-─-─┬─────┬────┬────┬──────┬────┐
>> │Date│Price│Open  │High    │Low    │Vol.   │Change %│         │
>> ├─────┼───┼-─-─-─┼─────┼────┼────┼──────┼────┤
>> │May 22 │ 2020│670.21│668.40│670.41│665.21│-            │0.23% │
>> ├─────┼───┼-─-──-┼─────┼────┼────┼──────┼────┤
>> │May 21 │ 2020│668.66│672.93│675.06│666.13│-            │-0.70%│
>> └─────┴───┴─-─-─-┴─────┴────┴────┴──────┴────┘
>> DATE
>> 0   <-------- match is FALSE and column header row is not deleted below
>> 3 {. a
>> ┌─────┬───┬-─-─-─┬─────┬────┬────┬──────┬────┐
>> │Date│Price│Open  │High    │Low    │Vol.   │Change %│         │
>> ├─────┼───┼-─-─-─┼─────┼────┼────┼──────┼────┤
>> │May 22 │ 2020│670.21│668.40│670.41│665.21│-            │0.23% │
>> ├─────┼───┼-─-──-┼─────┼────┼────┼──────┼────┤
>> │May 21 │ 2020│668.66│672.93│675.06│666.13│-            │-0.70%│
>> └─────┴───┴─-─-─-┴─────┴────┴────┴──────┴────┘
>> 
>> I'm curious why the headers are completely in sync with the data box
>> shapes when using the test data but out of sync with the box shapes
>> when using the full data (as well as having a match that is false
>> rather than true).
>> 
>> Here is the very brief code that I've been using for testing:
>> 
>> NB. ci2.ijs
>> require 'files stdlib'
>> require 'tables\dsv'
>> root=: '!user\......'  NB. wherever you wish
>> 
>> NB.               syntax:  ci2  'datafilename'
>> ci2=: 3 : 0
>>  aa=. ',' readdsv (jpath root,y)
>> smoutput 3 {. aa
>>  a=. removedoublequotes each aa
>> smoutput 3 {. a
>> smoutput toupper (> 0 { ({. a))
>> smoutput 'DATE' -: toupper (> 0 { ({. a))
>>  if. 'DATE' -: toupper (> 0 { ({. a)) do. a=. }. a end.
>> smoutput 3 {. a
>> )
>> NB. ========================
>> NB. adapted from source: "Special Matrices & Lists" in "Phrases"
>> NB. [original: removeblanks]
>> removedoublequotes=: -.@('"'&E.) # ]
>> NB. ========================
>> 
>> I would be happy to share the test file and full file via email with
>> anyone who is interested in figuring out this puzzle.  The test file
>> is 699 B, and the full file is 159 KB.
>> 
>> I'm eager to find out why the test file works but the full file
>> doesn't (even though the test file is the first 11 rows of the full
>> file).
>> 
>> Harvey
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Why is this match false?

Reply via email to