Does order of the substrings matter? (In other words, would it be ok to treat 'B002|A001' like 'A001|B002'?) -- Why or why not?
Are all the part numbers equal width? Would any of the lines be duplicates of other lines? If so, how should that be handled? My inclination would be to build a list of unique part numbers, replace each line with a list of indices into then treat this as a numeric problem (probably using sorting, possibly grouping the lines based on length). But to do that properly, I would need a deeper understanding of the requirements. Thanks, -- Raul On Friday, June 28, 2019, vadim . <vadim3128m...@gmail.com> wrote: > Example: given a file (a string), where each line is list of part > numbers with separators, how to exclude lines which are substrings of > other lines (cut on separators)? E.g. (first line is to be excluded): > > z =: 0 : 0 > A001|B002 > C003|A001|B002 > B002|A001 > C003|D004|A001 > E005|F006 > D004|C003 > ) > [lines =: <;._2 z > +---------+--------------+---------+--------------+---------+---------+ > |A001|B002|C003|A001|B002|B002|A001|C003|D004|A001|E005|F006|D004|C003| > +---------+--------------+---------+--------------+---------+---------+ > [syms =: (s:@:('|'&,))&.> lines > +-----------+-----------------+-----------+----------------- > +-----------+-----------+ > |`A001 `B002|`C003 `A001 `B002|`B002 `A001|`C003 `D004 `A001|`E005 > `F006|`D004 `C003| > +-----------+-----------------+-----------+----------------- > +-----------+-----------+ > (+./@:E.)&.(> :.])/~ syms > 1 1 0 0 0 0 > 0 1 0 0 0 0 > 0 0 1 0 0 0 > 0 0 0 1 0 0 > 0 0 0 0 1 0 > 0 0 0 0 0 1 > [idx =: 1&= +/"1 (+./@:E.)&.(> :.])/~ syms > 0 1 1 1 1 1 > [result =: idx#lines > +--------------+---------+--------------+---------+---------+ > |C003|A001|B002|B002|A001|C003|D004|A001|E005|F006|D004|C003| > +--------------+---------+--------------+---------+---------+ > > I'm worried about performance of this line: > > (+./@:E.)&.(> :.])/~ syms > > other details are not very important, as I'm only learning. Phrase > above uses form "+./@:E.", recommended for speed in J Wiki. I think > table adverb must be optimized, too. But, adding some weight: > > z =: 0 : 0 > 2N0472|6N8595|9L1366|1189902|1413983|8B2026|1M3381|7K3377| > 3H5788|1F7854|8W1152|8R0721|9C5344|6W6672|9G7101|3023908| > 6Y1352|4P0489|2757803 > 3419308|3514531|3525716|3557019|3586192|3635776|3783741 > 3T3625|6T7765|9L1366|1189902|1413983|8B2026|1M3381|7K3377|3H5788|1F7854 > 3T3625|6T7765|9L1366|1189902|1413983|8B2026|1M3381|7K3377| > 3H5788|1F7854|8W1152|8R0721 > 3T3628|6T7765|9L1366|1189902|1413983|8B2026|1M3381|7K3377| > 3H5788|1F7854|8W1152|8R0721|9C5344|6W6672|9G7101|3023908| > 6Y1352|4P0489|1336934 > 4N4906|6N6481|9L1366|1189902|1413983|8B2026|1M3381|7K3377 > 4N4906|6N6481|9L1366|1189902|1413983|8B2026|1M3381|7K3377|3H5788 > 6N7936|6N5049|9L1366|1189902|1413983|8B2026|1M3381|7K3377| > 3H5788|1F7854|8W1152|8R0721|9C5344|6W6672|9G7101|3023908| > 6Y1352|4P0489|2757803 > 6Y0248|6T7765|9L1366|1189902|1413983|8B2026|1M3381|7K3377| > 3H5788|1F7854|8W1152|8R0721|9C5344|6W6672|9G7101|3023908| > 6Y1352|4P0489|1336934 > 6Y0248|6T7765|9L1366|1189902|1413983|8B2026|1M3381|7K3377 > 6Y0248|6T7765|9L1366|1189902|1413983|8B2026|1M3381|7K3377| > 3H5788|1F7854|8W1152 > ) > lines =: <;._2 z > syms =: (s:@:('|'&,))&.> lines > syms10 =: ,(i.10) ]"0 _ syms > syms100 =: ,(i.100) ]"0 _ syms > syms1000 =: ,(i.1000) ]"0 _ syms > 10 (6!:2) '(+./@:E.)&.(> :.])/~ syms' > 3.936e_5 > 10 (6!:2) '(+./@:E.)&.(> :.])/~ syms10' > 0.00652701 > 10 (6!:2) '(+./@:E.)&.(> :.])/~ syms100' > 0.283609 > 1 (6!:2) '(+./@:E.)&.(> :.])/~ syms1000' > 28.7405 > > 28 seconds for 11000 short lines is unacceptable. Am I doing something > totally wrong? For example, for this task Perl shows close to linear > (definitely not quadratic) dependency, runs hundreds and thousands > times faster. > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm