There has been a good bit written about this, and some of it has made it
into System/Interpreter/Requests. If you are asking for more than is
asked for there, please add to that page.
Henry Rich
On 6/11/2017 10:17 AM, 'Pascal Jasmin' via Programming wrote:
emit empty word is strongly needed IMO.
I will probably write a mini-regex implementation for ;: (with *?+ support)
suitable for tag extraction that should be faster than regex.
But the basics of that are to start word after opening string, use ev (no
start) at start of closing string, but then ew with no start when end of
closing string.
But have you confirmed that regex is too slow for tag extraction?
________________________________
From: Danil Osipchuk <[email protected]>
To: Programming forum <[email protected]>
Sent: Sunday, June 11, 2017 9:59 AM
Subject: Re: [Jprogramming] Apply at start/lengths pairs
I could not find a cutP definition with a quick look, but from your example
it seems like you mean a character separator by token. It is not general
enough.
Also, imagine a fluffy xml file, with millions of records, where only a
minority of fields of different type in records are interesting, some nodes
have missing fields of the type you are interested in.
Parsing the whole file is plainly unfeasible because of performance and
complexity of the resulting code.
Applying at selected positions obtained and reshaped by whatever means
however works rather well and is easy to reason about.
As about fsm, I do remember that I found ;: inconvenient when I was trying
to apply it - and one issue was that there is no way to emit an empty word.
The other was that you have to have every possible value of input domain
represented as a row. When the domain are characters the mapping is
manageable, for everything else - not so much. As a vague idea, if there
was a way to condense the input domain through a verb, possibly a dyad to
pass an additional state, it would considerably expand the use of ;: with
some performance hit of course.
Also the code utilizing ;: is pretty much unreadable even by J standards.
That was my initial impressions about it.
2017-06-11 15:43 GMT+03:00 'Pascal Jasmin' via Programming <
[email protected]>:
A more general procedure than your request is to cut your data such that
your start/end segments are in odd positions
in jpp, https://github.com/Pascal-J/jpp
cutP is a process for cutting on start and end tokens, though there are
faster methods in included fsm.ijs file. And that process could get
significant boost if ;: were enhanced to support emitting empty boxes, but:
cutP '(asdf)g()'
++----+-+++
||asdf|g|||
++----+-+++
cutP is dyadic for start and end tokens other than '()'.
also from jpp, the AltM adverb takes a gerund to apply cyclically to such
an above cut structure.
a:"_`u AltM would produce empties for non-odd positions.
But if you only care about the selections, then either regex, or a ;:
definition can extract them.
________________________________
From: Danil Osipchuk <[email protected]>
To: Programming forum <[email protected]>
Sent: Sunday, June 11, 2017 7:19 AM
Subject: [Jprogramming] Apply at start/lengths pairs
Hi all,
I wonder if there is an idiomatic way to apply a verb using an array of
start and length pairs. This is a recurring pattern when extracting data
from files.
I've tried 3 adverbs (the example at the end), and the first one is
slightly better on big files, but I'm still looking for possible
improvements (the need is to extract selected fields from multi-gigabyte
memory mapped csv/xml files)
'ab' xmlTagContentSL XML
11 1
22 2
34 3
47 4
'ab' <xmlTagDo XML
+-+--+---+----+
|1|20|300|4000|
+-+--+---+----+
(2 2 $ 'ab'xmlTagContentSL XML) <doSL XML
+---+----+
|1 |20 |
+---+----+
|300|4000|
+---+----+
regards,
Danil
doSL =: 1 : '(,."1@[)u;.0]' NB. SL stands for start len pair
NB. doSL =: 1 : '(0|:[:,:[)u;.0]'
NB. doSL =: 1 : '(u;.0~ ,.)~"1'
xmlTagOpn =: '<' ,'>',~]
xmlTagCls =: '</','>',~]
xmlTagContentSL =: 4 : 0
CS =. (xmlTagOpn >x) (#@[ + I.@E.) y
CE =. (xmlTagCls >x) I.@E. y
CS ,. CE-CS
)
xmlTagDo =: 1 : '(xmlTagContentSL (u doSL) ])f.'
XML =: 0 : 0
<data>
<ab>1</ab>
<ab>20</ab>
<ab>300</ab>
<ab>4000</ab>
</data>
)
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
---
This email has been checked for viruses by AVG.
http://www.avg.com
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm