Re: Accessing parts of arrays

Mark Brownell Thu, 30 Sep 2004 13:40:23 -0700


On Thursday, September 30, 2004, at 12:35 PM, Jan Schenkel wrote:

If my memory serves me well, Geoff Canyon started a
thread on the xTalk mailing list a while ago that
proposed functions itemOffsets, wordOffsets and
lineOffsets which would return all the occurences'
locations.

So if we could have an elementOffsets() function, this
would be the best solution for the above request, I
think.

Jan Schenkel.


I have been talking to people at the mothership about this.

First off my software that I created with Rev is starting to sell. This gives me the money to pay for the externals suggested below by Mark. The reason that I bring it up here on the list in the open is that the external will speed up my XML based database and if it where later to be added to the engine then it would speed up my software that I'm selling now. This sounds like it might be a great tool for array power if you are willing to use a parser for the manipulations.

Before I proceed does this suggestion sound good for this array thread? (see below) Pull-parsing an XML structure at high speed could give us all kinds of array manipulations if you where to use numbered tag sets like <1>[data]</1>, <2>[more data here]</2>, <3>[even more data]</3>etc... and <1,1> <1,2> and <1,3,1> for dimensional arrays.

Mark Brownell


On Wednesday, September 29, 2004, at 09:50 AM, Mark Waddingham wrote:

Hi Mark,

[snip]

In terms of your request for the suggested matchGlobal function [see below] then while it would be nice to have, in comparison with other feature requests that we have, it is difficult to justify putting development time into this as opposed to other extensions/enhancements and features that people have requested.

However, as I mentioned before, we would be perfectly willing to develop an external with the functionality you require which can then be integrated into the engine at the next opportunity. This both mitigates the development cost to us, and provides you a more flexible solution should you require specialization and/or optimization of the functions in the future.

If you are interested in proceeding in this manner then I will happily put together a more concrete proposal to you, including technical details and time costings, and leave you to negotiate with Kevin the costs and finer contractual details.
To give you an idea of the substance of such a proposal I would suggest
implementing an external with the following functions:
  matchOffsets(<needle>, <haystack>, [ <from> ], [ <to> ])
  - return a list of offsets of the <needle> in char <from> to <to> of
    <haystack> one per line.
matchParallelOffsets(<needles>, <needle_sep>, <haystack>, [ <from> ], [ <to> ]) - return a list of offsets of each chunk of <needles> in char <from> to <to> of <haystack> The chunks of <needles> would be delimited by the character <needle_sep>. Each line of this list would be of the form offset of <needle_1>, offset of <needle_2>, ...,, offset of <needle_n> (i.e. the functionality of your parser would be given by doing a single call of matchParallelOffsets with two chunks in the <needles>)
  matchSetCacheSize <size>
  - The Boyer-Moore algorithm has a set-up cost for each pattern which
    incurs a memory overhead. This call would set the maximum number of
    patterns that should be cached at any one time.
To give an idea about how these might be implemented in the engine, then Jeanne's suggestion for syntax is a good one (assuming it doesn't cause any conflicts - I make no promises as to whether this syntax is feasible):
  the offsets of <needle> in <haystack>
  the offsets of the lines/words/items of <needles> in <haystack>
Anyway, I shall leave you to think on this way forward, and I promise to be more efficient in getting back to you next time.
Warmest Regards,
Mark.
On Thu, 16 Sep 2004, Mark Brownell wrote:
Hi Mark,
I was wondering, now that things might have gotten a little less hectic, what or if any progress has been made on adding this to the Rev engine? This is exactly what I was hoping to get. I can use it to isolate large portions of huge documents for the purpose of creating something I might need very badly in the next few months. also this single function could be highly useful to others as you pointed out.
Thanks,
Mark Brownell
On Wednesday, August 18, 2004, at 03:10 AM, Mark Waddingham wrote:
The one of most interest is the Boyer-Moore algorithm as this is
reputed
to be the fastest.
So, one idea is to implement a function:
  matchGlobal(stringToSearch, token)
returning a list of all indices in stringToSearch of token.
e.g.
  get matchGlobal("<a>foo</a><a>bar</a><a>baz</a>", "<a>")
would give
  it[1] = 1
  it[2] = 10
  it[3] = 20


_______________________________________________
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Accessing parts of arrays

Reply via email to