hi,
The Ragel Guide has an excellent set of guidelines for how to "take on
some buffer management functions" when using the longest-match operator
(for scanners):
\begin{itemize}
\setlength{\parskip}{0pt}
\item Read a block of input data.
\item Run the execute code.
\item If \verb|ts| is set, the execute code will expect the incomplete
token to be preserved ahead of the buffer on the next invocation of the execute
code.
\begin{itemize}
\item Shift the data beginning at \verb|ts| and ending at \verb|pe| to the
beginning of the input buffer.
\item Reset \verb|ts| to the beginning of the buffer.
\item Shift \verb|te| by the distance from the old value of \verb|ts|
to the new value. The \verb|te| variable may or may not be valid. There is
no way to know if it holds a meaningful value because it is not kept at null
when it is not in use. It can be shifted regardless.
\end{itemize}
\item Read another block of data into the buffer, immediately following any
preserved data.
\item Run the scanner on the new data.
\end{itemize}
I believe this is a correct implementation in Ruby: (see the #scan!
method for the buffering)
=begin
%%{
machine foo_scanner;
foo_open = 'START_FOO';
foo_close = 'STOP_FOO';
foo = foo_open any* :>> foo_close;
main := |*
foo => { emit data[ts...te].pack('c*') };
any;
*|;
}%%
=end
class FooScanner
# read stuff in 1 meg at a time
CHUNK_SIZE = 1_048_576
attr_reader :target
def initialize(target)
@target = target
%% write data;
end
def emit(foo_entity)
puts "I found a foo entity!"
puts foo_entity
end
def scan!
# Set pe so that ragel doesn't try to get it from data.length
pe = -1
eof = File.size(target)
%% write init;
prefix = []
File.open(target) do |f|
while chunk = f.read(CHUNK_SIZE)
# \item Read a block of input data.
data = prefix + chunk.unpack("c*")
# \item Run the execute code.
p = 0
pe = data.length
%% write exec;
# \item If \verb|ts| is set, the execute code will expect the
incomplete token to be preserved ahead of the buffer on the next invocation of
the execute code.
unless ts.nil?
# \begin{itemize}
# \item Shift the data beginning at \verb|ts| and ending at \verb|pe|
to the beginning of the input buffer.
prefix = data[ts..pe]
# \item Shift \verb|te| by the distance from the old value of
\verb|ts| to the new value. The \verb|te| variable may or may not be valid.
There is no way to know if it holds a meaningful value because it is not kept
at null when it is not in use. It can be shifted regardless. [SWAPPED ORDER]
if te
te = te - ts
end
# \item Reset \verb|ts| to the beginning of the buffer. [SWAPPED
ORDER]
ts = 0
# \end{itemize}
else
prefix = []
end
# \item Read another block of data into the buffer, immediately
following any preserved data.
# \item Run the scanner on the new data.
end
end
end
end
You can run it with
foo_scanner = FooScanner.new 'foo.txt'
foo_scanner.scan!
If that is good code, then perhaps it could be added as another example
to the Ragel website?
Thanks,
Seamus
--
Seamus Abshere
123 N Blount St Apt 403
Madison, WI 53703
1 (201) 566-0130
_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users