Hello All,
This is similar to how I parse the file also; read / split the line and
check line[0], then do what's needed based on checking the first string.
At present, my ALL.TXT is over 400MB. What I've been doing to prevent
read lock issues is creating a daily diff file between a copy and the
active ALL.TXT file; sticking each diff-file in folder to process at
whatever time I wish without affecting WSJT-X operations:
alltxt-diff-20190629-0300.txt
alltxt-diff-20190630-0300.txt
etc, etc, etc
After each diff run, I update the copy so it's ready for the next day.
There are hundreds of ways to accomplish the same thing, but, I found
this to be easy and fairly painless (disk space is cheap these days :-) )
What to do with the data after has been my focus of late. I've been
playing around with MongoDB (a schema-less JSON/BSON Document storage
database) to sick the decoded lines in. You can either split the lines,
or just stick the entire line in as a new document for long term
storage/later-date access.
The $regex processing capability of MongoDB is extensive, and very fast!
One can easily parse a multitude of string combinations, even with the
entire line in one field, for example:
use wsjtx;
db.alltxt.find( { $and: [
{event:{$regex:'MY-CALL'}},
{event:{$regex:'HIS-CALL'}}
]
});
That would print the lines (documents) that contains both 'my-call' and
'his-call'.
You could add ..'DATE_STRING' or any combination you wish to further
refine the search without having to split the lines at all.
In case folks are worried about the number of documents in each
collection, I've added the entire WSPR Decode Archive (from WSPRnet) to
a MongoDB Database/collection set (one collection for each year, 2008
thru 2019, at just over 95GB on disk size). Later collections have
"millions" of decodes in them. Single collection Query Times are =< 1 to
2 seconds. With added indexing, times are in the Millisecond range :-)
Aggregate queries, those spanning multiple collections/years, vary in
time depending on the data being sought but are well within an
acceptable time limit for most use cases I've had.
73's
Greg, KI7MT
On 7/1/19 1:27 AM, Claude Frantz wrote:
On 7/1/19 7:59 AM, Claude Frantz wrote:
Just as an example of code extract in perl:
if ($line =~ m/^(\d{4})-([A-Z][a-z]{2})-(\d{2})\b/ ) {
$day = $3 ;
$month_alpha = $2 ;
$year = $1 ;
}
elsif ($line =~ m/^(\d\d)(\d\d)(\d\d)_\d{6}\b/ ) {
$day = $3 ;
$month_num = $2 ;
$year = 2000 + $1 ;
}
elsif ($line =~ m/^(\d{4})-(\d\d)-(\d\d)\b/ ) {
$day = $3 ;
$month_num = $2 ;
$year = $1 ;
}
I have not tested it, I hope there is no error. This allow to decode the
3 formats of ALL.TXT about which ones I remember about. Please note that
the month can be numeric or alpha. If alpha, you have to convert to
numeric, if you want to compare to a numeric value. Please note also,
that the mode switching was an extra line in previous formats.
Best wishes,
Claude (DJ0OT)
_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel
_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel