Hello All,

This is similar to how I parse the file also; read / split the line and check line[0], then do what's needed based on checking the first string.

At present, my ALL.TXT is over 400MB. What I've been doing to prevent read lock issues is creating a daily diff file between a copy and the active ALL.TXT file; sticking each diff-file in folder to process at whatever time I wish without affecting WSJT-X operations:

alltxt-diff-20190629-0300.txt
alltxt-diff-20190630-0300.txt
etc, etc, etc

After each diff run, I update the copy so it's ready for the next day. There are hundreds of ways to accomplish the same thing, but, I found this to be easy and fairly painless (disk space is cheap these days :-) )

What to do with the data after has been my focus of late. I've been playing around with MongoDB (a schema-less JSON/BSON Document storage database) to sick the decoded lines in. You can either split the lines, or just stick the entire line in as a new document for long term storage/later-date access.

The $regex processing capability of MongoDB is extensive, and very fast! One can easily parse a multitude of string combinations, even with the entire line in one field, for example:

use wsjtx;
db.alltxt.find( { $and: [
        {event:{$regex:'MY-CALL'}},
        {event:{$regex:'HIS-CALL'}}
    ]
});


That would print the lines (documents) that contains both 'my-call' and 'his-call'.

You could add ..'DATE_STRING' or any combination you wish to further refine the search without having to split the lines at all.

In case folks are worried about the number of documents in each collection, I've added the entire WSPR Decode Archive (from WSPRnet) to a MongoDB Database/collection set (one collection for each year, 2008 thru 2019, at just over 95GB on disk size). Later collections have "millions" of decodes in them. Single collection Query Times are =< 1 to 2 seconds. With added indexing, times are in the Millisecond range :-) Aggregate queries, those spanning multiple collections/years, vary in time depending on the data being sought but are well within an acceptable time limit for most use cases I've had.

73's
Greg, KI7MT

On 7/1/19 1:27 AM, Claude Frantz wrote:
On 7/1/19 7:59 AM, Claude Frantz wrote:

Just as an example of code extract in perl:

if ($line =~ m/^(\d{4})-([A-Z][a-z]{2})-(\d{2})\b/ ) {
     $day = $3 ;
     $month_alpha = $2 ;
     $year = $1 ;
}
elsif ($line =~ m/^(\d\d)(\d\d)(\d\d)_\d{6}\b/ ) {
     $day = $3 ;
     $month_num = $2 ;
     $year = 2000 + $1 ;
}
elsif ($line =~ m/^(\d{4})-(\d\d)-(\d\d)\b/ ) {
     $day = $3 ;
     $month_num = $2 ;
     $year = $1 ;
     }

I have not tested it, I hope there is no error. This allow to decode the 3 formats of ALL.TXT about which ones I remember about. Please note that the month can be numeric or alpha. If alpha, you have to convert to numeric, if you want to compare to a numeric value. Please note also, that the mode switching was an extra line in previous formats.

Best wishes,
Claude (DJ0OT)


_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Reply via email to