Re: [wsjt-devel] ALL.TXT (again)

Greg Beam Mon, 01 Jul 2019 09:43:12 -0700

Hello All,

This is similar to how I parse the file also; read / split the line andcheck line[0], then do what's needed based on checking the first string.

At present, my ALL.TXT is over 400MB. What I've been doing to preventread lock issues is creating a daily diff file between a copy and theactive ALL.TXT file; sticking each diff-file in folder to process atwhatever time I wish without affecting WSJT-X operations:


alltxt-diff-20190629-0300.txt
alltxt-diff-20190630-0300.txt
etc, etc, etc

After each diff run, I update the copy so it's ready for the next day.There are hundreds of ways to accomplish the same thing, but, I foundthis to be easy and fairly painless (disk space is cheap these days :-) )

What to do with the data after has been my focus of late. I've beenplaying around with MongoDB (a schema-less JSON/BSON Document storagedatabase) to sick the decoded lines in. You can either split the lines,or just stick the entire line in as a new document for long termstorage/later-date access.

The $regex processing capability of MongoDB is extensive, and very fast!One can easily parse a multitude of string combinations, even with theentire line in one field, for example:


use wsjtx;
db.alltxt.find( { $and: [
        {event:{$regex:'MY-CALL'}},
        {event:{$regex:'HIS-CALL'}}
    ]
});

That would print the lines (documents) that contains both 'my-call' and'his-call'.

You could add ..'DATE_STRING' or any combination you wish to furtherrefine the search without having to split the lines at all.

In case folks are worried about the number of documents in eachcollection, I've added the entire WSPR Decode Archive (from WSPRnet) toa MongoDB Database/collection set (one collection for each year, 2008thru 2019, at just over 95GB on disk size). Later collections have"millions" of decodes in them. Single collection Query Times are =< 1 to2 seconds. With added indexing, times are in the Millisecond range :-)Aggregate queries, those spanning multiple collections/years, vary intime depending on the data being sought but are well within anacceptable time limit for most use cases I've had.


73's
Greg, KI7MT

On 7/1/19 1:27 AM, Claude Frantz wrote:

On 7/1/19 7:59 AM, Claude Frantz wrote:

Just as an example of code extract in perl:

if ($line =~ m/^(\d{4})-([A-Z][a-z]{2})-(\d{2})\b/ ) {
     $day = $3 ;
     $month_alpha = $2 ;
     $year = $1 ;
}
elsif ($line =~ m/^(\d\d)(\d\d)(\d\d)_\d{6}\b/ ) {
     $day = $3 ;
     $month_num = $2 ;
     $year = 2000 + $1 ;
}
elsif ($line =~ m/^(\d{4})-(\d\d)-(\d\d)\b/ ) {
     $day = $3 ;
     $month_num = $2 ;
     $year = $1 ;
     }
I have not tested it, I hope there is no error. This allow to decode the3 formats of ALL.TXT about which ones I remember about. Please note thatthe month can be numeric or alpha. If alpha, you have to convert tonumeric, if you want to compare to a numeric value. Please note also,that the mode switching was an extra line in previous formats.
Best wishes,
Claude (DJ0OT)


_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel



_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Re: [wsjt-devel] ALL.TXT (again)

Reply via email to