Thanks I will give it a try. I have Ubuntu 16.04.2 running in Virtual Box on my Mac I have about 900 PDF files to convert
On Friday, April 21, 2017 at 1:37:14 PM UTC-4, Bob Weber wrote: > > I use Debian linux. It was easy to convert the pdf file to text with the > command "pdftotext -raw file.pdf". It produced a file like this: > > WeatherCat Daily Report For Jan 1, 2015 > Hour TempHiTempLo HeatHi HeatLo ChillHi ChillLo DewPHi DewPLo HumHi HumLo > PresHi PresLo R/hHi R/hLo Rain AvWsHi AvWsLo GustHi GustLo WDir WRun SolHi > SolLo UVHi UVLo > 0 25.3 24.2 25.3 24.2 25.3 24.2 15.7 12.9 69 60 30.20 30.18 0.00 0.00 0.00 > 0 0 3 0 275 0.0 0 0 0.0 0.0 > 1 28.4 25.3 28.4 25.3 28.4 25.3 14.3 11.6 62 49 30.19 30.17 0.00 0.00 0.00 > 2 0 9 0 282 0.6 0 0 0.0 0.0 > 2 28.0 27.4 28.0 27.4 28.0 27.4 12.5 11.3 52 49 30.17 30.16 0.00 0.00 0.00 > 1 0 5 0 183 0.1 0 0 0.0 0.0 > 3 27.8 27.1 27.8 27.1 27.8 27.1 11.7 10.6 51 48 30.17 30.16 0.00 0.00 0.00 > 1 0 4 0 252 0.1 0 0 0.0 0.0 > 4 27.5 25.5 27.5 25.5 27.5 25.5 11.2 10.3 54 49 30.16 30.15 0.00 0.00 0.00 > 0 0 2 0 251 0.0 0 0 0.0 0.0 > 5 25.5 24.7 25.5 24.7 25.5 24.7 12.9 11.0 59 54 30.16 30.15 0.00 0.00 0.00 > 0 0 2 0 275 0.0 0 0 0.0 0.0 > 6 24.7 23.6 24.7 23.6 24.7 23.6 15.4 12.1 69 58 30.16 30.15 0.00 0.00 0.00 > 0 0 1 0 275 0.0 0 0 0.0 0.0 > 7 24.0 23.5 24.0 23.5 24.0 23.5 14.0 12.8 65 63 30.16 30.15 0.00 0.00 0.00 > 0 0 3 0 279 0.0 0 0 0.0 0.0 > 8 28.2 24.0 28.2 24.0 28.2 24.0 18.7 14.0 68 64 30.19 30.16 0.00 0.00 0.00 > 0 0 3 0 276 0.0 0 0 0.0 0.0 > 9 34.2 28.2 34.2 28.2 34.2 28.2 19.5 16.6 68 51 30.19 30.19 0.00 0.00 0.00 > 0 0 3 0 186 0.0 0 0 0.0 0.0 > 10 37.4 34.3 37.4 34.3 37.4 34.3 20.4 18.1 53 47 30.19 30.15 0.00 0.00 > 0.00 1 0 7 0 181 0.6 0 0 0.0 0.0 > 11 40.5 37.3 40.5 37.3 40.5 37.3 21.0 18.5 49 43 30.15 30.10 0.00 0.00 > 0.00 2 1 6 0 209 1.0 0 0 0.0 0.0 > 12 42.4 40.5 42.4 40.5 42.4 39.0 20.9 18.2 44 40 30.10 30.06 0.00 0.00 > 0.00 3 0 11 1 217 1.8 0 0 0.0 0.0 > 13 42.4 41.8 42.4 41.8 42.4 41.7 19.8 18.1 40 38 30.06 30.03 0.00 0.00 > 0.00 2 1 8 1 208 1.3 0 0 0.0 0.0 > 14 42.6 41.6 42.6 41.6 42.6 41.5 20.0 18.0 41 37 30.03 30.01 0.00 0.00 > 0.00 2 1 13 0 233 1.5 0 0 0.0 0.0 > 15 41.8 40.8 41.8 40.8 41.8 40.8 21.3 18.7 45 39 30.02 30.01 0.00 0.00 > 0.00 2 1 7 0 217 1.1 0 0 0.0 0.0 > 16 40.8 38.5 40.8 38.5 40.8 38.5 20.7 19.5 47 42 30.02 30.01 0.00 0.00 > 0.00 1 0 5 0 218 0.5 0 0 0.0 0.0 > 17 38.6 36.9 38.6 36.9 38.6 36.9 20.5 19.5 49 47 30.02 30.00 0.00 0.00 > 0.00 1 0 6 0 229 0.1 0 0 0.0 0.0 > 18 36.9 34.6 36.9 34.6 36.9 34.6 23.0 19.4 62 49 30.01 30.00 0.00 0.00 > 0.00 0 0 0 0 229 0.0 0 0 0.0 0.0 > 19 34.6 32.8 34.6 32.8 34.6 32.8 23.5 21.0 65 61 30.01 30.00 0.00 0.00 > 0.00 0 0 1 0 229 0.0 0 0 0.0 0.0 > 20 32.8 30.7 32.8 30.7 32.8 30.7 23.1 22.3 71 66 30.01 30.00 0.00 0.00 > 0.00 0 0 1 0 229 0.0 0 0 0.0 0.0 > 21 30.7 29.4 30.7 29.4 30.7 29.4 22.5 21.2 72 70 30.03 30.01 0.00 0.00 > 0.00 0 0 0 0 229 0.0 0 0 0.0 0.0 > 22 29.4 29.2 29.4 29.2 29.4 29.2 23.3 21.1 78 71 30.05 30.03 0.00 0.00 > 0.00 0 0 0 0 229 0.0 0 0 0.0 0.0 > 23 29.5 28.3 29.5 28.3 29.5 28.3 23.4 21.8 78 76 30.05 30.04 0.00 0.00 > 0.00 0 0 2 0 273 0.0 0 0 0.0 0.0 > Daily High 42.6 41.8 42.6 41.8 42.6 41.7 23.5 22.3 78 76 30.20 30.19 0.00 > 0.00 0.00 3 1 13 1 - 1.8 0 0 0.0 0.0 > Daily Low 24.0 23.5 24.0 23.5 24.0 23.5 11.2 10.3 40 37 30.01 30.00 0.00 > 0.00 0.00 0 0 0 0 - 0.0 0 0 0.0 0.0 > Daily Average 33.1 31.3 33.1 31.3 33.1 31.2 18.7 16.6 59 53 30.10 30.09 > 0.00 0.00 0.00 1 0 4 0 236 0.4 0 0 0.0 0.0 > Daily Total 0.00 8.9 > > -------------- > > So a little bash programming > for f in *pdf;do pdftotext -raw "$f"; done > and all the files will be converted in one command line. Note you need > the " around $f since the file name has spaces in it. > > Now for a little python programming to take the first line to get the date > and apply the hour for each row and convert to the time format you need > (like epoch). The lines of interest appear to have the first character as > a number (hour) and each field separated by white space. Just use some of > the neat csv libraries to convert back out ot csv format. > > I would just ignore the daily hi and low lines since once you have the > data in csv/sqlite you can find these values easily. > > I have just been playing with my own station data by downloading the WU > data for my station (back to 2008) and converting it to sqlite/postgress in > python as a way to learn python (I'm an old C programmer). > > If you don't use Debian or similar linux then try a live debian cd in a VM > (like VirtualBox) to do the conversions. You will probably need to > install pdftotext with "apt-get install poppler-utils". > > ...Bob > > On Friday, April 21, 2017 at 10:42:22 AM UTC-4, MRL wrote: >> >> Weather Cat daily data in a PDF file >> I thought I had a program to convert the PDF file to a text file. The >> conversion is a mess and unusable. >> Any help? I have 3+ years of daily data. >> I still have Weather Cat but have not been able to find an export >> capability. >> >> -- You received this message because you are subscribed to the Google Groups "weewx-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
