On Monday, April 2, 2018 at 2:24:41 AM UTC-7, Stelios Koroneos wrote:
>
> Greetings to all.
> A small intro before going into the request
>
> Used web2py 4 years as a web interface to a voip applications and really 
> liked its features compared to other frameworks.
> Haven't used it since and its nice to see that work has continued and its 
> been improved in a lot of ways. Downloaded the new version and been going 
> over the examples and posts in the forum.
>
> I have another project that has come up and though that it might be a good 
> fit
> I am looking to build an offline rss reader/aggregator where it downloads 
> not just the article headers but the entire article, removes the images and 
> packages and compresses all the text to reduce size.
>
> I have looked and there are some apps to collect the rss feeds in a db but 
> have not found any that does the article download, cleaning etc.
>

If the article is in HTML, then fetching the article will not automatically 
fetch the images or the javascript/css/etc.  In your browser, you get those 
because the  browser parses the HTML and does additional requests for the 
linked resources.  If you have curl on your system, try "curl GET 
www.google.com" for an example.

You will probably want to strip out those HTML lines, and that means 
parsing the file.  For that, I recommend BeautifulSoup4.
<URL:https://www.crummy.com/software/BeautifulSoup>

The collecting and parsing of articles should probably be done using the 
scheduler. Look in the book at
<URL:http://web2py.com/books/default/chapter/29/04/the-core#web2py-Scheduler>
(new versions web2py also have a queuing command with cron-like arguments).
Also, 
<URL:https://groups.google.com/d/msg/web2py-developers/cI7R-9hex7k/PfTsGodYEwAJ>
has been merged in Git, but the online book seems to be lagging a bit.


If anyone has done something similar and wished to share code i would be 
> most grateful.
> Also i am looking for advice from people who might have implemented a 
> similar system for "Best Practises" (tm), gotsas etc.
>
> I am a hardware/embedded engineer by trade so some of the web technologies 
> and solutions are not know (well) know to me.
>
> Thank you for your time.
>

Good luck!

/dps
 

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to