[tw] Re: Saving articles in tiddlywiki

Stefan Thu, 10 Sep 2015 02:02:46 -0700

Hi there, I can help with the wget commands and the tiddy-ing up of 
stuff (your step 3).

1. One option (that I previously worked with) is portia 
<http://blog.scrapinghub.com/2014/04/01/announcing-portia/>, scrapy 
<http://scrapy.org/>.

2. python +(wget + beautifulsoup 
<https://www.google.ro/search?q=get_text+beautifulsoup>) = not tried it 
yet, but seems better when you don't want to use any manually generated 
templates - portia.

from subprocess import call
urlx = 'links.txt'

call(['wget','-p' , '--timestamping', '-P' , 'tests/', '--convert-links', '-e 
robots=off','--connect-timeout=20','--adjust-extension',
'--keep-session-cookies','--convert-links','--user-agent=\"Mozilla/5.0 
(Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1\"', '-i', urlx
])

Or if you only want to use wget, use the arguments above with the last 
argument : 
-i links.txt

where you store your links, one link on one line.

For point 4. from your list we can extend our python script to generate new 
.tid files that contain the content that you want to import. I suggest you 
tinker with the node.js version of tiddlywiki because each tiddler is 
contained in a separate file, and it's easier to work with in this scenario.

Also, you can import the whole webpage to a tiddler by using <html> ... 
webpage here with all resources embedded ... </html>, this should be your 
step 4, if 3 doesn't work as expected.

If 3 doesn't work as expected, we can always use portia /scrapy for 
extracting information from each page, but it's more time consuming to set 
up + learn some regex.

Another thing to keep in mind, is that you might want to follow the links 
from the bookmarks, for example a blog post might refer two other 5 
relevant resources, why not get them too ? :-)

Regards,
Stefan

On Tuesday, 8 September 2015 04:00:43 UTC+3, RunningUtes wrote:
>
>
> I've been thinking lately about storing full text articles in a 
> tiddlywiki. Right now I'm using Pocket to save the links.
>
> My current thoughts on the process:
> 1. Export the links from pocket using https://getpocket.com/export
> 2. Use wget to grab all the articles and images (wget -i 
> ExportedPocketURLs.txt)
> 3. Clean up the article like pocket. This is the step that is giving me an 
> issue.
> 4. Drag and drop all the files into a tiddlywiki
>
> Another option I'm toying with is to:
> 1. Tag articles in pocket with "Save" tag
> 2. Use IFTTT to save articles with the "Save" tag to an Evernote notebook
> 3. Export Evernote articles to html
> 4. Drag and drop all the files into a tiddlywiki
>
> Has anyone had any luck with something that works?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tiddlywiki.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tiddlywiki/df1ccc2f-6349-422f-ab0c-30a6596f02de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tw] Re: Saving articles in tiddlywiki

Reply via email to