[tw5] Re: Intertwingling the TiddlyWiki - TF-IDF and tag inference

Rob Hoelz Mon, 21 Jan 2019 20:28:10 -0800

Again, thanks for sharing, Joe!  I looked through the PDF and had a few 
thoughts:


  * Did you do any additional processing of the tiddler bodies, eg. 
stemming, chunking into bigrams/trigrams, or stripping out various wikitext 
elements like URLs?  If you did, I'd be curious to hear how that affected 
your results!
  * During the talk, you mention the idea of an "assistant" that sits off 
to the side and helps you work on tiddlers as you type.  I often think that 
it would be helpful if TiddlyWiki offered me suggestions for tiddlers that 
might be related to what I'm currently writing, and I think perhaps your 
TF-IDF "significant term" detection approach might make for a step in the 
right direction.  Perhaps the top N TF-IDF terms for each tiddler could be 
encoded as a vector, and tiddlers whose vectors have the highest cosine 
similarity could be offered as matches in this regard - what do you think?

-Rob

On Monday, January 21, 2019 at 12:03:09 PM UTC-6, Rob Hoelz wrote:
>
> Thanks, Joe!  I'll read over that PDF you sent over; as far as the code 
> goes, I think the PDF documentation describing the methodology should 
> suffice.
>
> -Rob
>
> On Monday, January 21, 2019 at 11:33:31 AM UTC-6, Joe Armstrong wrote:
>>
>> The code I wrote was a bit messy and just as an experiment. 
>> Good enough for proof of concept but not for production - it was just 
>> written to test a few ideas.
>>
>> I don't mind sending you a private copy - but explaining how it works 
>> would be low priority.
>>
>> A better idea would be for me to put it up on github together with my 
>> library of Erlang code that
>> parses and mucks with tiddlers - I'm trying to programmatically create 
>> TWs from other data sources.
>>
>> If you saw the talk you'd see that we're interested in "Communicating 
>> TW's" I can imagine TW's sending messages
>> to each other - but this is a long way off ...
>>
>> I did make a little writeup that explains the method (enclosed) - the 
>> code was just a prototype and written in Erlang - the problem at the moment 
>> is that this is not integrated in any way with a live TW - Our idea was to 
>> integrate this through a socket interface.
>>
>> At the moment I'm learning the TW so hopefully when I understand more 
>> I'll figure out how to
>> connect the TW to Erlang through a socket and fun and games will follow 
>> :-)
>>
>> The TF*IDF algorithm is very simple (see the writeup) most of the work is 
>> in tokenising the input
>> into words - from  then on it's easy (in pure JS) - integrating this with 
>> the TW would then be
>> as they say "an exercise to the reader" (that's what I say when I don't 
>> know how to do this :-)
>>
>> Cheers
>>
>> /Joe
>>
>>
>> On Monday, 21 January 2019 18:04:10 UTC+1, Rob Hoelz wrote:
>>>
>>> Hi everyone (especially Jeremy and Joe) -
>>>
>>> I finally got around to watching this talk, and I was enraptured the 
>>> whole time, especially by the part about inferring tags and using TF-IDF to 
>>> come up with more accurate suggestions.  Is the source code for your work 
>>> freely available?  I tried my hand at tag inference using forests of 
>>> decision trees a few months back, and I'd like to study alternative 
>>> approaches!
>>>
>>> Thanks,
>>> Rob
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tiddlywiki+unsubscr...@googlegroups.com.
To post to this group, send email to tiddlywiki@googlegroups.com.
Visit this group at https://groups.google.com/group/tiddlywiki.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tiddlywiki/7a688a7e-f3e2-4075-8053-f84e62890f51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tw5] Re: Intertwingling the TiddlyWiki - TF-IDF and tag inference

Reply via email to