Ciao Joe A few comments ...
Joe Armstrong wrote: > > What I did was to use Baysian inference > Top notch. (Side note: Bayesian stats are great, based on an ontology that is quite different than other approaches. IMO Bayesian approaches would save a lot of lives if seriously applied to medical trials. Would get rid of existing placebo trials that kill people.) > ...This way I could correctly predict about 80% of the tags from the text > alone. > I wanted to comment that is a surprisingly good result. Why? Because most TW are authored in the secrecy of one's own attic. TW is not written in a networked system with any referent lingo on tagging ... when you author its "me, myself and I" ... we have no "auto-wizards" saying "You perhaps don't mean Tagg, but Tag?" So 80% on the ball is pretty amazing IMO. **I think that is worth noting**. > The problem was that, to me, many of the tags were meaningless and were > used internally to organise the TW. > Right. Partly its a mediation of "private language" ... I DO create tags like "miniFrugal" that I know what *I* mean to myself but anyone else would struggle with ... that would need "translation". BUT, I never thought you were interested enough it would go shareable public ... :-) Partly (and often wholly) tags are content organisers, not semantic labels. > > In a second experiment I totally ignored the assigned tags, and predicted > the tags from > a TF*IDF analysis of the text. This made tags that made much more sence to > me, but the > predicted tags often missed the supplied tags. > That is interesting. I suspect part of that result may devolve to the fact that wiki "made in your own attic" will differ on *tags* than a wiki made in "served networks" where commune lingo may get more attention -- just an hypothesis. > In my opinion the TF*IDF were better than the assigned tags since they had > nothing > to do with the organisation, but more to do with the actual words in the > text. > Personally I like idea one derives "semantic heft" directly from units (tiddlers), rather than from labels of them. For two reasons (1) the less I have to do to add manual tags the better; (2) I know there are patterns I don't see that smart code likely can. But, at the same time, any TW tag is a "label applied" to a tiddler -- a >> distance between the tiddler and its manifest content. >> >> FYI I'm a big fan of Twiitter where #hashtags are always inline. No >> separation of content from organization. Its a neat approach on content >> cognisance. Twitter is maybe extreme in its #hashtaggery but its effective >> in terms of finding stuff well enough. But, of course, Twitter usage of >> #hashtags is purely about flagging content, whilst in TW tags do several >> jobs. >> >> > YES :-) -- Given my earlier observations, perhapse we could distinguish > two types of > tags. The #inlineHashTags could have something to do with the content of > the containing paragraph. The tiddler tags could mean "tags used to > internally organise the TW itself" > Just FYI, at the moment TW does not support out-of-the-box inline taggery, only the label type. Best wishes Josiah -- You received this message because you are subscribed to the Google Groups "TiddlyWiki" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tiddlywiki. To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/d31c2f8c-a539-4f8d-b24f-746cbe97dc99%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

