Hi Mario,
I like your idea, and I happen to know a few things about approximate
string matching techniques, so I'd be interested to look into it at some
point. I could probably do the offline analytic part, but I know nothing
about javascript, even less about node.js and the internal of TW.
Additionally I think this is a quite complex problem, because it would
have to be efficient in time and space, which is not a given with this
kind of algorithms.
Erwan
On 15/12/14 12:03, PMario wrote:
On Monday, December 15, 2014 2:40:27 AM UTC+1, Erwan wrote:
Thank you for the comments. I wasn't very satisfied either with
the loss of functionality/design, then I thought of something
slightly different: the system is now presented as a "community
search engine", which is not meant to give access to the content
directly but only to point to the original wikis. It can be found
here:
https://rawgit.com/erwanm/tw-aggregator/master/tw-aggregator.html
<https://rawgit.com/erwanm/tw-aggregator/master/tw-aggregator.html>
Hi Erwan ,
I'm much more in favour of this approach. ....
Some time ago I was investigating a little bit about TW searchability,
for TiddlySpace based interconnected TWs.
eg:
- phonetic search ... where you would get a result if the tiddler
contains "phonetic" but the user searches for "fonetik"
- search for the word stem ... eg: a tiddler contains "cat" but
the user searches for "cats" ...
- lookup related words ... eg: you search for "child" and get
hits for "kid, youngster, minor, shaver, nipper, small fry, tiddler,
tike, tyke, fry, nestling" in the text. (sorting by relevance would be
nice here. limiting the related words too :)
- suggesting useful search terms with hit guaranty, when only 2
characters are typed yet ...
and so on.
The library I was thinking of is: natural [1]. It provides all the
needed components.
Some components need a server side or preprocessing, some components
may be part of the published TW.
To be useful, some components need preprocessing with large "lookup
databases". So it isn't practical to include them in the published TW.
... Since you need a preprocessing step anyway, I think it would fit
very well for an aggregated TW search index.
-----
This would remove the necessity to scrap and store the whole tiddlers,
but instead store and publish the aggregated meta data with the
according source links.
... (still an issue) Storing full text 3rd party tiddlers into the TW
system area doesn't remove the licensing problem, it just makes it
less visible.
... on the other hand:
Scraping, aggregating and publishing the described meta data creates
new and useful content, which is similar to well known search engines.
There are still some rules to follow, but they are much less critical.
have fun!
mario
[1] https://github.com/NaturalNode/natural
--
You received this message because you are subscribed to the Google
Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at http://groups.google.com/group/tiddlywiki.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tiddlywiki.
For more options, visit https://groups.google.com/d/optout.