On Fri, Feb 23, 2018 at 11:26 AM, Amirouche Boubekki <[email protected]> wrote: > >> > The goal of the atomspace is to eliminate human-curated datasets. >> >> Music to my ears. "Curated" means "detached from the actual source and >> context of knowledge." > > Not always. Curated means fixed, patched and edited by a human being > supervisor that knows best, until the correction is delivered in code. That > is chance to avoid structural bias like racist bots.
Ah! Now this last is a very interesting philosophical observation. This is not quite the correct mailing list within which to discuss this, but it overlaps onto a large number of political and mathematical issues that are very interesting to me. So here I go. Political - if this was a human, not bot, what amount of racism should be tolerated? Speech, thought, action are interconnected. For example: the American constitution enshrines freedom of speech, and the freedom to practice religion. But clearly, we have lost our freedom of speech: say the wrong thing about Islam, you get bombed. Should we restrain freedom of religion? Religion is a form of thought. What about freedom of thought? You can think murderous thoughts, but if you commit murder, you are socially unwanted (usually). The ability to commit murder is correlated with the absence of certain neural circuitry in the brain having to do with empathy. Some humans lack these neurons, and thus are prone to be psychopaths. Those who do have those neurons, and commit (or even witness) murder end up with PTSD. The mathematical issues first arise if you think of bots as approximating humans. Its trivial to create a bot that prints random dictionary words. Its a bit harder, but not too hard, to create a bot that spews random dictionary words assembled in grammatical sentences (just run the random word sequences through a grammar-checker, e.g. link-grammar, and reject the ungrammatical ones; don't print them. Since most random word-sequences are not grammatical, this is not CPU-efficient, so better algorithms avoid obviously-ungrammatical word-sequences by working at higher abstraction layers). What Microsoft did was just one single step beyond this: spew random grammatically correct sentences, using a probability weighting based on recently heard utterances. The system was too simple, the gamers gamed the system: trained up the probability weights to spew racist remarks. OK, suppose we can go one step beyond what Microsoft did: spew random sentences, that are created by means of "logical deduction" or "reasoning" applied to "knowledge" obtained from some database (e.g. wikipedia, or from a triple store). This could certainly wow some people, as it would demonstrate a robot capable of logical inference. So: this last is where your comment about "structural bias like racist bots" starts getting interesting. To recap: Step 0: random word sequences Step 1: random but grammatically correct word sequences Step 2: random grammatical sentences weighted by recent input <-- the Microsoft bot Step 3: grammatical sentences from random "logical inferences" <-- what opencog is currently attempting ... Step n: crazy shit people say and do ... Step p: crazy shit societies,cultures and civilizations do What are the values of n and p? Some might argue that perhaps they are 4 and 5; others might argue that they are higher. My point is: a curated database might make step 3 simpler. Its hopeless for step 4. For a commercial product, curated data is super-important: Alexa and Siri and Cortana are operating at the step 2/3 level with carefully curated databases of capitalist value: locations of restaurants, household products, luxury goods. The Russian twitter-bots, as well as Cambridge Analytica and the Facebook black-ops division are working at the step 2/3 level with carefully curated databases of psychological profiles and political propaganda. Scientists in general (and Ben in particular) would love to operate at the step 2/3 level with carefully curated databases of scientific knowledge, e.g. anti-aging, life-extension info. I'm getting old too. Medical breakthroughs are not happening fast enough, for me. So, yes, curated data is vitally important for commercial, political and scientific reasons. Just that it does not really put us into step 4 and 5, which are the steps along which AGI lies. The dream of AGI is to take those steps, without the curated bullshit (racism, religion, capitalism) that humankind generates, and yet also avoid the creation of a crisis that would threaten humanity/civilization. Linas. -- cassette tapes - analog TV - film cameras - you -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA36%2B3wCN%2BF0kRrJkK59-aCNS1UbZ33JGWkj5XJJSMmGP3g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
