---------- Forwarded message ----------
Date: Tue, 27 Dec 2005 16:15:29 +0100
From: Tjebbe van Tijen/Imaginary Museum Projects <[EMAIL PROTECTED]>
To: nettime <[email protected]>
Subject: <nettime> Statistically Improbable Phrases and the 'real reader'


Since a few months Amazon Books have introduced a new device:
Statistically Improbable Phrases (shortened to SIP).

To give an example, for the book

Armstrong, David F. ()/2000, William C. Stokoe Jr ()/Wilcox, Sherman ()
"Gesture and the nature of language" 1995/Cambridge University Press

The following SIPs are given:

Statistically Improbable Phrases (SIPs): (learn more)
primary sign languages, visible gestural, spoken language phonology,

language modular, visible gestures, signed languages, sublexical
level, sign language word, gestural approach, semantic phonology,
spatial syntax, grammar module, gestural theory, vocal gestures, deaf

signers, associationist theories, perceptual categorization, image
schemata, grammatical processing, primary consciousness, global
mappings, iconic gestures, modular theories, adaptive complex, order

consciousness

By clicking on one of these 'phrases' a web page with other books with
the same phrase and the number of occurrences of that particular SIP
will be generated.

The idea of SIP is explained on the Amazon site:

Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most
distinctive phrases in the text of books in the Search Inside!
program. To identify SIPs, our computers scan the text of all books in
the Search Inside! program. If they find a phrase that occurs a large
number of times in a particular book relative to all Search Inside!
books, that phrase is a SIP in that book.
SIPs are not necessarily improbable within a particular book, but they
are improbable relative to all books in Search Inside!.

and a new Wikepdia entry reads:

Statistically Improbable Phrases is a system developed by Amazon.com
to compare all of the books they index and find phrases in each that
are the most unlikely to be found in any other book indexed.
The system is used to find the most unique portions of books for use
as a summary or keyword.

This new device prompted me to the following reaction to the Amazon
Book team:

(1)
Well statistics of what?

is my first question... I suggest  you supply  basic statistics about
the source of your SIPs:

- how many books/titles have you indexed
- are these full text indexes or just indexes of the 'inside the book'

pages you do supply on the web
- how many million words
- how many sentences

When this is not given it is like the manipulative percentages of a census or
opinion poll without the total number of people that form the basis of these
percentages.


(2) Though it might seem stupid to say, I would like you also to state 
explicitly
that these SIPs are generated <automatically> according a certain algorithm, 
also
explaining in more detail what that algorithm entails.

(3) As people have been trying to jump 'up' the list of Google's search machine
rating, an unanticipated effect might be that writers, editors and publishers
would check a new text before publication for occurrence of SIPs and make
alterations to get a higher score. This might generate only statistically a more
"outstanding" text.

(4) We still need to value the most ourselves, us humans, because we are the 
only
ones that can 'read' (though machines can process text alright, but there is no
form understanding in the sense that each human reader becomes a re-writer when
"processing" a text in her or his personal way). The reader's reviews on your
website do give that kind of understanding and are often very helpful in 
learning
about a book and its reception. Recently I started to archive some of the Amazon
Books customer reviews in my bibliographical database. The on-line reader 
reviews
are part of a very old tradition, like the Renaissance 'commonplace books' and 
the
Greek/Roman 'hypomnemata' filled with quotations and remarks that students would
make to keep for themselves make and show to each other.

The value of readers comments lies in the rephrasing and synthesizing of the
content of a book, something that can only be appreciated by 'reading'.

The mechanisms of 'rating', choosing the top ten, hundred or whatever, are an
undeniable a part of our market oriented culture, still - even in a pure
commercial setting like Amazon Books - there can be a prominent place for 
personal
exchange of opinions between 'real reader's, beyond any automated statistics. An
exchange that allows for both praise and critique outside the realm of
professional and commercial reviewing.

(5) Sip-ratings can well develop into an useful search instrument, but let it 
be a
well understood that it is just a product coming from ' machine processing', a
secondary tool at most.




Tjebbe van Tijen

Imaginary Museum Projects
dramatizing historical information
http://imaginarymuseum.org





#  distributed via <nettime>: no commercial use without permission
#  <nettime> is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: [EMAIL PROTECTED] and "info nettime-l" in the msg body
#  archive: http://www.nettime.org contact: [email protected]

Reply via email to