Re: [CODE4LIB] analysis of referrer data

2006-03-30 Thread Roy Tennant

On Mar 30, 2006, at 5:12 AM, Eric Lease Morgan wrote:


I see that a lot of the hits to my site come from MySpace.com where
teenaged and college aged girls have incorporated some of my pictures
into their pages. Another common use is on bulletin board systems
where someone used one of my pictures as their avatar. In these
second and third cases should I expect some sort of remuneration or
at least a link back to infomotions.com?


Eric,
I have a comment on this (although I can't suggest any particular
strategies on processing referer (sic) hits). I have a web site where
I put up a lot of my photos http://freelargephotos.com/, and I draw
a line between personal use for which I expect no payment for use
of the photos (even the largest version I have which they can get by
request), and commercial use for which the photo must be used a) to
make money or b) in a product or to promote a product that is sold.
Therefore, in my universe the uses you mention above I would classify
as personal use, and within the bounds of appropriate usage. YMMV.
Roy


Re: [CODE4LIB] analysis of referrer data

2006-03-30 Thread Edward Summers

On Mar 30, 2006, at 7:12 AM, Eric Lease Morgan wrote:

How would you go about doing this sort of analysis? All I have to
start with is my Apache combined access_log files?


I'm not sure about the 'morality' issue, but It might be interesting
to see whether the links are distributed according to a power law
[1]. This could go both ways: looking at the hosts that are linking
to you, and the target urls on your site. My guess is that they will
be given the research that has gone into showing this happens at the
host name level [2] on the web at large.

Using Google's API you could lookup pages that are linking to your
stuff, and compare contrast to what you are seeing in your logs. It
might be interesting to extract the google search query from the log
and plug it back into google and see what page number your url comes
up in. This could serve as a metric of how often people go past the
first page of  search results in google. Perhaps there some other
interesting stuff you can do with the google api.

Referrer logs are a really interesting artifact of the operating web.
In exploring 'backlinks', you are in the good company of Bush [3],
Garfield [4], Nelson [5] and that late-comer Page [6]. Referrer logs
won't tell you everyone who is linking to you, but only some of the
links that have been travelled. In some ways this usage data is even
more valuable than the complete index of backlinks that google has,
since it records intention. Perhaps when google acquires enough dark
fiber they'll be able to capture this as well--but for the moment
they don't know what people are clicking on once they leave google.com.

//Ed

[1] http://www.kottke.org/03/02/weblogs-and-power-laws
[2] http://www.nd.edu/~networks/Linked/index.html
[3] http://www.theatlantic.com/doc/194507/bush
[4] http://www.garfield.library.upenn.edu/
[5] http://www.readwriteweb.com/archives/ted_nelsons_two.php
[6] http://en.wikipedia.org/wiki/PageRank


Re: [CODE4LIB] analysis of referrer data

2006-03-30 Thread Colleen Whitney

In my webmastering days we used AWStats to analyze our log files.

http://awstats.sourceforge.net/

It has been a while, but I remember it being very configurable and easy
to use.  It might we worth looking it over to see whether it would yield
what you want for your analysis...might save you some headaches.



Eric Lease Morgan wrote:


How would you go about doing some analysis of your website's referrer
data?

I have committed to writing an article for the anniversary issue of
First Monday (as if I don't already have enough to do). Here is the
accepted/proposed title and abstract:

  Ethical issues surrounding freely available information
  found on the Web

  By reverse engineering Google queries and by tracing back
  the referrer values found in Apache log files, the use of
  content made available from infomotions.com is examined and
  ethical questions are asked. While all the content from the
  site is freely available under the GNU Public License, the
  content is not always used in the intended manner. This
  raises interesting questions regarding the time spent making
  the content available, the expense of the hardware and
  network connections, and whether or not the application of
  the content is put to good and moral purposes. This essay
  addresses these and other ethical questions in an attempt to
  come to an understanding regarding the place of information
  and knowledge in an open environment.

I find it interesting to watch the content of my access_log scroll by
on my console. I am most interested in the referrer information. Most
of my hits originate as searches against Google. It is fun feed these
queries back into Google and see what people searched for, watch what
the searches return, and see what page number my item is located. I
see that a lot of the hits to my site come from MySpace.com where
teenaged and college aged girls have incorporated some of my pictures
into their pages. Another common use is on bulletin board systems
where someone used one of my pictures as their avatar. In these
second and third cases should I expect some sort of remuneration or
at least a link back to infomotions.com?

Some hits come from really weird places. For example, the search for
lease brings back many hits about equipment rental, but sometimes
my name and/or the Alex Catalogue of Electronic Texts is linked from
the equipment rental site. Sort of strange if you ask me. They are
using my name, sort of. (Is it 'my' name?)

In any event, I plan to take two months of access_log data, extract
the pages being looked at and the referrer information to more
systematically examine how the content on Infomotions is being
incorporated into other sites. How would you suggest I do this?
Presently I plan to extract the necessary information from my logs
and dump it into a flat database file where I will exploit various
incarnations of SQL SELECT statements. Count this. Group that. Sort
this way. Etc. Mind you, I am most interested in the one-off sort of
hits, not just the overall usage.

How would you go about doing this sort of analysis? All I have to
start with is my Apache combined access_log files?

--
Eric Lease Morgan
University Libraries of Notre Dame


[CODE4LIB] external linking to your images (was Re: [CODE4LIB] analysis of referrer data)

2006-03-30 Thread Jonathan Rochkind

At 5:36 AM -0800 3/30/06, Roy Tennant wrote:

On Mar 30, 2006, at 5:12 AM, Eric Lease Morgan wrote:


I see that a lot of the hits to my site come from MySpace.com where
teenaged and college aged girls have incorporated some of my pictures
into their pages. [...]


[...]I draw a line between personal use for which I expect no
payment for use
of the photos (even the largest version I have which they can get by
request), and commercial use for which the photo must be used a) to
make money or b) in a product or to promote a product that is sold.


There's kind of two issues; the intellectual property issue that Roy
talks about, but also the use of computing resources issue.  I'd be
worried more about the latter myself.  Those hypothetical teenage
girls could have copied the image files and put them on their own web
server. Instead, they are linking directly to the image files on
Eric's server.  Either way it would be a use of Eric's intellectual
property (if it is his!); either way it would be allowed 'personal
use' under Roy's policy for use of his IP.  But if so many people are
linking to your files on your server (for their own purposes that
have nothing to do with yours), that it causes bandwidth or CPU
problems for you, that's what I'd be concerned about. They're kind of
using your hardware as their own personal web server.

Many websites will refuse to serve images to a request with an
external referrer for just this reason.

--Jonathan