Re: [CODE4LIB] analysis of referrer data
On Mar 30, 2006, at 5:12 AM, Eric Lease Morgan wrote: I see that a lot of the hits to my site come from MySpace.com where teenaged and college aged girls have incorporated some of my pictures into their pages. Another common use is on bulletin board systems where someone used one of my pictures as their avatar. In these second and third cases should I expect some sort of remuneration or at least a link back to infomotions.com? Eric, I have a comment on this (although I can't suggest any particular strategies on processing referer (sic) hits). I have a web site where I put up a lot of my photos http://freelargephotos.com/, and I draw a line between personal use for which I expect no payment for use of the photos (even the largest version I have which they can get by request), and commercial use for which the photo must be used a) to make money or b) in a product or to promote a product that is sold. Therefore, in my universe the uses you mention above I would classify as personal use, and within the bounds of appropriate usage. YMMV. Roy
Re: [CODE4LIB] analysis of referrer data
On Mar 30, 2006, at 7:12 AM, Eric Lease Morgan wrote: How would you go about doing this sort of analysis? All I have to start with is my Apache combined access_log files? I'm not sure about the 'morality' issue, but It might be interesting to see whether the links are distributed according to a power law [1]. This could go both ways: looking at the hosts that are linking to you, and the target urls on your site. My guess is that they will be given the research that has gone into showing this happens at the host name level [2] on the web at large. Using Google's API you could lookup pages that are linking to your stuff, and compare contrast to what you are seeing in your logs. It might be interesting to extract the google search query from the log and plug it back into google and see what page number your url comes up in. This could serve as a metric of how often people go past the first page of search results in google. Perhaps there some other interesting stuff you can do with the google api. Referrer logs are a really interesting artifact of the operating web. In exploring 'backlinks', you are in the good company of Bush [3], Garfield [4], Nelson [5] and that late-comer Page [6]. Referrer logs won't tell you everyone who is linking to you, but only some of the links that have been travelled. In some ways this usage data is even more valuable than the complete index of backlinks that google has, since it records intention. Perhaps when google acquires enough dark fiber they'll be able to capture this as well--but for the moment they don't know what people are clicking on once they leave google.com. //Ed [1] http://www.kottke.org/03/02/weblogs-and-power-laws [2] http://www.nd.edu/~networks/Linked/index.html [3] http://www.theatlantic.com/doc/194507/bush [4] http://www.garfield.library.upenn.edu/ [5] http://www.readwriteweb.com/archives/ted_nelsons_two.php [6] http://en.wikipedia.org/wiki/PageRank
Re: [CODE4LIB] analysis of referrer data
In my webmastering days we used AWStats to analyze our log files. http://awstats.sourceforge.net/ It has been a while, but I remember it being very configurable and easy to use. It might we worth looking it over to see whether it would yield what you want for your analysis...might save you some headaches. Eric Lease Morgan wrote: How would you go about doing some analysis of your website's referrer data? I have committed to writing an article for the anniversary issue of First Monday (as if I don't already have enough to do). Here is the accepted/proposed title and abstract: Ethical issues surrounding freely available information found on the Web By reverse engineering Google queries and by tracing back the referrer values found in Apache log files, the use of content made available from infomotions.com is examined and ethical questions are asked. While all the content from the site is freely available under the GNU Public License, the content is not always used in the intended manner. This raises interesting questions regarding the time spent making the content available, the expense of the hardware and network connections, and whether or not the application of the content is put to good and moral purposes. This essay addresses these and other ethical questions in an attempt to come to an understanding regarding the place of information and knowledge in an open environment. I find it interesting to watch the content of my access_log scroll by on my console. I am most interested in the referrer information. Most of my hits originate as searches against Google. It is fun feed these queries back into Google and see what people searched for, watch what the searches return, and see what page number my item is located. I see that a lot of the hits to my site come from MySpace.com where teenaged and college aged girls have incorporated some of my pictures into their pages. Another common use is on bulletin board systems where someone used one of my pictures as their avatar. In these second and third cases should I expect some sort of remuneration or at least a link back to infomotions.com? Some hits come from really weird places. For example, the search for lease brings back many hits about equipment rental, but sometimes my name and/or the Alex Catalogue of Electronic Texts is linked from the equipment rental site. Sort of strange if you ask me. They are using my name, sort of. (Is it 'my' name?) In any event, I plan to take two months of access_log data, extract the pages being looked at and the referrer information to more systematically examine how the content on Infomotions is being incorporated into other sites. How would you suggest I do this? Presently I plan to extract the necessary information from my logs and dump it into a flat database file where I will exploit various incarnations of SQL SELECT statements. Count this. Group that. Sort this way. Etc. Mind you, I am most interested in the one-off sort of hits, not just the overall usage. How would you go about doing this sort of analysis? All I have to start with is my Apache combined access_log files? -- Eric Lease Morgan University Libraries of Notre Dame
[CODE4LIB] external linking to your images (was Re: [CODE4LIB] analysis of referrer data)
At 5:36 AM -0800 3/30/06, Roy Tennant wrote: On Mar 30, 2006, at 5:12 AM, Eric Lease Morgan wrote: I see that a lot of the hits to my site come from MySpace.com where teenaged and college aged girls have incorporated some of my pictures into their pages. [...] [...]I draw a line between personal use for which I expect no payment for use of the photos (even the largest version I have which they can get by request), and commercial use for which the photo must be used a) to make money or b) in a product or to promote a product that is sold. There's kind of two issues; the intellectual property issue that Roy talks about, but also the use of computing resources issue. I'd be worried more about the latter myself. Those hypothetical teenage girls could have copied the image files and put them on their own web server. Instead, they are linking directly to the image files on Eric's server. Either way it would be a use of Eric's intellectual property (if it is his!); either way it would be allowed 'personal use' under Roy's policy for use of his IP. But if so many people are linking to your files on your server (for their own purposes that have nothing to do with yours), that it causes bandwidth or CPU problems for you, that's what I'd be concerned about. They're kind of using your hardware as their own personal web server. Many websites will refuse to serve images to a request with an external referrer for just this reason. --Jonathan