Re: Apache logs and data

2007-11-20 Thread Karl Wettin
20 nov 2007 kl. 20.28 skrev Doug Cutting: karl wettin wrote: On Nov 15, 2007 10:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: it is always good to have query logs http://thepiratebay.org/tor/3783572 It doesn't look as though there's click data, so we can't use this for relevance exp

Re: Apache logs and data

2007-11-20 Thread Doug Cutting
karl wettin wrote: On Nov 15, 2007 10:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: it is always good to have query logs I realize that it is not that politically correct, but the TPB collection is released to the public domain and contains 3.2 million user queries with session id, timesta

Re: Apache logs and data

2007-11-20 Thread Chris Hostetter
: I think the safest path is simply to not publish any queries, but rather to, : e.g., permit committers to run experiments using them and publish the results : of the experiments. But no queries would be made available to the general : public on a website. that would eliminate the goal of havin

Re: Apache logs and data

2007-11-20 Thread Grant Ingersoll
This may be worth asking legal-discuss about. I am not sure if there is an issue or not. -Grant On Nov 20, 2007, at 4:54 AM, karl wettin wrote: On Nov 15, 2007 10:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: it is always good to have query logs I realize that it is not that politic

Re: Apache logs and data

2007-11-20 Thread karl wettin
On Nov 15, 2007 10:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > it is always good to have query logs I realize that it is not that politically correct, but the TPB collection is released to the public domain and contains 3.2 million user queries with session id, timestamp, category etc to g

Re: Apache logs and data

2007-11-19 Thread Doug Cutting
Chris Hostetter wrote: right ... i'm not suggesting we do this in an automatic un-human-involved way; i'm suggesting that a "trusted" person generate this report, ignore anything with a count less then some number (both to remove noise, and eliminate most of the random "identifiable" queries),

Re: Apache logs and data

2007-11-19 Thread Grant Ingersoll
On Nov 19, 2007, at 3:41 PM, Chris Hostetter wrote: : info, etc. could be stripped fairly easily. So, we wouldn't necessarily know : who is searching for "Yonik Seeley" when we see that query term, just that it : was searched for. Maybe we can inquire to infrastructure what is even

Re: Apache logs and data

2007-11-19 Thread Chris Hostetter
: info, etc. could be stripped fairly easily. So, we wouldn't necessarily know : who is searching for "Yonik Seeley" when we see that query term, just that it : was searched for. Maybe we can inquire to infrastructure what is even It's a largely theoretical arguement (particularly relating to

Re: Apache logs and data

2007-11-19 Thread Grant Ingersoll
I'm not sure where the personal info is leaked, we aren't proposing to make who made the query available, just what the query is and I suspect the IP info, etc. could be stripped fairly easily. So, we wouldn't necessarily know who is searching for "Yonik Seeley" when we see that query ter

Re: Apache logs and data

2007-11-19 Thread Chris Hostetter
: > report of (querystring,accesscount)->url mappings based on requests that : > had a major search engine as the refer URL, that should be fine right? : : Query strings can leak personal info too (think of someone googling : themselves or their SSN) right ... i'm not suggesting we do this in an

Re: Apache logs and data

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 1:29 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > : Note that logs are generally considered private data. So we could not make > : these available to the general public, but only to folks who've somehow > sworn > : to keep them private. > > but in theory, it would be okay to m

Re: Apache logs and data

2007-11-19 Thread Chris Hostetter
: Note that logs are generally considered private data. So we could not make : these available to the general public, but only to folks who've somehow sworn : to keep them private. but in theory, it would be okay to make aggregated info from the logs available right? ie: we don't want to make

Re: Apache logs and data

2007-11-16 Thread Doug Cutting
Note that logs are generally considered private data. So we could not make these available to the general public, but only to folks who've somehow sworn to keep them private. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] F

Re: Apache logs and data

2007-11-15 Thread Grant Ingersoll
Not so sure about relevance, but it is always good to have query logs and we have the data, so we could start building up relevance judgments over time based on the data. Might be good for demos and other stuff too. -Grant On Nov 15, 2007, at 3:20 PM, Mike Klaas wrote: On 15-Nov-07, at

Re: Apache logs and data

2007-11-15 Thread Mike Klaas
On 15-Nov-07, at 5:33 AM, Grant Ingersoll wrote: Would people be interested in asking infrastructure to see if we can get our hands on things like JIRA search logs and any other search/query logs available? I'm thinking if we had this, plus the underlying data, we could start to use this i

Apache logs and data

2007-11-15 Thread Grant Ingersoll
Would people be interested in asking infrastructure to see if we can get our hands on things like JIRA search logs and any other search/ query logs available? I'm thinking if we had this, plus the underlying data, we could start to use this in a number of places like benchmark, for testing