Alex McLintock <alex.mclintock <at> gmail.com> writes:
> > I'm quite interested in OpenCalais - a Reuters/Thompson initiative. It > is a web service to take your free text and identify important terms > in it like people, businesses, places, and so on. If you are the > document owner you can submit your document to their web site and get > back important tags saying what this document is about. I'd like to > tag this sort of data and feed it into a Lucene style index so that it > can be used in searches AND in focussed/topical crawls. > > Now, here comes the problem. When we crawl the web we don't own the > documents we are crawling so we don't really have permission to use > Reuters' servers to do this analysis. (Maybe we could cut a deal > though if we were a big enough company). > > So has anyone else looked at alternatives to OpenCalais which takes > free text and tries to understand what it is about? I've been looking > for software to do this but nothing seems suitable. > > Alex > > Alex: Ah, we enter the sticky area of ownership, IP, fair use rights and all of that. The TOS themselves are the rule - but a few comments. The OpenCalais TOS don't in themselves insist that you "own" the content you submit. You have to make a decision for content that you don't "own" on whether your usage of that content with Calais is covered by fair use. We can't make that decision for you - but a review of the OpenCalais gallery will show you many organizations that have made the decision that web-derived content can be utilized by OpenCalais. Our hard and fast limitations on use are in the TOS and are pretty straightforward. The include no hate speech, no porn, no deep packet inspection uses and a few others. Basically do no evil. The issue itself will persist regardless of the tool you chose. Open source, commercial - whatever. In the end you'll still need to own the decision on whether you're allowed to "use" other content in your service. Regards, Tom

