Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-17 Thread Dan Nessett
On Fri, 10 Sep 2010 23:11:27 +, Dan Nessett wrote:

 We are currently attempting to refactor some specific modifications to
 the standard MW code we use (1.13.2) into an extension so we can upgrade
 to a more recent maintained version. One modification we have keeps a
 flag in the revisions table specifying that article text was imported
 from WP. This flag generates an attribution statement at the bottom of
 the article that acknowledges the import.
 
 I don't want to start a discussion about the various legal issues
 surrounding text licensing. However, assuming we must acknowledge use of
 licensed text, a legitimate technical issue is how to associate state
 with an article in a way that records the import of licensed text. I
 bring this up here because I assume we are not the only site that faces
 this issue.
 
 Some of our users want to encode the attribution information in a
 template. The problem with this approach is anyone can come along and
 remove it. That would mean the organization legally responsible for the
 site would entrust the integrity of site content to any arbitrary
 author. We may go this route, but for the sake of this discussion I
 assume such a strategy is not viable. So, the remainder of this post
 assumes we need to keep such licensing state in the db.
 
 After asking around, one suggestion was to keep the licensing state in
 the page_props table. This seems very reasonable and I would be
 interested in comments by this community on the idea. Of course, there
 has to be a way to get this state set, but it seems likely that could be
 achieved using an extension triggered when an article is edited.
 
 Since this post is already getting long, let me close by asking whether
 support for associating licensing information with articles might be
 useful to a large number of sites. If so, the perhaps it belongs in the
 core.

One thing I haven't seen so far (probably because it doesn't belong on 
Wikitech) is a discussion of the policy requirements. In open source 
software development, you have to carry forward licenses even if you 
substantially change the code content. The only way around this is a 
clean room implementation (e.g., how BSD Unix got around ATT's 
original licensing for Unix).

Is this also true for textual content? If so, then once you import such 
content into an article you are obliged to carry forward any licensing 
conditions on that import on for all subsequent revisions.

Where is the proper place to discuss these kinds of questions?

-- 
-- Dan Nessett


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-15 Thread Platonides
Dan Nessett wrote:
 The discussion about whether to support license data in the database has 
 settled down. There seems to be some support. So, I think the next step 
 is to determine the best technical approach. Below I provide a strawman 
 proposal. Note that this is only to foster discussion on technical 
 requirements and approaches. I have nothing invested in the strawman.
 
 Implementation location: In an extension
 
 Permissions: include two new permissions - 1) addlicensedata, and 2) 
 modifylicensedata. These are pretty self-explanatory. Sites that wish to 
 give all users the ability to provide and modify licensing data would 
 assign these permissions to everyone. Sites that wish to allow all users 
 to add licensing data, but restrict those who are allowed to modify it, 
 would give the first permission to everyone and the second to a limited 
 group.
 
 Database schema: Add a licensing table to the db with the following 
 columns - 1) revision_or_image, 2) revision_id, 3) image_id, 4) 
 content_source, 5) license_id, 6) user_id.
 
 The first three columns identify the revision or image to which the 
 licensing data is associated.

That's ugly. I would prefer having one licensing table for revisions and
another for images (btw, there's no such thing as image_id they are
identified by name, or the id of their description page, plus timestamp
if you also want to address old versions).

 The content_source column is a 
 string that is a URL or other reference that specifies the source of the 
 content under license. The license_id identifies the specific license for 
 the content. The user_id identifies the user that added the licensing 
 information. The user_id may be useful if a site wishes to allow someone 
 who added the licensing information to delete or modify it. However, 
 there are complications with this. Since IP addresses are easily spoofed, 
 it would mean this entry should only be valid for logged in users.

The user id could be stored at the logging table. You may want to add a
licensing_id to identify rows on this table.


 Add a license table with the following columns - 1) license_id, 2) 
 license_text, 3) license name and 4) license_version. The license_id in 
 the licensing table references rows in this table.

You could begin by hardcoding the available licenses in the extension,
and then add support for the license table.
There is a number of issues there: When can you remove a license? (maybe
never once it is used), which licenses are shown as available?
Do you have licenses which will change (eg. when you may want to
change the default license from CC-BY-SA 3.0 or laterto CC-BY-SA 4.0
or later) ?
Note that the license_version could also be part of the license_name. To
make it useful you probably need a boolean to mark that it is an or
later licensing.


 One complication is when a page or image is reverted, the licensing table 
 must be modified to reflect the current state.

If you are associating licenses with revisions (instead of pages), you
don't need to change the state in the licensing table on further edits
(just copy the license of the previous revision).


 Data manipulation: The extension would use suitable hooks to insert, 
 modify and render licensing data. Insertion and modification would 
 probably use a relevant Edit Page or Article Management hook. Rendering 
 would probably use a Page Rendering Hook.
 
 Page rendering: You probably don't want to dump licensing data directly 
 onto a page. Instead, it is preferable to output a short licensing 
 statement like:
 
 Content on this page uses licensed content. For details, see licensing 
 data.
 
 The phrase licensing data would be a link to a special page that 
 accesses the licensing table and displays the license data associated 
 with the page.

That's fine. You could even use  Content on this page uses licensed
content from  under [[Special:Licenses/YYY|YYY license]]

Do you want to support multilicensing? You could have revisions with
data coming from several sources.
That means you must allow duplicated revision_id in the licensing table.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-15 Thread Nikola Smolenski
Дана Tuesday 14 September 2010 21:01:40 Dan Nessett написа:
 Database schema: Add a licensing table to the db with the following
 columns - 1) revision_or_image, 2) revision_id, 3) image_id, 4)
 content_source, 5) license_id, 6) user_id.

 The first three columns identify the revision or image to which the
 licensing data is associated. I am not particularly adept with SQL, so
 there may be a better way to do this. The content_source column is a
 string that is a URL or other reference that specifies the source of the
 content under license. The license_id identifies the specific license for
 the content. The user_id identifies the user that added the licensing
 information. The user_id may be useful if a site wishes to allow someone
 who added the licensing information to delete or modify it. However,
 there are complications with this. Since IP addresses are easily spoofed,
 it would mean this entry should only be valid for logged in users.

 Add a license table with the following columns - 1) license_id, 2)
 license_text, 3) license name and 4) license_version. The license_id in
 the licensing table references rows in this table.

How about a more generalised, more wiki solution?

Instead of licensing table, use revisionlinks table that would track what 
revision of a page was linking to what revision of another page.

rl_from: revision that is linking
rl_from_page: page in which the revision was included
rl_to: revision that is being linked to
rl_to_page: page that is being linked to
rl_type: template, category, article...

You could then use this to find what revision of a template was linked by what 
revision of a page. If used for a license template, this would effectively 
track licenses.

If this would be too database intensive, it could be used only for some pages 
(for example, only those with a specific magic word).

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-15 Thread Neil Kandalgaonkar
On 9/15/10 10:11 AM, Nikola Smolenski wrote:
 How about a more generalised, more wiki solution?

 Instead of licensing table, use revisionlinks table that would track what
 revision of a page was linking to what revision of another page.

I can't immediately see why this would be a bad idea, although it seems 
like a pretty radical idea to solve just this licensing problem. 
Maintaining a table of every link in every version of every page seems 
pretty expensive to me, even if it's limited to just some kinds of 
pages. It does open the door to a new way of thinking about wikis though.

That said, such a system is not quite the same as a table in a 
relational database.

- We would need to build separate search, metadata, and indexing systems 
if we wanted to do anything useful with link information.

- It is harder to enforce constraints.

That said, I've been thinking about metadata systems for wikis and this 
is an interesting idea.

-- 
Neil Kandalgaonkar  |) ne...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-15 Thread Neil Kandalgaonkar
I think this is a great start and I am willing to start drawing up plans 
about this on-wiki somewhere. I am rather slammed with pressing 
deadlines right now but I just wanted to contribute a little to the 
discussion.


 On Fri, 10 Sep 2010 23:11:27 +, Dan Nessett wrote:


 Implementation location: In an extension

 Permissions: include two new permissions - 1) addlicensedata, and 2)
 modifylicensedata.

Sounds good to me.


 Database schema: Add a licensing table to the db with the following
 columns - 1) revision_or_image, 2) revision_id, 3) image_id, 4)
 content_source, 5) license_id, 6) user_id.

 The first three columns identify the revision or image to which the
 licensing data is associated.

revision_or_image is a wart, as others pointed out. Each image has its 
own dedicated wiki page so it is probably more useful to use that.

Otherwise your schema is already similar to work I've been doing with 
UploadWizard, building on typical licensing workflows and templates.

There are Deeds which are composed of:

- Source -- some information that tells us where this came from. 
Currently we use a variety of wiki templates here. It could be a URL, a 
bibliographic record, anything.

- Author, which again is rather free-form. It can be a particular user 
on the wiki. But it also common to use a Creator template for a famous 
artist, or to simply write in the name in plaintext.

- License, which is just a template license, but in this new world 
should be something like license_id.

If we want to get more structured, it would be nice to also record the 
Uploader, since that is not the same thing as the Author or Source. 
Things may get complex when image replacement happens as you noted.

Right now our major use case is when the uploader is the author. But it 
will not always be so. In the case where the uploader asserts that 
someone else has okayed their work to be distributed under a free 
license, we want the author in question, or OTRS, to be able to check 
off that this was verified. OTRS has a workflow like this already, and 
in the Multimedia project we had plans to simplify this but I'm afraid 
it's unlikely I will get to that that soon.


 Add a license table with the following columns - 1) license_id, 2)
 license_text, 3) license name and 4) license_version. The license_id in
 the licensing table references rows in this table.

Sounds good to me although I would also add boolean columns that are 
useful to describe the salient features of the license in machine 
readable terms, like attribution_required, share_alike, etc. That 
will help with searching and with a uniform licensing display (see 
below). Another column which gives us the wiki-hosted image of a small 
icon for the license may also be helpful.

Licensing gets very complicated when it comes to country-by-country 
laws, so it may be useful to record the legal regime under which the 
deed falls, which could be something like a country code.

 Page rendering: You probably don't want to dump licensing data directly
 onto a page. Instead, it is preferable to output a short licensing
 statement like:

No, you almost certainly do want to describe the terms of the license 
(broadly) right on the main page for the content.

The description should be functional, from a potential re-user's point 
of view, in very plain language. As in:

WANT TO USE THIS IMAGE?

You are free to use this image for any purpose, even in works that 
you sell.
If you use this image, you must credit the author, Joe Blow 
joeb...@sample.com http://joeblow.sample.com/ .
If you use this image, you must allow others to share the image in 
the same way.

Read more about the licensing terms here: link to our template for 
cc-by-sa 3.0

That's why machine-readable license properties will help. Even for 
images which don't allow re-use at all, it should say so quite clearly. 
(We host Wikimedia trademarked images, for instance.)


PS: I apologize if the threading is screwed up here -- WMF mail was down 
for a few hours so I missed this message.

-- 
Neil Kandalgaonkar  |) ne...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-15 Thread Platonides
Neil Kandalgaonkar wrote:
 I think this is a great start and I am willing to start drawing up plans 
 about this on-wiki somewhere. I am rather slammed with pressing 
 deadlines right now but I just wanted to contribute a little to the 
 discussion.

(...)

 Database schema: Add a licensing table to the db with the following
 columns - 1) revision_or_image, 2) revision_id, 3) image_id, 4)
 content_source, 5) license_id, 6) user_id.

 The first three columns identify the revision or image to which the
 licensing data is associated.
 
 revision_or_image is a wart, as others pointed out. Each image has its 
 own dedicated wiki page so it is probably more useful to use that.

But then you encounter a GFDL / CC-BY-SA description of a CC-BY-any
reupload of a PD image.


 Otherwise your schema is already similar to work I've been doing with 
 UploadWizard, building on typical licensing workflows and templates.
 
 There are Deeds which are composed of:
 
 - Source -- some information that tells us where this came from. 
 Currently we use a variety of wiki templates here. It could be a URL, a 
 bibliographic record, anything.
 
 - Author, which again is rather free-form. It can be a particular user 
 on the wiki. But it also common to use a Creator template for a famous 
 artist, or to simply write in the name in plaintext.
 
 - License, which is just a template license, but in this new world 
 should be something like license_id.
 
 If we want to get more structured, it would be nice to also record the 
 Uploader, since that is not the same thing as the Author or Source. 
 Things may get complex when image replacement happens as you noted.

That would be already stored in the upload log.


 Right now our major use case is when the uploader is the author. But it 
 will not always be so. In the case where the uploader asserts that 
 someone else has okayed their work to be distributed under a free 
 license, we want the author in question, or OTRS, to be able to check 
 off that this was verified. OTRS has a workflow like this already, and 
 in the Multimedia project we had plans to simplify this but I'm afraid 
 it's unlikely I will get to that that soon.

Note that another common case is that the author was the uploader on a
different wiki.


 PS: I apologize if the threading is screwed up here -- WMF mail was down 
 for a few hours so I missed this message.

Seems well-threaded here :)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-15 Thread Platonides
Nikola Smolenski wrote:
 How about a more generalised, more wiki solution?
 
 Instead of licensing table, use revisionlinks table that would track what 
 revision of a page was linking to what revision of another page.
 
 rl_from: revision that is linking
 rl_from_page: page in which the revision was included
 rl_to: revision that is being linked to
 rl_to_page: page that is being linked to
 rl_type: template, category, article...
 
 You could then use this to find what revision of a template was linked by 
 what 
 revision of a page. If used for a license template, this would effectively 
 track licenses.
 
 If this would be too database intensive, it could be used only for some pages 
 (for example, only those with a specific magic word).

This is a complete overkill. And you can't even be sure that it is
consistent. Page A includes {{GFDL}}, to which [[Image:Goatse]] is added
(and reverted 5 seconds later). That would mean Page B (and thousands
more) should have an entry in your table for Goatse. In fact, that won't
appear.
I see the benefit for marking some pages as record anything that ever
links here but then you start getting requests for listing revisions
older than the date on which the revision was tagged as such.

I'm not even sure why we would want to keep the licensing per revision
in such case*. If it's added via templates/links, then the history can
be seen in the page and also the licensing.
You would only want to track which templates are licenses (that's what
some toolserver projects already do). That would certainly be a much
easier goal than what Dan proposed.

Even with licensing information in a separate table, when could keep
them per page, and add dummy revisions where needed.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-14 Thread Dan Nessett
On Fri, 10 Sep 2010 23:11:27 +, Dan Nessett wrote:

 We are currently attempting to refactor some specific modifications to
 the standard MW code we use (1.13.2) into an extension so we can upgrade
 to a more recent maintained version. One modification we have keeps a
 flag in the revisions table specifying that article text was imported
 from WP. This flag generates an attribution statement at the bottom of
 the article that acknowledges the import.
 
 I don't want to start a discussion about the various legal issues
 surrounding text licensing. However, assuming we must acknowledge use of
 licensed text, a legitimate technical issue is how to associate state
 with an article in a way that records the import of licensed text. I
 bring this up here because I assume we are not the only site that faces
 this issue.
 
 Some of our users want to encode the attribution information in a
 template. The problem with this approach is anyone can come along and
 remove it. That would mean the organization legally responsible for the
 site would entrust the integrity of site content to any arbitrary
 author. We may go this route, but for the sake of this discussion I
 assume such a strategy is not viable. So, the remainder of this post
 assumes we need to keep such licensing state in the db.
 
 After asking around, one suggestion was to keep the licensing state in
 the page_props table. This seems very reasonable and I would be
 interested in comments by this community on the idea. Of course, there
 has to be a way to get this state set, but it seems likely that could be
 achieved using an extension triggered when an article is edited.
 
 Since this post is already getting long, let me close by asking whether
 support for associating licensing information with articles might be
 useful to a large number of sites. If so, the perhaps it belongs in the
 core.

The discussion about whether to support license data in the database has 
settled down. There seems to be some support. So, I think the next step 
is to determine the best technical approach. Below I provide a strawman 
proposal. Note that this is only to foster discussion on technical 
requirements and approaches. I have nothing invested in the strawman.

Implementation location: In an extension

Permissions: include two new permissions - 1) addlicensedata, and 2) 
modifylicensedata. These are pretty self-explanatory. Sites that wish to 
give all users the ability to provide and modify licensing data would 
assign these permissions to everyone. Sites that wish to allow all users 
to add licensing data, but restrict those who are allowed to modify it, 
would give the first permission to everyone and the second to a limited 
group.

Database schema: Add a licensing table to the db with the following 
columns - 1) revision_or_image, 2) revision_id, 3) image_id, 4) 
content_source, 5) license_id, 6) user_id.

The first three columns identify the revision or image to which the 
licensing data is associated. I am not particularly adept with SQL, so 
there may be a better way to do this. The content_source column is a 
string that is a URL or other reference that specifies the source of the 
content under license. The license_id identifies the specific license for 
the content. The user_id identifies the user that added the licensing 
information. The user_id may be useful if a site wishes to allow someone 
who added the licensing information to delete or modify it. However, 
there are complications with this. Since IP addresses are easily spoofed, 
it would mean this entry should only be valid for logged in users.

Add a license table with the following columns - 1) license_id, 2) 
license_text, 3) license name and 4) license_version. The license_id in 
the licensing table references rows in this table.

One complication is when a page or image is reverted, the licensing table 
must be modified to reflect the current state.

Data manipulation: The extension would use suitable hooks to insert, 
modify and render licensing data. Insertion and modification would 
probably use a relevant Edit Page or Article Management hook. Rendering 
would probably use a Page Rendering Hook.

Page rendering: You probably don't want to dump licensing data directly 
onto a page. Instead, it is preferable to output a short licensing 
statement like:

Content on this page uses licensed content. For details, see licensing 
data.

The phrase licensing data would be a link to a special page that 
accesses the licensing table and displays the license data associated 
with the page.

-- 
-- Dan Nessett


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-11 Thread Robert Rohde
Let me just say that getting licensing information in to a database
would definitely have advantages for Commons, Wikisource, and a
variety of third parties.  It would also enable the development of
many potentially useful extensions to Mediawiki, such as creating
automatically updated attribution statements, as you mentioned.

However, in practice, there are some considerations that will limit
its ability to improve upon the template system in many use cases.

Content importing is usually done by ordinary users, and those users
doing the importing would generally need to be able to set those
flags.  Further, unless one wants to bathe in license tag errors and
vandalism, the same class of users also need the ability to unset or
change those flags when problems occur.

One could restrict the set of people allowed to modify license tags to
just admins, or just some other intermediate user class (i.e.
something between admins and newbies), but that limitation may or may
not work well in practice.  For example, it would be impractical to
impose many restrictions on a site like Commons where a significant
amount of content comes from infrequent contributors with little or no
established presence in the community.

In addition, if people are editing and changing attribution flags then
there is a natural need to have version histories for license flags.
In the existing system, this is accomplished by the revision histories
of the pages showing changes in templates.  This isn't ideal (for
example the history of license changes isn't easily searchable), but
it does fill a critical need.  One could create a new log of
attribution changes, but it's effectiveness would be limited unless
one can see and revert to the attribution as it existed in the past
(which is not a feature generally enabled by logs).

This is not to say that having attribution info in the database isn't
useful.  Personally, I think it would be very useful.  However, any
scheme to augment or replace other means of conveying attribution
information will need to carefully consider the variety of ways that
this information is used and managed.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-11 Thread James Forrester
On 11 September 2010 08:40, Robert Rohde raro...@gmail.com wrote:
 In addition, if people are editing and changing attribution flags then
 there is a natural need to have version histories for license flags.
 In the existing system, this is accomplished by the revision histories
 of the pages showing changes in templates.  This isn't ideal (for
 example the history of license changes isn't easily searchable), but
 it does fill a critical need.  One could create a new log of
 attribution changes, but it's effectiveness would be limited unless
 one can see and revert to the attribution as it existed in the past
 (which is not a feature generally enabled by logs).

Perhaps something like how the interface for file revisions works -
that is, a licensing-history tab (or somesuch ?) with state links,
change data, diff links, and revert to buttons? But yes, this would
be quite distinct from normal logs. :-)

The opportunities such a system would potentially give would be hugely
cool e.g. in-line automatic RDFa hooks on the licensing of images
on-page (and in-text, though I think community members for WMF wikis
might complain), or only let me see the actually free bits of this
wiki, excluding the non-free components.

James
-- 
James D. Forrester
jdforres...@wikimedia.org | jdforres...@gmail.com
[[Wikipedia:User:Jdforrester|James F.]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-11 Thread Bryan Tong Minh
On Sat, Sep 11, 2010 at 1:11 AM, Dan Nessett dness...@yahoo.com wrote:
 After asking around, one suggestion was to keep the licensing state in
 the page_props table. This seems very reasonable and I would be
 interested in comments by this community on the idea. Of course, there
 has to be a way to get this state set, but it seems likely that could be
 achieved using an extension triggered when an article is edited.

Note that the page_props table is parser-owned. This means that
entries for a specific page are cleared and reinserted when a page is
reparsed. You should take that into account when using the page_props
table.

I'm not sure if page_props is the correct way to go. Copyright is
associated to a specific text or image revision. Therefore, it seems
more obvious to put the licensing in the revision and image table and
their respective archive tables.


Bryan

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-11 Thread Conrad Irwin
There's a long outstanding bug [1] to ensure that accurate attribution
is maintained when templates are substituted. I don't think this is
the same as maintaining attribution of external imports, but it may be
that whatever solution is implemented for one can be generalised to
allow the other.

Conrad

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=6785


On 11 September 2010 01:51, Bryan Tong Minh bryan.tongm...@gmail.com wrote:
 On Sat, Sep 11, 2010 at 1:11 AM, Dan Nessett dness...@yahoo.com wrote:
 After asking around, one suggestion was to keep the licensing state in
 the page_props table. This seems very reasonable and I would be
 interested in comments by this community on the idea. Of course, there
 has to be a way to get this state set, but it seems likely that could be
 achieved using an extension triggered when an article is edited.

 Note that the page_props table is parser-owned. This means that
 entries for a specific page are cleared and reinserted when a page is
 reparsed. You should take that into account when using the page_props
 table.

 I'm not sure if page_props is the correct way to go. Copyright is
 associated to a specific text or image revision. Therefore, it seems
 more obvious to put the licensing in the revision and image table and
 their respective archive tables.


 Bryan

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Keeping record of imported licensed text

2010-09-10 Thread Dan Nessett
We are currently attempting to refactor some specific modifications to 
the standard MW code we use (1.13.2) into an extension so we can upgrade 
to a more recent maintained version. One modification we have keeps a 
flag in the revisions table specifying that article text was imported 
from WP. This flag generates an attribution statement at the bottom of 
the article that acknowledges the import.

I don't want to start a discussion about the various legal issues 
surrounding text licensing. However, assuming we must acknowledge use of 
licensed text, a legitimate technical issue is how to associate state 
with an article in a way that records the import of licensed text. I 
bring this up here because I assume we are not the only site that faces 
this issue.

Some of our users want to encode the attribution information in a 
template. The problem with this approach is anyone can come along and 
remove it. That would mean the organization legally responsible for the 
site would entrust the integrity of site content to any arbitrary author. 
We may go this route, but for the sake of this discussion I assume such a 
strategy is not viable. So, the remainder of this post assumes we need to 
keep such licensing state in the db.

After asking around, one suggestion was to keep the licensing state in 
the page_props table. This seems very reasonable and I would be 
interested in comments by this community on the idea. Of course, there 
has to be a way to get this state set, but it seems likely that could be 
achieved using an extension triggered when an article is edited.

Since this post is already getting long, let me close by asking whether 
support for associating licensing information with articles might be 
useful to a large number of sites. If so, the perhaps it belongs in the 
core.

-- 
-- Dan Nessett


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-10 Thread Platonides
Dan Nessett wrote:
(...)
 
 After asking around, one suggestion was to keep the licensing state in 
 the page_props table. This seems very reasonable and I would be 
 interested in comments by this community on the idea. Of course, there 
 has to be a way to get this state set, but it seems likely that could be 
 achieved using an extension triggered when an article is edited.

Seems a good approach.

 Since this post is already getting long, let me close by asking whether 
 support for associating licensing information with articles might be 
 useful to a large number of sites. If so, the perhaps it belongs in the 
 core.

Many sites could benefit, but I'd place it into an extension for now.
Preferably on our svn. Note that not everything that many people use
belongs to core (eg. ParserFunctions).



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Keeping record of imported licensed text

2010-09-10 Thread Neil Kandalgaonkar
Support for licenses in the database would be a huge boon to Wikimedia 
Commons, for all the reasons you state. Commons' licensing is not 
uniform and making it easy to search and sort would be better for everyone.

Currently we display licenses in templates, which has many drawbacks.

I'd like it to be more concrete than just a page_prop -- for instance, 
you also want to associate properties with the licenses themselves, such 
as requires attribution. So that would mean another table.



On 9/10/10 4:11 PM, Dan Nessett wrote:
 We are currently attempting to refactor some specific modifications to
 the standard MW code we use (1.13.2) into an extension so we can upgrade
 to a more recent maintained version. One modification we have keeps a
 flag in the revisions table specifying that article text was imported
 from WP. This flag generates an attribution statement at the bottom of
 the article that acknowledges the import.

 I don't want to start a discussion about the various legal issues
 surrounding text licensing. However, assuming we must acknowledge use of
 licensed text, a legitimate technical issue is how to associate state
 with an article in a way that records the import of licensed text. I
 bring this up here because I assume we are not the only site that faces
 this issue.

 Some of our users want to encode the attribution information in a
 template. The problem with this approach is anyone can come along and
 remove it. That would mean the organization legally responsible for the
 site would entrust the integrity of site content to any arbitrary author.
 We may go this route, but for the sake of this discussion I assume such a
 strategy is not viable. So, the remainder of this post assumes we need to
 keep such licensing state in the db.

 After asking around, one suggestion was to keep the licensing state in
 the page_props table. This seems very reasonable and I would be
 interested in comments by this community on the idea. Of course, there
 has to be a way to get this state set, but it seems likely that could be
 achieved using an extension triggered when an article is edited.

 Since this post is already getting long, let me close by asking whether
 support for associating licensing information with articles might be
 useful to a large number of sites. If so, the perhaps it belongs in the
 core.


-- 
Neil Kandalgaonkar   ) ne...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l