Re: [Foundation-l] [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index

2010-07-21 Thread Daniel Mietchen
On Tue, Jul 20, 2010 at 9:26 PM, Brian J Mingus
brian.min...@colorado.edu wrote:
 I like your suggestion that the abc disambiguator be chosen based on the
 first date of publication, and I also like the prospect of using slashes
 since they can't be contained in names. Using the full year is a good idea
 too. We can combine these to come up with a key that, in principle, is
 guaranteed to be unique. This key would contain:

 1) The first three author names separated by slashes
why not separate by pluses? they don't form part of names either, and
don't cause problems with wiki page titles.

 2) If there are more than three authors, an EtAl
don't think that's necessary if we get the abc part right.

 3) Some or all of the date. For instance, if there is only one source by
 this set of authors that year, we can just use . However, once another
 source by those set of authors is added, the key should change to MMDD
 or similar.
I don't think it is a good idea to change one key as a function of
updates on another, except for a generic disambiguation tag.

 If there are multiple publications on the same day, we can
 resort to abc. Redirects and disambiguation pages can be set up when a key
 changes.
As Jodi pointed out already, the exact date is often not clearly
identifiable, so I would go simply for the year.
Instead of an alphabetic abc, one could use some function of the
article title (e.g. the first three words thereof, or the initials of
the first three words), always in lower case.

An even less ambiguous abc would be starting page (for printed stuff)
or article number (for online only) but this brings us back to the
7523225 problem you mentioned above.

 Since the slashes are somewhat cumbersome, perhaps we can not make them
 mandatory, but similarly use them only when they are necessary in order to
 escape a name. In the case that one of the authors does not have a slash
 in their name - the dominant case - we can stick to the easily legible and
 niecly compact CamelCase format.

 Example keys generated by this algorithm:

 KangHsuKrajbichEtAl2009
Kang+Hsu+Krajbich+2009+the+wick+in
or
Kang+Hsu+Krajbich+2009+twi

also note that the CamelCase key does not yield results in a google
search, whereas the first plused variant brings up the right work
correctly, while the plused one with initialed title tends to bring at
least something written by or cited from these authors.

 Author1Author2/Author-Three/2009
Author1+Author2+Author-Three+2009+just+another+article
or
Author1+Author2+Author-Three+2009+jat

Of course, it does not have to be _exactly_ three authors, nor three
words from the title, and it does not solve the John Smith (or Zheng
Wang) problem.

Daniel

-- 
http://www.google.com/profiles/daniel.mietchen

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index

2010-07-20 Thread Brian J Mingus
On Mon, Jul 19, 2010 at 8:08 PM, Rob Lanphier ro...@robla.net wrote:

 On Mon, Jul 19, 2010 at 1:20 PM, Brian J Mingus
 brian.min...@colorado.edu wrote:
  I have been working with Sam and others for some time now on
 brainstorming a
  proposal for the Foundation to create a centralized wiki of citations, a
  WikiCite so to speak, if that is not the eventual name. My plan is to
  continue to discuss with folks who are knowledgeable and interested in
 such
  a project and to have the feedback I receive go into the proposal which I
  hope to write this summer.

 This sounds great.  Just speaking as a community member, I've been
 thinking about this topic a long time myself, and have plenty to add
 to the conversation.

  The proposal white paper will then be sent around
  to interested parties for corrections and feedback, including on-wiki and
  mailing lists, before eventually landing at the Foundation officially. As
 we
  know WMF has not started a new project in some years, so there is no
  official process. Thus I find it important to get it right.

 I'd suggest finding an on-wiki spot to discuss this work.  Here's one
 place this has been discussed in the past that may be a good place to
 revive the conversation:

 http://strategy.wikimedia.org/wiki/Proposal:Building_a_database_of_all_books_ever_published

 Rather than commenting on list about the subject itself, I've
 commented on the discussion page there:

 http://strategy.wikimedia.org/wiki/Proposal_talk:Building_a_database_of_all_books_ever_published#Fact_database_6531

 Rob


Rob,

Thanks for bringing my attention to this proposal. It certainly has some of
the same ring as this project, with of course some important differences.
Commonalities between the projects are that they are multilingual and
require a powerful search engine. Differences are that this project is for
all literary sources and that I believe it is best suited at the WMF. The
widespread use of citations across the Wikipedias will drive user
contributions towards adding richer metadata to those citations. And having
a source of citations available will increase the quality of the Wikipedias
as it becomes easier and easier to cite sources.

Brian
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index

2010-07-20 Thread Brian J Mingus
On Mon, Jul 19, 2010 at 9:37 PM, Samuel Klein meta...@gmail.com wrote:

 Brian,

 The meta process for new project proposals is still the cleanest one
 for suggesting a specific Project and presenting it alongside similar
 projects.

 It would be helpful if you could update a related project proposal on
 meta -- say, [[m:WikiBibliography]], if that seems relevant.  (I just
 cleaned that page up and merged in an older proposal that had been
 obfuscated.)


Thanks for your work on this - definitely in the right direction! I will
consider whether I feel it's the right way for me to get started. One point
is that I am pointing more in the direction of a long-form proposal, and I
have more experience writing white-paper proposals for academia. I certainly
want it to end up on wiki, but when TPTB finally read the proposal perhaps
they will find it more persuasive if it is a professional looking document
that lands in their inbox.


 Or you can create a new project proposal...  WikiCite as a name can be
 confusing, since it has been used to refer to this bibliographic idea,
 but also to refer to the idea of citations for every statement or fact
 - something closer to a blame or trust solution that includes
 citations in its transactions.


Another name that I have come up with is OpenScholar. I still rather like
it, but suspect it has too much of a scientific ring to it? Names are
certainly very important so we should do more work on this avenue. Including
a list of names in the proposal would be a good idea, and perhaps the final
name will be a combination of existing name proposals.


 We should figure out how this project would work with acawiki, and
 possibly bibdex.  Bibdex doesn't aim to   And it would be helpful to
 have a publicly-viewable demo to play with -- could you clone your
 current wiki and populate the result with dummy data?


The problem with WikiPapers is that it has too many features! A feature-thin
version would be ideal for the proposal though, so I will plan to have some
kind of a demo site available.


 I love the idea of having a global place to discuss citations -- ALL
 citations -- something that OpenLibrary, the arXiv, and anyone else
 hosting cited documents could point to for every one of its works.


Exactly :)

Brian


 Sam.


 On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo)
 nemow...@gmail.com wrote:
  Brian J Mingus, 19/07/2010 22:20:
  The basic idea is a centralized wiki that contains citation information
 that
  other MediaWikis and WMF projects can then reference using something
 like a
  {{cite}} template or a simple link. The community can document the
 citation,
  the author, the book etc.. and, in one idealization, all citations
 across
  all wikis would point to the same article on WikiCite. Users can use
 this
  wiki as their personal bibliography as well, as collections of citations
 can
  be exported in arbitrary citation formats.
 
  I have already mentioned it before, but this description looks quite
  similar to http://bibdex.org/ . Maybe we should join forces (i.e., send
  your proposal also to Sunir Shah).
 
  Nemo
 
  ___
  Wiki-research-l mailing list
  wiki-researc...@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 



 --
 Samuel Klein  identi.ca:sj   w:user:sj

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index

2010-07-20 Thread Brian J Mingus
On Tue, Jul 20, 2010 at 5:10 AM, Daniel Kinzler dan...@brightbyte.dewrote:

 Hi all

 A central place for managing Bibliographic data for use with Citations is
 something that has been discussed by the German community for a long time.
 To
 me, it consists of two parts: a project for managing the structured data,
 and a
 machanism for uzsing that data on the wikis.

 I have been working on the latter recently, and there's a working
 prototype: on
  http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion
 you
 can see how data records can be included from external sources. A demo for
 the
 actual on-wiki use can be found at
 http://prototype.wikimedia.org/wmde-sandbox-1/Ameisenigel#Literatur,
 where
 {{ISBN|0868400467}} is used to show the bibliographic info for that book.
 (side
 note: the prototype wikis are slow. sorry about that).

 Fetching and showing the data is done using
 http://www.mediawiki.org/wiki/Extension:DataTransclusion. Care has been
 taken
 to make this secure and scalable.

 For a first demo, I'm using teh ISBN as the key, but any kind of key could
 be
 used to reference resources other than books.

 For demoing managing the data by ourselves, I have set up ab SMW instance.
 An
 example bib record is at
 http://prototype.wikimedia.org/wmde-bib/ISBN:0451526538, it's used
 across
 wikis at
 http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion.
 Note
 that changes will show delayed, as the data is cached for a while.


 When discussing these things, please keep in mind that there are two
 components:
 fetching and displaying external data records, and managing structured data
 in a
 wiki style. The former is much simpler than the latter. I think we should
 really
 aim at getting both, but we can start off with transclusing external data
 much
 faster, if we allow no-so-wiki data sources. For ISBN-based queries, we
 could
 simply fetch information from http://openlibrary.org - or the open
 knowledge
 foundation's http://bibliographica.org, once it's working.

 In the context of bibdex, I recommend to also have a look at
 http://bibsonomy.org - it's a university research project, open source,
 and is
 quite similar to bibdex (and to what citeulike used to be).

 As to managing structured data ourselves: I have talked a lot with Erik
 Möller
 and Markus Krötzsch about this, and I'm in touch with the people wo make
 DBpedia
 and OntoWiki. Everyone wants this. But it's not simple at all to get it
 right
 (efficient versioning of multilingual data in a document oriented database,
 anyone? want inference? reasoning, even? yay...). So the plan is currently
 to
 hatch a concrete plan for this. And I imagine that bibliographical and
 biographical info will be among the first used cases.


Hi Daniel,

Have you considered that Lucene is the perfect backend for this kind of
project? What kinds of faults do you see with it? At least in my mind, we
can mold it to our needs here. It has the core capabilities found in
Semantic MediaWiki, and it is fast and scalable.

I say this as a serious user of Semantic MediaWiki. I have seen that it
can't scale well without an alternate backend, and I wonder what kind of
monumental effort will be required to make it scale to tens or hundreds of
millions of documents, each of which containing 20-50 properties. Lucene can
already do this, SMW, not so much ;-)

Brian



 cheers,
 daniel


 ___
 Wiki-research-l mailing list
 wiki-researc...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index

2010-07-20 Thread Brian J Mingus
On Tue, Jul 20, 2010 at 11:56 AM, Jodi Schneider jodi.schnei...@deri.orgwrote:

 Hi Brian,

 On 20 Jul 2010, at 18:02, Brian J Mingus wrote:

 On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen f...@imm.dtu.dk wrote:



 Hi Brian and others,

 I also think that it would be interesting with some bibliographic support,
 for two-way citation tracking and commenting on articles (for example), but
 I furthermore find that particular in science article we often find data
 that is worth structuring and put in a database or a structured wiki, so
 that we can extract the data for meta-analysis and specialized information
 retrieval. That is what I also do in the Brede Wiki. I use the templates to
 store such data. So if such a system as yours is implemented we should not
 just think of it as a bibliographic database but in more broader terms: A
 data wiki.


 Although the technology required to make a WikiCite happen will be
 applicable to a more generalized wiki for storing data I think that is too
 broad for the current proposal. A WMF analogue to Google Base is an entirely
 new beast that has its own requirements. I certainly think it's an
 interesting and worthwhile idea, but I don't feel that we are there yet.

 As the 'key' (the wiki page title) I use the (lowercase) title of the
 article. That might be more reader friendly - but usually longer. I think
 that KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor author
 list + year will be unique, so we need some predictable disambig.


 I noticed that AcaWiki is using the title, but I am personally not a fan of
 it. The motivation for using a key comes from BibTeX. When you cite an entry
 in a publication in LaTeX, you type \cite{key}. Also, I think most
 bibliographic formats support such a key. The idea is that there is a
 universal token that you can type into Google that will lead you to the
 right item. The predictable disambig is in the format I sent out (which
 likely needs modification for other kinds of sources). The format is
 Author1Author2Author3EtAlYYb. Here is a real world example from a pair of
 very prolific scientists, Deco  Rolls, who published at least three papers
 together in 2005. In our lab we have really come to love these keys - they
 are very memorable tokens that you can verbally pass on to other scientists
 in the midst of a discussion. Eventually, if they enter the key you have
 given them into Google, they will get the right entry at WikiCite.


 DecoRolls05 - Synaptic and spiking dynamics underlying reward reversal in
 the orbitofrontal cortex.
 DecoRolls05b - Sequential memory: a putative neural and synaptic dynamical
 mechanism.
 DecoRolls05c - Attention, short-term memory, and action selection: a
 unifying theory.


 Citation keys of this sort work, but they have to be decided on by some
 external system. Who decides which paper is -, b, and c? Publication order
 would be one way to do it -- but that's complicated, especially with online
 first publication, or overlapping conferences.

 I think whether they're memorable tokens might vary by person... Sure, the
 author and year will be identifiable, even memorable. But the a, b, c?

 If you want to support more than recent works, I'd urge  instead of YY.
 Then we only have an issue for pre-0 stuff. :)

 Also consider differentiating authors from title and year, perhaps with
 slashes.
 author1-author2-author3-etal//b
 I'm not convinced that -'s are better than capital letters (author last
 names can have both)...


The key seems to be a very important point, so it's important that we get it
right. My thinking is guided by several constraints. First, I strongly
dislike the numeric keys used at sites such as CiteULike and most database
sites (such as 7523225). To the greatest degree possible I believe the key
should actually convey what is behind the link. On the other hand, the key
should not be too long. Numeric keys maximize the shortness while telling
you nothing , whereas titles as keys are very long and don't give you some
of the most important information - the authors and the year it was
published. The key format I have suggested does seem to have a flaw, being
that it easily becomes ambiguous and you must resort to a token that is not
easily memorable. Then again, even though many authors and sets of authors
will publish multiple items in a year, the vast majority of works have a
unique set of authors for a given year.

I like your suggestion that the abc disambiguator be chosen based on the
first date of publication, and I also like the prospect of using slashes
since they can't be contained in names. Using the full year is a good idea
too. We can combine these to come up with a key that, in principle, is
guaranteed to be unique. This key would contain:

1) The first three author names separated by slashes
2) If there are more than three authors, an EtAl
3) Some or all of the date. For instance, if there is only one source by
this set of authors 

Re: [Foundation-l] [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index

2010-07-20 Thread phoebe ayers
Hi guys! I'm glad my little post helped re-start such a productive
conversation.

Since some people are replying only to the research-l list and some to
both research-l and foundation-l (my fault for cc'ing both) maybe we
should centralize this discussion (at least of the nitty gritty
metadata issues) on the research list for now? thread here:
http://lists.wikimedia.org/pipermail/wiki-research-l/2010-July/thread.html

Of course the perennial issue of how to propose a new WMF project is
very much a foundation-l topic.

regards,
phoebe

On Tue, Jul 20, 2010 at 12:26 PM, Brian J Mingus
brian.min...@colorado.edu wrote:


 On Tue, Jul 20, 2010 at 11:56 AM, Jodi Schneider jodi.schnei...@deri.org
 wrote:

 Hi Brian,
 On 20 Jul 2010, at 18:02, Brian J Mingus wrote:

 On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen f...@imm.dtu.dk wrote:


 Hi Brian and others,

 I also think that it would be interesting with some bibliographic
 support, for two-way citation tracking and commenting on articles (for
 example), but I furthermore find that particular in science article we often
 find data that is worth structuring and put in a database or a structured
 wiki, so that we can extract the data for meta-analysis and specialized
 information retrieval. That is what I also do in the Brede Wiki. I use the
 templates to store such data. So if such a system as yours is implemented we
 should not just think of it as a bibliographic database but in more broader
 terms: A data wiki.


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index

2010-07-19 Thread Rob Lanphier
On Mon, Jul 19, 2010 at 1:20 PM, Brian J Mingus
brian.min...@colorado.edu wrote:
 I have been working with Sam and others for some time now on brainstorming a
 proposal for the Foundation to create a centralized wiki of citations, a
 WikiCite so to speak, if that is not the eventual name. My plan is to
 continue to discuss with folks who are knowledgeable and interested in such
 a project and to have the feedback I receive go into the proposal which I
 hope to write this summer.

This sounds great.  Just speaking as a community member, I've been
thinking about this topic a long time myself, and have plenty to add
to the conversation.

 The proposal white paper will then be sent around
 to interested parties for corrections and feedback, including on-wiki and
 mailing lists, before eventually landing at the Foundation officially. As we
 know WMF has not started a new project in some years, so there is no
 official process. Thus I find it important to get it right.

I'd suggest finding an on-wiki spot to discuss this work.  Here's one
place this has been discussed in the past that may be a good place to
revive the conversation:
http://strategy.wikimedia.org/wiki/Proposal:Building_a_database_of_all_books_ever_published

Rather than commenting on list about the subject itself, I've
commented on the discussion page there:
http://strategy.wikimedia.org/wiki/Proposal_talk:Building_a_database_of_all_books_ever_published#Fact_database_6531

Rob

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index

2010-07-19 Thread Samuel Klein
Brian,

The meta process for new project proposals is still the cleanest one
for suggesting a specific Project and presenting it alongside similar
projects.

It would be helpful if you could update a related project proposal on
meta -- say, [[m:WikiBibliography]], if that seems relevant.  (I just
cleaned that page up and merged in an older proposal that had been
obfuscated.)

Or you can create a new project proposal...  WikiCite as a name can be
confusing, since it has been used to refer to this bibliographic idea,
but also to refer to the idea of citations for every statement or fact
- something closer to a blame or trust solution that includes
citations in its transactions.

We should figure out how this project would work with acawiki, and
possibly bibdex.  Bibdex doesn't aim to   And it would be helpful to
have a publicly-viewable demo to play with -- could you clone your
current wiki and populate the result with dummy data?

I love the idea of having a global place to discuss citations -- ALL
citations -- something that OpenLibrary, the arXiv, and anyone else
hosting cited documents could point to for every one of its works.

Sam.


On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo)
nemow...@gmail.com wrote:
 Brian J Mingus, 19/07/2010 22:20:
 The basic idea is a centralized wiki that contains citation information that
 other MediaWikis and WMF projects can then reference using something like a
 {{cite}} template or a simple link. The community can document the citation,
 the author, the book etc.. and, in one idealization, all citations across
 all wikis would point to the same article on WikiCite. Users can use this
 wiki as their personal bibliography as well, as collections of citations can
 be exported in arbitrary citation formats.

 I have already mentioned it before, but this description looks quite
 similar to http://bibdex.org/ . Maybe we should join forces (i.e., send
 your proposal also to Sunir Shah).

 Nemo

 ___
 Wiki-research-l mailing list
 wiki-researc...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
Samuel Klein          identi.ca:sj           w:user:sj

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l