On Jul 26, 2011, at 3:13 PM, Bruce D'Arcus wrote:

> On Tue, Jul 26, 2011 at 2:36 PM, Simon Kornblith <[email protected]> wrote:
> 
>> So, I have a crazy idea of how to shift as much of the complexity of
>> generating CSL away from the user as possible. Essentially, I want to be
>> able to copy and paste bibliography entries from a journal's reference list
>> into a box and end up with a formatted style.
> 
> Indeed, this would probably be the ideal (except that, note: most of
> the time, the examples aren't extensive enough to account for what
> authors often need; code should account for that if it can).

That's the rationale behind using existing macros when they fit, instead of 
trying to infer everything, but there may still be some issues with this.

>> As far as the implementation goes, we would need to:
>> 1) Convert the bibliography entries to a series of labeled fields using a
>> parser such as FreeCite.
>> 2) Where possible, string together macros from existing styles to generate
>> the output.
>> 3) If the output contains a substring that cannot be generated using
>> existing macros, generate a new macro to generate only that substring and
>> use existing macros for the rest. In order to avoid generating macros that
>> work for only a limited set of references (e.g., "(" as a prefix on one
>> element and ")" as a suffix on a different element), this would need to be
>> done either using a statistical model based on the distribution of prefixes,
>> suffixes, and group delimiters in the CSL repository and choosing the most
>> likely macro, or by using a set of heuristics.
>> As far as (3) goes, I made a naive implementation of the former in
>> Scheme/MIT Church (https://github.com/simonster/csl-inference) that mostly
>> works. MIT Church is really nice in some ways, but the inference is
>> imperfect (samples are not actually independent). Heuristics would
>> undoubtedly be faster, and might work better.
> 
> Why MIT Church, and not, say, Python? Just something you'd been
> playing with, or is there some other reason?

MIT Church has a lot of rough edges, but it makes performing this kind of 
inference very simple. Essentially, you can write code to generate a random 
sample from some distribution (a generative model), and it will find samples 
that match a given set of parameters, even when drawing a sample with those 
parameters by chance is highly improbable. That code contains a routine to 
generate a random CSL substring from a distribution defined by the prefixes, 
suffixes, and group delimiters in the CSL repository, which is very large. 
Church's mh-query function takes that function and samples that very large 
distribution of substrings for a CSL substring that matches the given output. 
Since the CSL generating routine is more likely to give samples that more 
closely resemble the repository, CSL substrings are more likely to resemble 
those in the repository than not. Church is intended to make writing code to 
perform this kind of inference very easy.

Unfortunately, Church is very computationally intensive, and the algorithm it 
uses for inference (Metropolis-Hastings) might be suboptimal for this kind of 
problem judging by the results, so I'm not sure this code has much of a future 
besides as a proof of concept.

>> Implementing this might end up being a lot of work, but I think it's
>> possible in principle. The UI is very simple if it can be made to work well
>> enough; the difficulty is in programming it. I won't have any time to do
>> this for quite a while, but it could be a fun project.
> 
> Cool; thanks for putting it up on github!
> 
> If you get a chance, do you think you could convert the README to
> markdown, so that it will render correctly (complete with
> syntax-highlighting) in the browser?
> 
> If the source is LaTeX, pandoc will convert it for you, except maybe
> for the syntax highlighting. For that, see this source:
> 
> <https://raw.github.com/seancribbs/ripple/6b62eee9301b654d937b0f85706e6cc72ad88352/README.markdown>
> 
> .. which will render like:
> 
> <https://github.com/seancribbs/ripple/blob/master/README.markdown>
> 
> Hence, XML highlighting with this:
> 
> ``` xml
> <foo>bar</foo>
> ```

Thanks. The original file is in fact LaTeX, so I'll give this a try.

Simon
------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
xbiblio-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Reply via email to