On Tue, Jul 26, 2011 at 3:51 PM, Simon Kornblith <[email protected]> wrote:
> On Jul 26, 2011, at 3:13 PM, Bruce D'Arcus wrote:
>
>> On Tue, Jul 26, 2011 at 2:36 PM, Simon Kornblith <[email protected]> wrote:
>>
>>> So, I have a crazy idea of how to shift as much of the complexity of
>>> generating CSL away from the user as possible. Essentially, I want to be
>>> able to copy and paste bibliography entries from a journal's reference list
>>> into a box and end up with a formatted style.
>>
>> Indeed, this would probably be the ideal (except that, note: most of
>> the time, the examples aren't extensive enough to account for what
>> authors often need; code should account for that if it can).
>
> That's the rationale behind using existing macros when they fit, instead of 
> trying to infer everything, but there may still be some issues with this.
>
>>> As far as the implementation goes, we would need to:
>>> 1) Convert the bibliography entries to a series of labeled fields using a
>>> parser such as FreeCite.
>>> 2) Where possible, string together macros from existing styles to generate
>>> the output.
>>> 3) If the output contains a substring that cannot be generated using
>>> existing macros, generate a new macro to generate only that substring and
>>> use existing macros for the rest. In order to avoid generating macros that
>>> work for only a limited set of references (e.g., "(" as a prefix on one
>>> element and ")" as a suffix on a different element), this would need to be
>>> done either using a statistical model based on the distribution of prefixes,
>>> suffixes, and group delimiters in the CSL repository and choosing the most
>>> likely macro, or by using a set of heuristics.
>>> As far as (3) goes, I made a naive implementation of the former in
>>> Scheme/MIT Church (https://github.com/simonster/csl-inference) that mostly
>>> works. MIT Church is really nice in some ways, but the inference is
>>> imperfect (samples are not actually independent). Heuristics would
>>> undoubtedly be faster, and might work better.
>>
>> Why MIT Church, and not, say, Python? Just something you'd been
>> playing with, or is there some other reason?
>
> MIT Church has a lot of rough edges, but it makes performing this kind of 
> inference very simple. Essentially, you can write code to generate a random 
> sample from some distribution (a generative model), and it will find samples 
> that match a given set of parameters, even when drawing a sample with those 
> parameters by chance is highly improbable. That code contains a routine to 
> generate a random CSL substring from a distribution defined by the prefixes, 
> suffixes, and group delimiters in the CSL repository, which is very large. 
> Church's mh-query function takes that function and samples that very large 
> distribution of substrings for a CSL substring that matches the given output. 
> Since the CSL generating routine is more likely to give samples that more 
> closely resemble the repository, CSL substrings are more likely to resemble 
> those in the repository than not. Church is intended to make writing code to 
> perform this kind of inference very easy.
>
> Unfortunately, Church is very computationally intensive, and the algorithm it 
> uses for inference (Metropolis-Hastings) might be suboptimal for this kind of 
> problem judging by the results, so I'm not sure this code has much of a 
> future besides as a proof of concept.

Do you have some thoughts on a possibly more appropriate algorithm,
should someone want to explore alternatives?

[...snip...]

Bruce

------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
xbiblio-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Reply via email to