Re: [julia-users] Edit Distance?

2014-01-14 Thread Shashwat Anand
On Wed, Jan 15, 2014 at 10:18 AM, Matthias BUSSONNIER 
bussonniermatth...@gmail.com wrote:


 Le 14 janv. 2014 à 16:57, John Myles White a écrit :

  Thanks!
 
  I'll have to check that out. I was able to translate some of the
 Wikipedia code fast enough to get something working for my purposes.
 

 That was more or less what I did,

 I just found that trying to consume the minimal memory (n+1) was
 significantly
 slower than using (2n) IIRC.


Just someone curious here.

Assuming two strings of length M and N, we can do the naive edit distance
via dynamic programming,
in O (MN) time and O (MN) space.

However since only last row is used to determine the result, we can save
the space by using O (N) memory,
which is basically 2*N memory.

What I don't understand is, what is this minimal memory (n + 1) you just
mentioned ?


 I tried a little to look into what else could be done in the literature,
 but was most of the time you need some assumption on the alphabet,
 or the overhead seem too significant for short strings.

 You could also try to mimic how Python difflib and SequenceMatcher works,
 it's much faster but does not always give the right result[1].


 While you are woking on diff, interested on trying to diff IPython
 notebooks ? :-)
 --
 M




 [1]: http://carreau.github.io/posts/08-Dear-DiffLib.html


  -- John
 
  On Jan 14, 2014, at 3:18 PM, Matthias BUSSONNIER 
 bussonniermatth...@gmail.com wrote:
 
 
  Le 14 janv. 2014 à 15:08, John Myles White a écrit :
 
  Is there a package out there to compute edit distances between strings?
 
  I started at some point, never really finished.
 
  https://github.com/carreau/Diff.jl
 
  --
  M
 
 
  -- John
 
 
 




Re: [julia-users] Re: Style Guideline

2013-12-31 Thread Shashwat Anand
On Wed, Jan 1, 2014 at 12:35 AM, John Myles White
johnmyleswh...@gmail.comwrote:

 (4) Using both tabs and spaces is a huge problem in a shared codebase.
 This is probably the only rule in my entire list that I’m actually going to
 enforce in the code I maintain. IIRC, Python completely forbids mixing
 these kinds of space characters at the language level.


+1
IMHO tabs should totally be avoided.  We can configure our editors/IDE to
behave tabs like spaces.
I know vim does that [set expandtab], and we can use tab key for all
general purpose and it expands
to spaces.



 (7) + (8) These rules are part of the official Google style guides for R,
 which is the language with the most similarity to Julia that’s being used
 at companies with public facing style guidelines. I think they’re quite
 sensible rules, which is why I decided to borrow them from published
 standards.

 (18) + (19): This is clearly an area of big disagreement in our community.
 I might pull them out into a suggestions section since I’d really prefer
 that code submitted to things like DataFrames.jl follow this rule, but
 don’t want to include a rule that’s going to be a big schism in the
 community.

 (22) + (23) + (24): I may take these out as well. I definitely agree that
 there’s a big difference between performance guidelines and style
 guidelines, although that line is blurry when you’re trying to keep a
 codebase written in a consistent style.

 (31): Comments aren’t PDF’s or HTML or any other language designed for
 transmitting carefully formatted documents. You don’t get to use images,
 properly formatted tables, etc. I find diagrams are an essential part of
 good documentation. I think conflating documents with code leads to
 documents that are less readable and lots of lines in code that’s not
 actually worth reading.

 (35): I might take this one out as well. It’s somewhere on the boundary
 between a performance tip and a style habit worth developing.

  — John

 On Dec 31, 2013, at 11:12 AM, Daniel Carrera dcarr...@gmail.com wrote:

  Personally, I do not think that a more thorough style guide is
 necessarily better. That said, I will give you my comments:
 
  (4):  I like tabs and I use them.
 
  (7) + (8):  I disagree. Although I generally use comma+space as you say,
 at times I deviate from that when I feel that doing so will improve the
 clarity and readability of my code.
 
  (18)+(19):  I disagree. Although I could favour rules like this in a
 particular project, in many cases I think that adding type annotations just
 creates syntactic noise and can create a needless limitation.
 
  (22)+(23)+(24): I do not think that performance tips belong in a style
 guide. You could spend a lot of time writing performance tips and I don't
 see an obvious reason why the three tips you chose are more important than
 other performance tips.
 
  (31): I partially disagree. I like writing documentation (e.g. tutorial
 or explaining an algorithm) at the top of the file. I like having the
 documentation in the same file as the code that it refers to. I do not know
 what you mean when you say that English documents are more readable when
 not constrained by the rule of code comments. What rules are those?
 
  Also, I rarely want to have a diagram in my documentation because that
 involves starting a WYSIWYG program like LibreOffice or something like
 that. I haven't really felt a lot of need for diagrams.
 
 
  (35): This doesn't sound like a style thing either. Advice on the
 correct way to use a module, or how to maintain precision or avoid
 round-off errors, do not belong in a style guide. This sort of thing
 belongs in either the documentation for the module, or on some tutorial
 about numerical computation.
 
  Cheers,
  Daniel.
 
 
 
  On Tuesday, 31 December 2013 10:01:23 UTC-5, John Myles White wrote:
  One of the things that I really like about working with the Facebook
 codebase is that all of the code was written to comply with a very thorough
 internal style guideline. This prevents a lot of useless disagreement about
 code stylistics and discourages the creation of unreadable code before
 anything reaches the review stage.
 
  In an attempt to emulate that level of thoroughness, I decided to extend
 the main Julia manual’s style guide by writing my own personal style
 guideline, which can be found at
 https://github.com/johnmyleswhite/Style.jl
 
  I’d be really interested to know what others think of these rules and
 what they think is missing. Right now, my guidelines leave a lot of wiggle
 room.
 
   — John