Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-20 Thread Simon Richter
Hi,

On 21.12.2015 08:35, Lorenzo Marcantonio wrote:

> After this I'm *not* advocating a binary format, there would be no
> advantage for it in pcbnew. There are no performance restraints in
> either speed or space for kicad files so just use whatever is easier for
> you to manage (i.e. structured text)

The main advantage of text files is that you can fix them with a text
editor when stuff breaks, because there are no length fields to update
consistently. I'd consider that a major plus.

   Simon




signature.asc
Description: OpenPGP digital signature
___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-20 Thread Nick Østergaard
Hi David

This discussion about binary formats was just a side comment from
someone, this is off topic. Please use your energy on something
usefull, like commenting on Roszko's patch instead. Which is about
making am more sane sexpr parser than we have in KiCad at the moment.
Please try to be constructive.

2015-12-20 4:51 GMT+01:00 David Godfrey :
> Hi,
>
> I don't often post here, but I will add my ~30 years of experience with
> binary file formats.
> THEY ARE EVIL EVIL THINGS !!
>
> They tend to be extremely fragile, any change and something is likely to
> break.
> They require parsers that suck to maintain
> In most cases general purpose parsers *CAN NOT* be used, requiring a custom
> parser.
> it is *VERY* difficult to keep compatibility with an older version of the
> format.
> Essentially you need to keep multiple copies of the parser and have some
> form of VersionID that can be used to select between parsers.
> To all intents it is *IMPOSSIBLE* to handle forward compatibility.
> ie: open a file with version 1.1 that was written with version 1.1a
> A permanent and accurate copy of documentation is required to be maintained
> for *EVERY* version of the parser ever written
> Parser VersionID magic must uniquely identify the parser version and be the
> first entry in the file.
> NOTE: other locations are possible, but cause significant problems with
> extracting the version
> The more involved the protocol the worse all of these issues get.
> Documentation is hard to write, and even harder to read and understand
> sufficiently to code against.
> Debugging read/write issues is an absolute nightmare.
> You either have to provide a reference to test against by either
>
> hand encode or decode a file
> have an independent programmer write a "clean room" implementation of the
> parser
> This requires the previously mentioned documentation to be correct for this
> version
> It can introduce additional bugs (you now have 2 parsers to debug)
>
> Many potential developers of addons or core code may
>
> be scared off by the complexity
> accidentally introduce bugs or incompatibilities that don't show up until a
> user looses data
>
> Inadvertently force full or partial Vendor Lock In.
> If a protocol (file format) is
>
> too hard to implement
> too hard to keep up with changes
> too prone to breakage (fragile)
> too obscure (poorly documented or the documentation is not easily available)
>
> Then other vendors won't put in the effort to support the protocol (file
> format)
>
> I could go on with a lot more points but you get the picture.
>
> Binary file formats made a lot of sense back in the days of
>
> low speed serial communications
> very small storage devices
> expensive data links (anyone remember gprs @ 10c per kilo byte?)
>
> Expensive GPRS lead to WAP which was (roughly) a tokenised binary HTML/XML
> It was not widely used due to some of the above problems, and it also scared
> off content developers
>
> memory constrained systems like
>
> Old PC's (486 and earlier)
> old microcontrollers were short on memory and clock speed
> some modern microcontrollers can be affected here, but only the lower end
> ones
>
> slow radio links (bluetooth and wifi are not slow)
>
> They do make sense when used internally to a compression algorithm or an
> encryption scheme.
> They don't make sense in (almost) all other cases today.
> (No flames here please, I am not trolling, just making a very generalized
> statement)
>
>
> For a format like .png binary is not so bad, the format is unlikely to
> change much if ever (even so there are at least 6 different png formats out
> there, some of them ascii and some binary), but it is a simple format to
> describe and implement. Any program that works with png's needs to implement
> support for all versions of png, or clearly explain to the user what is and
> is not supported with appropriate error messages. Any png programs that
> don't support all versions tend to fall into disuse and die a slow death.
>
> Text based formats on the other hand, if well designed, are (to a fair
> extent)
>
> human readable
> self documenting (although good documentation is *ALWAYS* recommended
> general purpose parsers can be used allowing our code to focus on using the
> information, not extracting it
> tolerant of extra nodes (features)
> tolerant of missing nodes
> often a newer version of a parser can parse older versions of a file without
> problem.
> It is trivial for a newer version to cater for specific version based
> variations in a protocol.
> If the protocol is well designed new versions will only add nodes to the
> previous version, never alter the way data is stored.
>
> This allows a newer parser to parse any historical version file.
> In almost all cases an older version parser can parse a newer file, but
>
> any nodes that are unknown will just be skipped.
> Skipped nodes may or may not be a problem for the way a use sees the result,
> but there should be a 

Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-19 Thread David Godfrey

  
  
Hi,

I don't often post here, but I will add my ~30 years of experience
with binary file formats.
THEY ARE EVIL EVIL THINGS !!

  They tend to be extremely fragile, any change and something is
likely to break.
  They require parsers that suck to maintain
  In most cases general purpose parsers *CAN NOT* be used,
requiring a custom parser.
  
  it is *VERY* difficult to keep compatibility with an older
version of the format.
Essentially you need to keep multiple copies of the parser and
have some form of VersionID that can be used to select between
parsers.
  To all intents it is *IMPOSSIBLE* to handle forward
compatibility.
ie: open a file with version 1.1 that was written with version
1.1a
  A permanent and accurate copy of documentation is required to
be maintained for *EVERY* version of the parser ever written
  Parser VersionID magic must uniquely identify the parser
version and be the first entry in the file.
NOTE: other locations are possible, but cause significant
problems with extracting the version
  The more involved the protocol the worse all of these issues
get.
  Documentation is hard to write, and even harder to read and
understand sufficiently to code against.
  
  Debugging read/write issues is an absolute nightmare.
You either have to provide a reference to test against by either

  hand encode or decode a file
  have an independent programmer write a "clean room"
implementation of the parser
This requires the previously mentioned documentation to be
correct for this version
It can introduce additional bugs (you now have 2 parsers to
debug)

  
  Many potential developers of addons or core code may

  be scared off by the complexity
  accidentally introduce bugs or incompatibilities that
don't show up until a user looses data

  
  Inadvertently force full or partial Vendor Lock In. 
If a protocol (file format) is 

  too hard to implement
  too hard to keep up with changes
  too prone to breakage (fragile)
  too obscure (poorly documented or the documentation is not
easily available)
  

Then other vendors won't put in the effort to support the
protocol (file format)
  

I could go on with a lot more points but you get the picture.

Binary file formats made a lot of sense back in the days of 

  low speed serial communications
  very small storage devices
  expensive data links (anyone remember gprs @ 10c per kilo
byte?)

  Expensive GPRS lead to WAP which was (roughly) a tokenised
binary HTML/XML
It was not widely used due to some of the above problems,
and it also scared off content developers
  

  
  memory constrained systems like

  Old PC's (486 and earlier)
  old microcontrollers were short on memory and clock speed
  some modern microcontrollers can be affected here, but
only the lower end ones

  
  slow radio links (bluetooth and wifi are not slow)

They do make sense when used internally to a compression
  algorithm or an encryption scheme. 
  They don't make sense in (almost) all other cases today.
  (No flames here please, I am not trolling, just making a
  very generalized statement)


For a format like .png binary is not so bad, the format is unlikely
to change much if ever (even so there are at least 6 different png
formats out there, some of them ascii and some binary), but it is a
simple format to describe and implement. Any program that works with
png's needs to implement support for all versions of png, or clearly
explain to the user what is and is not supported with appropriate
error messages. Any png programs that don't support all versions
tend to fall into disuse and die a slow death.

Text based formats on the other hand, if well designed, are (to a
fair extent)

  human readable
  self documenting (although good documentation is *ALWAYS*
recommended
  general purpose parsers can be used allowing our code to focus
on using the information, not extracting it
  
  tolerant of extra nodes (features)
  tolerant of missing nodes
  often a newer version of a parser can parse older versions of
a file without problem.
It is trivial for a newer version to cater for specific version
based variations in a protocol.
  If the protocol is well designed new versions will only add
nodes to the 

Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Lorenzo Marcantonio
On Fri, Dec 18, 2015 at 12:00:58AM -0500, Mark Roszko wrote:

> is just extremely silly and extra work compared to generate SEXPR
> trees in memory like SEXPR represents in the first place. God forbid
> you accidentally format that double wrong.

D'oh you actually want a full in-memory tree representation.. the full
lisp way (made of cons cells, obviously :D)

> Namely file size savings and sanity. Instead of all those ridiculous X

Well these are trivial IMHO. The biggest horror for me is splitting one
object in many data forms. I couldn't care less if the result is (data
aa bb cc) or (data "asdf==")... of course the second one is better. Even
better if you tagged it with a macro character like (data {asdf==}) so
that the reader could know that {} denotes a base64 encoded blob (the {}
is not lisp, in the '80 there was no base64 :P).

-- 
Lorenzo Marcantonio
CZ Srl - Parma

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Lorenzo Marcantonio
On Thu, Dec 17, 2015 at 11:49:31PM -0500, Chris Pavlina wrote:
> Dude. The way it's stored currently is horrible - it's not congruent to 
> the structure of the file! The s-expr file is supposed to be a tree 
> structure, why is the binary data stored broken into multiple objects 
> like that? It's yet another facet of the parsing nightmare we have.
> 
> If you're going to you a "standard" format like s-expr, you should 
> actually understand it and use it the way it's meant to be used.

Yep you wouldn't need many data forms... why not a single big one?

Oh yeah, because it uses *strings* for that, and it couldn't newline in
them :D

For the curious, common lisp sexp syntax for arrays is like this:

#(1 2 3 4) ; 1 2 3 4 are the elements, of various type

...of course you could simply say (data 89 50 4E 47 ...) since the types
are currently hardcoded... it works that way for the timestamps.

-- 
Lorenzo Marcantonio
CZ Srl - Parma

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Lorenzo Marcantonio
On Thu, Dec 17, 2015 at 10:32:10PM -0500, Mark Roszko wrote:
> So awhile back, Wayne said to use sexpr for something I wanted to do.
> Then I looked at the sexpr parsing and said NOPE.

OK, being a lisper and having read the dragon book here's my view on the
subject:

1) the parser isn't actually bad, it's mostly a recursive descent i.e.
   it expects the *actual* grammar to be generated. You take the BNF
   form, process it and write the corresponding state machine. It's not
   VB6, it's a state machine :D

   So you tailor a parser for a specific grammar; if you do it by hand
   good luck if the grammar changes (there are tools like yacc that
   generate parsers but that's beyond the scope)

2) Given the point before it doesn't actually do sexps because it needs
   the type in the grammar beforehand. Given:

   (module smd:C0603 (layer F.Cu) (tedit 52108AB5) (tstamp 558BEA9B) ...)

   The parser *must* know that module, layer, tedit and tstamp are
   atoms, and where are strings and hex values placed. So the grammar
   should be (ignoring the fact that tedit and tstamp forms are
   optional):

   (module  (layer ) (tedit ) (tstamp ) ...)

   Any true sexp form has types self-evident (exactly how depends on the
   lisp flavour you're using, in common lisp reader macros define the
   behaviour), so the reader (the tokenizer in two piece parsers) can
   parse it WITHOUT knowing the grammar beforehand. The common lisp way
   (actually any lisp except for the hex numbers) would be:

   (module "smd:C0603" (layer "F.Cu") (tedit #x52108AB5) (tstamp #x558BEA9B) 
...)

   ...actually I adapted the pcbnew code (it's about a ten line diff:P)
   to emit this

   (module "smd:C0603" (layer "F.Cu") (tedit 52108AB5) (tstamp 558BEA9B) ...)

   ...which is fully backward compatible with 'official' kicad parser.
   Just quote every string; it also get correctly font locked in emacs :D
   Hex values are only used for these fields AFAIK, I didn't special
   cased them in kicad but in lisp...

3) Using an XML similitude, usual sexp processing in lisp follows
   something like a DOM model: 'read' (that's the actual function name)
   pull up the whole sexp in memory (yes the whole file, in this case!
   there is a way to do partial processing but it's quite advanced and
   you need to temporarily rebind the system reader) and then you
   process it with your favourite list mangler; too bad you would need a
   full lisp environment to do it in pcbnew :P However something like
   SAX would be quite easy to implement; events would be 'start of list'
   'end of list' 'atom' 'string' 'number' and such

4) Even in lisp *you have parser generators*! it's something like 40
   years they've been used for, like, everything. Kicad has something
   for keywords but the grammar is still hand coded. That's the major
   flaw, IMHO

As for the performance issue I think these are non-existant. It's I/O
code and it's done only once every ten minutes, if you work like me...
probably the kernel works a lot more to handle buffers and schedule the
disk I/O than pcbnew to form or decode the sexps. OTOH Dick is famous to
micromanage performance stuff (like not checking types with dynamic_cast
because it's more expensive than reading the object type tag:P), so that
would be a given :D

-- 
Lorenzo Marcantonio
CZ Srl - Parma

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Mark Roszko
>You take the BNF form, process it and write the corresponding state machine. 
>It's not VB6, it's a state machine :D

Writing a new state machine for every single list and every single
file over and over again is the part I have problems with. There
should be a single state machine that takes the tokens and gives you a
list. Not 500 over the whole codebase.

>Well these are trivial IMHO. The biggest horror for me is splitting one
object in many data forms.

The definition of sanity is not splitting it into many data forms.


>3) Using an XML similitude, usual sexp processing in lisp follows
   something like a DOM model

Yea that was the plan when I structured my end result. Walking it
later is trivial.
I'm more for manual walking of the lists after the fact than trying to
use an event based one. I don't see a benefit really and rather see it
increase complexity with needing callback classes when manual
unrolling should work fairly well BUT i am not exactly happy with
manual unrolling looks so its something to play with.

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Lorenzo Marcantonio
On Fri, Dec 18, 2015 at 07:55:07AM -0500, Mark Roszko wrote:
> Writing a new state machine for every single list and every single
> file over and over again is the part I have problems with. There
> should be a single state machine that takes the tokens and gives you a
> list. Not 500 over the whole codebase.

Also, re-read eventually the part about parser generation. And think
about grammar changes...

> The definition of sanity is not splitting it into many data forms.

It's nonetheless a 'curious' engineering approach:D

> >3) Using an XML similitude, usual sexp processing in lisp follows
>something like a DOM model
> 
> Yea that was the plan when I structured my end result. Walking it
> later is trivial.

I'd suggest to use a proper list/vector container instead of the cons
approach (it was meant to be a joke). Cons handling is trickier without
the lisp runtime at hand :D

In pseudo-BNF

list :- sequence-of list-element
list-element :- one-of(symbol, string, number, whatever, list)

The sequence-of could be a vector of base pointers using push_back, the
one-of is obviously modeled with inheritance (if it were C a union would
be fine...). As for the lexing strategy: the traditional lisp reader has
*no* lookahead and dispatch on the first character:

- '(' starts a list
- [0123456789.+-] starts a number
- '"' starts a string
- a letter start a symbol
- whitespace is eaten
- other characters trigger specific behaviour (like the '#' main macro
  character)

*if* you want to keep string quoting optional then you can't distinguish
a string from a symbol (because depends on the semantic grammar which
the reader doesn't have access to). Then you have to match keywords as
string, not elegant but doable.

> I'm more for manual walking of the lists after the fact than trying to
> use an event based one. I don't see a benefit really and rather see it
> increase complexity with needing callback classes when manual
> unrolling should work fairly well BUT i am not exactly happy with
> manual unrolling looks so its something to play with.

Given the relatively low amount of data to process a DOM approach is
quite feasible. Keep an iterator on the current list handy and loop
away. There are plenty of matching/binding/unifying/destructuring
methods to use when you have the whole list already in core. Personally
I would use a recursive descent driven by the tree elements (*not*
directly by the input file, as it is now); it should be the easiest to
do by hand.

-- 
Lorenzo Marcantonio
CZ Srl - Parma

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Edwin van den Oetelaar
On Fri, Dec 18, 2015 at 3:49 PM, Tomasz Wlostowski <
tomasz.wlostow...@cern.ch> wrote:

> On 18.12.2015 15:46, Edwin van den Oetelaar wrote:
> > Concerning changing the format of the PCB file again...
> > Making a new binary file format is a big NO NO NO (screaming) in my book.
>
> Hi Edwin,
>
> Don't worry, we are not going to change the format :)
>
> > I want to be able to edit it with VIM if needed.
>
> See, this is my point :) Why implement a new feature in pcbnew (so that
> everybody can use it) if you can edit the .kicad_pcb with VIM...
>

It is not that I want to do it, but I want to Be Able To Do It when I have
to.
I also have a HAM license for when phone, cellular and internet go dead...
I also have a large battery backup... just in case..

Thanks for working on the project, you do good work.
Greetings,
Edwin


>
> Best,
> Tom
>
___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Lorenzo Marcantonio
On Fri, Dec 18, 2015 at 10:40:44AM -0500, Mark Roszko wrote:

> Simply trying to make parsing simple and easy of sexpr. No file format
> change required.

Still keep proposing mandatory quotes for strings :D completely backward
compatible.

> Yes we need a parser especially with some of our nuisances of allowing
> UTF8 in places (most parsers I've seen only allow ASCII and other
> funnyness, and SPECTRA which is sexpr-like is a different take). I
> wrote a parser that's simple to maintain in tree and can be extended
> to handle things easily.

Can of worm warning! Be careful since going out-of-ASCII is not so
simple for some formats. Case in point, IPC is ASCII strict and SPECCTRA
has it's own things for quoting. IIRC there are special cases in the
sexp code for handling SPECCTRA idiosyncrasies.

> I don't quite understand the point about "generic" formats not needing
> parsers. All formats need dedicated parsers, unless you were written
> in Javascript and you eval(JSON).

Or (read) for lisp sexps... if it isn't dedicated why libxml2 is so big? :D

Also generators. printf doesn't quite feel right. In C++ << is
overloaded for stream and called 'insertion operator', it would fit into
place for building sexps.

-- 
Lorenzo Marcantonio
CZ Srl - Parma

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Edwin van den Oetelaar
Concerning changing the format of the PCB file again...

Just my $0.02 ...

I have been lurking on this list for years now...

Making a new binary file format is a big NO NO NO (screaming) in my book.

I want to be able to edit it with VIM if needed.
I personally do not care if the format is s-expression or Json or XML as
long as it can be read and changed by humans and text editors and parsers.
This was one of the reasons to leave all these proprietary binary formats
behind and start using KiCad.
I see no reason why parsing a text based format can not work, possibly it
is less work to fix any issues than to start all over again with yet
another file format
(like there are not enough file formats in the PCB industry already)

Just make the thing work as designed now, do not start another project
which brings nothing but trouble.

Thanks for listening,
Edwin van den Oetelaar




On Fri, Dec 18, 2015 at 3:23 PM, Tomasz Wlostowski <
tomasz.wlostow...@cern.ch> wrote:
>
> On 18.12.2015 13:55, Mark Roszko wrote:
> > Writing a new state machine for every single list and every single
> > file over and over again is the part I have problems with. There
> > should be a single state machine that takes the tokens and gives you a
> > list. Not 500 over the whole codebase.
>
> Hi Mark,
>
> May I add my 5 cents to this discussion...
>
> - My only big concern with the current parser so far is the lexer code
> generation done by CMake. I'm not a big fan of making scripts that
> produce code which then gets compiled...
>
> Anyway, since we got here, I have a devilish idea: why a text format at
> all? Of course we are not going to change the format again (we have a
> lot of more exciting things to do ;-), but to point out a few reasons:
>
> - PCB files (as opposed to netlists) represent graphical objects. For
> someone looking at a PCB file with a text editor, it's just meaningless
> numbers. Diffs look equally horrible (just look at a diff between two
> .kicad_pcb files after moving a couple of traces in P mode).
>
> - binary formats are generally easier to parse (unless someone made it
> deliberately difficult - but it's not our case). We could just serialize
> objects directly to a binary file, along with some version info (think
> of Google's protobuf). This would also let us implement introspection
> (e.g. a property editor tool) for no extra cost.
>
> - Our s-expr format needs a custom parser (correct me if I'm wrong),
> which contradicts the idea that generic text formats (e.g. json/lisp
> s-expr, even lua/python arrays) need no dedicated parsers.
>
> - Last but not least (and surely not least controversial): file format
> that is easy to hand-edit enables people to hack scripts that do stuff
> that Kicad is currently missing. This is an advantage for some (perhaps
> more advanced) users who can work around missing features this way, but
> is it really beneficial for Kicad as a complete PCB design tool? If the
> easiest way is to hack a PCB file with a perl script/text editor, what
> motivation is left to implement the missing feature in Kicad?
>
> Cheers,
> Tom
>
>
>
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Tomasz Wlostowski
On 18.12.2015 15:46, Edwin van den Oetelaar wrote:
> Concerning changing the format of the PCB file again...
> Making a new binary file format is a big NO NO NO (screaming) in my book.

Hi Edwin,

Don't worry, we are not going to change the format :)

> I want to be able to edit it with VIM if needed.

See, this is my point :) Why implement a new feature in pcbnew (so that
everybody can use it) if you can edit the .kicad_pcb with VIM...

Best,
Tom

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Tomasz Wlostowski
On 18.12.2015 13:55, Mark Roszko wrote:
> Writing a new state machine for every single list and every single
> file over and over again is the part I have problems with. There
> should be a single state machine that takes the tokens and gives you a
> list. Not 500 over the whole codebase.

Hi Mark,

May I add my 5 cents to this discussion...

- My only big concern with the current parser so far is the lexer code
generation done by CMake. I'm not a big fan of making scripts that
produce code which then gets compiled...

Anyway, since we got here, I have a devilish idea: why a text format at
all? Of course we are not going to change the format again (we have a
lot of more exciting things to do ;-), but to point out a few reasons:

- PCB files (as opposed to netlists) represent graphical objects. For
someone looking at a PCB file with a text editor, it's just meaningless
numbers. Diffs look equally horrible (just look at a diff between two
.kicad_pcb files after moving a couple of traces in P mode).

- binary formats are generally easier to parse (unless someone made it
deliberately difficult - but it's not our case). We could just serialize
objects directly to a binary file, along with some version info (think
of Google's protobuf). This would also let us implement introspection
(e.g. a property editor tool) for no extra cost.

- Our s-expr format needs a custom parser (correct me if I'm wrong),
which contradicts the idea that generic text formats (e.g. json/lisp
s-expr, even lua/python arrays) need no dedicated parsers.

- Last but not least (and surely not least controversial): file format
that is easy to hand-edit enables people to hack scripts that do stuff
that Kicad is currently missing. This is an advantage for some (perhaps
more advanced) users who can work around missing features this way, but
is it really beneficial for Kicad as a complete PCB design tool? If the
easiest way is to hack a PCB file with a perl script/text editor, what
motivation is left to implement the missing feature in Kicad?

Cheers,
Tom



___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Mark Roszko
So any actual comments on what I did in the commit I linked originally
in the first email?

>Still keep proposing mandatory quotes for strings :D completely backward
compatible.

Well yeamy generator class does that all the time. Unless you
define a symbol explicitly, all strings are always quoted.


>Can of worm warning! Be careful since going out-of-ASCII is not so
>simple for some formats. Case in point, IPC is ASCII strict and SPECCTRA
>has it's own things for quoting. IIRC there are special cases in the
>sexp code for handling SPECCTRA idiosyncrasies.


Well yea, I am aware of the particular behaviors it has, I've gone
through what kicad does before I came rage posting.


>Also generators. printf doesn't quite feel right. In C++ << is
overloaded for stream and called 'insertion operator', it would fit into
place for building sexps.


I already have those overloads :P

roughly
SEXPR_LIST << 10 << "string thing" << OUTPUT_SYMBOL("symbol") << 4.0;

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Mark Roszko
O god, your email is just going to start a chain of rage about
changing the format.
Not trying to do that folks, please rage in another thread if you want
to reply to him on that.



Simply trying to make parsing simple and easy of sexpr. No file format
change required.


> Our s-expr format needs a custom parser (correct me if I'm wrong),
>which contradicts the idea that generic text formats (e.g. json/lisp
>s-expr, even lua/python arrays) need no dedicated parsers.

Yes we need a parser especially with some of our nuisances of allowing
UTF8 in places (most parsers I've seen only allow ASCII and other
funnyness, and SPECTRA which is sexpr-like is a different take). I
wrote a parser that's simple to maintain in tree and can be extended
to handle things easily.

I don't quite understand the point about "generic" formats not needing
parsers. All formats need dedicated parsers, unless you were written
in Javascript and you eval(JSON).

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-18 Thread Wayne Stambaugh
Mark,

I haven't had time to look at your commit so I'm not going to comment on
that until I do.  I don't know when I'll have time as I have a lot of
stuff to do and I'm traveling over the holidays so my review time will
be limited.  In the mean time I will comment on some of the things I've
read in this thread.

Binary file formats are not going happen while I'm project leader. I
have 30 years of less than positive experience with them and I'm not
about to start using them now.

Be careful with the C++ << operators. They may not do the correct thing
with floating point numbers. There may also be some overhead that the
current design does not have.  I will not accept a performance hit on
file parsing or any significant increase in formatted file size.

On 12/17/2015 10:32 PM, Mark Roszko wrote:
> So awhile back, Wayne said to use sexpr for something I wanted to do.
> Then I looked at the sexpr parsing and said NOPE.
> 
> 
> Why NOPE? Because the current parsing regime is basically Visual Basic
> 6 parser written in a modern language with micro-optimizations meant
> for someone running a Windows 3.1 computer with 5.25" storage drives.
> 
> 
> This patch is a propsed sexpr parser that parses sepxr like its sexpr
> and not a parenthesis format. Because if you are parsing with things
> like NeedLeft() and NeedRight(), you are parsing it wrong.
> 
> Especially when you see something like this:
> 
> case T_descr:
> if( sawDesc )
> in->Duplicate( tok );
> sawDesc = true;
> in->NeedSYMBOLorNUMBER();
> row.SetDescr( in->FromUTF8() );
> break;
> 
> 
> but then you realize that fp_lib_table has quoted strings for MOST
> descr entries. And you question why the method is called
> NeedSYMBOLorNumber when its a string with quotes. Then you go deeper
> and realize the parser may as well be a csv reader/writer. You either
> parse like sexpressions or you stop calling it sexpressions.
> 
> 
> 
> New proposed system:
> 1. Reads and generates an in memory tree structure of data as it
> should be, i.e. lists, strings, numbers, etc.

I'm ok with the memory tree structure as long as it doesn't add any
significant parsing or conversion to internal object overhead.

> 2. Helpers to pull out each item as need be

OK

> 3. Backwards compatible

Backwards compatibility is not optional.

> 4. Doesn't do silly keywords micro optimization at compile time. You
> do a string comparison to convert the value to integer anyway, using
> if/else is no different each time. Kicad isn't parsing gigabyte sized
> files nor hundreds of files, this optimization really isn't worth the
> overhead in maintenance.

This change will probably kill your performance as the file format gets
new features which it will.  I could be wrong but there is not going to
be much faster token look up than integers.  Before you decide this
optimization isn't worth it, you need to get an 8 layer+ high density
board file from someone and do some testing.

> 5. Generate saved files from in memory tree structures, this will
> avoid all possible formatting irregularities and differences because
> someone handwrote unrolling all the data members.

I need to see how your doing this because keeping both the memory tree
and the board objects in memory doesn't make a lot of sense to me.

> 6. Avoid things like " ${KIGITHUB}/Air_Coils_SML_NEOSID.pretty" being
> defined as a symbol instead of a string in the future.
> 7. Explicit definitions of symbols and strings. Strings are always
> quoted. Period. No silly auto-detection logic.
> 
> 
> So my first goal is to have 1:1 parity with the existing stuff for
> kicad files for both reading and writing.
> 
> 
> Benchmarks:
> Old fp-lib-table read: 1ms
> New fp-lib-table read: 2ms
> 
> Here's the actual commit for fp-lib-table:
> https://github.com/marekr/kicad-sexpr/commit/9367f469be69962d14671411eddd6fd759ace1f2
> 
> Not expecting anyone to compile it or anything, more input that
> anything. Yes its messy as its an initial proof of concept.
> 
> 
> 
> Yes, string parsing and writing isn't escaping properly, TBD (easy, just 
> lazy).
> There would probably be a smaller kicad_sexpr wrapper to implement
> common sexpr pattern helpers such as the "list key-value pair" that's
> used.
> 
> 
> 
> Also played around with .kicad-pcb but its not committed:
> Old read: ~350ms
> New read: ~230ms
> 
> 
> I await the abuse.
> 
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
> 


___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : 

Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-17 Thread Mark Roszko
I just saw this:
http://i.imgur.com/H8VxD3Z.png

storing a bitmap as hex values.

Next proposal item to add: storing binary data(bitmaps) as base64
quoted strings like every other browser, application and tool.

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-17 Thread Chris Pavlina
Dude. The way it's stored currently is horrible - it's not congruent to 
the structure of the file! The s-expr file is supposed to be a tree 
structure, why is the binary data stored broken into multiple objects 
like that? It's yet another facet of the parsing nightmare we have.

If you're going to you a "standard" format like s-expr, you should 
actually understand it and use it the way it's meant to be used.

On Thu, Dec 17, 2015 at 11:46:31PM -0500, tiger12506 wrote:
> Not disagreeing with you on this one, but I would have to question why...
> 
> Why change this when what is already there means something more than what
> you're proposing...
> KiCad doesn't have to do internally what "every other browser, application,
> and tool" does, if it doesn't help anything.
> Not sure how storing base64 would help.
> 
> *backs away slowly*
> 
> On 12/17/2015 11:20 PM, Mark Roszko wrote:
> >I just saw this:
> >http://i.imgur.com/H8VxD3Z.png
> >
> >storing a bitmap as hex values.
> >
> >Next proposal item to add: storing binary data(bitmaps) as base64
> >quoted strings like every other browser, application and tool.
> >
> >___
> >Mailing list: https://launchpad.net/~kicad-developers
> >Post to : kicad-developers@lists.launchpad.net
> >Unsubscribe : https://launchpad.net/~kicad-developers
> >More help   : https://help.launchpad.net/ListHelp
> 
> 
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-17 Thread Mark Roszko
>Not disagreeing with you on this one, but I would have to question why...

Did you skip the why section?


Because we are supposed to have a standard file parser and reader that
is maintainable. What there is a micro-optimization obsessed VB6
parser. There's a keyword enum pre-generation that just adds pointless
back and forth (a class with const strings is more than enough). If we
are reading "s-expressions" then there should be a common output,
every single kicad file writer shouldn't be implementing its own
interpretation of raw tokens from the files. When you have a standard
API from C++ objects to file, then you can ensure 100% sanity across
all files. Right now there can easily be one-off errors because
someone forgot a token or made assumptions.

Also writing files is insane. Every "file writer" inherits or creates
it owns methods to write the file output with what is basically printf
statements.

Stuff like this:

m_out->Print( nestLevel, "(%s", getTokenName( T_setup ) );
m_out->Print( 0, "(textsize %s %s)",
  double2Str( WORKSHEET_DATAITEM::m_DefaultTextSize.x ).c_str(),
  double2Str( WORKSHEET_DATAITEM::m_DefaultTextSize.y
).c_str() );
m_out->Print( 0, "(linewidth %s)", double2Str(
WORKSHEET_DATAITEM::m_DefaultLineWidth ).c_str() );
m_out->Print( 0, "(textlinewidth %s)", double2Str(
WORKSHEET_DATAITEM::m_DefaultTextThickness ).c_str() );
m_out->Print( 0, "\n" );


is just extremely silly and extra work compared to generate SEXPR
trees in memory like SEXPR represents in the first place. God forbid
you accidentally format that double wrong.




>Not sure how storing base64 would help.
Namely file size savings and sanity. Instead of all those ridiculous X
number of hex bytes on each line split with spaces, you have a single
quoted entry that's a base64ed.
And you can actually manually edit that, god forbid someone wants to
do that then.

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-17 Thread tiger12506

Ok. I see. Sorry for the noise.


Not sure how storing base64 would help.


Namely file size savings and sanity. Instead of all those ridiculous X
number of hex bytes on each line split with spaces, you have a single
quoted entry that's a base64ed.
And you can actually manually edit that, god forbid someone wants to
do that then.



___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Kicad-developers] [rfc] actual sexpression parsing

2015-12-17 Thread Chris Pavlina
It's not the base64 that's important, it's the structure. You can pick 
whatever encoding you like, base64 is just very common as a relatively 
dense but still text-safe one.

On Fri, Dec 18, 2015 at 12:06:25AM -0500, tiger12506 wrote:
> Ok. I see. Sorry for the noise.
> 
> 
> Not sure how storing base64 would help.
> 
> >Namely file size savings and sanity. Instead of all those ridiculous X
> >number of hex bytes on each line split with spaces, you have a single
> >quoted entry that's a base64ed.
> >And you can actually manually edit that, god forbid someone wants to
> >do that then.
> 
> 
> ___
> Mailing list: https://launchpad.net/~kicad-developers
> Post to : kicad-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~kicad-developers
Post to : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp