Re: [Kicad-developers] [rfc] actual sexpression parsing
Hi, On 21.12.2015 08:35, Lorenzo Marcantonio wrote: > After this I'm *not* advocating a binary format, there would be no > advantage for it in pcbnew. There are no performance restraints in > either speed or space for kicad files so just use whatever is easier for > you to manage (i.e. structured text) The main advantage of text files is that you can fix them with a text editor when stuff breaks, because there are no length fields to update consistently. I'd consider that a major plus. Simon signature.asc Description: OpenPGP digital signature ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
Hi David This discussion about binary formats was just a side comment from someone, this is off topic. Please use your energy on something usefull, like commenting on Roszko's patch instead. Which is about making am more sane sexpr parser than we have in KiCad at the moment. Please try to be constructive. 2015-12-20 4:51 GMT+01:00 David Godfrey: > Hi, > > I don't often post here, but I will add my ~30 years of experience with > binary file formats. > THEY ARE EVIL EVIL THINGS !! > > They tend to be extremely fragile, any change and something is likely to > break. > They require parsers that suck to maintain > In most cases general purpose parsers *CAN NOT* be used, requiring a custom > parser. > it is *VERY* difficult to keep compatibility with an older version of the > format. > Essentially you need to keep multiple copies of the parser and have some > form of VersionID that can be used to select between parsers. > To all intents it is *IMPOSSIBLE* to handle forward compatibility. > ie: open a file with version 1.1 that was written with version 1.1a > A permanent and accurate copy of documentation is required to be maintained > for *EVERY* version of the parser ever written > Parser VersionID magic must uniquely identify the parser version and be the > first entry in the file. > NOTE: other locations are possible, but cause significant problems with > extracting the version > The more involved the protocol the worse all of these issues get. > Documentation is hard to write, and even harder to read and understand > sufficiently to code against. > Debugging read/write issues is an absolute nightmare. > You either have to provide a reference to test against by either > > hand encode or decode a file > have an independent programmer write a "clean room" implementation of the > parser > This requires the previously mentioned documentation to be correct for this > version > It can introduce additional bugs (you now have 2 parsers to debug) > > Many potential developers of addons or core code may > > be scared off by the complexity > accidentally introduce bugs or incompatibilities that don't show up until a > user looses data > > Inadvertently force full or partial Vendor Lock In. > If a protocol (file format) is > > too hard to implement > too hard to keep up with changes > too prone to breakage (fragile) > too obscure (poorly documented or the documentation is not easily available) > > Then other vendors won't put in the effort to support the protocol (file > format) > > I could go on with a lot more points but you get the picture. > > Binary file formats made a lot of sense back in the days of > > low speed serial communications > very small storage devices > expensive data links (anyone remember gprs @ 10c per kilo byte?) > > Expensive GPRS lead to WAP which was (roughly) a tokenised binary HTML/XML > It was not widely used due to some of the above problems, and it also scared > off content developers > > memory constrained systems like > > Old PC's (486 and earlier) > old microcontrollers were short on memory and clock speed > some modern microcontrollers can be affected here, but only the lower end > ones > > slow radio links (bluetooth and wifi are not slow) > > They do make sense when used internally to a compression algorithm or an > encryption scheme. > They don't make sense in (almost) all other cases today. > (No flames here please, I am not trolling, just making a very generalized > statement) > > > For a format like .png binary is not so bad, the format is unlikely to > change much if ever (even so there are at least 6 different png formats out > there, some of them ascii and some binary), but it is a simple format to > describe and implement. Any program that works with png's needs to implement > support for all versions of png, or clearly explain to the user what is and > is not supported with appropriate error messages. Any png programs that > don't support all versions tend to fall into disuse and die a slow death. > > Text based formats on the other hand, if well designed, are (to a fair > extent) > > human readable > self documenting (although good documentation is *ALWAYS* recommended > general purpose parsers can be used allowing our code to focus on using the > information, not extracting it > tolerant of extra nodes (features) > tolerant of missing nodes > often a newer version of a parser can parse older versions of a file without > problem. > It is trivial for a newer version to cater for specific version based > variations in a protocol. > If the protocol is well designed new versions will only add nodes to the > previous version, never alter the way data is stored. > > This allows a newer parser to parse any historical version file. > In almost all cases an older version parser can parse a newer file, but > > any nodes that are unknown will just be skipped. > Skipped nodes may or may not be a problem for the way a use sees the result, > but there should be a
Re: [Kicad-developers] [rfc] actual sexpression parsing
Hi, I don't often post here, but I will add my ~30 years of experience with binary file formats. THEY ARE EVIL EVIL THINGS !! They tend to be extremely fragile, any change and something is likely to break. They require parsers that suck to maintain In most cases general purpose parsers *CAN NOT* be used, requiring a custom parser. it is *VERY* difficult to keep compatibility with an older version of the format. Essentially you need to keep multiple copies of the parser and have some form of VersionID that can be used to select between parsers. To all intents it is *IMPOSSIBLE* to handle forward compatibility. ie: open a file with version 1.1 that was written with version 1.1a A permanent and accurate copy of documentation is required to be maintained for *EVERY* version of the parser ever written Parser VersionID magic must uniquely identify the parser version and be the first entry in the file. NOTE: other locations are possible, but cause significant problems with extracting the version The more involved the protocol the worse all of these issues get. Documentation is hard to write, and even harder to read and understand sufficiently to code against. Debugging read/write issues is an absolute nightmare. You either have to provide a reference to test against by either hand encode or decode a file have an independent programmer write a "clean room" implementation of the parser This requires the previously mentioned documentation to be correct for this version It can introduce additional bugs (you now have 2 parsers to debug) Many potential developers of addons or core code may be scared off by the complexity accidentally introduce bugs or incompatibilities that don't show up until a user looses data Inadvertently force full or partial Vendor Lock In. If a protocol (file format) is too hard to implement too hard to keep up with changes too prone to breakage (fragile) too obscure (poorly documented or the documentation is not easily available) Then other vendors won't put in the effort to support the protocol (file format) I could go on with a lot more points but you get the picture. Binary file formats made a lot of sense back in the days of low speed serial communications very small storage devices expensive data links (anyone remember gprs @ 10c per kilo byte?) Expensive GPRS lead to WAP which was (roughly) a tokenised binary HTML/XML It was not widely used due to some of the above problems, and it also scared off content developers memory constrained systems like Old PC's (486 and earlier) old microcontrollers were short on memory and clock speed some modern microcontrollers can be affected here, but only the lower end ones slow radio links (bluetooth and wifi are not slow) They do make sense when used internally to a compression algorithm or an encryption scheme. They don't make sense in (almost) all other cases today. (No flames here please, I am not trolling, just making a very generalized statement) For a format like .png binary is not so bad, the format is unlikely to change much if ever (even so there are at least 6 different png formats out there, some of them ascii and some binary), but it is a simple format to describe and implement. Any program that works with png's needs to implement support for all versions of png, or clearly explain to the user what is and is not supported with appropriate error messages. Any png programs that don't support all versions tend to fall into disuse and die a slow death. Text based formats on the other hand, if well designed, are (to a fair extent) human readable self documenting (although good documentation is *ALWAYS* recommended general purpose parsers can be used allowing our code to focus on using the information, not extracting it tolerant of extra nodes (features) tolerant of missing nodes often a newer version of a parser can parse older versions of a file without problem. It is trivial for a newer version to cater for specific version based variations in a protocol. If the protocol is well designed new versions will only add nodes to the
Re: [Kicad-developers] [rfc] actual sexpression parsing
On Fri, Dec 18, 2015 at 12:00:58AM -0500, Mark Roszko wrote: > is just extremely silly and extra work compared to generate SEXPR > trees in memory like SEXPR represents in the first place. God forbid > you accidentally format that double wrong. D'oh you actually want a full in-memory tree representation.. the full lisp way (made of cons cells, obviously :D) > Namely file size savings and sanity. Instead of all those ridiculous X Well these are trivial IMHO. The biggest horror for me is splitting one object in many data forms. I couldn't care less if the result is (data aa bb cc) or (data "asdf==")... of course the second one is better. Even better if you tagged it with a macro character like (data {asdf==}) so that the reader could know that {} denotes a base64 encoded blob (the {} is not lisp, in the '80 there was no base64 :P). -- Lorenzo Marcantonio CZ Srl - Parma ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
On Thu, Dec 17, 2015 at 11:49:31PM -0500, Chris Pavlina wrote: > Dude. The way it's stored currently is horrible - it's not congruent to > the structure of the file! The s-expr file is supposed to be a tree > structure, why is the binary data stored broken into multiple objects > like that? It's yet another facet of the parsing nightmare we have. > > If you're going to you a "standard" format like s-expr, you should > actually understand it and use it the way it's meant to be used. Yep you wouldn't need many data forms... why not a single big one? Oh yeah, because it uses *strings* for that, and it couldn't newline in them :D For the curious, common lisp sexp syntax for arrays is like this: #(1 2 3 4) ; 1 2 3 4 are the elements, of various type ...of course you could simply say (data 89 50 4E 47 ...) since the types are currently hardcoded... it works that way for the timestamps. -- Lorenzo Marcantonio CZ Srl - Parma ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
On Thu, Dec 17, 2015 at 10:32:10PM -0500, Mark Roszko wrote: > So awhile back, Wayne said to use sexpr for something I wanted to do. > Then I looked at the sexpr parsing and said NOPE. OK, being a lisper and having read the dragon book here's my view on the subject: 1) the parser isn't actually bad, it's mostly a recursive descent i.e. it expects the *actual* grammar to be generated. You take the BNF form, process it and write the corresponding state machine. It's not VB6, it's a state machine :D So you tailor a parser for a specific grammar; if you do it by hand good luck if the grammar changes (there are tools like yacc that generate parsers but that's beyond the scope) 2) Given the point before it doesn't actually do sexps because it needs the type in the grammar beforehand. Given: (module smd:C0603 (layer F.Cu) (tedit 52108AB5) (tstamp 558BEA9B) ...) The parser *must* know that module, layer, tedit and tstamp are atoms, and where are strings and hex values placed. So the grammar should be (ignoring the fact that tedit and tstamp forms are optional): (module (layer ) (tedit ) (tstamp ) ...) Any true sexp form has types self-evident (exactly how depends on the lisp flavour you're using, in common lisp reader macros define the behaviour), so the reader (the tokenizer in two piece parsers) can parse it WITHOUT knowing the grammar beforehand. The common lisp way (actually any lisp except for the hex numbers) would be: (module "smd:C0603" (layer "F.Cu") (tedit #x52108AB5) (tstamp #x558BEA9B) ...) ...actually I adapted the pcbnew code (it's about a ten line diff:P) to emit this (module "smd:C0603" (layer "F.Cu") (tedit 52108AB5) (tstamp 558BEA9B) ...) ...which is fully backward compatible with 'official' kicad parser. Just quote every string; it also get correctly font locked in emacs :D Hex values are only used for these fields AFAIK, I didn't special cased them in kicad but in lisp... 3) Using an XML similitude, usual sexp processing in lisp follows something like a DOM model: 'read' (that's the actual function name) pull up the whole sexp in memory (yes the whole file, in this case! there is a way to do partial processing but it's quite advanced and you need to temporarily rebind the system reader) and then you process it with your favourite list mangler; too bad you would need a full lisp environment to do it in pcbnew :P However something like SAX would be quite easy to implement; events would be 'start of list' 'end of list' 'atom' 'string' 'number' and such 4) Even in lisp *you have parser generators*! it's something like 40 years they've been used for, like, everything. Kicad has something for keywords but the grammar is still hand coded. That's the major flaw, IMHO As for the performance issue I think these are non-existant. It's I/O code and it's done only once every ten minutes, if you work like me... probably the kernel works a lot more to handle buffers and schedule the disk I/O than pcbnew to form or decode the sexps. OTOH Dick is famous to micromanage performance stuff (like not checking types with dynamic_cast because it's more expensive than reading the object type tag:P), so that would be a given :D -- Lorenzo Marcantonio CZ Srl - Parma ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
>You take the BNF form, process it and write the corresponding state machine. >It's not VB6, it's a state machine :D Writing a new state machine for every single list and every single file over and over again is the part I have problems with. There should be a single state machine that takes the tokens and gives you a list. Not 500 over the whole codebase. >Well these are trivial IMHO. The biggest horror for me is splitting one object in many data forms. The definition of sanity is not splitting it into many data forms. >3) Using an XML similitude, usual sexp processing in lisp follows something like a DOM model Yea that was the plan when I structured my end result. Walking it later is trivial. I'm more for manual walking of the lists after the fact than trying to use an event based one. I don't see a benefit really and rather see it increase complexity with needing callback classes when manual unrolling should work fairly well BUT i am not exactly happy with manual unrolling looks so its something to play with. ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
On Fri, Dec 18, 2015 at 07:55:07AM -0500, Mark Roszko wrote: > Writing a new state machine for every single list and every single > file over and over again is the part I have problems with. There > should be a single state machine that takes the tokens and gives you a > list. Not 500 over the whole codebase. Also, re-read eventually the part about parser generation. And think about grammar changes... > The definition of sanity is not splitting it into many data forms. It's nonetheless a 'curious' engineering approach:D > >3) Using an XML similitude, usual sexp processing in lisp follows >something like a DOM model > > Yea that was the plan when I structured my end result. Walking it > later is trivial. I'd suggest to use a proper list/vector container instead of the cons approach (it was meant to be a joke). Cons handling is trickier without the lisp runtime at hand :D In pseudo-BNF list :- sequence-of list-element list-element :- one-of(symbol, string, number, whatever, list) The sequence-of could be a vector of base pointers using push_back, the one-of is obviously modeled with inheritance (if it were C a union would be fine...). As for the lexing strategy: the traditional lisp reader has *no* lookahead and dispatch on the first character: - '(' starts a list - [0123456789.+-] starts a number - '"' starts a string - a letter start a symbol - whitespace is eaten - other characters trigger specific behaviour (like the '#' main macro character) *if* you want to keep string quoting optional then you can't distinguish a string from a symbol (because depends on the semantic grammar which the reader doesn't have access to). Then you have to match keywords as string, not elegant but doable. > I'm more for manual walking of the lists after the fact than trying to > use an event based one. I don't see a benefit really and rather see it > increase complexity with needing callback classes when manual > unrolling should work fairly well BUT i am not exactly happy with > manual unrolling looks so its something to play with. Given the relatively low amount of data to process a DOM approach is quite feasible. Keep an iterator on the current list handy and loop away. There are plenty of matching/binding/unifying/destructuring methods to use when you have the whole list already in core. Personally I would use a recursive descent driven by the tree elements (*not* directly by the input file, as it is now); it should be the easiest to do by hand. -- Lorenzo Marcantonio CZ Srl - Parma ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
On Fri, Dec 18, 2015 at 3:49 PM, Tomasz Wlostowski < tomasz.wlostow...@cern.ch> wrote: > On 18.12.2015 15:46, Edwin van den Oetelaar wrote: > > Concerning changing the format of the PCB file again... > > Making a new binary file format is a big NO NO NO (screaming) in my book. > > Hi Edwin, > > Don't worry, we are not going to change the format :) > > > I want to be able to edit it with VIM if needed. > > See, this is my point :) Why implement a new feature in pcbnew (so that > everybody can use it) if you can edit the .kicad_pcb with VIM... > It is not that I want to do it, but I want to Be Able To Do It when I have to. I also have a HAM license for when phone, cellular and internet go dead... I also have a large battery backup... just in case.. Thanks for working on the project, you do good work. Greetings, Edwin > > Best, > Tom > ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
On Fri, Dec 18, 2015 at 10:40:44AM -0500, Mark Roszko wrote: > Simply trying to make parsing simple and easy of sexpr. No file format > change required. Still keep proposing mandatory quotes for strings :D completely backward compatible. > Yes we need a parser especially with some of our nuisances of allowing > UTF8 in places (most parsers I've seen only allow ASCII and other > funnyness, and SPECTRA which is sexpr-like is a different take). I > wrote a parser that's simple to maintain in tree and can be extended > to handle things easily. Can of worm warning! Be careful since going out-of-ASCII is not so simple for some formats. Case in point, IPC is ASCII strict and SPECCTRA has it's own things for quoting. IIRC there are special cases in the sexp code for handling SPECCTRA idiosyncrasies. > I don't quite understand the point about "generic" formats not needing > parsers. All formats need dedicated parsers, unless you were written > in Javascript and you eval(JSON). Or (read) for lisp sexps... if it isn't dedicated why libxml2 is so big? :D Also generators. printf doesn't quite feel right. In C++ << is overloaded for stream and called 'insertion operator', it would fit into place for building sexps. -- Lorenzo Marcantonio CZ Srl - Parma ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
Concerning changing the format of the PCB file again... Just my $0.02 ... I have been lurking on this list for years now... Making a new binary file format is a big NO NO NO (screaming) in my book. I want to be able to edit it with VIM if needed. I personally do not care if the format is s-expression or Json or XML as long as it can be read and changed by humans and text editors and parsers. This was one of the reasons to leave all these proprietary binary formats behind and start using KiCad. I see no reason why parsing a text based format can not work, possibly it is less work to fix any issues than to start all over again with yet another file format (like there are not enough file formats in the PCB industry already) Just make the thing work as designed now, do not start another project which brings nothing but trouble. Thanks for listening, Edwin van den Oetelaar On Fri, Dec 18, 2015 at 3:23 PM, Tomasz Wlostowski < tomasz.wlostow...@cern.ch> wrote: > > On 18.12.2015 13:55, Mark Roszko wrote: > > Writing a new state machine for every single list and every single > > file over and over again is the part I have problems with. There > > should be a single state machine that takes the tokens and gives you a > > list. Not 500 over the whole codebase. > > Hi Mark, > > May I add my 5 cents to this discussion... > > - My only big concern with the current parser so far is the lexer code > generation done by CMake. I'm not a big fan of making scripts that > produce code which then gets compiled... > > Anyway, since we got here, I have a devilish idea: why a text format at > all? Of course we are not going to change the format again (we have a > lot of more exciting things to do ;-), but to point out a few reasons: > > - PCB files (as opposed to netlists) represent graphical objects. For > someone looking at a PCB file with a text editor, it's just meaningless > numbers. Diffs look equally horrible (just look at a diff between two > .kicad_pcb files after moving a couple of traces in P mode). > > - binary formats are generally easier to parse (unless someone made it > deliberately difficult - but it's not our case). We could just serialize > objects directly to a binary file, along with some version info (think > of Google's protobuf). This would also let us implement introspection > (e.g. a property editor tool) for no extra cost. > > - Our s-expr format needs a custom parser (correct me if I'm wrong), > which contradicts the idea that generic text formats (e.g. json/lisp > s-expr, even lua/python arrays) need no dedicated parsers. > > - Last but not least (and surely not least controversial): file format > that is easy to hand-edit enables people to hack scripts that do stuff > that Kicad is currently missing. This is an advantage for some (perhaps > more advanced) users who can work around missing features this way, but > is it really beneficial for Kicad as a complete PCB design tool? If the > easiest way is to hack a PCB file with a perl script/text editor, what > motivation is left to implement the missing feature in Kicad? > > Cheers, > Tom > > > > ___ > Mailing list: https://launchpad.net/~kicad-developers > Post to : kicad-developers@lists.launchpad.net > Unsubscribe : https://launchpad.net/~kicad-developers > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
On 18.12.2015 15:46, Edwin van den Oetelaar wrote: > Concerning changing the format of the PCB file again... > Making a new binary file format is a big NO NO NO (screaming) in my book. Hi Edwin, Don't worry, we are not going to change the format :) > I want to be able to edit it with VIM if needed. See, this is my point :) Why implement a new feature in pcbnew (so that everybody can use it) if you can edit the .kicad_pcb with VIM... Best, Tom ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
On 18.12.2015 13:55, Mark Roszko wrote: > Writing a new state machine for every single list and every single > file over and over again is the part I have problems with. There > should be a single state machine that takes the tokens and gives you a > list. Not 500 over the whole codebase. Hi Mark, May I add my 5 cents to this discussion... - My only big concern with the current parser so far is the lexer code generation done by CMake. I'm not a big fan of making scripts that produce code which then gets compiled... Anyway, since we got here, I have a devilish idea: why a text format at all? Of course we are not going to change the format again (we have a lot of more exciting things to do ;-), but to point out a few reasons: - PCB files (as opposed to netlists) represent graphical objects. For someone looking at a PCB file with a text editor, it's just meaningless numbers. Diffs look equally horrible (just look at a diff between two .kicad_pcb files after moving a couple of traces in P mode). - binary formats are generally easier to parse (unless someone made it deliberately difficult - but it's not our case). We could just serialize objects directly to a binary file, along with some version info (think of Google's protobuf). This would also let us implement introspection (e.g. a property editor tool) for no extra cost. - Our s-expr format needs a custom parser (correct me if I'm wrong), which contradicts the idea that generic text formats (e.g. json/lisp s-expr, even lua/python arrays) need no dedicated parsers. - Last but not least (and surely not least controversial): file format that is easy to hand-edit enables people to hack scripts that do stuff that Kicad is currently missing. This is an advantage for some (perhaps more advanced) users who can work around missing features this way, but is it really beneficial for Kicad as a complete PCB design tool? If the easiest way is to hack a PCB file with a perl script/text editor, what motivation is left to implement the missing feature in Kicad? Cheers, Tom ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
So any actual comments on what I did in the commit I linked originally in the first email? >Still keep proposing mandatory quotes for strings :D completely backward compatible. Well yeamy generator class does that all the time. Unless you define a symbol explicitly, all strings are always quoted. >Can of worm warning! Be careful since going out-of-ASCII is not so >simple for some formats. Case in point, IPC is ASCII strict and SPECCTRA >has it's own things for quoting. IIRC there are special cases in the >sexp code for handling SPECCTRA idiosyncrasies. Well yea, I am aware of the particular behaviors it has, I've gone through what kicad does before I came rage posting. >Also generators. printf doesn't quite feel right. In C++ << is overloaded for stream and called 'insertion operator', it would fit into place for building sexps. I already have those overloads :P roughly SEXPR_LIST << 10 << "string thing" << OUTPUT_SYMBOL("symbol") << 4.0; ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
O god, your email is just going to start a chain of rage about changing the format. Not trying to do that folks, please rage in another thread if you want to reply to him on that. Simply trying to make parsing simple and easy of sexpr. No file format change required. > Our s-expr format needs a custom parser (correct me if I'm wrong), >which contradicts the idea that generic text formats (e.g. json/lisp >s-expr, even lua/python arrays) need no dedicated parsers. Yes we need a parser especially with some of our nuisances of allowing UTF8 in places (most parsers I've seen only allow ASCII and other funnyness, and SPECTRA which is sexpr-like is a different take). I wrote a parser that's simple to maintain in tree and can be extended to handle things easily. I don't quite understand the point about "generic" formats not needing parsers. All formats need dedicated parsers, unless you were written in Javascript and you eval(JSON). ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
Mark, I haven't had time to look at your commit so I'm not going to comment on that until I do. I don't know when I'll have time as I have a lot of stuff to do and I'm traveling over the holidays so my review time will be limited. In the mean time I will comment on some of the things I've read in this thread. Binary file formats are not going happen while I'm project leader. I have 30 years of less than positive experience with them and I'm not about to start using them now. Be careful with the C++ << operators. They may not do the correct thing with floating point numbers. There may also be some overhead that the current design does not have. I will not accept a performance hit on file parsing or any significant increase in formatted file size. On 12/17/2015 10:32 PM, Mark Roszko wrote: > So awhile back, Wayne said to use sexpr for something I wanted to do. > Then I looked at the sexpr parsing and said NOPE. > > > Why NOPE? Because the current parsing regime is basically Visual Basic > 6 parser written in a modern language with micro-optimizations meant > for someone running a Windows 3.1 computer with 5.25" storage drives. > > > This patch is a propsed sexpr parser that parses sepxr like its sexpr > and not a parenthesis format. Because if you are parsing with things > like NeedLeft() and NeedRight(), you are parsing it wrong. > > Especially when you see something like this: > > case T_descr: > if( sawDesc ) > in->Duplicate( tok ); > sawDesc = true; > in->NeedSYMBOLorNUMBER(); > row.SetDescr( in->FromUTF8() ); > break; > > > but then you realize that fp_lib_table has quoted strings for MOST > descr entries. And you question why the method is called > NeedSYMBOLorNumber when its a string with quotes. Then you go deeper > and realize the parser may as well be a csv reader/writer. You either > parse like sexpressions or you stop calling it sexpressions. > > > > New proposed system: > 1. Reads and generates an in memory tree structure of data as it > should be, i.e. lists, strings, numbers, etc. I'm ok with the memory tree structure as long as it doesn't add any significant parsing or conversion to internal object overhead. > 2. Helpers to pull out each item as need be OK > 3. Backwards compatible Backwards compatibility is not optional. > 4. Doesn't do silly keywords micro optimization at compile time. You > do a string comparison to convert the value to integer anyway, using > if/else is no different each time. Kicad isn't parsing gigabyte sized > files nor hundreds of files, this optimization really isn't worth the > overhead in maintenance. This change will probably kill your performance as the file format gets new features which it will. I could be wrong but there is not going to be much faster token look up than integers. Before you decide this optimization isn't worth it, you need to get an 8 layer+ high density board file from someone and do some testing. > 5. Generate saved files from in memory tree structures, this will > avoid all possible formatting irregularities and differences because > someone handwrote unrolling all the data members. I need to see how your doing this because keeping both the memory tree and the board objects in memory doesn't make a lot of sense to me. > 6. Avoid things like " ${KIGITHUB}/Air_Coils_SML_NEOSID.pretty" being > defined as a symbol instead of a string in the future. > 7. Explicit definitions of symbols and strings. Strings are always > quoted. Period. No silly auto-detection logic. > > > So my first goal is to have 1:1 parity with the existing stuff for > kicad files for both reading and writing. > > > Benchmarks: > Old fp-lib-table read: 1ms > New fp-lib-table read: 2ms > > Here's the actual commit for fp-lib-table: > https://github.com/marekr/kicad-sexpr/commit/9367f469be69962d14671411eddd6fd759ace1f2 > > Not expecting anyone to compile it or anything, more input that > anything. Yes its messy as its an initial proof of concept. > > > > Yes, string parsing and writing isn't escaping properly, TBD (easy, just > lazy). > There would probably be a smaller kicad_sexpr wrapper to implement > common sexpr pattern helpers such as the "list key-value pair" that's > used. > > > > Also played around with .kicad-pcb but its not committed: > Old read: ~350ms > New read: ~230ms > > > I await the abuse. > > ___ > Mailing list: https://launchpad.net/~kicad-developers > Post to : kicad-developers@lists.launchpad.net > Unsubscribe : https://launchpad.net/~kicad-developers > More help : https://help.launchpad.net/ListHelp > ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help :
Re: [Kicad-developers] [rfc] actual sexpression parsing
I just saw this: http://i.imgur.com/H8VxD3Z.png storing a bitmap as hex values. Next proposal item to add: storing binary data(bitmaps) as base64 quoted strings like every other browser, application and tool. ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
Dude. The way it's stored currently is horrible - it's not congruent to the structure of the file! The s-expr file is supposed to be a tree structure, why is the binary data stored broken into multiple objects like that? It's yet another facet of the parsing nightmare we have. If you're going to you a "standard" format like s-expr, you should actually understand it and use it the way it's meant to be used. On Thu, Dec 17, 2015 at 11:46:31PM -0500, tiger12506 wrote: > Not disagreeing with you on this one, but I would have to question why... > > Why change this when what is already there means something more than what > you're proposing... > KiCad doesn't have to do internally what "every other browser, application, > and tool" does, if it doesn't help anything. > Not sure how storing base64 would help. > > *backs away slowly* > > On 12/17/2015 11:20 PM, Mark Roszko wrote: > >I just saw this: > >http://i.imgur.com/H8VxD3Z.png > > > >storing a bitmap as hex values. > > > >Next proposal item to add: storing binary data(bitmaps) as base64 > >quoted strings like every other browser, application and tool. > > > >___ > >Mailing list: https://launchpad.net/~kicad-developers > >Post to : kicad-developers@lists.launchpad.net > >Unsubscribe : https://launchpad.net/~kicad-developers > >More help : https://help.launchpad.net/ListHelp > > > ___ > Mailing list: https://launchpad.net/~kicad-developers > Post to : kicad-developers@lists.launchpad.net > Unsubscribe : https://launchpad.net/~kicad-developers > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
>Not disagreeing with you on this one, but I would have to question why... Did you skip the why section? Because we are supposed to have a standard file parser and reader that is maintainable. What there is a micro-optimization obsessed VB6 parser. There's a keyword enum pre-generation that just adds pointless back and forth (a class with const strings is more than enough). If we are reading "s-expressions" then there should be a common output, every single kicad file writer shouldn't be implementing its own interpretation of raw tokens from the files. When you have a standard API from C++ objects to file, then you can ensure 100% sanity across all files. Right now there can easily be one-off errors because someone forgot a token or made assumptions. Also writing files is insane. Every "file writer" inherits or creates it owns methods to write the file output with what is basically printf statements. Stuff like this: m_out->Print( nestLevel, "(%s", getTokenName( T_setup ) ); m_out->Print( 0, "(textsize %s %s)", double2Str( WORKSHEET_DATAITEM::m_DefaultTextSize.x ).c_str(), double2Str( WORKSHEET_DATAITEM::m_DefaultTextSize.y ).c_str() ); m_out->Print( 0, "(linewidth %s)", double2Str( WORKSHEET_DATAITEM::m_DefaultLineWidth ).c_str() ); m_out->Print( 0, "(textlinewidth %s)", double2Str( WORKSHEET_DATAITEM::m_DefaultTextThickness ).c_str() ); m_out->Print( 0, "\n" ); is just extremely silly and extra work compared to generate SEXPR trees in memory like SEXPR represents in the first place. God forbid you accidentally format that double wrong. >Not sure how storing base64 would help. Namely file size savings and sanity. Instead of all those ridiculous X number of hex bytes on each line split with spaces, you have a single quoted entry that's a base64ed. And you can actually manually edit that, god forbid someone wants to do that then. ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
Ok. I see. Sorry for the noise. Not sure how storing base64 would help. Namely file size savings and sanity. Instead of all those ridiculous X number of hex bytes on each line split with spaces, you have a single quoted entry that's a base64ed. And you can actually manually edit that, god forbid someone wants to do that then. ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp
Re: [Kicad-developers] [rfc] actual sexpression parsing
It's not the base64 that's important, it's the structure. You can pick whatever encoding you like, base64 is just very common as a relatively dense but still text-safe one. On Fri, Dec 18, 2015 at 12:06:25AM -0500, tiger12506 wrote: > Ok. I see. Sorry for the noise. > > > Not sure how storing base64 would help. > > >Namely file size savings and sanity. Instead of all those ridiculous X > >number of hex bytes on each line split with spaces, you have a single > >quoted entry that's a base64ed. > >And you can actually manually edit that, god forbid someone wants to > >do that then. > > > ___ > Mailing list: https://launchpad.net/~kicad-developers > Post to : kicad-developers@lists.launchpad.net > Unsubscribe : https://launchpad.net/~kicad-developers > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp