Re: Efficient C++ XML validating parser?

2008-03-06 Thread Shachar Shemesh

Amos Shapira wrote:


Another one is CodeSynthesis XSD
(http://www.codesynthesis.com/products/xsd/), it's GPL so we we can't
link it with our proprietary code.

  
gcc is also GPL, and yet you can link the programs you compile with it 
to your proprietary code.


Is it GPL, or is its output GPL?

Shachar

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Efficient C++ XML validating parser?

2008-03-06 Thread Shachar Shemesh

Shachar Shemesh wrote:


Amos Shapira wrote:


Another one is CodeSynthesis XSD
(http://www.codesynthesis.com/products/xsd/), it's GPL so we we can't
link it with our proprietary code.

  


Is it GPL, or is its output GPL?
Having said that (and it makes sense that such a program will have GPL 
output as well), that is exactly the reason the GPL is so popular. If 
you want to take but not give back, pay someone for the privilege. 
Otherwise, you can have your implementation for free.


Shachar

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Efficient C++ XML validating parser?

2008-03-06 Thread Amos Shapira
On Thu, Mar 6, 2008 at 9:04 AM, Shachar Shemesh [EMAIL PROTECTED] wrote:

 Amos Shapira wrote:

  Another one is CodeSynthesis XSD
  (http://www.codesynthesis.com/products/xsd/), it's GPL so we we can't
  link it with our proprietary code.
 
 
 gcc is also GPL, and yet you can link the programs you compile with it
 to your proprietary code.

 Is it GPL, or is its output GPL?


I know what GPL means - and we can't link the output of their code together
with our proprietary code without paying them tens of thousands of dollar
for the right to do so.

--Amos


Re: Efficient C++ XML validating parser?

2008-03-06 Thread Amos Shapira
On Wed, Mar 5, 2008 at 10:43 AM, Shachar Shemesh [EMAIL PROTECTED]
wrote:

 Without knowing Xerces too deeply, I think you can do MUCH faster than
 it, by feeding the schema before hand. Theoretically (though, the last
 time I said this word on this list an actual project came out [1]), you
 can write a parser that receives the schema, and produces yacc (or
 bison++) output for parsing it. That would, of course, make a compiler
 compiler compiler, but who's counting? You can then take the input file,
 and follow the usual procedures for generating C++, and then binary,
 from them.


BTW - since XML schema is just XML, I suspect it should realtivelly easy to
parse it and produce code based on it without resorting to bison.
e.g. see Perl's XML::Compile::Schema and friends at
http://search.cpan.org/search?query=XML%3A%3ACompile%3A%3ASchemamode=module
actually maybe the above can be tweaked to produce C++ code...

--Amos


Re: Efficient C++ XML validating parser?

2008-03-06 Thread Amos Shapira
On Thu, Mar 6, 2008 at 10:50 AM, Gilad Ben-Yossef [EMAIL PROTECTED]
wrote:

 Shachar Shemesh wrote:
  Amos Shapira wrote:
 
  Another one is CodeSynthesis XSD
  (http://www.codesynthesis.com/products/xsd/), it's GPL so we we can't
  link it with our proprietary code.
 
 
  gcc is also GPL, and yet you can link the programs you compile with it
  to your proprietary code.

 hmpf... that's not a good example.

 It is true that GCC output (the object and ELF files) are themselves not
 derived work of GCC. However, to actually use them for something you
 need to link (statically or otherwise) with libgcc.

 Mind you, the produced programs are indeed not subect to the GPL - but
 only because the FSF has made an explicit exception for libgcc.


Whatever it is with GCC, libgxcc, mingw or whatever, the issue is that the
codesynthesis program creates code which relies on linking with their own
run-time libraries, which are covered by the GPL and therefore I can't link
proprietary code with them and distribute them outside my own legal inetity,
from http://www.codesynthesis.com/products/xsd/license.xhtml:

By linking with the XSD runtime library and/or the generated
code (directly or indirectly, statically or dynamically,
at compile time or runtime), your application is subject to the
terms of the GPL or the FLOSS Exception, which both require that
you release the source code of your application if and when you
distribute it. Distributing an application includes giving it
to customers, contractors, parent companies, subsidiaries, or any
legal entity other than your own.

--Amos


Re: Efficient C++ XML validating parser?

2008-03-06 Thread Gilad Ben-Yossef

Shachar Shemesh wrote:

Amos Shapira wrote:


Another one is CodeSynthesis XSD
(http://www.codesynthesis.com/products/xsd/), it's GPL so we we can't
link it with our proprietary code.

  
gcc is also GPL, and yet you can link the programs you compile with it 
to your proprietary code.


hmpf... that's not a good example.

It is true that GCC output (the object and ELF files) are themselves not 
derived work of GCC. However, to actually use them for something you 
need to link (statically or otherwise) with libgcc.


Mind you, the produced programs are indeed not subect to the GPL - but 
only because the FSF has made an explicit exception for libgcc.


See for example: http://www.mingw.org/MinGWiki/index.php/SharedLibgccLegal

I suspect for most real world cases there is some variant of libgcc 
lurking about.



Gilad


--
Gilad Ben-Yossef [EMAIL PROTECTED]
Chief Coffee Drinker

Codefidence Ltd.| Web: http://codefidence.com
Work: +972-3-7515563 ext. 201   | Mobile: +972-52-8260388

Your hovercraft is full of eels. For information on
 emptying your hovercraft, turn to Section 2.6.a.17
 of your hovercraft user manual.
- The Monty Python technical writer


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



RE: Efficient C++ XML validating parser?

2008-03-05 Thread ronys
Hi,

If you decide to do without validation, then I've used TinyXML in a couple
of projects, and am pretty happy with the footprint  performance.
http://www.grinninglizard.com/tinyxml/

Your right in that it will require you to write the parser manually, though,
with all that that implies.

Cheers,

Rony

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Amos Shapira
Sent: Wednesday, March 05, 2008 2:56 AM
To: Israel Linux Mailing list
Subject: Efficient C++ XML validating parser?

Hello,

Currently we use Xerces for C (http://xerces.apache.org/xerces-c/) to
read XML files but are looking at making this as efficient as
possible.

The XML files are generated by our own software so some of us though
that maybe we can get rid of validation of the input and go straight
to event handling using SAX parsers.

My concern with this approach is that it sounds like we'll end up with
a hand-written parser for very specific version of the input schema,
which will require us to keep the code in pace with changes in the
schema.

Instead, I was wondering what would be the best way to ask the XML
parser to validate the input. Maybe some tool which converts an XML
schema to tightly integrated C++ code would do the trick? I found
http://tinyurl.com/2wqqp8 but it's just a research paper (NOT free),
not open source code.

What do people around here like to use for EFFICIENT XML parsing?

Thanks,

--Amos

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Efficient C++ XML validating parser?

2008-03-05 Thread Dotan Shavit
On Wednesday 05 March 2008, Amos Shapira wrote:
 What do people around here like to use for EFFICIENT XML parsing?

A stronger machine.

Don't laugh... it may be much cheaper than developing and maintaining 
software.

#

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Efficient C++ XML validating parser?

2008-03-05 Thread ik
Two suggestions:

There are GNU/GTK (depends on how you look at it) tools for handling
XML, and one of the tools in the library is to validate (in code)
XML...

And sorry if it will sound like a troll's answer, but you can take the
native XML implementation in FPC and write it binding to C++, and then
use it with your application :)

Ido

On Wed, Mar 5, 2008 at 2:56 AM, Amos Shapira [EMAIL PROTECTED] wrote:
 Hello,

  Currently we use Xerces for C (http://xerces.apache.org/xerces-c/) to
  read XML files but are looking at making this as efficient as
  possible.

  The XML files are generated by our own software so some of us though
  that maybe we can get rid of validation of the input and go straight
  to event handling using SAX parsers.

  My concern with this approach is that it sounds like we'll end up with
  a hand-written parser for very specific version of the input schema,
  which will require us to keep the code in pace with changes in the
  schema.

  Instead, I was wondering what would be the best way to ask the XML
  parser to validate the input. Maybe some tool which converts an XML
  schema to tightly integrated C++ code would do the trick? I found
  http://tinyurl.com/2wqqp8 but it's just a research paper (NOT free),
  not open source code.

  What do people around here like to use for EFFICIENT XML parsing?

  Thanks,

  --Amos

  =
  To unsubscribe, send mail to [EMAIL PROTECTED] with
  the word unsubscribe in the message body, e.g., run the command
  echo unsubscribe | mail [EMAIL PROTECTED]





-- 
http://ik.homelinux.org/

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Efficient C++ XML validating parser?

2008-03-05 Thread Shachar Shemesh

Gilad Ben-Yossef wrote:


Amos Shapira wrote:



What do people around here like to use for EFFICIENT XML parsing?



Isn't Efficient XML an oxymoron?

Seriously, and despite the flame bait way I've introduced the subject, 
if you need to do XML parsing in a way which is more efficient then 
Xerces,  maybe it is an indication that XML is a not a proper way to 
encode you r data.

I'll bite.

Without knowing Xerces too deeply, I think you can do MUCH faster than 
it, by feeding the schema before hand. Theoretically (though, the last 
time I said this word on this list an actual project came out [1]), you 
can write a parser that receives the schema, and produces yacc (or 
bison++) output for parsing it. That would, of course, make a compiler 
compiler compiler, but who's counting? You can then take the input file, 
and follow the usual procedures for generating C++, and then binary, 
from them.


How about using a binary format which is compiled from XML ?
In this day and age, is it really all that faster? What makes XML hard 
to parse, IMHO, is not the fact that it's text, it's the fact that it's 
hierarchal.


You get all the benefits of using XML and no parsing overhead.
Well, you lose one benefit - it's no longer in a standard parsable, nor 
even textual, format.


Gilad

Shachar

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Efficient C++ XML validating parser?

2008-03-05 Thread Omer Zak
On Wed, 2008-03-05 at 12:43 +0200, Shachar Shemesh wrote:
 Gilad Ben-Yossef wrote:
  How about using a binary format which is compiled from XML ?
 In this day and age, is it really all that faster? What makes XML hard 
 to parse, IMHO, is not the fact that it's text, it's the fact that it's 
 hierarchal.
 
  You get all the benefits of using XML and no parsing overhead.
 Well, you lose one benefit - it's no longer in a standard parsable, nor 
 even textual, format.

Not necessarily.
If you have a converter between XML and your binary format, and make it
available everywhere your application is available, then the messages
would still effectively be available in XML.  You'll need also some way
to force people to modify the converter whenever they modify the schema.

Another way is to use one of the serializer/unserializer modules
available in scripting languages such as Python or Perl.  This will
transform between your data structure's internal representation and a
binary format.

--- Omer
-- 
May the holy trinity of  $_, @_ and %_ be hallowed.
My own blog is at http://www.zak.co.il/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS:  at http://www.zak.co.il/spamwarning.html


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Efficient C++ XML validating parser?

2008-03-05 Thread Amos Shapira
On Wed, Mar 5, 2008 at 9:43 PM, Shachar Shemesh [EMAIL PROTECTED] wrote:
 Gilad Ben-Yossef wrote:

   Amos Shapira wrote:
  
  
   What do people around here like to use for EFFICIENT XML parsing?
  
  
   Isn't Efficient XML an oxymoron?
  
   Seriously, and despite the flame bait way I've introduced the subject,
   if you need to do XML parsing in a way which is more efficient then
   Xerces,  maybe it is an indication that XML is a not a proper way to
   encode you r data.
  I'll bite.

Thanks to everyone for your answers.

I'm replying to Shachar's reply because his is the closest to what I
have to add to this, plus some more info about my question as I
learned since I sent it.


  Without knowing Xerces too deeply, I think you can do MUCH faster than
  it, by feeding the schema before hand. Theoretically (though, the last

Xerces is apparently the Lincoln of XML parsers i.e. it supports
everything there is to support in the standard but it comes with a
huge weight attached to it. On my desktop it's the 9th largest library
at almost 4Mb, comes just before libkhtml and twice the size of libc.
But library size is not all I can say against it - it adheres to the
standard approach of DOM (tons of object, lots of memory) or SAX (i.e.
have to manually handle each event in the code which uses SAX).

There are a few newer approaches to parse XML files, there is a pretty
good list at http://en.wikipedia.org/wiki/Xml_parser#Processing_XML_files

The one that appeals the most to me is Data Binding
(http://en.wikipedia.org/wiki/Xml_parser#Data_binding), i.e., as
Shachar describes below - it's based on a program which reads the
schema and builds code (in my case, C++ class) which reads files of
this specific schema, its objects are strongly-typed in-memory
representations of the data in the XML file and provide convenient
accessors.

Presumebly, because these classes are schema-specific, they can cut a
lot of checks for irrelevant execution paths.

If you ever wrote XDR/RPC stuff (I'm talking about the stuff the NFS
and friends uses for network-level representation) then it might be
something similar - it used to have a program to convert language
independent data representation to various language-specific
implementations of classes to marshal and demarshal data (only I
forgot the name of the XDR compiler right now).

The snag about Data Binding is that all the implementations I found so
far are either for Java or Proprietary and cost a fortune (thousands
of dollars per developer seat, where you have to buy a license for
every developer who links his code with the output of the programs).

Ah - and our final programs (the ones we ship to customers) have to
support all sorts of UNIX variants, and Windows, not just Linux.

The only one which keeps our hopes alive is xmlbeanscxx
(http://xmlbeansxx.touk.pl/). I'm struggling with getting it to
compile and run for now.

Another one is CodeSynthesis XSD
(http://www.codesynthesis.com/products/xsd/), it's GPL so we we can't
link it with our proprietary code.

Here is a pretty complete list of XML Data Binding resources, almost
all options for C/C++ are commercial:
http://www.rpbourret.com/xml/XMLDataBinding.htm

Thanks again for everyone's input.

--Amos

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Efficient C++ XML validating parser?

2008-03-04 Thread Gilad Ben-Yossef

Amos Shapira wrote:



What do people around here like to use for EFFICIENT XML parsing?



Isn't Efficient XML an oxymoron?

Seriously, and despite the flame bait way I've introduced the subject, 
if you need to do XML parsing in a way which is more efficient then 
Xerces,  maybe it is an indication that XML is a not a proper way to 
encode you r data.


How about using a binary format which is compiled from XML ?

You get all the benefits of using XML and no parsing overhead.

Gilad




--
Gilad Ben-Yossef [EMAIL PROTECTED]
Chief Coffee Drinker

Codefidence Ltd.| Web: http://codefidence.com
Work: +972-3-7515563 ext. 201   | Mobile: +972-52-8260388

Your hovercraft is full of eels. For information on
 emptying your hovercraft, turn to Section 2.6.a.17
 of your hovercraft user manual.
- The Monty Python technical writer


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]