Re: Code compilation alternative to Function()

2014-01-21 Thread Sean Silva
On Mon, Jan 20, 2014 at 4:21 PM, Gustavs Tēbergs fixplzsecr...@gmail.comwrote:

 Following up with a benchmark.

 (I thought of this idea while working on a parser generator library,
 but unfortunately for my argument the library turned out really
 fast...)

 I decided to write a converter for Asm.js code to see roughly what
 happens when building code expression by expression, and it gives me
 existing programs to test with.

 https://gist.github.com/fixplz/8529003

 Results:

 * I took Asm.js code from https://github.com/kripken/ammo.js as a testing
 source
 * Tested with Node 0.10.17 Win7 64
 * The JS engine parses the source (1.6 MB) in 180 ms
 * The opcode reader traverses the corresponting opcode binary (1.8 MB)
 in 120 ms (the binary could be some 30% smaller by adding special case
 codes)
 * With conversion enabled the opcode reader takes 260 ms to convert back
 to JS

 The output concatenation is very fast. But it could be better.


Your `load` routine is not how I described and is inducing a ton of
cache-busting heap traffic. You need to put *all* strings in a single,
constant array and append references to those constant strings to a single
growing array. There should be no concatenations except for the final
.join(). I.e. something like:

var STRINGPOOL = ['do{', 'function', '(', ')', ...];
var KW_DO = 0;
var KW_FUNCTION = 1;
var LPAREN = 2;
var RPAREN = 3;
...
var outArr = [];

and then in order to concatenate a string to the output do something like:

outArr.push(STRINGPOOL[KW_DO]);

and then finally, after outArr contains the entire output, do:

return outArr.join('');

Really, what you need to do is to run a slight modification of your current
`load` routine *during compilation* and trace all strings that are
appended. Then uniquify all the needed strings and add them to a string
pool. Then you send down the wire:
1. The string pool.
2. A binary blob of indices into that string pool (or other opcodes).

Then your decoder can be as simple as:
var stringPool = getStringPool();
var indices = getIndices();
var outArr = [];
for (var i = 0, e = indices.length; i !== e; ++i) {
outArr.push(stringPool[indices[i]]);
}
return outArr.join('');

Really, this is a coding problem, and you just need to develop an efficient
codec for describing a string of JS. The approach in the Gist you linked is
not a very efficient codec. The codec used in the above simple loop is
probably not the most efficient, but if you add a couple special case
opcodes and/or use a cheap varint encoding for the indices (and
appropriately optimize the order of the string pool array to make good use
of it), then you can really get fast and small.

-- Sean Silva



 The conversion forgoes minimizing parentheses so the output is larger
 than the source (2.5 MB), so the total time of using a converter like
 this would be 260 ms to convert + 300 ms to parse.

 That means about 80% of the running time is taken up by conversion to
 JS here. If the conversion involved something more complex, like
 reading Ruby code, the proportion might decrease.

 I think my point stands - this performance is undesirable. It might be
 difficult to deploy web apps with dependencies on large foreign
 codebases, especially on mobile devices which likely take much longer
 to perform all of this.

 This is combined with the fact that use of Function() seems to be
 discouraged by the CSP proposal.


 On the other hand - I learned a bit about execution semantics in V8
 and found it effectively does what I wanted, only a little indirectly
 by building a string of JS syntax instead of IR. (So the problem
 severity in my mind is downgraded from mysterious action to
 Function() kinda sucks.)
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code compilation alternative to Function()

2014-01-21 Thread Gustavs Tēbergs
On 22 January 2014 01:50, Sean Silva sil...@purdue.edu wrote:
 Your `load` routine is not how I described and is inducing a ton of
 cache-busting heap traffic. You need to put *all* strings in a single,
 constant array and append references to those constant strings to a single
 growing array. There should be no concatenations except for the final
 .join(). I.e. something like:

I think that's what it already does. Using += is faster than
push-and-join in isolated tests. I could cache uses of num.toString()
though.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code compilation alternative to Function()

2014-01-21 Thread Sean Silva
On Tue, Jan 21, 2014 at 9:06 PM, Gustavs Tēbergs fixplzsecr...@gmail.comwrote:

 On 22 January 2014 01:50, Sean Silva sil...@purdue.edu wrote:
  Your `load` routine is not how I described and is inducing a ton of
  cache-busting heap traffic. You need to put *all* strings in a single,
  constant array and append references to those constant strings to a
 single
  growing array. There should be no concatenations except for the final
  .join(). I.e. something like:

 I think that's what it already does. Using += is faster than
 push-and-join in isolated tests. I could cache uses of num.toString()
 though.


I'm not so worried about that. Rather, it is all the temporary string that
are being created. For example, the single line ` app( getIdent(read16()) +
'[' )` is creating at least 2 temporary strings.

-- Sean Silva
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code compilation alternative to Function()

2014-01-20 Thread Gustavs Tēbergs
Following up with a benchmark.

(I thought of this idea while working on a parser generator library,
but unfortunately for my argument the library turned out really
fast...)

I decided to write a converter for Asm.js code to see roughly what
happens when building code expression by expression, and it gives me
existing programs to test with.

https://gist.github.com/fixplz/8529003

Results:

* I took Asm.js code from https://github.com/kripken/ammo.js as a testing source
* Tested with Node 0.10.17 Win7 64
* The JS engine parses the source (1.6 MB) in 180 ms
* The opcode reader traverses the corresponting opcode binary (1.8 MB)
in 120 ms (the binary could be some 30% smaller by adding special case
codes)
* With conversion enabled the opcode reader takes 260 ms to convert back to JS

The output concatenation is very fast. But it could be better.

The conversion forgoes minimizing parentheses so the output is larger
than the source (2.5 MB), so the total time of using a converter like
this would be 260 ms to convert + 300 ms to parse.

That means about 80% of the running time is taken up by conversion to
JS here. If the conversion involved something more complex, like
reading Ruby code, the proportion might decrease.

I think my point stands - this performance is undesirable. It might be
difficult to deploy web apps with dependencies on large foreign
codebases, especially on mobile devices which likely take much longer
to perform all of this.

This is combined with the fact that use of Function() seems to be
discouraged by the CSP proposal.


On the other hand - I learned a bit about execution semantics in V8
and found it effectively does what I wanted, only a little indirectly
by building a string of JS syntax instead of IR. (So the problem
severity in my mind is downgraded from mysterious action to
Function() kinda sucks.)
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code compilation alternative to Function()

2014-01-12 Thread Andrea Giammarchi
beside what Sean and Brendan said, I think it makes sense to ask for a
code generator/compositor under CSP.

Actually I never understood why CSP does not allow Function since
everything that could be done inside Function can be basically done
programmatically too (parsing manually the string or similar things even
more problematic if used).

There are few libraries based on features detection resulting in best
runtime created function able to boost up a lot (NWMatcher, lo-dash,
others) and it's a pity these might cause problems under CSP.

that being said, troll have you tried execScript already? /troll :D


On Sat, Jan 11, 2014 at 10:54 PM, Brendan Eich bren...@mozilla.com wrote:

 fixplzsecr...@gmail.com wrote:

 Asm.js programs are larger than equivalent native executables


 Not on the (gzipped) wire:

 http://mozakai.blogspot.com/2011/11/code-size-when-
 compiling-to-javascript.html

 (Emscripten has improved since then, IINM.)

 In memory, the issue isn't executable vs. JS source (asm.js or not), since
 safety is not optional. You'd do better to consider an SFI compiler (NaCl
 or PNaCl) code size than raw executable.

 Anyway, I see Sean Silva replied, so I'll stop here.

 /be

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code compilation alternative to Function()

2014-01-12 Thread Gustavs Tēbergs
On 12 January 2014 07:48, Sean Silva sil...@purdue.edu wrote:
 Can you provide some measurements where concatenating a string and parsing it 
 is a bottleneck?

I would not call it a bottleneck because the concatenating in most use
cases only happens once at startup, that's why I used the word
latency. If you want to deliver data or programs to a browser client
in a custom format, generating Javascript can cause a delay of 100 -
500ms before your application can start operating.

I could string up some demo that takes this long in practice and
emulates an approach that authors might want to use in the future.

Simple and acute: take PEG.js and the included Javascript parser and
parse some Javascript library (pretending this library is written in a
different language), or the CSS parser and a CSS framework.

I am proposing this because it occurred to me that the addition of
this API would remove much of the cost of something being foreign to
the Javascript/JSON/HTML system. The same way we all want to remove
the cost of web applications being foreign to the operating system.

 LLVM-dev here. Other than this being an IR, I don't see any resemblance to 
 LLVM at all.

 This is nothing like LLVM at all (not SSA).

It is in the same vicinity. I meant to point out that insertion should
not be done with ASTs or stack opcodes like I've seen in other places.

Note that op returns a token that represents the result of the
operation that can be used in subsequent operations. get and set
would load and store these value tokens from/to variables and
properties. I hope you see what I mean when I say this part is like
LLVM syntax (quite SSA).

 I'm not convinced that decompressing megabytes of JavaScript and parsing it 
 (all in native code) is going to be any slower than having to JIT a 
 JavaScript code generation routine, build up some horribly cache-unfriendly 
 sea of GC-heap-allocated linked nodes in JS AST structure, then convert 
 that sea of linked nodes in JS to the JIT's internal program 
 representation. Decompress+parse is already a highly optimized code path as 
 it is on the critical path of virtually every page load.

That would depend on what the input to the generation routine is. You
are right that SpiderMonkey AST is not more compact than the textual
representation.

With other AST types one node can correspond to complex Javascript node.
For example the PEG expression a / b executes as something like:

tmp = rule_a()
if(ok) result = tmp
if(!ok) {
  tmp = rule_b()
  if(ok) result = tmp
}
return result

The output from a parser generator can quickly become large. This is
problematic if I want to deliver the parser definition to the client
to be generated clientside. I estimate that concatenating and
evaluating 100KB of code takes 100ms in Chrome. Make that 2x if using
a last-gen CPU and 2x if using Firefox. How long on an ARM device
would that be?

If we're talking about megabytes of Javascript I assume we're talking
about Asm.js. Then the input would be something specialized to
representing Asm.js programs, probably an array of opcodes. Like I
mentioned, a common criticism of Asm.js is that it needs multiple
characters to represent fundamental operations like (x+y|0) and
[x2]. Our custom opcode format could represent these with a single
byte - the native parser may be fast, but can it parse 5x as many
chars spanning multiple AST nodes more efficiently than our opcode
decoder?

Also, if I'm not mistaken, both SM and V8 build GC-allocated AST trees
while parsing, only to later copy the AST to a flat IR.

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/frontend/Parser.cpp
https://github.com/v8/v8/blob/master/src/parser.cc

That's why the API suggestion example I gave aims to provide an
abstract write-only interface to a flat SSA register IR. Our Asm.js
decoder could skip tree node allocation altogether (ideally, maybe).

 Unless the program representation used internally by the JS engines is 
 exactly the same as the standardized representation (it isn't, and I doubt JS 
 engines would let that implementation detail be standardized), then they 
 still need to do a format conversion, but this time walking a linked sea of 
 GC-heap allocated JS objects instead of doing a linear scan of the program 
 text.

Indeed, I am unsure about the viability of the API fitting all
engines, and whether those that have a SSA IR could write it directly
through these API calls and have only a verification pass upon
.done() being called, or if it should be buffered and copied to the
real IR later.

However, the intention of my proposed API is exactly that it can be
written in one go with no intermediate linked parts.
The intermediate representation does not need to be standardized,
because the implementation of the API calls would convert the
parameters to what is suitable for the IR - though the IR does need to
be similar.

If anyone knows of engines that do not fit this flat 

Re: Code compilation alternative to Function()

2014-01-12 Thread Gustavs Tēbergs
I see what you mean. I will investigate why I see long running times
with parser generators.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code compilation alternative to Function()

2014-01-12 Thread Boris Zbarsky

On 1/12/14 6:58 PM, Gustavs Tēbergs wrote:

I see what you mean. I will investigate why I see long running times
with parser generators.


If you can post a link to a testcase, I'm happy to look into this on the 
SpiderMonkey end.


-Boris

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code compilation alternative to Function()

2014-01-11 Thread Sean Silva
On Sat, Jan 11, 2014 at 7:34 PM, fixplzsecr...@gmail.com wrote:


 I want to propose an idea and get feedback on viability of this or some
 related addition to the language.

 ES should have a way to build functions at runtime - like the approach of
 building a string of code and using Function(code) - but using API calls to
 assemble the code. Special support for this from language implementations
 would allow programs that depend on dynamic evaluation of large blocks of
 code to execute with less latency - bypassing the cost of concatenating a
 large string and then requesting the runtime to parse it character by
 character as with the use of Function().


Can you provide some measurements where concatenating a string and parsing
it is a bottleneck?



 Here is my naive guess at what the API should look like:

 func  = CompiledFunction.create()
 arg0  = func.arg(0)
 val   = func.get(arg0)
 val2  = func.op('*', val, func.literal(2))
 func.return(val2)
 result = func.done()

 This builds function(a){ return a * 2 }

 As you can see, this API provides a LLVM-like language.


LLVM-dev here. Other than this being an IR, I don't see any resemblance to
LLVM at all.



 There would be more methods: get, set for reading and writing
 variables and fields, call for function calls, startIfElse for writing
 if/else blocks, startFor, startWhile for loops, break, continue,
 and others for every possible construct.


This is nothing like LLVM at all (not SSA).




 It is important that runtime implementations support these operations with
 low overhead. Ideally it should be a nearly direct means of writing to the
 intermediate representation format of the runtime.


 Motivation:
 There are a few types of programs that use generated Javascript in
 browsers:

 - PEG.js
 PEG.js is a library for facilitating parsing of custom text formats. It
 works by converting a parser notation to Javascript code that performs the
 specified parsing rules.
 Ability to use custom data formats is a piece of Unix philosophy that it
 seems should be supported on the Web. As is PEG.js has to depend on the
 Function() quirk, which may not work if content security policy is enabled.

 - Opal http://opalrb.org/try/ c
 Opal aims to implement Ruby running in Javascript environments. It is
 among other projects with similar aims for different languages operating in
 different ways.
 If special support for this API is available, web applications depending
 on tools like this could deliver their code and initialize with lower
 latency.

 - Asm.js
 Asm.js is a special format of Javascript suitable for representing native
 executable code. However, one complaint concerning it is that it is not a
 suitable lexical representation of the content it encodes - Asm.js programs
 are larger than equivalent native executables, and parsing time is
 proportional to this size because content has to be scanned character by
 character and code size in present experiments can exceed 10 megabytes or
 more, and parsing can be inefficient unless the runtime implementation
 includes a specialized parser for this format. Because of this, I have seen
 criticism suggesting that proponents of Asm.js should design a bytecode
 format for this use case instead of sending blobs of Javascript.

 But defining an adequate future-proof bytecode format ahead of time is
 difficult, and is almost a separate concern to what Asm.js is concerned
 with.


 What if independent parties could design their own program representation
 format that can be converted to executable form as needed?

 A party delivering a web application can choose a format for their Asm.js
 code, and include a decoder for it in their application startup procedure.
 If the runtime receiving this application supports the code creation API,
 the decoder can bypass the usual process of decompressing megabytes of
 Javascript code and then having to parse it.


I'm not convinced that decompressing megabytes of JavaScript and parsing it
(all in native code) is going to be any slower than having to JIT a
JavaScript code generation routine, build up some horribly cache-unfriendly
sea of GC-heap-allocated linked nodes in JS AST structure, then convert
that sea of linked nodes in JS to the JIT's internal program
representation. Decompress+parse is already a highly optimized code path as
it is on the critical path of virtually every page load.




 To illustrate what I mean:
 Eval example:

 load(./blob.js, function(input) {
   var ts = {}
   ts.start = Date.now()

   Function(input)()
   MyBlob.foo

   ts.end = Date.now()
   console.log(Eval time, ts.end - ts.start)
 })

 Codegen example:

 load(./blob.js, function(input) {
   var ast = esprima.parse(input)

   var ts = {}
   ts.start = Date.now()

   Function(escodegen.generate(ast))()
   MyBlob.foo

   ts.end = Date.now()
   console.log(Codegen time, ts.end - ts.start)
 })

 In the 

Re: Code compilation alternative to Function()

2014-01-11 Thread Brendan Eich

fixplzsecr...@gmail.com wrote:

Asm.js programs are larger than equivalent native executables


Not on the (gzipped) wire:

http://mozakai.blogspot.com/2011/11/code-size-when-compiling-to-javascript.html

(Emscripten has improved since then, IINM.)

In memory, the issue isn't executable vs. JS source (asm.js or not), 
since safety is not optional. You'd do better to consider an SFI 
compiler (NaCl or PNaCl) code size than raw executable.


Anyway, I see Sean Silva replied, so I'll stop here.

/be
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss