Re: [JS-internals] Dynamic Analysis API discussion

2014-06-27 Thread Manu Sridharan
Some additional points that hopefully are not entirely redundant with what 
others have already said:

* There is a growing ecosystem of JavaScript parsing and instrumentation 
toolkits, beyond Jalangi, e.g.:

https://github.com/wala/JS_WALA
https://github.com/substack/node-falafel

The nice thing about supporting a source-to-source API is that it will 
encourage experimentation with yet more approaches, which might lead to new 
insights into doing low-overhead instrumentation, what would be appropriate for 
a lower-level API, etc.

* A source-to-source API enables greater portability, e.g., for writing 
analyses / transformations that work for node.js programs and (at least 
partially) on other browsers.  Even if some analyses support FF-specific 
features, probably much of the logic would be shareable across runtimes.

* As Koushik mentioned on another thread, Michael Pradel has already created a 
modified version of Firefox that supports S2S instrumentation:

https://github.com/Berkeley-Correctness-Group/Jalangi-Berkeley

Given that an outside developer could do this without deep expertise on the 
Firefox JS engine, I imagine the maintenance burden of an S2S API would be 
fairly low, making it worth doing even in addition to a lower-level API.

* While the overhead of Jalangi instrumentation is high, this is not 
fundamental.  Instrumentation customized to a particular client could have much 
lower overhead.

* Regarding a lower-level API, one motivating client might be Event Racer:

http://eventracer.org/

Event Racer cannot be built using JS instrumentation alone, as it requires 
detailed information about DOM and event-loop operations.  Right now, it's 
built upon a modified Webkit (with work on porting to Blink in progress).  We 
had a very preliminary discussion with Servo developers about designing an API 
upon which Event Racer could be built, but we didn't pursue it further.  If you 
think supporting such a client analysis might be desirable, I can ask the Event 
Racer developers to chime in with more feedback.

Best,
Manu


On Thursday, June 26, 2014 4:01:38 PM UTC-7, Robert O'Callahan wrote:
 On Fri, Jun 27, 2014 at 1:57 AM, Nicolas B. Pierron 
 
 nicolas.b.pier...@mozilla.com wrote:
 
 
 
  Yes, the idea I have in mind is to have some-kind of self-hosted
 
  compartment dedicated to analysis where if a function named xyz is
 
  declared on the global, then it can be used preferably asynchronously (as
 
  we might not want to pay the cross-compartment call), or synchronously
 
  (waiting for the day we inline cross-compartment calls in ICs / code), or
 
  maybe both.
 
 
 
   In terms of hooks, an API enabling arbitrary program transformation has
 
  big
 
  advantages as a basis for implementing dynamic analyses, compared to other
 
  kinds of API:
 
  1) Maximum flexibility for tool builders. You can do essentially anything
 
  you want with the program execution.
 
  2) Simple interface to the underlying VM. So it's easy to maintain as the
 
  VM evolves. And, importantly, minimal work for Mozilla.
 
 
 
 
 
  Except if Mozilla is maintaining these tools as we want to rely on these.
 
  For example the Security team wants to rely on some taint analysis or even
 
  other simple analysis for checking if events have been validated before
 
  being processed.
 
 
 
 
 
 Yes, but we can collaborate on that with a large group of people --- at
 
 least, a larger group than the set of people who want to hack on
 
 Spidermonkey.
 
 
 
 
 
 
 
   3) Potentially very low overhead, because instrumentation code can be
 
  inlined into application code by the JIT.
 
 
 
 
 
  I have a question for you, and also for people who have made such analysis
 
  in SpiderMonkey.  Why taking all the pain of integrating such analysis in
 
  SpiderMonkey's code, which is hard and change frequently when it would be
 
  easy (based on what you mention) to just do source-to-source transformation?
 
 
 
  Why do we have 3 propositions of implementing taint analysis in
 
  SpiderMonkey so far?  It sounds to me that there is something which is not
 
  easily accessible from source-to-source transformation, which might be
 
  easier to get hooked once you are deep inside the engine.
 
 
 
 
 
 I don't know. One reasonable guess would be that if you're doing a research
 
 project and you want to minimize overhead and don't care about
 
 maintainability, you can't go wrong by modifying the engine directly.
 
 
 
 
 
   You identified some disadvantages:
 
  1) It may be difficult to keep the language support of code transformation
 
  tools in sync with Spidermonkey.
 
  2) Code transformation tools may introduce application bugs (e.g. by
 
  polluting the global or due to a bug in translation).
 
  3) Transformed code may incur unacceptable slowdown (e.g. due to
 
  ubiquitous
 
  boxing).
 
  (Did I miss anything?)
 
 
 
 
 
  Source-to-source implies that analysis developers have to know about the
 
  JS implementation, 

Re: [JS-internals] Dynamic Analysis API discussion

2014-06-26 Thread Robert O'Callahan
Your email is unclear as to whether you're proposing integrating some
particular analysis engine or framework into Spidermonkey (or more than
one), or just some minimal set of hooks to enable others to supply such
engines/frameworks. I'm going to assume the latter since I think the former
makes no sense at all.

In terms of hooks, an API enabling arbitrary program transformation has big
advantages as a basis for implementing dynamic analyses, compared to other
kinds of API:
1) Maximum flexibility for tool builders. You can do essentially anything
you want with the program execution.
2) Simple interface to the underlying VM. So it's easy to maintain as the
VM evolves. And, importantly, minimal work for Mozilla.
3) Potentially very low overhead, because instrumentation code can be
inlined into application code by the JIT.
I spent a few years writing dynamic analysis tools for Java, and they all
used bytecode transformation for all these reasons.

You identified some disadvantages:
1) It may be difficult to keep the language support of code transformation
tools in sync with Spidermonkey.
2) Code transformation tools may introduce application bugs (e.g. by
polluting the global or due to a bug in translation).
3) Transformed code may incur unacceptable slowdown (e.g. due to ubiquitous
boxing).
(Did I miss anything?)

I think #2 really only matters for people who want to deploy dynamic
analysis in customer-facing production systems, and I don't think that will
be important anytime soon.

#1 doesn't seem like a big problem to me. Extending a JS parser is not that
hard. New language features with complex semantics require significant tool
updates whatever API we use. If we're using these tools ourselves, we'd
have to update the tools sometime between landing the feature in
Spidermonkey and starting to use it in FirefoxOS or elsewhere where we're
depending on analysis.

#3 is interesting and perhaps where lessons learned from Java and other
contexts do not apply. I think we should dig into specific tool examples
for this; maybe some combination of more intelligent translation and
judicious API extensions can solve the problems.

Nicolas B. Pierron wrote:

 Personally, I think that these issues implies that we should avoid relying
 on a source-to-source mapping if we want to provide meaningful security
 results. We could replicate the same or a similar API in SpiderMonkey, and
 even make one compatible with Jalangi analysis.


It's not clear what you mean by the same or a similar API here.

If we add opcodes dedicated to monitor values (at the bytecode emitter
 level), instead of doing source-to-source transformation. One of the
 advantage would be that frontend developers would not have to maintain
 Jalangi sources when we are adding new features in SpiderMonkey, and more
 over, the bytecode emitter already breakdown everything to opcodes, which
 are easier to wrap than the source.

 Analysis are usually made to observe the execution of a code, and not to
 mutate it.  So if we only monitor the execution, instead of emulating it, we
 might be able to batch analysis calls.  Doing batches asynchronously implies
 that the overhead of running an analysis is  minimal while the analyzed code
 is running.


Logging and log analysis have their place, but a lot of dynamic analysis
tools rely on efficient synchronous online data processing in
instrumentation code. For example, if you want to count the number of times
a program point is reached, it's much more efficient to increment a global
variable at that program point than to log to a buffer every time that
point is reached, and count log entries offline. For many analyses of
real-world applications, high-volume data logging is neither efficient nor
scalable. Here are a couple of examples of Java tools I worked on where
synchronous online data processing was essential:
-- http://fsl.cs.illinois.edu/images/e/e8/P385-goldsmith.pdf
-- http://web5.cs.columbia.edu/~junfeng/09fa-e6998/papers/hybrid.pdf
So I think injection of synchronously executed instrumentation is essential
for a large class of analyses.

Rob
-- 
Jtehsauts  tshaei dS,o n Wohfy  Mdaon  yhoaus  eanuttehrotraiitny  eovni
le atrhtohu gthot sf oirng iyvoeu rs ihnesa.rt sS?o  Whhei csha iids  teoa
stiheer :p atroa lsyazye,d  'mYaonu,r  sGients  uapr,e  tfaokreg iyvoeunr,
'm aotr  atnod  sgaoy ,h o'mGee.t  uTph eann dt hwea lmka'n?  gBoutt  uIp
waanndt  wyeonut  thoo mken.o w
___
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals


Re: [JS-internals] Dynamic Analysis API discussion

2014-06-26 Thread Nicolas B. Pierron

On 06/26/2014 04:50 AM, Robert O'Callahan wrote:

Your email is unclear as to whether you're proposing integrating some
particular analysis engine or framework into Spidermonkey (or more than
one), or just some minimal set of hooks to enable others to supply such
engines/frameworks. I'm going to assume the latter since I think the former
makes no sense at all.


Yes, the idea I have in mind is to have some-kind of self-hosted compartment 
dedicated to analysis where if a function named xyz is declared on the 
global, then it can be used preferably asynchronously (as we might not want 
to pay the cross-compartment call), or synchronously (waiting for the day we 
inline cross-compartment calls in ICs / code), or maybe both.



In terms of hooks, an API enabling arbitrary program transformation has big
advantages as a basis for implementing dynamic analyses, compared to other
kinds of API:
1) Maximum flexibility for tool builders. You can do essentially anything
you want with the program execution.
2) Simple interface to the underlying VM. So it's easy to maintain as the
VM evolves. And, importantly, minimal work for Mozilla.


Except if Mozilla is maintaining these tools as we want to rely on these. 
For example the Security team wants to rely on some taint analysis or even 
other simple analysis for checking if events have been validated before 
being processed.



3) Potentially very low overhead, because instrumentation code can be
inlined into application code by the JIT.


I have a question for you, and also for people who have made such analysis 
in SpiderMonkey.  Why taking all the pain of integrating such analysis in 
SpiderMonkey's code, which is hard and change frequently when it would be 
easy (based on what you mention) to just do source-to-source transformation?


Why do we have 3 propositions of implementing taint analysis in SpiderMonkey 
so far?  It sounds to me that there is something which is not easily 
accessible from source-to-source transformation, which might be easier to 
get hooked once you are deep inside the engine.



I spent a few years writing dynamic analysis tools for Java, and they all
used bytecode transformation for all these reasons.


I understand your argument saying that we should support transformations on 
a support which is standardized.  Maybe this is just a matter of naming the 
API properly such as the analysis feel like that are being hooked on the 
Specs definitions of JavaScript.



You identified some disadvantages:
1) It may be difficult to keep the language support of code transformation
tools in sync with Spidermonkey.
2) Code transformation tools may introduce application bugs (e.g. by
polluting the global or due to a bug in translation).
3) Transformed code may incur unacceptable slowdown (e.g. due to ubiquitous
boxing).
(Did I miss anything?)


Source-to-source implies that analysis developers have to know about the JS 
implementation, and JS syntax.  While such work belongs to the JavaScript 
engine developers.


The goal of such API is to balance the work to where is the knowledge, and I 
do not expect analysis developers to understand all the subtle details of 
JavaScript. (cf Jalangi issues)  On the other hand, I do not expect 
JavaScript developers to maintain any kind of analysis integrated in the JS 
engine (except for optimization purposes).


Having a Dynamic Analysis API, is just a way in the middle to let people 
deal with the problem they know.



I think #2 really only matters for people who want to deploy dynamic
analysis in customer-facing production systems, and I don't think that will
be important anytime soon.


On the contrary, I think/hope we could have trivial taint analysis to 
monitor privacy, in a similar way as Lightbeam (Collusion) is doing.



#1 doesn't seem like a big problem to me. Extending a JS parser is not that
hard.


Extending a JS parser, maybe.  Extending 2 JS parser the same way, is harder.


New language features with complex semantics require significant tool
updates whatever API we use.


No as much as the syntax, the bytecode is an example of it, as the bytecode 
is some-kind of subset that we target with the bytecode emitter.  As you 
mentionned, manipulating bytecode is easy, but manipulating the source to 
ensure that we have the same semantic might be more complex.


A trivial example is the deconstruction syntax:

  var [a, b] = a;

Where do you hook the getters?  Or do you have to understand it to translate 
it to:


  var a = $.arrayGet(c, 0);
  var b = $.arrayGet(c, 1);

And I do not have to go far to see that is this already done by the parser, 
and that the parser handle the name clashes for us (what if instead of c, 
this was a?).  Do we want every analysis developer do the same mistake, or 
just provide them with an API as Jalangi does.



If we're using these tools ourselves, we'd
have to update the tools sometime between landing the feature in
Spidermonkey and starting to use it in FirefoxOS or 

Re: [JS-internals] Dynamic Analysis API discussion

2014-06-26 Thread Shu-yu Guo
On Jun 26, 2014, at 6:57 AM, Nicolas B. Pierron nicolas.b.pier...@mozilla.com 
wrote:

 I have a question for you, and also for people who have made such analysis in 
 SpiderMonkey.  Why taking all the pain of integrating such analysis in 
 SpiderMonkey's code, which is hard and change frequently when it would be 
 easy (based on what you mention) to just do source-to-source transformation?
 
 Why do we have 3 propositions of implementing taint analysis in SpiderMonkey 
 so far?  It sounds to me that there is something which is not easily 
 accessible from source-to-source transformation, which might be easier to get 
 hooked once you are deep inside the engine.

Perhaps we can get those who tried to implement taint analysis in SpiderMonkey 
before to chime in about the pain points they experienced. Do we know who they 
are?

 I understand your argument saying that we should support transformations on a 
 support which is standardized.  Maybe this is just a matter of naming the API 
 properly such as the analysis feel like that are being hooked on the Specs 
 definitions of JavaScript.

It seems to me you can’t have your cake and eat it too. IIUC, you are proposing 
an API that’s SpiderMonkey-specific and is tailored to its extensions. How is 
that going to play with the official spec? Why entangle spec language? Should 
this API be kept up with the spec?

I also foresee scaling/implementation problems by designing hooks around the ES 
spec, as that goes well beyond monitoring bytecode, but would require manual 
instrumentation of many VM functions (e.g., prototype chain walks, lexical 
environments).

 Extending a JS parser, maybe.  Extending 2 JS parser the same way, is harder.
 
 New language features with complex semantics require significant tool
 updates whatever API we use.
 
 No as much as the syntax, the bytecode is an example of it, as the bytecode 
 is some-kind of subset that we target with the bytecode emitter.  As you 
 mentionned, manipulating bytecode is easy, but manipulating the source to 
 ensure that we have the same semantic might be more complex.

It seems a worse maintenance burden to me to have to update all analyses 
written when we decide to change the bytecode in SpiderMonkey, say, like 
decomposing some more fat ops. Exposing a bytecode-based instrumentation on a 
private bytecode makes the bytecode a de facto public and frozen API, which is 
undesirable.

As I’ve said before, which I’ll repeat here for the benefit of the discussion 
thread, I am in favor of a source-to-source approach because it seems to me 
that source-to-source is just as expressive as the API proposed here. I remain 
optimistic that an out-of-engine tool can be made performant, for some of the 
points roc mentioned. For maintenance, if nothing else, an out-of-engine tool 
is open to be maintained by a larger number of developers instead of just JS 
engine developers.

If we discover fundamental performance and expressivity limitations that an 
out-of-engine tool poses, I would be swayed in the other direction.

- shu
___
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals


Re: [JS-internals] Dynamic Analysis API discussion

2014-06-26 Thread Nicolas B. Pierron

On 06/26/2014 10:49 AM, Shu-yu Guo wrote:

On Jun 26, 2014, at 6:57 AM, Nicolas B. Pierron nicolas.b.pier...@mozilla.com 
wrote:


I have a question for you, and also for people who have made such analysis in 
SpiderMonkey.  Why taking all the pain of integrating such analysis in 
SpiderMonkey's code, which is hard and change frequently when it would be easy 
(based on what you mention) to just do source-to-source transformation?

Why do we have 3 propositions of implementing taint analysis in SpiderMonkey so 
far?  It sounds to me that there is something which is not easily accessible 
from source-to-source transformation, which might be easier to get hooked once 
you are deep inside the engine.


Perhaps we can get those who tried to implement taint analysis in SpiderMonkey 
before to chime in about the pain points they experienced. Do we know who they 
are?


Yes, we know who they are, and we contact for all of them.

But I know that at least one of them does not want to go public right now.


Extending a JS parser, maybe.  Extending 2 JS parser the same way, is harder.


New language features with complex semantics require significant tool
updates whatever API we use.


No as much as the syntax, the bytecode is an example of it, as the bytecode is 
some-kind of subset that we target with the bytecode emitter.  As you 
mentionned, manipulating bytecode is easy, but manipulating the source to 
ensure that we have the same semantic might be more complex.


It seems a worse maintenance burden to me to have to update all analyses 
written when we decide to change the bytecode in SpiderMonkey, say, like 
decomposing some more fat ops. Exposing a bytecode-based instrumentation on a 
private bytecode makes the bytecode a de facto public and frozen API, which is 
undesirable.

As I’ve said before, which I’ll repeat here for the benefit of the discussion 
thread, I am in favor of a source-to-source approach because it seems to me 
that source-to-source is just as expressive as the API proposed here. I remain 
optimistic that an out-of-engine tool can be made performant, for some of the 
points roc mentioned. For maintenance, if nothing else, an out-of-engine tool 
is open to be maintained by a larger number of developers instead of just JS 
engine developers.


I do not disagree that source-to-source is more expressive, but it is as 
well easier to shoot your-self in the foot, by doing such modification.


I want to make sure that this is both as easy for analysis developers to 
make analysis as it is for us to maintain such API.


--
Nicolas B. Pierron

___
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals


Re: [JS-internals] Dynamic Analysis API discussion

2014-06-26 Thread Robert O'Callahan
On Fri, Jun 27, 2014 at 1:57 AM, Nicolas B. Pierron 
nicolas.b.pier...@mozilla.com wrote:

 Yes, the idea I have in mind is to have some-kind of self-hosted
 compartment dedicated to analysis where if a function named xyz is
 declared on the global, then it can be used preferably asynchronously (as
 we might not want to pay the cross-compartment call), or synchronously
 (waiting for the day we inline cross-compartment calls in ICs / code), or
 maybe both.

  In terms of hooks, an API enabling arbitrary program transformation has
 big
 advantages as a basis for implementing dynamic analyses, compared to other
 kinds of API:
 1) Maximum flexibility for tool builders. You can do essentially anything
 you want with the program execution.
 2) Simple interface to the underlying VM. So it's easy to maintain as the
 VM evolves. And, importantly, minimal work for Mozilla.


 Except if Mozilla is maintaining these tools as we want to rely on these.
 For example the Security team wants to rely on some taint analysis or even
 other simple analysis for checking if events have been validated before
 being processed.


Yes, but we can collaborate on that with a large group of people --- at
least, a larger group than the set of people who want to hack on
Spidermonkey.



  3) Potentially very low overhead, because instrumentation code can be
 inlined into application code by the JIT.


 I have a question for you, and also for people who have made such analysis
 in SpiderMonkey.  Why taking all the pain of integrating such analysis in
 SpiderMonkey's code, which is hard and change frequently when it would be
 easy (based on what you mention) to just do source-to-source transformation?

 Why do we have 3 propositions of implementing taint analysis in
 SpiderMonkey so far?  It sounds to me that there is something which is not
 easily accessible from source-to-source transformation, which might be
 easier to get hooked once you are deep inside the engine.


I don't know. One reasonable guess would be that if you're doing a research
project and you want to minimize overhead and don't care about
maintainability, you can't go wrong by modifying the engine directly.


  You identified some disadvantages:
 1) It may be difficult to keep the language support of code transformation
 tools in sync with Spidermonkey.
 2) Code transformation tools may introduce application bugs (e.g. by
 polluting the global or due to a bug in translation).
 3) Transformed code may incur unacceptable slowdown (e.g. due to
 ubiquitous
 boxing).
 (Did I miss anything?)


 Source-to-source implies that analysis developers have to know about the
 JS implementation, and JS syntax.  While such work belongs to the
 JavaScript engine developers.


Analysis developers would be exposed to less JS implementation details by
working at the source level than by working at the bytecode or some more
Spidermonkey-internal level. Yes, they would have to have detailed
knowledge of JS syntax and semantics, but that's OK; people developing
program analysis frameworks expect to have to know those things :-).

And not every analysis developer will have to know everything. A good
framework will present higher-level abstractions that make it easy to write
simple analyses (while still being possible to write deep ones). I trust
Manu and his friends to write a good framework :-).


  I think #2 really only matters for people who want to deploy dynamic
 analysis in customer-facing production systems, and I don't think that
 will
 be important anytime soon.


 On the contrary, I think/hope we could have trivial taint analysis to
 monitor privacy, in a similar way as Lightbeam (Collusion) is doing.


I hesitate to use trivial and taint analysis in the same sentence, but
OK. I still think we can leave this up to the developers of the analysis
framework. They are just as smart as us, trust me :-).

The asynchronism is one suggestion to make recording analysis faster, by
 avoiding frequent cross-compartment calls.  I do not see any issue to have
 synchronous request, on the contrary I think it might be interesting to
 interrupt the program execution on such request, or even change the program
 execution (things that we can only do synchronously) to prevent security
 holes / privacy leaks.


OK but fast synchronous calls to instrumentation code will very quickly
become important. It's not clear to me why we can't have instrumentation
code running in the same compartment.

I echo what Shu said. Standardizing a code format lower-level than JS
syntax seems like a big maintenance burden for Spidermonkey. Better to have
a separate front end maintained outside Spidermonkey. These formats have
different requirements so it makes sense to allow them to evolve
independently. In practice, I don't think keeping the extra front end up to
date will be a problem. People are already doing this, e.g. Traceur.

Rob
-- 
Jtehsauts  tshaei dS,o n Wohfy  Mdaon  yhoaus  eanuttehrotraiitny  eovni
le atrhtohu 

[JS-internals] Dynamic Analysis API discussion

2014-06-25 Thread Nicolas B. Pierron

Hi,

This emails echoes the dev.platform email, and it contains a proposal for 
adding a JavaScript API for implementing Dynamic Analysis on top of 
SpiderMonkey as opposed to implementing each of them inside the JavaScript 
engine.


1. Motivation

So far, we received 3 different proposals for adding coarse/fine grain 
implementations of taint analysis in the JavaScript engine. (I cannot name 
all of them publicly yet)


Accepting any of the taint analysis proposals has a price, either this is a 
maintenance cost, as these implementations are entangled in many parts of 
the JavaScript engine, or/and these implementation suffer from a overhead 
even when they are disabled.


Assuming that we were to accept any, we still have to deal with the 
acceptable trade-off between performances and the ability to check, while 
this choice should be done by the persons who are running the analysis, and 
not by the JavaScript Engine developers.


On the other hand, some tools external, such as Jalangi [1,2,3], are able to 
instrument web pages which are running in the browser, and run dynamic 
analysis on JavaScript programs.


Sadly, Jalangi does not answer all our needs yet, because: It does not 
support SpiderMonkey extensions which are used in Gecko; It modifies the 
current global to add the analysis framework; It emulates JavaScript 
operators  language (~x26 slowdown [4] while recording); It does not work 
on Firefox OS devices.


What Jalangi teach us is that having a dynamic analysis framework is quite 
capable to do analysis such as recording  replay, tracing NaN, taint 
analysis, and doing some code coverage.


Firefox OS / Gecko could make use of such API to implement a proper  
correct way of doing Code Coverage. We can also see that code coverage being 
used as a metric on tbpl.


Security teams can use such API for tainting analysis.  We can see multiple 
applications, such as using it under fuzzing to find potential code 
injections (.innerHTML, document.write, …) from untrusted sources or sending 
SMS / Emails with untrusted data on your behalf.


Dev-tools teams can use such API either to expose it to web developers, 
and/or to implement Debugger features, such as: finding the last 
assignment(s) to a property (I wished I had such feature in gdb); tracing 
where NaN / null / undefined are produced (for game developers).


In order to make the right design choices, we need to know what would be 
expected from such API:

 - Do we want to use it on web pages / Firefox OS apps / Gecko's JavaScript?
 - Do we want to use it during the start-up of Firefox / Firefox OS?
 - What speed overhead is acceptable? (for Record   replay, code coverage, 
simple taint analysis, tracing NaN, …)

 - What memory overhead is acceptable?
 - Can we risk changing the semantics of the analyzed code?
 - Should this API cover JavaScript features used in Gecko / Firefox OS?
 - Can we rely on Source-to-Source transformation for Gecko's code?

[1] https://www.eecs.berkeley.edu/~gongliang13/jalangi_ff/
[2] https://github.com/SRA-SiliconValley/jalangi/tree/master/src/js/analyses
[3] https://air.mozilla.org/test-and-cure-your-javascript-blues-with-jalangi/
[4] http://srl.cs.berkeley.edu/~ksen/papers/jalangi.pdf
[5] https://github.com/SRA-SiliconValley/jalangi/tree/master/src/js/analyses

2. Dynamic Analysis API

In terms of implementation, we could replicate/embed Jalangi but If we 
decide to embed Jalangi, we might have to fix the following issues:


Operators are emulated by Jalangi:
 - Analysis have to be synchronous, even if they can record (~x26 slowdown) 
and replay the recorded analysis to check other analysis result.

 - The analyzed code environment is polluted and Stack traces are not correct.
  { valueOf: function () { throw new Error(); } }
 - All operators are in one function which is likely mega-morphic.

Each analysis has to to add its boxing and unboxing logic.  Even if we can 
verify that the core of Jalangi is safe and behave as specified, the boxing 
and unboxing logic might be buggy, which does not serve the purpose of the 
analysis.  So developers of such analysis have no safe guards.


Also, using a source-to-source system, implies that we have to restrict 
our-self to only analyze the intersection of SpiderMonkey and Jalangi 
(acorn.js [6]) which have identical semantic.  This might be problematic for 
new SpiderMoney features, or for not-yet standardized / not yet compatible 
features.


Personally, I think that these issues implies that we should avoid relying 
on a source-to-source mapping if we want to provide meaningful security 
results. We could replicate the same or a similar API in SpiderMonkey, and 
even make one compatible with Jalangi analysis.


If we add opcodes dedicated to monitor values (at the bytecode emitter 
level), instead of doing source-to-source transformation. One of the 
advantage would be that frontend developers would not have to maintain 
Jalangi sources when we are adding new 

Re: [JS-internals] Dynamic Analysis API discussion

2014-06-25 Thread Bobby Holley
On Wed, Jun 25, 2014 at 8:49 AM, Nicolas B. Pierron
nicolas.b.pier...@mozilla.com wrote:
 So far, we received 3 different proposals for adding coarse/fine grain
 implementations of taint analysis in the JavaScript engine. (I cannot name
 all of them publicly yet)

Are we talking about taint analysis as an auditing tool, or as a
web-accessible feature? The latter has generally been considered too
costly to be worth using.

I've been lightly mentoring a research project with Stanford that does
information flow control at the level of the global, and leverages our
security wrappers to prevent exfiltration of sensitive data. It's a
very interesting a approach:
http://www.scs.stanford.edu/~deian/cowl.pdf

 Accepting any of the taint analysis proposals has a price, either this is a
 maintenance cost, as these implementations are entangled in many parts of
 the JavaScript engine, or/and these implementation suffer from a overhead
 even when they are disabled.

Yeah, I'm really not wild about this. Adding information flow analysis
to the engine is going to be _really_ invasive. And we can't just keep
the analysis confined to the interpreter / JIT, because then
information will leak all over the place via JSAPI/DOM.

What's our proposed SLA? Is it a security bug if we get part of this
wrong? What's our commitment to fixing it?

bholley
___
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals