Re: [tboot-devel] TBOOT 1.8.3 fails to resume from S3

2015-09-14 Thread Sun, Ning
Try these commands in a script, and check the print-out after resumed from S3:
date
sudo rtcwake -u -s 20 -m mem
date


Thanks,
-ning

-Original Message-
From: Ross Philipson [mailto:ross.philip...@gmail.com] 
Sent: Sunday, September 13, 2015 11:55 AM
To: Sun, Ning; tboot-devel@lists.sourceforge.net
Subject: Re: [tboot-devel] TBOOT 1.8.3 fails to resume from S3

On 09/11/2015 08:20 PM, Sun, Ning wrote:
> Actually system resumes from S3 timely in tboot, but I/O (keyboard, mouse, 
> video) looks blocked for seconds in kernel, need more investigations...

When S3 works, it is very timely but that is not the issue. I noted in another 
reply to this thread that when I remove these 2 change sets, the hang and 
subsequent system reset do not happen:

http://hg.code.sf.net/p/tboot/code/rev/9040e000ccc4
http://hg.code.sf.net/p/tboot/code/rev/78713e04bdd9

It is very clearly related to these. I also stated that our suspicion is a 
buffer overflow in or related to this code:

http://hg.code.sf.net/p/tboot/code/rev/9040e000ccc4#l3.62

We do have the logging level set to "all".

Thanks
Ross

>
> Thanks,
> -ning
>
> -Original Message-
> From: Ross Philipson [mailto:ross.philip...@gmail.com]
> Sent: Thursday, September 10, 2015 12:36 PM
> To: tboot-devel@lists.sourceforge.net
> Subject: [tboot-devel] TBOOT 1.8.3 fails to resume from S3
>
> I have been working on moving our project from TBOOT 1.7.0 to 1.8.3. I have 
> discovered that while our 1.7.0 version of TBOOT resumes from S3 just fine, 
> 1.8.3 does not.
>
> The most common symptom seems to be a hang just after TBOOT enters SMX mode. 
> The hang happens at different places so I don't think it is one specific 
> thing that TBOOT is doing to cause the hang. The hang can be short (on the 
> order of seconds) or longs (several minutes). Then suddenly the platform will 
> "unhang". It looks like the platform restarts quickly and goes right back in 
> to TBOOT but it is a little hard to tell exactly what happens right around 
> this point. We see this on every system we have tried it on.
>
> I have backed all our patches out and I still see the problem with a clean 
> 1.8.3 code base. I have also tried using TBOOT with a Debian Jessie install 
> and I get similar problems there. I have been comparing the code between the 
> version and have so far not found anything that makes a difference.
>
> Any help in this matter is appreciated.
> Thanks
> --
> Ross Philipson
>
> --
>  Monitor Your Dynamic Infrastructure at Any Scale With 
> Datadog!
> Get real-time metrics from all of your servers, apps and tools in one place.
> SourceForge users - Click here to start your Free Trial of Datadog now!
> http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140
> ___
> tboot-devel mailing list
> tboot-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tboot-devel
>


--
Ross Philipson

--
___
tboot-devel mailing list
tboot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tboot-devel


[tboot-devel] TBOOT 1.8.3 LZ_Compress very slow and buffer overrun in compression logic

2015-09-14 Thread Ross Philipson
I am starting a new thread for these issues because I now understand 
what is going on. The original thread was this one:

http://sourceforge.net/p/tboot/mailman/message/34455904/

We have identified 2 issues with the LZ compressed log code feature 
introduced here:

http://hg.code.sf.net/p/tboot/code/rev/9040e000ccc4
http://hg.code.sf.net/p/tboot/code/rev/78713e04bdd9

Issue #1

Calls to  LZ_Compress can be extremely slow. On all the systems we have 
we are seeing times on the order of around 1 minute to as bad as 4 
minutes or more on the first zip to occur. This is why we thought we saw 
hangs - they turned out to be temporary ones and eventually the system 
would resume from S3.

The author of the LZ code actually states that it is very very slow:

https://github.com/NordicSemiconductor/puck-central-ios/blob/master/PuckCentral/lz.c#L22

I assume this particular implementation was chosen because it is BSD 
(lzlib is GPL)? I also assume LZ_CompressFast was not used because of 
the very large buffer it needs? At any rate, I don't think this is an 
acceptable delay in S3 resume.

Issue #2

The new log compressing logic is susceptible to buffer overruns after a 
sequence of S3 sleeps and resumes. This is because there is no 
terminating condition in the logic. This condition needs to be tested 
before copying more zipped blobs and text into the log buffer:

if (g_log->curr_pos + count > g_log->max_size) {
 g_log->zip_size = LZ_Compress(_log->buf[g_log->zip_pos], out,
   g_log->curr_pos - g_log->zip_pos);

 /* This is the new condition that needs to be tested for here */
 if (g_log->zip_pos + g_log->zip_size + count > g_log->max_size) {
 /* Not sure what the right thing to do here is. Reset logging
pointers, disable further mem logging, etc... */
 }
 ...
}

I have shown in code that I do reach that condition and logging beyond 
that condition results in a buffer overrun.

Thanks

-- 
Ross Philipson

--
___
tboot-devel mailing list
tboot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tboot-devel