Re: How to control block size of input_filter data

2013-03-12 Thread Hoang-Vu Dang

Thank you for the quick reply !

The context is what I am looking into right now, and It is indeed a 
right solution to my original question. I just want to know a little bit 
more detail if you do not mind, you said:


I typically destroy it by placing a callback in the cleanup hook of the 
req-pool. 


What exactly is the callback function that I need to look for ? When it 
executes, can we be sure that all the data has been processed, and our 
ctx will be maintained at that state ?


Best, Vu

On 03/12/2013 10:36 AM, Sorin Manolache wrote:

On 2013-03-12 10:16, Hoang-Vu Dang wrote:

Hi all,

When I write an input_filter, I notice that the data sent from client is
not always available in one chunk if it's large.

In other words, The input_filter() function will be called multiple
times per request. My question is how to have a control on this (for
example the size of the chunk until it breaks in to two) ? what should
we look into in order to check if the two filters are called from the
same request.



You can keep the state from one filter invokation to the other in 
f-ctx, the filter's context.


There are many ways to do this.

One way I've seen is to check if f-ctx is NULL (if it was NULL then 
it would mean that it is the first invokation of the filter). If it's 
NULL, we build the context. Subsequent invokations have the context != 
NULL. You'll have to destroy the context at the end of the request. I 
typically destroy it by placing a callback in the cleanup hook of the 
req-pool.


Another way to destroy it, but in my opinion a wrong way, is to 
destroy it when you encounter EOS in the data processed by the filter. 
I'd say it's wrong because a wrongly written filter could send data 
_after_ an EOS bucket and then you could not distinguish between a new 
request and a request sending data after EOS.


Another way to initialize the context is by placing a filter init 
function when you declare the filter and to initialize the context in 
this function. This is more elegant in my opinion, because the context 
is already initialized when the filter is called the first time.


The filter context could be any structure, so you can track the filter 
processing state in it.


Regards,
Sorin





Re: some key fields of request_rec are null

2013-03-12 Thread Tom Evans
On Mon, Mar 11, 2013 at 7:47 PM, Nce Rt nce...@yahoo.com wrote:
 that's what is null I've been talking about past 2 weeks!!


You still haven't confirmed what handler you are running in. Certain
parts of the request_rec are populated by different handler stages, if
your handler is running before they run, then (surprise!) those parts
of the config won't be filled in.

You've been asked several times what handler your code is running in,
providing context and/or code would be better than just moaning.

Cheers

Tom


Re: How to control block size of input_filter data

2013-03-12 Thread Sorin Manolache

On 2013-03-12 10:52, Hoang-Vu Dang wrote:

Thank you for the quick reply !

The context is what I am looking into right now, and It is indeed a
right solution to my original question. I just want to know a little bit
more detail if you do not mind, you said:

I typically destroy it by placing a callback in the cleanup hook of the
req-pool. 


Now I remember: I use C++ so I need to create and destroy the context. 
But if you allocate your context from an apr_pool you don't need to 
bother about the context destruction because it is automatically 
destroyed. Sorry for confusing you.


Just for information, I create/destroy the contexts like that:

int
flt_init_function(ap_filter_t *flt) {
   // I use C++, if you allocated ctx from a pool, you don't even need 
this callback destruction

   flt-ctx = new MyContext();

   // delete_ctx is called when r-pool is destroyed, i.e. at the very 
end of the request processing, after the response has been sent to the 
client and the request logged.
   apr_pool_cleanup_register(flt-r-pool, flt-ctx, (apr_status_t 
(*)(void *))destroy_ctx, apr_pool_cleanup_null);

}

and

apr_status_t
destroy_ctx(MyContext *ctx) {
   delete ctx;
   return APR_SUCCESS;
}

The filter function could be something like:

apr_status_t
input_filter(ap_filter_t *f, apr_bucket_brigade *bb, ap_input_mode_t 
mode, apr_read_type_e block, apr_off_t bytes) {

MyContext *ctx = (MyContext *)f-ctx;

switch (ctx-state()) {
case FIRST_INVOKATION:
...
break;
case NTH_INVOKATION:
...
break;
case FOUND_EOS:
...
break;
...
}
}




What exactly is the callback function that I need to look for ? When it
executes, can we be sure that all the data has been processed, and our
ctx will be maintained at that state ?

Best, Vu

On 03/12/2013 10:36 AM, Sorin Manolache wrote:

On 2013-03-12 10:16, Hoang-Vu Dang wrote:

Hi all,

When I write an input_filter, I notice that the data sent from client is
not always available in one chunk if it's large.

In other words, The input_filter() function will be called multiple
times per request. My question is how to have a control on this (for
example the size of the chunk until it breaks in to two) ? what should
we look into in order to check if the two filters are called from the
same request.



You can keep the state from one filter invokation to the other in
f-ctx, the filter's context.

There are many ways to do this.

One way I've seen is to check if f-ctx is NULL (if it was NULL then
it would mean that it is the first invokation of the filter). If it's
NULL, we build the context. Subsequent invokations have the context !=
NULL. You'll have to destroy the context at the end of the request. I
typically destroy it by placing a callback in the cleanup hook of the
req-pool.

Another way to destroy it, but in my opinion a wrong way, is to
destroy it when you encounter EOS in the data processed by the filter.
I'd say it's wrong because a wrongly written filter could send data
_after_ an EOS bucket and then you could not distinguish between a new
request and a request sending data after EOS.

Another way to initialize the context is by placing a filter init
function when you declare the filter and to initialize the context in
this function. This is more elegant in my opinion, because the context
is already initialized when the filter is called the first time.

The filter context could be any structure, so you can track the filter
processing state in it.

Regards,
Sorin





Re: apr_memcache operation timeouts

2013-03-12 Thread Jeff Trawick
On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz jmara...@google.com wrote:

 ping!

 Please don't hesitate to push back and tell me if I can supply the patch or
 update in some easier-to-digest form.  In particular, while I have
 rigorously stress-tested this change using mod_pagespeed's unit test,
 system-test, and load-test framework, I don't really understand what the
 testing flow is for APR.  I'd be happy to add unit-tests for that if
 someone points me to a change-list or patch-file that does it properly.

 -Josh


I'll try hard to work on this in the next couple of days.  It would be
great to have fixes in APR-Util 1.5.x, which we hope to work on later this
week.



 On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz jmara...@google.com
 wrote:

  I have completed a solution to this problem, which can be a drop-in
 update
  for the existing apr_memcache.c.  It is now checked in for my module as
 
 http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
  .
 
  It differs from the solution in
  https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
 
 - It doesn't require an API change; it but it enforces the 50ms
 timeout that already exists for apr_multgetp for all operations.
 - It works under my load test (which I found is not true of the patch
 in 51065).
 
  For my own purposes, I will be shipping my module with apr_memcache2 so I
  get the behavior I want regardless of what version of Apache is
 installed.
   But I'd like to propose my patch for apr_memcache.c.  The patch is
  attached, and I've also submitted it as an alternative patch to bug
 51065.
 
  If you agree with the strategy I used to solve this problem, then please
  let me know if I can help with any changes required to get this into the
  main distribution,
 
 
  On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz jmara...@google.com
 wrote:
 
  I've had some preliminary success with my own variant of apr_memcache.c
  (creatively called apr_memcache2.c).  Rather than setting the socket
  timeout, I've been mimicing the timeout strategy I saw in
  apr_memcache_multgetp, by adding a new helper method:
 
  static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
 apr_memcache2_conn_t*
  conn) {
  apr_pollset_t* pollset;
  apr_status_t rv = apr_pollset_create(pollset, 1, temp_pool, 0);
  if (rv == APR_SUCCESS) {
  apr_pollfd_t pollfd;
  pollfd.desc_type = APR_POLL_SOCKET;
  pollfd.reqevents = APR_POLLIN;
  pollfd.p = temp_pool;
  pollfd.desc.s = conn-sock;
  pollfd.client_data = NULL;
  apr_pollset_add(pollset, pollfd);
  apr_int32_t queries_recvd;
  const apr_pollfd_t* activefds;
  rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT, queries_recvd,
activefds);
  if (rv == APR_SUCCESS) {
  assert(queries_recvd == 1);
  assert(activefds-desc.s == conn-sock);
  assert(activefds-client_data == NULL);
  }
  }
  return rv;
  }
 
  And calling that before many of the existing calls to get_server_line
 as:
 
  rv = wait_for_server_or_timeout_no_pool(conn);
  if (rv != APR_SUCCESS) {
  ms_release_conn(ms, conn);
  return rv;
  }
 
  This is just an experiment; I think I can streamline this by
  pre-populating the pollfd structure as part of the apr_memcache_conn_t
  (actually now apr_memcache2_conn_t).
 
  I have two questions about this:
 
  1. I noticed the version of apr_memcache.c that ships with Apache 2.4 is
  somewhat different from the one that ships with Apache 2.2.  In
 particular
  the 2.4 version cannot be compiled against the headers that come with a
 2.2
  distribution.  Is there any downside to taking my hacked 2.2
 apr_memcache.c
  and running it in Apache 2.4?  Or should I maintain two hacks?
 
  2. This seems wasteful in terms of system calls.  I am making an extra
  call to poll, rather than relying on the socket timeout.  The socket
  timeout didn't work as well as this though.  Does anyone have any
 theories
  as to why, or what could be done to the patch in
  https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
 
  -Josh
 
 
  On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz jmara...@google.com
 wrote:
 
  Following up: I tried doing what I suggested above: patching that
 change
  into my own copy of apr_memcache.c  It was first of all a bad idea to
 pull
  in only part of apr_memcache.c because that file changed slightly
 between
  2.2 and 2.4 and our module works in both.
 
  I was successful making my own version of apr_memcache (renaming
  entry-points apr_memcache2*) that I could hack.  But if I changed the
  socket timeout from -1 to 10 seconds, then the system behaved very
 poorly
  under load test (though it worked fine in our unit-tests and
 system-tests).
   In other words,