Jason,

It's really hard to tell what's going on without more information about the
environment.

Reviewing the manpage for recv (the C API behind socket.recv), you can see
that the call returns 0 if the connection is closed. Python interprets this
as an empty buffer rather than an error condition.

Now, why would the server-side close the connection? The only thing I can
think of is that there is an unhandled error path in the server code such
that the Erlang process serving the requests exits abnormally, and the
linked server socket shuts down as a result.  This is why I asked about the
error logs from your Riak nodes.

So, any additional information you can give would be helpful, especially
what your deployment environment looks like. For example, are you on a
cloud provider or your own hardware? What OS/distribution are you using?
What sorts of other infrastructure is deployed for your app?


On Thu, Apr 17, 2014 at 12:43 PM, Jason Wang <[email protected]> wrote:

> Sean,
>
> RE http vs pbc port: Verified that our production environment is using the
> pbc protocol on port 8087.
>
> RE crashlog: There are no entries in console.log, crash.log and error.log
> for over a month.
>
> RE HAprodxy: Our clients are connecting to the Riak servers. No reverse
> proxy systems are in between.
>
> Anything else I can look into?
>
> Jason
>
>
> On Wed, Apr 16, 2014 at 1:02 PM, Sean Cribbs <[email protected]> wrote:
>
>> Hi Jason,
>>
>> We usually see these errors when someone connects to the HTTP port
>> instead of the PB port. First, check that your client is configured
>> correctly. Also, please check the server logs for any crashes related to
>> riak_api_pb_server, and paste those here if present.
>>
>> It is also conceivable this could occur if the socket were closed before
>> anything were sent, that is, the read buffer on it is empty, so it returns
>> 0. I don't know of any specific reason that might happen unless you are
>> using haproxy between the client and the server (see related PR/issue
>> https://github.com/basho/riak_api/pull/54).
>>
>>
>> On Tue, Apr 15, 2014 at 6:34 PM, Jason Wang <[email protected]> wrote:
>>
>>> HI all,
>>>
>>> In production, we are experiencing "Socket returned short packet length
>>> 0 - expected 4" exceptions whenever we try to store an object >20K in size.
>>> In addition, the exception typically takes over 60 seconds to manifest. The
>>> content of each object is an bytearray.
>>>
>>> Any idea what could be causing this exception?
>>>
>>> Other details:
>>> Library: Python
>>> Version:  riak==2.0.1, riak-pb==1.4.1.1
>>> Protocol: pbc
>>> Steps to reproduce: N/A. This only happens in production, not on dev
>>> machines.
>>>
>>> Thanks in advance,
>>>
>>> Jason
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [email protected]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>>
>> --
>> Sean Cribbs <[email protected]>
>> Software Engineer
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
Sean Cribbs <[email protected]>
Software Engineer
Basho Technologies, Inc.
http://basho.com/
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to