On Fri, Aug 12, 2016 at 11:44 AM, Ioana Danes <ioanada...@gmail.com> wrote:

>
>
> On Fri, Aug 12, 2016 at 11:34 AM, Adrian Klaver <adrian.kla...@aklaver.com
> > wrote:
>
>> On 08/12/2016 08:30 AM, Ioana Danes wrote:
>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 11:26 AM, Adrian Klaver
>>> <adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>> wrote:
>>>
>>>     On 08/12/2016 08:10 AM, Ioana Danes wrote:
>>>
>>>
>>>
>>>         On Fri, Aug 12, 2016 at 10:47 AM, Francisco Olarte
>>>         <fola...@peoplecall.com <mailto:fola...@peoplecall.com>
>>>         <mailto:fola...@peoplecall.com <mailto:fola...@peoplecall.com>>>
>>>         wrote:
>>>
>>>             CCing to the list...
>>>
>>>         Thanks
>>>
>>>
>>>             On Fri, Aug 12, 2016 at 4:10 PM, Ioana Danes
>>>         <ioanada...@gmail.com <mailto:ioanada...@gmail.com>
>>>             <mailto:ioanada...@gmail.com <mailto:ioanada...@gmail.com>>>
>>>         wrote:
>>>             >> given 318220 and 318216 are just a bit away ( 4db08/4db0c
>>>         ), and it
>>>             >> repeats sporadically, have you ruled out ( by having page
>>>             checksums or
>>>             >> other mechanism ) a potential disk read/write error ?
>>>             >>
>>>             >>
>>>             >> > Also the index is correct on db3 as the record in case
>>>         (with
>>>             drawid =
>>>             >> > 318216) is retrieved if I filter by drawid = 318220
>>>             >>
>>>             >> Specially if this happens, you may have some slightly bad
>>>         disks/ram/
>>>             >> leading to this kind of problems.
>>>             >>
>>>             >
>>>             > Could be. I also had some issues with an rsync between db3
>>> and
>>>             drdb a week
>>>             > ago that did not complete for bigger files (> 200MB) and
>>>         gave me some
>>>             > corruption messages. Then the system was revbooted and
>>>         everything
>>>             seemed
>>>             > fine but apparently it is not.
>>>             > I am planning to drop & create the table from a good
>>>         backup and if
>>>             that does
>>>             > not fix the issue then I will rebuild the server.
>>>
>>>             I would check whatever logs you can ( syslog or eventlog,
>>>         smart log,
>>>             etc.. ) hunting for disk errors ( sometimes they are
>>>         reported ). This
>>>             kind of problems, with programs as tested as postgres and
>>>         rsync, tend
>>>             to indicate controller/RAM/disk going bad ( in your case it
>>>         could be
>>>             caused by a single bit getting flipped in a sector for the
>>> data
>>>             portion of the table, and not being propagated either
>>> because it
>>>             happened after your sync of drdb or because it was synced
>>>         from the WAL
>>>             and not the table, or because it was read from the disk
>>> cache ).
>>>
>>>         I agree, unfortunately I did not find any clues about corruption
>>>         or any
>>>         anomalies in the logs.
>>>         I will work tonight to rebuild that table and see where I go
>>>         from there.
>>>
>>>
>>>     The db3 database is on a different machine from all the other
>>>     databases you set up, correct?
>>>
>>> Yes, they are all different vms first 3 dbs are on the same cluster but
>>> drdb is a remote machine,
>>>
>>
>> Aah, another player in the mix.
>>
>> What virtualization technology are you using?
>>
>
> kvm
>
Sorry I should add more info
kernel 4.7
and the filesystem is xfs vs ext3/ext4



>
>>
>>> Thank you
>>>
>>>
>>>
>>>         Thanks,
>>>         ioana
>>>
>>>             Francisco Olarte.
>>>
>>>
>>>
>>>
>>>     --
>>>     Adrian Klaver
>>>     adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
>>>
>>>
>>>
>>
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>>
>
>

Reply via email to