[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983584#comment-15983584 ] ASF GitHub Bot commented on GEODE-2398: --- Github user asfgit closed the pull request at: https://github.com/apache/geode/pull/477 > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Anilkumar Gingade > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983581#comment-15983581 ] ASF subversion and git services commented on GEODE-2398: Commit a3434e29976a7d54957878a074806d016eb59234 in geode's branch refs/heads/develop from [~lgallinat] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=a3434e2 ] GEODE-2398: fix oplog corruption in overflow oplogs * ported changes from original fix in Oplog.java to OverflowOplog.java This closes #477 > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Anilkumar Gingade > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983360#comment-15983360 ] ASF GitHub Bot commented on GEODE-2398: --- Github user lgallinat commented on the issue: https://github.com/apache/geode/pull/477 Anil, I have made your suggested change and pushed it to the feature branch. > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Anilkumar Gingade > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983332#comment-15983332 ] ASF subversion and git services commented on GEODE-2398: Commit 5697f8c6f6d7e3a0272fc86be6d2c04ba87d9013 in geode's branch refs/heads/feature/GEODE-2398overflow from [~lgallinat] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=5697f8c ] GEODE-2398: fix oplog corruption in overflow oplogs * ported changes from original fix in Oplog.java to OverflowOplog.java > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Anilkumar Gingade > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982194#comment-15982194 ] ASF GitHub Bot commented on GEODE-2398: --- Github user agingade commented on a diff in the pull request: https://github.com/apache/geode/pull/477#discussion_r113087379 --- Diff: geode-core/src/main/java/org/apache/geode/internal/cache/OverflowOplog.java --- @@ -724,8 +727,31 @@ public final void flush() throws IOException { if (bb != null && bb.position() != 0) { bb.flip(); int flushed = 0; + int numChannelRetries = 0; do { -flushed += olf.channel.write(bb); +int channelBytesWritten = 0; +final int bbStartPos = bb.position(); +final long channelStartPos = olf.channel.position(); +// differentiate between bytes written on this channel.write() iteration and the +// total number of bytes written to the channel on this call +channelBytesWritten += olf.channel.write(bb); --- End diff -- Instead of "+="; we could just assign the value...Its not really makes any difference; its just when you read this line, you don't have to know its previous value... > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Anilkumar Gingade > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982095#comment-15982095 ] ASF GitHub Bot commented on GEODE-2398: --- GitHub user lgallinat opened a pull request: https://github.com/apache/geode/pull/477 GEODE-2398: fix oplog corruption in overflow oplogs * ported changes from original fix in Oplog.java to OverflowOplog.java @agingade @dschneider-pivotal @pivotal-eshu @pdxrunner You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/geode feature/GEODE-2398overflow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/geode/pull/477.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #477 > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Anilkumar Gingade > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982042#comment-15982042 ] ASF subversion and git services commented on GEODE-2398: Commit 7546b57dce91c079714a6fc0ef37785a4c2799cc in geode's branch refs/heads/feature/GEODE-2398overflow from [~lgallinat] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=7546b57 ] GEODE-2398: fix oplog corruption in overflow oplogs * ported changes from original fix in Oplog.java to OverflowOplog.java > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Anilkumar Gingade > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868350#comment-15868350 ] ASF subversion and git services commented on GEODE-2398: Commit fb14e9aab263654ed0176dcc3c9738be1b208a82 in geode's branch refs/heads/feature/GEODE-2449 from [~khowe] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=fb14e9a ] GEODE-2398: Updates from review https://reviews.apache.org/r/56506/ > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868348#comment-15868348 ] ASF subversion and git services commented on GEODE-2398: Commit 9b0f16570aad4abc82b71d0d16167a9774449d41 in geode's branch refs/heads/feature/GEODE-2449 from [~khowe] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=9b0f165 ] GEODE-2398: Retry oplog channel.write on silent failures Implemented limited retries in two forms of Oplog.flush() when channel.write() is called. If write() returns bytes witten less than the change in the ByteBuffer positions, then reset buffer positions and re-try writing for a liomited number of times. Throws IOException if the write doesn't succeeded after a few retries (max number of retries is defined by a static). Added new unit tests. > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866600#comment-15866600 ] ASF subversion and git services commented on GEODE-2398: Commit 9b0f16570aad4abc82b71d0d16167a9774449d41 in geode's branch refs/heads/feature/GEODE-2402 from [~khowe] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=9b0f165 ] GEODE-2398: Retry oplog channel.write on silent failures Implemented limited retries in two forms of Oplog.flush() when channel.write() is called. If write() returns bytes witten less than the change in the ByteBuffer positions, then reset buffer positions and re-try writing for a liomited number of times. Throws IOException if the write doesn't succeeded after a few retries (max number of retries is defined by a static). Added new unit tests. > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866601#comment-15866601 ] ASF subversion and git services commented on GEODE-2398: Commit fb14e9aab263654ed0176dcc3c9738be1b208a82 in geode's branch refs/heads/feature/GEODE-2402 from [~khowe] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=fb14e9a ] GEODE-2398: Updates from review https://reviews.apache.org/r/56506/ > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866418#comment-15866418 ] ASF subversion and git services commented on GEODE-2398: Commit 9b0f16570aad4abc82b71d0d16167a9774449d41 in geode's branch refs/heads/feature/GEODE-2267 from [~khowe] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=9b0f165 ] GEODE-2398: Retry oplog channel.write on silent failures Implemented limited retries in two forms of Oplog.flush() when channel.write() is called. If write() returns bytes witten less than the change in the ByteBuffer positions, then reset buffer positions and re-try writing for a liomited number of times. Throws IOException if the write doesn't succeeded after a few retries (max number of retries is defined by a static). Added new unit tests. > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866419#comment-15866419 ] ASF subversion and git services commented on GEODE-2398: Commit fb14e9aab263654ed0176dcc3c9738be1b208a82 in geode's branch refs/heads/feature/GEODE-2267 from [~khowe] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=fb14e9a ] GEODE-2398: Updates from review https://reviews.apache.org/r/56506/ > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865063#comment-15865063 ] ASF subversion and git services commented on GEODE-2398: Commit 9b0f16570aad4abc82b71d0d16167a9774449d41 in geode's branch refs/heads/feature/GEODE-2474 from [~khowe] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=9b0f165 ] GEODE-2398: Retry oplog channel.write on silent failures Implemented limited retries in two forms of Oplog.flush() when channel.write() is called. If write() returns bytes witten less than the change in the ByteBuffer positions, then reset buffer positions and re-try writing for a liomited number of times. Throws IOException if the write doesn't succeeded after a few retries (max number of retries is defined by a static). Added new unit tests. > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865064#comment-15865064 ] ASF subversion and git services commented on GEODE-2398: Commit fb14e9aab263654ed0176dcc3c9738be1b208a82 in geode's branch refs/heads/feature/GEODE-2474 from [~khowe] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=fb14e9a ] GEODE-2398: Updates from review https://reviews.apache.org/r/56506/ > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > Fix For: 1.2.0 > > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863815#comment-15863815 ] ASF subversion and git services commented on GEODE-2398: Commit aacfa0685d6e8f1043cb485bbd7182a71e444043 in geode's branch refs/heads/feature/GEODE-2398 from [~khowe] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=aacfa06 ] GEODE-2398: Updates from review https://reviews.apache.org/r/56506/ > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859927#comment-15859927 ] ASF subversion and git services commented on GEODE-2398: Commit 451c5b497662485bfb94b7f7afaacfd2cd82d043 in geode's branch refs/heads/feature/GEODE-2398 from [~khowe] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=451c5b4 ] GEODE-2398: Retry oplog channel.write on silent failures Implemented limited retries in two forms of Oplog.flush() when channel.write() is called. If write() returns bytes witten less than the change in the ByteBuffer positions, then reset buffer positions and re-try writing for a liomited number of times. Throws IOException if the write doesn't succeeded after a few retries (max number of retries is defined by a static). Added new unit tests. > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GEODE-2398) Sporadic Oplog corruption due to channel.write failure
[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859759#comment-15859759 ] Kenneth Howe commented on GEODE-2398: - This problem occurred writing to the channel from within the method Oplog.flush(OplogFile olf, boolean doSync). There is also a channel write executed from within Oplog.flush(OplogFile olf, ByteBuffer b1, ByteBuffer b2). The second form of flush calls channel.write(ByteBuffer[] bbArray) instead of channel.write(ByteBuffer bb) as in the first form. Since the write has been seen to fail in the first form, there's presumably a remote chance of a similar failure in the second form. The fix for this problem is to add a retry loop around the channel.write calls conditional on the number of bytes written returned by write() being consistent with the change in ByteBuffer positions. The number of retries is limited to a small number to prevent a hard failure causing a thread to hang. IOException is thrown if the retry limit is exceeded. > Sporadic Oplog corruption due to channel.write failure > -- > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Kenneth Howe >Assignee: Kenneth Howe > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)