subject:"\[jira\] \[Updated\] \(KAFKA\-615\) Avoid fsync on log segment roll"

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-08-05 Thread Jay Kreps (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Kreps updated KAFKA-615:

Attachment: KAFKA-615-v7.patch

Patch which uses the correct but slow approach of synchronously committing the
checkpoint each time we truncate before fetching restarts.

Avoid fsync on log segment roll
---

Key: KAFKA-615
URL: https://issues.apache.org/jira/browse/KAFKA-615
Project: Kafka
Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch,
KAFKA-615-v3.patch, KAFKA-615-v4.patch, KAFKA-615-v5.patch,
KAFKA-615-v6.patch, KAFKA-615-v7.patch

It still isn't feasible to run without an application level fsync policy.
This is a problem as fsync locks the file and tuning such a policy so that
the flushes aren't so frequent that seeks reduce throughput, yet not so
infrequent that the fsync is writing so much data that there is a noticable
jump in latency is very challenging.
The remaining problem is the way that log recovery works. Our current policy
is that if a clean shutdown occurs we do no recovery. If an unclean shutdown
occurs we recovery the last segment of all logs. To make this correct we need
to ensure that each segment is fsync'd before we create a new segment. Hence
the fsync during roll.
Obviously if the fsync during roll is the only time fsync occurs then it will
potentially write out the entire segment which for a 1GB segment at 50mb/sec
might take many seconds. The goal of this JIRA is to eliminate this and make
it possible to run with no application-level fsyncs at all, depending
entirely on replication and background writeback for durability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-08-05 Thread Jay Kreps (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Kreps updated KAFKA-615:

Attachment: KAFKA-615-v8.patch

Ack, yes, I did mean to fix the recoveryPoint/logEndOffset issue, I just
forgot. Attached v8 which includes that. The fix is as you describe--I just
reset the recovery point to the end of the log.

Avoid fsync on log segment roll
---

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-08-04 Thread Jay Kreps (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Kreps updated KAFKA-615:

Attachment: KAFKA-615-v6.patch

Updated patch:
- Removed bad scaladoc
- Improved log corruption test to cover corruption in a non-final segment to
show that the existing logic works

Actually the recoverLog method is right. It loops through the unflushed
segments validating them. When it finds a bad one it truncates to the right
position in that segment and then loops over all remaining segments and deletes
them. The confusing part, I think, is that unflushed is an iterator so
unflushed.foreach(deleteSegment) actually ends the loop because a post
condition of that is that unflushed.hasNext is false. I agree that is kind of
tricky. Not sure if there is a more clear way to do that (I tried, that was
what I came up with...wish we had break).

Avoid fsync on log segment roll
---

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-08-02 Thread Jay Kreps (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Kreps updated KAFKA-615:

Attachment: KAFKA-615-v5.patch

Attach updated patch v5. Rebased against trunk and with added support for
compression in the write throughput test.

Avoid fsync on log segment roll
---

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-07-17 Thread Jay Kreps (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Kreps updated KAFKA-615:

Attachment: KAFKA-615-v4.patch

Rebased patch to trunk.

Avoid fsync on log segment roll
---

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-07-11 Thread Jay Kreps (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Kreps updated KAFKA-615:

Attachment: KAFKA-615-v3.patch

Patch version v3:
- Found a call to flush the index in Log.roll(). Removed this.

Avoid fsync on log segment roll
---

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-07-08 Thread Jay Kreps (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Kreps updated KAFKA-615:

Attachment: KAFKA-615-v2.patch

New patch with a couple of improvements:
1. Found and fixed a bug in recovery that lead to recovering logs even in clean
shutdown case.
2. Now we always resize indexes for all segments during recovery as the index
size may change. Not doing this was a bug in the previous patch.
3. Added a unit test that intentionally corrupts a log and checks recovery.

I also did some performance testing on my desktop machine. We can sustain very
high throughput, but as we approach the maximum throughput of the drive latency
will get worse and worse.

But as one data point I could do 75Mb/sec sustained writes across 500 logs on a
single drive machine that can do a peek of 120MB/sec with avg write latency of
1ms and maximum latency of about 350ms.

Avoid fsync on log segment roll
---

Key: KAFKA-615
URL: https://issues.apache.org/jira/browse/KAFKA-615
Project: Kafka
Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
Attachments: KAFKA-615-v1.patch, KAFKA-615-v2.patch

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

2013-07-06 Thread Jay Kreps (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Kreps updated KAFKA-615:

Attachment: KAFKA-615-v1.patch

Attached a draft patch for a first version of this for early feedback. A few
details remain to work out.

This patch removes the per-data-directory .kafka_cleanshutdown file as well as
the concept of a clean shutdown. The concept of clean shutdown is replaced
with the concept of recovery point. The recovery point is the offset from
which the log must be recovered. Recovery points are checkpointed in a
per-data-directory file called recovery-point-offset-checkpoint. This uses
normal offset checkpoint file format.

Previously we always recovered the last log segment unless a clean shutdown was
recorded. Now we recover from the recovery point--which may mean recovering
many segments. We do not, however, recover partial segments: if the recovery
point falls in the middle of a segment we recover that segment from the
beginning.

On shutdown we force a flush and checkpoint which has the same effect as the
cleanshutdown file did before.

Deleting the recovery-point-offset-checkpoint file will cause running full
recovery on your log on restart which is kind of a nice feature if you suspect
any kind of corruption in the log.

Log.flush now takes an offset argument and flushes from the recovery point up
to the given offset. This allows more granular control to avoid syncing (and
hence locking) the active segment.

Log.roll() now uses the scheduler to make its flush asynchronous. This flush
now only covers up to the segment that is just completed, not the newly created
segment, so there should be no locking of the active segment any more.

The per-topic flush policy based on # messages and time still remains but now
it defaults to off so we rely only on

I did some preliminary performance testing and we can indeed run with no
application-level flush policy with reasonable latency which is both convenient
(no tuning to do) and yields much better throughput. I will do more testing and
report results.

Avoid fsync on log segment roll
---

Key: KAFKA-615
URL: https://issues.apache.org/jira/browse/KAFKA-615
Project: Kafka
Issue Type: Bug
Reporter: Jay Kreps
Assignee: Neha Narkhede
Attachments: KAFKA-615-v1.patch

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

[jira] [Updated] (KAFKA-615) Avoid fsync on log segment roll

8 matches

Site Navigation

Mail list logo

Footer information