On Fri, Oct 23, 2015 at 7:59 AM, Howard Chu wrote:
> If the stream of writes is large enough, you could omit fsync because
> everything is being forced out of the cache to disk anyway. In that
> scenario, the only thing that matters is that the writes get forced out in
> the order
Ric Wheeler wrote:
On 10/23/2015 07:06 AM, Ric Wheeler wrote:
On 10/23/2015 02:21 AM, Howard Chu wrote:
Normally, best practice is to use batching to avoid paying worst case latency
>when you do a synchronous IO. Write a batch of files or appends without
fsync,
>then go back and fsync and
On Thu, Oct 22, 2015 at 11:16 PM, Howard Chu wrote:
> Milosz Tanski adfin.com> writes:
>
>>
>> On Tue, Oct 20, 2015 at 4:00 PM, Sage Weil redhat.com> wrote:
>> > On Tue, 20 Oct 2015, John Spray wrote:
>> >> On Mon, Oct 19, 2015 at 8:49 PM, Sage Weil redhat.com> wrote:
>> >> >
On 10/23/2015 02:21 AM, Howard Chu wrote:
Normally, best practice is to use batching to avoid paying worst case latency
>when you do a synchronous IO. Write a batch of files or appends without
fsync,
>then go back and fsync and you will pay that latency once (not per file/op).
If filesystems
On 10/23/2015 07:06 AM, Ric Wheeler wrote:
On 10/23/2015 02:21 AM, Howard Chu wrote:
Normally, best practice is to use batching to avoid paying worst case latency
>when you do a synchronous IO. Write a batch of files or appends without
fsync,
>then go back and fsync and you will pay that
On 10/23/2015 10:59 AM, Howard Chu wrote:
Ric Wheeler wrote:
On 10/23/2015 07:06 AM, Ric Wheeler wrote:
On 10/23/2015 02:21 AM, Howard Chu wrote:
Normally, best practice is to use batching to avoid paying worst case latency
>when you do a synchronous IO. Write a batch of files or appends
Ric Wheeler redhat.com> writes:
>
> On 10/21/2015 09:32 AM, Sage Weil wrote:
> > On Tue, 20 Oct 2015, Ric Wheeler wrote:
> >>> Now:
> >>> 1 io to write a new file
> >>> 1-2 ios to sync the fs journal (commit the inode, alloc change)
> >>> (I see 2 journal IOs on XFS and
Gregory Farnum wrote:
On Fri, Oct 23, 2015 at 7:59 AM, Howard Chu wrote:
If the stream of writes is large enough, you could omit fsync because
everything is being forced out of the cache to disk anyway. In that
scenario, the only thing that matters is that the writes get forced
enue, San Jose, CA 95134
> T: +1 408 801 7030| M: +1 408 780 6416
> allen.samu...@sandisk.com
>
> -Original Message-
> From: Martin Millnert [mailto:mar...@millnert.se]
> Sent: Thursday, October 22, 2015 6:20 AM
> To: Mark Nelson <mnel...@redhat.com>
> Cc: Ric Wheele
On Wed, Oct 21, 2015 at 10:30:28AM -0700, Sage Weil wrote:
> For example: we need to do an overwrite of an existing object that is
> atomic with respect to a larger ceph transaction (we're updating a bunch
> of other metadata at the same time, possibly overwriting or appending to
> multiple
: Re: newstore direction
On Wed, 21 Oct 2015, Ric Wheeler wrote:
> You will have to trust me on this as the Red Hat person who spoke to
> pretty much all of our key customers about local file systems and
> storage - customers all have migrated over to using normal file systems under
>
l.org
> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Thursday, October 22, 2015 5:50 AM
> To: Ric Wheeler
> Cc: Orit Wasserman; ceph-devel@vger.kernel.org
> Subject: Re: newstore direction
>
> On Wed, 21 Oct 2015, Ric Wheeler wrote:
>> You will
Regards,
>> James
>>
>> -Original Message-
>> From: ceph-devel-ow...@vger.kernel.org
>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
>> Sent: Thursday, October 22, 2015 5:50 AM
>> To: Ric Wheeler
>> Cc: Orit Wasserman; ceph-de
mes@ssi.samsung.com>
Cc: Sage Weil <sw...@redhat.com>; Ric Wheeler <rwhee...@redhat.com>; Orit
Wasserman <owass...@redhat.com>; ceph-devel@vger.kernel.org
Subject: Re: newstore direction
Since the changes which moved the pg log and the pg info into the pg object
space, I think it
.com]
Sent: Thursday, October 22, 2015 10:17 AM
To: Allen Samuels <allen.samu...@sandisk.com>; Sage Weil <sw...@redhat.com>;
ceph-devel@vger.kernel.org
Subject: Re: newstore direction
On 10/21/2015 08:53 PM, Allen Samuels wrote:
Fixing the bug doesn't take a long time. Gettin
On 10/22/2015 08:50 AM, Sage Weil wrote:
On Wed, 21 Oct 2015, Ric Wheeler wrote:
You will have to trust me on this as the Red Hat person who spoke to pretty
much all of our key customers about local file systems and storage - customers
all have migrated over to using normal file systems under
Milosz Tanski adfin.com> writes:
>
> On Tue, Oct 20, 2015 at 4:00 PM, Sage Weil redhat.com> wrote:
> > On Tue, 20 Oct 2015, John Spray wrote:
> >> On Mon, Oct 19, 2015 at 8:49 PM, Sage Weil redhat.com> wrote:
> >> > - We have to size the kv backend storage (probably still an XFS
> >> >
On Wed, 21 Oct 2015, Ric Wheeler wrote:
> You will have to trust me on this as the Red Hat person who spoke to pretty
> much all of our key customers about local file systems and storage - customers
> all have migrated over to using normal file systems under Oracle/DB2.
> Typically, they use XFS
On Tue, Oct 20, 2015 at 4:00 PM, Sage Weil wrote:
> On Tue, 20 Oct 2015, John Spray wrote:
>> On Mon, Oct 19, 2015 at 8:49 PM, Sage Weil wrote:
>> > - We have to size the kv backend storage (probably still an XFS
>> > partition) vs the block storage. Maybe
er 20, 2015 11:32 AM
To: Sage Weil <sw...@redhat.com>; ceph-devel@vger.kernel.org
Subject: Re: newstore direction
On 10/19/2015 03:49 PM, Sage Weil wrote:
The current design is based on two simple ideas:
1) a key/value interface is better way to manage all of our internal
metadata (obje
On 10/21/2015 09:32 AM, Sage Weil wrote:
On Tue, 20 Oct 2015, Ric Wheeler wrote:
Now:
1 io to write a new file
1-2 ios to sync the fs journal (commit the inode, alloc change)
(I see 2 journal IOs on XFS and only 1 on ext4...)
1 io to commit the rocksdb journal
Nelson [mailto:mnel...@redhat.com]
> Sent: Wednesday, October 21, 2015 9:36 PM
> To: Allen Samuels; Sage Weil; Chen, Xiaoxi
> Cc: James (Fei) Liu-SSI; Somnath Roy; ceph-devel@vger.kernel.org
> Subject: Re: newstore direction
>
> Thanks Allen! The devil is always in the details
ph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ric Wheeler
Sent: Tuesday, October 20, 2015 11:32 AM
To: Sage Weil <sw...@redhat.com>; ceph-devel@vger.kernel.org
Subject: Re: newstore direction
On 10/19/2015 03:49 PM, Sage Weil wrote:
The current design is bas
;xiaoxi.c...@intel.com>
Cc: James (Fei) Liu-SSI <james@ssi.samsung.com>; Somnath Roy
<somnath@sandisk.com>; ceph-devel@vger.kernel.org
Subject: Re: newstore direction
On 10/20/2015 07:30 AM, Sage Weil wrote:
On Tue, 20 Oct 2015, Chen, Xiaoxi wrote:
+1, nowadays K-V DB care m
ber 20, 2015 11:32 AM
To: Sage Weil <sw...@redhat.com>; ceph-devel@vger.kernel.org
Subject: Re: newstore direction
On 10/19/2015 03:49 PM, Sage Weil wrote:
The current design is based on two simple ideas:
1) a key/value interface is better way to manage all of our internal
metadata (object
On Wed, 21 Oct 2015, Ric Wheeler wrote:
> On 10/21/2015 04:22 AM, Orit Wasserman wrote:
> > On Tue, 2015-10-20 at 14:31 -0400, Ric Wheeler wrote:
> > > On 10/19/2015 03:49 PM, Sage Weil wrote:
> > > > The current design is based on two simple ideas:
> > > >
> > > >1) a key/value interface is
Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ric Wheeler
Sent: Tuesday, October 20, 2015 11:32 AM
To: Sage Weil <sw...@redhat.com>; ceph-devel@vger.kernel.org
Subject: Re: newstore direction
On 10/19/2015 03:49 PM,
Adding 2c
On Wed, 2015-10-21 at 14:37 -0500, Mark Nelson wrote:
> My thought is that there is some inflection point where the userland
> kvstore/block approach is going to be less work, for everyone I think,
> than trying to quickly discover, understand, fix, and push upstream
> patches that
...@sandisk.com
-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com]
Sent: Wednesday, October 21, 2015 10:45 PM
To: Allen Samuels <allen.samu...@sandisk.com>; Ric Wheeler
<rwhee...@redhat.com>; Sage Weil <sw...@redhat.com>; ceph-devel@vger.kernel.org
Subject: Re: newstor
hat.com>
Cc: Ric Wheeler <rwhee...@redhat.com>; Allen Samuels
<allen.samu...@sandisk.com>; Sage Weil <sw...@redhat.com>;
ceph-devel@vger.kernel.org
Subject: Re: newstore direction
Adding 2c
On Wed, 2015-10-21 at 14:37 -0500, Mark Nelson wrote:
> My thought is that there
com]
Sent: Wednesday, October 21, 2015 8:24 PM
To: Allen Samuels <allen.samu...@sandisk.com>; Sage Weil <sw...@redhat.com>;
ceph-devel@vger.kernel.org
Subject: Re: newstore direction
On 10/21/2015 06:06 AM, Allen Samuels wrote:
> I agree that moving newStore to raw block is
llen.samu...@sandisk.com>; Sage Weil <sw...@redhat.com>
Cc: James (Fei) Liu-SSI <james@ssi.samsung.com>; Somnath Roy
<somnath@sandisk.com>; ceph-devel@vger.kernel.org
Subject: RE: newstore direction
We did evaluate whether NVMKV could be implemented by non-fusionIO ssds, i.e
On 10/21/2015 08:53 PM, Allen Samuels wrote:
Fixing the bug doesn't take a long time. Getting it deployed is where the delay is. Many
companies standardize on a particular release of a particular distro. Getting them to
switch to a new release -- even a "bug fix" point release -- is a major
From: Ric Wheeler [mailto:rwhee...@redhat.com]
Sent: Thursday, October 22, 2015 10:17 AM
To: Allen Samuels <allen.samu...@sandisk.com>; Sage Weil <sw...@redhat.com>;
ceph-devel@vger.kernel.org
Subject: Re: newstore direction
On 10/21/2015 08:53 PM, Allen Samuels wrote:
> Fixing the bug
On Tue, 20 Oct 2015, Ric Wheeler wrote:
> > Now:
> > 1 io to write a new file
> >1-2 ios to sync the fs journal (commit the inode, alloc change)
> >(I see 2 journal IOs on XFS and only 1 on ext4...)
> > 1 io to commit the rocksdb journal (currently 3, but will drop to
>
780 6416
allen.samu...@sandisk.com
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ric Wheeler
Sent: Tuesday, October 20, 2015 11:32 AM
To: Sage Weil <sw...@redhat.com>; ceph-devel@vger.kernel.org
Subject: Re: newstore directi
[mailto:ceph-devel-
>>> ow...@vger.kernel.org] On Behalf Of James (Fei) Liu-SSI
>>> Sent: Tuesday, October 20, 2015 6:21 AM
>>> To: Sage Weil; Somnath Roy
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: RE: newstore direction
>>>
>>> Hi Sage and So
r.kernel.org] On Behalf Of Ric Wheeler
Sent: Tuesday, October 20, 2015 11:32 AM
To: Sage Weil <sw...@redhat.com>; ceph-devel@vger.kernel.org
Subject: Re: newstore direction
On 10/19/2015 03:49 PM, Sage Weil wrote:
The current design is based on two simple ideas:
1) a key/value interfac
On 10/21/2015 04:22 AM, Orit Wasserman wrote:
On Tue, 2015-10-20 at 14:31 -0400, Ric Wheeler wrote:
On 10/19/2015 03:49 PM, Sage Weil wrote:
The current design is based on two simple ideas:
1) a key/value interface is better way to manage all of our internal
metadata (object metadata,
On Tue, 2015-10-20 at 14:31 -0400, Ric Wheeler wrote:
> On 10/19/2015 03:49 PM, Sage Weil wrote:
> > The current design is based on two simple ideas:
> >
> > 1) a key/value interface is better way to manage all of our internal
> > metadata (object metadata, attrs, layout, collection membership,
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Monday, October 19, 2015 9:49 PM
>
> The current design is based on two simple ideas:
>
> 1) a key/value interface is better way to manage all of our
On Tue, 20 Oct 2015, Haomai Wang wrote:
> On Tue, Oct 20, 2015 at 3:49 AM, Sage Weil wrote:
> > The current design is based on two simple ideas:
> >
> > 1) a key/value interface is better way to manage all of our internal
> > metadata (object metadata, attrs, layout, collection
2015 6:21 AM
> > To: Sage Weil; Somnath Roy
> > Cc: ceph-devel@vger.kernel.org
> > Subject: RE: newstore direction
> >
> > Hi Sage and Somnath,
> > In my humble opinion, There is another more aggressive solution than raw
> > block device base keyvalue sto
On Behalf Of Sage Weil
> Sent: Monday, October 19, 2015 1:55 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: newstore direction
>
> On Mon, 19 Oct 2015, Somnath Roy wrote:
> > Sage,
> > I fully support that. If we want to saturate SSDs ,
On 10/19/2015 03:49 PM, Sage Weil wrote:
The current design is based on two simple ideas:
1) a key/value interface is better way to manage all of our internal
metadata (object metadata, attrs, layout, collection membership,
write-ahead logging, overlay data, etc.)
2) a file system is well
ing block allocation.
-Neo
> Mark
>
>
>>
>>
>>>> -Original Message-
>>>> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
>>>> ow...@vger.kernel.org] On Behalf Of James (Fei) Liu-SSI
>>>> Sent: Tuesday, October 2
On Tue, 20 Oct 2015, Ric Wheeler wrote:
> On 10/19/2015 03:49 PM, Sage Weil wrote:
> > The current design is based on two simple ideas:
> >
> > 1) a key/value interface is better way to manage all of our internal
> > metadata (object metadata, attrs, layout, collection membership,
> >
ither way, I strongly support to have CEPH own data format instead
> > of relying on filesystem.
> >
> > Regards,
> > James
> >
> > -Original Message-
> > From: ceph-devel-ow...@vger.kernel.org
> > [mailto:ceph-devel-ow...@vger.kern
On Tue, Oct 20, 2015 at 12:44 PM, Sage Weil wrote:
> On Tue, 20 Oct 2015, Ric Wheeler wrote:
>> The big problem with consuming block devices directly is that you ultimately
>> end up recreating most of the features that you had in the file system. Even
>> enterprise databases
On 10/20/2015 03:44 PM, Sage Weil wrote:
On Tue, 20 Oct 2015, Ric Wheeler wrote:
On 10/19/2015 03:49 PM, Sage Weil wrote:
The current design is based on two simple ideas:
1) a key/value interface is better way to manage all of our internal
metadata (object metadata, attrs, layout,
On Tue, 20 Oct 2015, Gregory Farnum wrote:
> On Tue, Oct 20, 2015 at 12:44 PM, Sage Weil wrote:
> > On Tue, 20 Oct 2015, Ric Wheeler wrote:
> >> The big problem with consuming block devices directly is that you
> >> ultimately
> >> end up recreating most of the features that
On Tue, 20 Oct 2015, John Spray wrote:
> On Mon, Oct 19, 2015 at 8:49 PM, Sage Weil wrote:
> > - We have to size the kv backend storage (probably still an XFS
> > partition) vs the block storage. Maybe we do this anyway (put metadata on
> > SSD!) so it won't matter. But what
On Tue, Oct 20, 2015 at 11:31 AM, Ric Wheeler wrote:
> On 10/19/2015 03:49 PM, Sage Weil wrote:
>>
>> The current design is based on two simple ideas:
>>
>> 1) a key/value interface is better way to manage all of our internal
>> metadata (object metadata, attrs, layout,
/LightNVM-Vault2015.pdf
Regards,
James
-Original Message-
From: Sage Weil [mailto:sw...@redhat.com]
Sent: Tuesday, October 20, 2015 5:34 AM
To: James (Fei) Liu-SSI
Cc: Somnath Roy; ceph-devel@vger.kernel.org
Subject: RE: newstore direction
On Mon, 19 Oct 2015, James (Fei) Liu-SSI wrote
Sent: Tuesday, October 20, 2015 5:34 AM
> To: James (Fei) Liu-SSI
> Cc: Somnath Roy; ceph-devel@vger.kernel.org
> Subject: RE: newstore direction
>
> On Mon, 19 Oct 2015, James (Fei) Liu-SSI wrote:
> > Hi Sage and Somnath,
> > In my humble opinion, There is anothe
el@vger.kernel.org>
> Sent: Tuesday, October 20, 2015 4:00:23 PM
> Subject: Re: newstore direction
>
> On Tue, 20 Oct 2015, John Spray wrote:
> > On Mon, Oct 19, 2015 at 8:49 PM, Sage Weil <sw...@redhat.com> wrote:
> > > - We have to size the kv backend storage
On 10/20/2015 05:47 PM, Sage Weil wrote:
On Tue, 20 Oct 2015, Gregory Farnum wrote:
On Tue, Oct 20, 2015 at 12:44 PM, Sage Weil wrote:
On Tue, 20 Oct 2015, Ric Wheeler wrote:
The big problem with consuming block devices directly is that you ultimately
end up recreating most
:55 PM
To: Somnath Roy
Cc: ceph-devel@vger.kernel.org
Subject: RE: newstore direction
On Mon, 19 Oct 2015, Somnath Roy wrote:
Sage,
I fully support that. If we want to saturate SSDs , we need to get
rid of this filesystem overhead (which I am in process of measuring).
Also, it will be good if we
Sage,
I fully support that. If we want to saturate SSDs , we need to get rid of this
filesystem overhead (which I am in process of measuring).
Also, it will be good if we can eliminate the dependency on the k/v dbs (for
storing allocators and all). The reason is the unknown write amps they
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
I think there is a lot that can be gained by Ceph managing a raw block
device. As I mentioned on ceph-users, I've given this some though and
a lot of optimizations could be done that is conducive to storing
objects. I didn't think however to bypass
On Mon, 19 Oct 2015, Somnath Roy wrote:
> Sage,
> I fully support that. If we want to saturate SSDs , we need to get rid
> of this filesystem overhead (which I am in process of measuring). Also,
> it will be good if we can eliminate the dependency on the k/v dbs (for
> storing allocators and
On 10/19/2015 09:49 PM, Sage Weil wrote:
> The current design is based on two simple ideas:
>
> 1) a key/value interface is better way to manage all of our internal
> metadata (object metadata, attrs, layout, collection membership,
> write-ahead logging, overlay data, etc.)
>
> 2) a file
Hi Sage,
If we are managing the raw device, does it make sense to have a key value store
to manage the whole space?
Having metadata of the allocator might cause some other problems of
consistency. Getting an fsck for that implementation can be tougher, we might
have to have strict crc
To: Somnath Roy
Cc: ceph-devel@vger.kernel.org
Subject: RE: newstore direction
On Mon, 19 Oct 2015, Somnath Roy wrote:
> Sage,
> I fully support that. If we want to saturate SSDs , we need to get
> rid of this filesystem overhead (which I am in process of measuring).
> Also, it will be go
On Mon, Oct 19, 2015 at 8:49 PM, Sage Weil wrote:
> - We have to size the kv backend storage (probably still an XFS
> partition) vs the block storage. Maybe we do this anyway (put metadata on
> SSD!) so it won't matter. But what happens when we are storing gobs of
> rgw index
el.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Monday, October 19, 2015 1:55 PM
> To: Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: newstore direction
>
> On Mon, 19 Oct 2015, Somnath Roy wrote:
> > Sage,
> > I
ri
> Sent: Tuesday, October 20, 2015 10:33 AM
> To: James (Fei) Liu-SSI; Sage Weil; Somnath Roy
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: newstore direction
>
> Hi James,
>
> Are you mentioning SCSI OSD (http://www.t10.org/drafts.htm#OSD_Family) ?
> If SCSI OSD i
.com>; Somnath Roy
> <somnath@sandisk.com>
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: newstore direction
>
> Hi Sage and Somnath,
> In my humble opinion, There is another more aggressive solution than raw
> block device base keyvalue store as backend
On Tue, Oct 20, 2015 at 3:49 AM, Sage Weil wrote:
> The current design is based on two simple ideas:
>
> 1) a key/value interface is better way to manage all of our internal
> metadata (object metadata, attrs, layout, collection membership,
> write-ahead logging, overlay data,
69 matches
Mail list logo