Re: Fwd: how io works when backfill

2015-12-28 Thread Dong Wu
if add in osd.7 and 7 becomes the primary: pg1.0 [1, 2, 3]  --> pg1.0
[7, 2, 3],  is it similar with the example above?
still install a pg_temp entry mapping the PG back to [1, 2, 3], then
backfill happens to 7, normal io write to [1, 2, 3], if io to the
portion of the PG that has already been backfilled will also be sent
to osd.7?

how about these examples about removing an osd:
- pg1.0 [1, 2, 3]
- osd.3 down and be removed
- mapping changes to [1, 2, 5], but osd.5 has no data, then install a
pg_temp mapping the PG back to [1, 2], then backfill happens to 5,
- normal io write to [1, 2], if io hits object which has been
backfilled to osd.5, io will also send to osd.5
- when backfill completes, remove the pg_temp and mapping changes back
to [1, 2, 5]


another example:
- pg1.0 [1, 2, 3]
- osd.3 down and be removed
- mapping changes to [5, 1, 2], but osd.5 has no data of the pg, then
install a pg_temp mapping the PG back to [1, 2] which osd.1
temporarily becomes the primary, then backfill happens to 5,
- normal io write to [1, 2], if io hits object which has been
backfilled to osd.5, io will also send to osd.5
- when backfill completes, remove the pg_temp and mapping changes back
to [5, 1, 2]

is my ananysis right?

2015-12-29 1:30 GMT+08:00 Sage Weil :
> On Mon, 28 Dec 2015, Zhiqiang Wang wrote:
>> 2015-12-27 20:48 GMT+08:00 Dong Wu :
>> > Hi,
>> > When add osd or remove osd, ceph will backfill to rebalance data.
>> > eg:
>> > - pg1.0[1, 2, 3]
>> > - add an osd(eg. osd.7)
>> > - ceph start backfill, then pg1.0 osd set changes to [1, 2, 7]
>> > - if [a, b, c, d, e] are objects needing to backfill to osd.7 and now
>> > object a is backfilling
>> > - when a write io hits object a, then the io needs to wait for its
>> > complete, then goes on.
>> > - but if io hits object b which has not been backfilled, io reaches
>> > osd.1, then osd.1 send the io to osd.2  and osd.7, but osd.7 does not
>> > have object b, so osd.7 needs to wait for object b to backfilled, then
>> > write. Is it right? Or osd.1 only send the io to osd.2, not both?
>>
>> I think in this case, when the write of object b reaches osd.1, it
>> holds the client write, raises the priority of the recovery of object
>> b, and kick off the recovery of it. When the recovery of object b is
>> done, it requeue the client write, and then everything goes like
>> usual.
>
> It's more complicated than that.  In a normal (log-based) recovery
> situation, it is something like the above: if the acting set is [1,2,3]
> but 3 is missing the latest copy of A, a write to A will block on the
> primary while the primary initiates recovery of A immediately.  Once that
> completes the IO will continue.
>
> For backfill, it's different.  In your example, you start with [1,2,3]
> then add in osd.7.  The OSD will see that 7 has no data for teh PG and
> install a pg_temp entry mapping the PG back to [1,2,3] temporarily.  Then
> things will proceed normally while backfill happens to 7.  Backfill won't
> interfere with normal IO at all, except that IO to the portion of the PG
> that has already been backfilled will also be sent to the backfill target
> (7) so that it stays up to date.  Once it complets, the pg_temp entry is
> removed and the mapping changes back to [1,2,7].  Then osd.3 is allowed to
> remove it's copy of the PG.
>
> sage
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: how io works when backfill

2015-12-28 Thread Zhiqiang Wang
2015-12-27 20:48 GMT+08:00 Dong Wu :
> Hi,
> When add osd or remove osd, ceph will backfill to rebalance data.
> eg:
> - pg1.0[1, 2, 3]
> - add an osd(eg. osd.7)
> - ceph start backfill, then pg1.0 osd set changes to [1, 2, 7]
> - if [a, b, c, d, e] are objects needing to backfill to osd.7 and now
> object a is backfilling
> - when a write io hits object a, then the io needs to wait for its
> complete, then goes on.
> - but if io hits object b which has not been backfilled, io reaches
> osd.1, then osd.1 send the io to osd.2  and osd.7, but osd.7 does not
> have object b, so osd.7 needs to wait for object b to backfilled, then
> write. Is it right? Or osd.1 only send the io to osd.2, not both?

I think in this case, when the write of object b reaches osd.1, it
holds the client write, raises the priority of the recovery of object
b, and kick off the recovery of it. When the recovery of object b is
done, it requeue the client write, and then everything goes like
usual.

> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Speeding up rbd_stat() in libvirt

2015-12-28 Thread Wido den Hollander
Hi,

The storage pools of libvirt know a mechanism called 'refresh' which
will scan a storage pool to refresh the contents.

The current implementation does:
* List all images via rbd_list()
* Call rbd_stat() on each image

Source:
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/storage/storage_backend_rbd.c;h=cdbfdee98505492407669130712046783223c3cf;hb=master#l329

This works, but a RBD pool with 10k images takes a couple of minutes to
scan.

Now, Ceph is distributed, so this could be done in parallel, but before
I start on this I was wondering if somebody had a good idea to fix this?

I don't know if it is allowed in libvirt to spawn multiple threads and
have workers do this, but it was something which came to mind.

libvirt only wants to know the size of a image and this is now stored in
the rbd_directory object, so the rbd_stat() is required.

Suggestions or ideas? I would like to have this process to be as fast as
possible.

Wido
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FreeBSD Building and Testing

2015-12-28 Thread Willem Jan Withagen

Hi,

Can somebody try to help me and explain why

in test: Func: test/mon/osd-crash
Func: TEST_crush_reject_empty started

Fails with a python error which sort of startles me:
test/mon/osd-crush.sh:227: TEST_crush_reject_empty:  local 
empty_map=testdir/osd-crush/empty_map

test/mon/osd-crush.sh:228: TEST_crush_reject_empty:  :
test/mon/osd-crush.sh:229: TEST_crush_reject_empty:  ./crushtool -c 
testdir/osd-crush/empty_map.txt -o testdir/osd-crush/empty_map.m

ap
test/mon/osd-crush.sh:230: TEST_crush_reject_empty:  expect_failure 
testdir/osd-crush 'Error EINVAL' ./ceph osd setcrushmap -i testd

ir/osd-crush/empty_map.map
../qa/workunits/ceph-helpers.sh:1171: expect_failure:  local 
dir=testdir/osd-crush

../qa/workunits/ceph-helpers.sh:1172: expect_failure:  shift
../qa/workunits/ceph-helpers.sh:1173: expect_failure:  local 
'expected=Error EINVAL'

../qa/workunits/ceph-helpers.sh:1174: expect_failure:  shift
../qa/workunits/ceph-helpers.sh:1175: expect_failure:  local success
../qa/workunits/ceph-helpers.sh:1176: expect_failure:  pwd
../qa/workunits/ceph-helpers.sh:1177: expect_failure:  printenv
../qa/workunits/ceph-helpers.sh:1178: expect_failure:  echo ./ceph osd 
setcrushmap -i testdir/osd-crush/empty_map.map
../qa/workunits/ceph-helpers.sh:1180: expect_failure:  ./ceph osd 
setcrushmap -i testdir/osd-crush/empty_map.map

*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
Traceback (most recent call last):
  File "./ceph", line 936, in 
retval = main()
  File "./ceph", line 874, in main
sigdict, inbuf, verbose)
  File "./ceph", line 457, in new_style_command
inbuf=inbuf)
  File 
"/usr/srcs/Ceph/wip-freebsd-wjw/ceph/src/pybind/ceph_argparse.py", line 
1208, in json_command

raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
RuntimeError: "{'prefix': u'osd setcrushmap'}": exception "['{"prefix": 
"osd setcrushmap"}']": exception 'utf8' codec can't decode b

yte 0x86 in position 56: invalid start byte

Which is certainly not the type of error expected.
But it is hard to detect any 0x86 in the arguments.

And yes python is right, there are no UTF8 sequences that start with 0x86.
Question is:
Why does it want to parse with UTF8?
And how do I switch it off?
Or how to I fix this error?

Thanx,
--WjW
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: how io works when backfill

2015-12-28 Thread Sage Weil
On Mon, 28 Dec 2015, Zhiqiang Wang wrote:
> 2015-12-27 20:48 GMT+08:00 Dong Wu :
> > Hi,
> > When add osd or remove osd, ceph will backfill to rebalance data.
> > eg:
> > - pg1.0[1, 2, 3]
> > - add an osd(eg. osd.7)
> > - ceph start backfill, then pg1.0 osd set changes to [1, 2, 7]
> > - if [a, b, c, d, e] are objects needing to backfill to osd.7 and now
> > object a is backfilling
> > - when a write io hits object a, then the io needs to wait for its
> > complete, then goes on.
> > - but if io hits object b which has not been backfilled, io reaches
> > osd.1, then osd.1 send the io to osd.2  and osd.7, but osd.7 does not
> > have object b, so osd.7 needs to wait for object b to backfilled, then
> > write. Is it right? Or osd.1 only send the io to osd.2, not both?
> 
> I think in this case, when the write of object b reaches osd.1, it
> holds the client write, raises the priority of the recovery of object
> b, and kick off the recovery of it. When the recovery of object b is
> done, it requeue the client write, and then everything goes like
> usual.

It's more complicated than that.  In a normal (log-based) recovery 
situation, it is something like the above: if the acting set is [1,2,3] 
but 3 is missing the latest copy of A, a write to A will block on the 
primary while the primary initiates recovery of A immediately.  Once that 
completes the IO will continue.

For backfill, it's different.  In your example, you start with [1,2,3] 
then add in osd.7.  The OSD will see that 7 has no data for teh PG and 
install a pg_temp entry mapping the PG back to [1,2,3] temporarily.  Then 
things will proceed normally while backfill happens to 7.  Backfill won't 
interfere with normal IO at all, except that IO to the portion of the PG 
that has already been backfilled will also be sent to the backfill target 
(7) so that it stays up to date.  Once it complets, the pg_temp entry is 
removed and the mapping changes back to [1,2,7].  Then osd.3 is allowed to 
remove it's copy of the PG.

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to configure if there are tow network cards in Client

2015-12-28 Thread Sage Weil
On Fri, 25 Dec 2015, ?? wrote:
> Hi all,
> When we read the code, we haven?t find the function that the client can 
> bind a specific IP. In Ceph?s configuration, we could only find the parameter 
> ?public network?, but it seems acts on the OSD but not the client.
> There is a scenario that the client has two network cards named NIC1 and 
> NIC2. The NIC1 is responsible for communicating with cluster (monitor and 
> RADOS) and the NIC2 has other services except Ceph?s client. So   we need the 
> client can bind specific IP in order to differentiate the IP communicating 
> with cluster from another IP serving other applications. We want to know is 
> there any configuration in Ceph to achieve this function? If there is, how 
> could we configure the IP? if not, could we add this function in Ceph? Thank 
> you so much.

Right.  There isn't a configurable to do this now--we've always just let 
the kernel network layer sort it out. Is this just a matter of calling 
bind on the socket before connecting? I've never done this before..

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph branch status

2015-12-28 Thread ceph branch robot
-- All Branches --

Abhishek Varshney 
2015-11-23 11:45:29 +0530   infernalis-backports

Adam C. Emerson 
2015-12-21 16:51:39 -0500   wip-cxx11concurrency

Adam Crume 
2014-12-01 20:45:58 -0800   wip-doc-rbd-replay

Alfredo Deza 
2015-03-23 16:39:48 -0400   wip-11212
2015-12-23 11:25:13 -0500   wip-doc-style

Alfredo Deza 
2014-07-08 13:58:35 -0400   wip-8679
2014-09-04 13:58:14 -0400   wip-8366
2014-10-13 11:10:10 -0400   wip-9730

Ali Maredia 
2015-11-25 13:45:29 -0500   wip-10587-split-servers
2015-12-23 12:01:46 -0500   wip-cmake
2015-12-23 16:12:47 -0500   wip-cmake-rocksdb

Barbora Ančincová 
2015-11-04 16:43:45 +0100   wip-doc-RGW

Boris Ranto 
2015-09-04 15:19:11 +0200   wip-bash-completion

Daniel Gryniewicz 
2015-11-11 09:06:00 -0500   wip-rgw-storage-class
2015-12-09 12:56:37 -0500   cmake-dang

Danny Al-Gaaf 
2015-04-23 16:32:00 +0200   wip-da-SCA-20150421
2015-04-23 17:18:57 +0200   wip-nosetests
2015-04-23 18:20:16 +0200   wip-unify-num_objects_degraded
2015-11-03 14:10:47 +0100   wip-da-SCA-20151029
2015-11-03 14:40:44 +0100   wip-da-SCA-20150910

David Zafman 
2014-08-29 10:41:23 -0700   wip-libcommon-rebase
2015-04-24 13:14:23 -0700   wip-cot-giant
2015-09-28 11:33:11 -0700   wip-12983
2015-12-22 16:19:25 -0800   wip-zafman-testing

Dongmao Zhang 
2014-11-14 19:14:34 +0800   thesues-master

Greg Farnum 
2015-04-29 21:44:11 -0700   wip-init-names
2015-07-16 09:28:24 -0700   hammer-12297
2015-10-02 13:00:59 -0700   greg-infernalis-lock-testing
2015-10-02 13:09:05 -0700   greg-infernalis-lock-testing-cacher
2015-10-07 00:45:24 -0700   greg-infernalis-fs
2015-10-21 17:43:07 -0700   client-pagecache-norevoke
2015-10-27 11:32:46 -0700   hammer-pg-replay
2015-11-24 07:17:33 -0800   greg-fs-verify
2015-12-11 00:24:40 -0800   greg-fs-testing

Greg Farnum 
2014-10-23 13:33:44 -0700   wip-forward-scrub

Guang G Yang 
2015-06-26 20:31:44 +   wip-ec-readall
2015-07-23 16:13:19 +   wip-12316

Guang Yang 
2014-09-25 00:47:46 +   wip-9008
2015-10-20 15:30:41 +   wip-13441

Haomai Wang 
2015-10-26 00:02:04 +0800   wip-13521

Haomai Wang 
2014-07-27 13:37:49 +0800   wip-flush-set
2015-04-20 00:47:59 +0800   update-organization
2015-07-21 19:33:56 +0800   fio-objectstore
2015-08-26 09:57:27 +0800   wip-recovery-attr
2015-10-24 23:39:07 +0800   fix-compile-warning

Hector Martin 
2015-12-03 03:07:02 +0900   wip-cython-rbd

Ilya Dryomov 
2014-09-05 16:15:10 +0400   wip-rbd-notify-errors

Ivo Jimenez 
2015-08-24 23:12:45 -0700   hammer-with-new-workunit-for-wip-12551

James Page 
2015-11-04 11:08:42 +   javacruft-wip-ec-modules

Jason Dillaman 
2015-08-31 23:17:53 -0400   wip-12698
2015-11-13 02:00:21 -0500   wip-11287-rebased

Jenkins 
2015-11-04 14:31:13 -0800   rhcs-v0.94.3-ubuntu

Jenkins 
2014-07-29 05:24:39 -0700   wip-nhm-hang
2014-10-14 12:10:38 -0700   wip-2
2015-02-02 10:35:28 -0800   wip-sam-v0.92
2015-08-21 12:46:32 -0700   last
2015-08-21 12:46:32 -0700   loic-v9.0.3
2015-09-15 10:23:18 -0700   rhcs-v0.80.8
2015-09-21 16:48:32 -0700   rhcs-v0.94.1-ubuntu

Joao Eduardo Luis 
2014-09-10 09:39:23 +0100   wip-leveldb-get.dumpling

Joao Eduardo Luis 
2014-07-22 15:41:42 +0100   wip-leveldb-misc

Joao Eduardo Luis 
2014-09-02 17:19:52 +0100   wip-leveldb-get
2014-10-17 16:20:11 +0100   wip-paxos-fix
2014-10-21 21:32:46 +0100   wip-9675.dumpling
2015-07-27 21:56:42 +0100   wip-11470.hammer
2015-09-09 15:45:45 +0100   wip-11786.hammer

Joao Eduardo Luis 
2014-11-17 16:43:53 +   wip-mon-osdmap-cleanup
2014-12-15 16:18:56 +   wip-giant-mon-backports
2014-12-17 17:13:57 +   wip-mon-backports.firefly
2014-12-17 23:15:10 +   wip-mon-sync-fix.dumpling
2015-01-07 23:01:00 +   wip-mon-blackhole-mlog-0.87.7
2015-01-10 02:40:42 +   wip-dho-joao
2015-01-10 02:46:31 +   

Cordial greeting

2015-12-28 Thread Zahra Robert



Cordial greeting message from Fatima, I am seeking for your help,I will be
very glad if you do assist me to relocate a sum of (US$4 Million Dollars)
into your Bank account in your country for the benefit of both of us i
want to use this money for investment. I will give you more details as you
reply Yours Eva Zahra Robert

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CEPH build

2015-12-28 Thread Odintsov Vladislav
Hi,

resending my letter.
Thank you for the attention.


Best regards,

Vladislav Odintsov


From: Sage Weil 
Sent: Monday, December 28, 2015 19:49
To: Odintsov Vladislav
Subject: Re: CEPH build

Can you resend this to ceph-devel, and copy ad...@redhat.com?

On Fri, 25 Dec 2015, Odintsov Vladislav wrote:

>
> Hi, Sage!
>
>
> I'm working at Cloud provider as a system engineer, and now
> I'm trying to build different versions of CEPH (0.94, 9.2, 10.0) with libxio
> enabled, and I've got a problem with understanding, how do ceph maintainers
> create official tarballs and builds from git repo.
>
> I saw you as a maintainer of build related files in a repo, and thought you
> can help me :) If I'm wrong, please, say me, who can do it.
>
> I've found very many information sources with different description of ceph
> build process:
>
> - https://github.com/ceph/ceph-build
>
> - https://github.com/ceph/autobuild-ceph
>
> - documentation on ceph.docs.
>
>
> But I'm unable to get the same tarball as
> at http://download.ceph.com/tarballs/
>
> for example for version v0.94.5. What else should I read? Or, maybe there is
> some magic...)
>
>
> Actually, I want understand how official builds are made (which tools), I'd
> like to go through all build related steps by myself to understand the
> upstream building process.
>
>
> Thanks a lot for your help!
>
>
> 
> Best regards,
>
> Vladislav Odintsov
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html