Re: [ceph-users] Debian repo down?

2015-09-29 Thread Wido den Hollander
On 09/26/2015 03:58 PM, Iban Cabrillo wrote:
> HI cepher,
>   I am getting download error form debian repos (I check it with firefly
> and hammer) :
> 
>   W: Failed to fetch http://ceph.com/debian-hammer/dists/trusty/InRelease
> 
> W: Failed to fetch http://ceph.com/debian-hammer/dists/trusty/Release.gpg
>  Cannot initiate the connection to download.ceph.com:80
> (2607:f298:6050:51f3:f816:3eff:fe50:5ec). - connect (101: Network is
> unreachable) [IP: 2607:f298:6050:51f3:f816:3eff:fe50:5ec 80]
> 
> URL is not available from browser either (http://ceph.com/debian-).
> 

Seems like a IPv6 routing issue. If you need, you can always eu.ceph.com
to download your packages.

> Saludos
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Storage Cluster on Amazon EC2 across different regions

2015-09-29 Thread Christian Balzer

Hello,

On Tue, 29 Sep 2015 10:21:00 +0200 Raluca Halalai wrote:

> In my understanding, the deployment you suggested (local Ceph clusters +
> Rados Gateways) would imply giving up strong consistency guarantees. In
> that case, it is not what we are aiming for.
>
Indeed, there is another planned project to replicate Ceph clusters on a
RBD level (not rados-gw), but with similar constraints.
Personally, I'd be happy with a slightly lagging replication like that.

A WAN storage without quantum entanglement data links will always be
slower than what most Ceph users are able to accept.
 
Christian

> Thank you for your replies.
> 
> On Tue, Sep 29, 2015 at 10:02 AM, Robert Sander <
> r.san...@heinlein-support.de> wrote:
> 
> > On 29.09.2015 09:54, Raluca Halalai wrote:
> > > What do you want to prove with such a setup?
> > >
> > >
> > > It's for research purposes. We are trying different storage systems
> > > in a WAN environment.
> >
> > Then Ceph can be ticked off the list of candidates.
> > Its purpose is not to be a WAN storage system.
> >
> > It would be different if you setup local Ceph clusters and have Rados
> > Gateways (S3 Interfaces) interact with them (geo replication with the
> > radosgw agent).
> >
> > Regards
> > --
> > Robert Sander
> > Heinlein Support GmbH
> > Schwedter Str. 8/9b, 10119 Berlin
> >
> > http://www.heinlein-support.de
> >
> > Tel: 030 / 405051-43
> > Fax: 030 / 405051-19
> >
> > Zwangsangaben lt. §35a GmbHG:
> > HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> > Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> 
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Peering algorithm questions

2015-09-29 Thread Balázs Kossovics
Hey!

I'm trying to understand the peering algorithm based on [1] and [2]. There
are things that aren't really clear or I'm not entirely sure if I
understood them correctly, so I'd like to ask some clarification on the
points below:

1, Is it right, that the primary writes the operations to the PG log
immediately upon its reception?

2, Is it possible that an operation is persisted, but never acknowledged?
Imagine this situation: a write arrives to an object, the operation is
copied to and get written to the journal by the replicas, but the primary
OSD dies and never recovers before it could acknowledge to the user. Upon
the next peering, this operations will make part of the authoritative
history?

3, Quote from the second step of the peering algorithm: "generate a list of
past intervals since last epoch started"
If there was no peering failure, than there is exactly one past interval?

4, Quote from the same step: "the subset for which peering could have
completed before the acting set changed to another set of OSDs".
The other intervals are ignored, because we can be sure that no write
operations were allowed during those?

5, In each moment, the Up set is either equals to, or a strict subset of
the Acting set?

6, When does OSDs repeer? Only when an OSD goes from in -> out, or even if
an OSD goes down (but not yet marked automatically out)?

7, For what reasons can the peering fail? If the OSD map changes before the
peering completes, then it's a failure? If the OSD map doesn't change, then
a reason for failure is not being able to contact "at least one OSD from
each of past interval‘s acting set"?

8, up_thru: is a per OSD value in the OSD map, which is updated for the
primary after successfully agreeing on the authoritative history, but
before completing the peering. What about the secondaries?

Thanks,
Balázs Kossovics

[1] http://docs.ceph.com/docs/master/dev/peering/
[2] http://docs.ceph.com/docs/master/dev/osd_internals/last_epoch_started/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Consulting

2015-09-29 Thread Robert Sander
On 28.09.2015 20:47, Robert LeBlanc wrote:
> Ceph consulting was provided by Inktank[1], but the Inktank website is
> down. How do we go about getting consulting services now?

Have a look at the RedHat site for Ceph:

https://www.redhat.com/en/technologies/storage/ceph

There are also several independent consulting companies which provide
Ceph support.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Storage Cluster on Amazon EC2 across different regions

2015-09-29 Thread Robert Sander
On 29.09.2015 09:54, Raluca Halalai wrote:
> What do you want to prove with such a setup?
> 
> 
> It's for research purposes. We are trying different storage systems in a
> WAN environment.

Then Ceph can be ticked off the list of candidates.
Its purpose is not to be a WAN storage system.

It would be different if you setup local Ceph clusters and have Rados
Gateways (S3 Interfaces) interact with them (geo replication with the
radosgw agent).

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Storage Cluster on Amazon EC2 across different regions

2015-09-29 Thread Robert Sander
On 28.09.2015 19:55, Raluca Halalai wrote:

> I am trying to deploy a Ceph Storage Cluster on Amazon EC2, in different
> regions.

Don't do this.

What do you want to prove with such a setup?

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW: Can't download a big file

2015-09-29 Thread Luis Periquito
I'm having some issues downloading a big file (60G+).

After some investigation it seems to be very similar to
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001272.html,
however I'm currently running Hammer 0.94.3. However the files were
uploaded when the cluster was running Firefly (IIRC 0.80.7).

>From the radosgw --debug-ms=1 --debug-rgw=20
[...]
2015-09-28 16:08:14.596299 7f6361f93700  1 -- 10.248.33.13:0/1004902 -->
10.249.33.12:6814/15506 -- osd_op(client.129331531.0:817
default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_2 [read 0~4194304]
9.7a8bb21d ack+read+known_if_redirected e59831) v5 -- ?+0 0x7f63e507aca0
con 0x7f63e4d2ef50
2015-09-28 16:08:14.596361 7f6361f93700 20 rados->aio_operate r=0
bl.length=0
2015-09-28 16:08:14.596371 7f6361f93700 20 rados->get_obj_iterate_cb
oid=default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_3
obj-ofs=1795162112 read_ofs=0 len=4194304
2015-09-28 16:08:14.596385 7f6361f93700  1 -- 10.248.33.13:0/1004902 -->
10.249.33.14:6825/14173 -- osd_op(client.129331531.0:818
default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_3 [read 0~4194304]
9.6407682a ack+read+known_if_redirected e59831) v5 -- ?+0 0x7f63e4e77370
con 0x7f63e4d8aa60
2015-09-28 16:08:14.596421 7f6361f93700 20 rados->aio_operate r=0
bl.length=0
2015-09-28 16:08:14.596445 7f6361f93700 20 rados->get_obj_iterate_cb
oid=default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_4
obj-ofs=1799356416 read_ofs=0 len=4194304
2015-09-28 16:08:14.596474 7f6361f93700  1 -- 10.248.33.13:0/1004902 -->
10.249.33.39:6805/28145 -- osd_op(client.129331531.0:819
default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_4 [read 0~4194304]
9.ceaa43a4 ack+read+known_if_redirected e59831) v5 -- ?+0 0x7f63e4e77370
con 0x7f63e4f94220
2015-09-28 16:08:14.596502 7f6361f93700 20 rados->aio_operate r=0
bl.length=0
2015-09-28 16:08:14.597073 7f6337d30700  1 -- 10.248.33.13:0/1004902 <==
osd.48 10.248.33.12:6821/27677 9  osd_op_reply(816
default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_1 [read 0~4194304]
v0'0 uv0 ack = -2 ((2) No such file or directory)) v6  260+0+0
(3636490520 0 0) 0x7f6414011d30 con 0x7f63e4d9dc10
2015-09-28 16:08:14.597152 7f640bfff700 20 get_obj_aio_completion_cb: io
completion ofs=1786773504 len=4194304
2015-09-28 16:08:14.597159 7f640bfff700  0 ERROR: got unexpected error when
trying to read object: -2
2015-09-28 16:08:14.597184 7f6361f93700 20 get_obj_data::cancel_all_io()
2015-09-28 16:08:14.597234 7f6339641700  1 -- 10.248.33.13:0/1004902 <==
osd.42 10.249.33.12:6814/15506 8  osd_op_reply(817
default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_2 [read 0~4194304]
v0'0 uv0 ack = -2 ((2) No such file or directory)) v6  260+0+0
(4010761036 0 0) 0x7f63fc011820 con 0x7f63e4d2ef50
2015-09-28 16:08:14.597257 7f640bfff700 20 get_obj_aio_completion_cb: io
completion ofs=1790967808 len=4194304
2015-09-28 16:08:14.597263 7f640bfff700  0 ERROR: got unexpected error when
trying to read object: -2
2015-09-28 16:08:14.597298 7f6336320700  1 -- 10.248.33.13:0/1004902 <==
osd.23 10.249.33.14:6825/14173 8  osd_op_reply(818
default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_3 [read 0~4194304]
v0'0 uv0 ack = -2 ((2) No such file or directory)) v6  260+0+0 (8601650
0 0) 0x7f63ec015050 con 0x7f63e4d8aa60
2015-09-28 16:08:14.597323 7f640bfff700 20 get_obj_aio_completion_cb: io
completion ofs=1795162112 len=4194304
2015-09-28 16:08:14.597326 7f640bfff700  0 ERROR: got unexpected error when
trying to read object: -2
2015-09-28 16:08:14.597177 7f6336d26700  1 -- 10.248.33.13:0/1004902 <==
osd.66 10.249.33.39:6805/28145 9  osd_op_reply(819
default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_4 [read 0~4194304]
v0'0 uv0 ack = -2 ((2) No such file or directory)) v6  260+0+0
(3449160520 0 0) 0x7f63d80173e0 con 0x7f63e4f94220
2015-09-28 16:08:14.597338 7f640bfff700 20 get_obj_aio_completion_cb: io
completion ofs=1799356416 len=4194304
2015-09-28 16:08:14.597339 7f640bfff700  0 ERROR: got unexpected error when
trying to read object: -2
2015-09-28 16:08:14.597507 7f6361f93700  2 req 1:32.448590:s3:GET
/Exchange%20Stores/Directors.edb:get_obj:http status=404
2015-09-28 16:08:14.597515 7f6361f93700  1 == req done
req=0x7f6420019d70 http_status=404 ==


doing a rados ls -p .rgw.buckets and searching for the ".35_" parts I only
get these...

default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_11
default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_10
default.26848827.2__shadow_Exchange
Stores/Directors.edb.WA70vanNXi08cmwrdmURjXFIUCgaIVv.35_12

Re: [ceph-users] CephFS Attributes Question Marks

2015-09-29 Thread John Spray
Hmm, so apparently a similar bug was fixed in 0.87: Scott, can you confirm
that your *clients* were 0.94 (not just the servers)?

Thanks,
John

On Tue, Sep 29, 2015 at 11:56 AM, John Spray  wrote:

> Ah, this is a nice clear log!
>
> I've described the bug here:
> http://tracker.ceph.com/issues/13271
>
> In the short term, you may be able to mitigate this by increasing
> client_cache_size (on the client) if your RAM allows it.
>
> John
>
> On Tue, Sep 29, 2015 at 12:58 AM, Scottix  wrote:
>
>> I know this is an old one but I got a log in ceph-fuse for it.
>> I got this on OpenSuse 12.1
>> 3.1.10-1.29-desktop
>>
>> Using ceph-fuse
>> ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>>
>> I am running an rsync in the background and then doing a simple ls -la so
>> the log is large.
>>
>> I am guessing this is the problem. The file is there and if I list the
>> directory again it shows up properly.
>>
>> 2015-09-28 16:34:21.548631 7f372effd700  3 client.28239198 ll_lookup
>> 0x7f370d1b1c50 data.2015-08-23_00-00-00.csv.bz2
>> 2015-09-28 16:34:21.548635 7f372effd700 10 client.28239198 _lookup
>> concluded ENOENT locally for 19d72a1.head(ref=4 ll_ref=5 cap_refs={}
>> open={} mode=42775 size=0/0 mtime=2015-09-28 05:57:57.259306
>> caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE parents=0x7f3732ff97c0 0x7f370d1b1c50)
>> dn 'data.2015-08-23_00-00-00.csv.bz2'
>>
>>
>> [image: Selection_034.png]
>>
>> It seems to show up more if multiple things are access the ceph mount,
>> just my observations.
>>
>> Best,
>> Scott
>>
>> On Tue, Mar 3, 2015 at 3:05 PM Scottix  wrote:
>>
>>> Ya we are not at 0.87.1 yet, possibly tomorrow. I'll let you know if it
>>> still reports the same.
>>>
>>> Thanks John,
>>> --Scottie
>>>
>>>
>>> On Tue, Mar 3, 2015 at 2:57 PM John Spray  wrote:
>>>
 On 03/03/2015 22:35, Scottix wrote:
 > I was testing a little bit more and decided to run the
 cephfs-journal-tool
 >
 > I ran across some errors
 >
 > $ cephfs-journal-tool journal inspect
 > 2015-03-03 14:18:54.453981 7f8e29f86780 -1 Bad entry start ptr
 > (0x2aebf6) at 0x2aeb32279b
 > 2015-03-03 14:18:54.539060 7f8e29f86780 -1 Bad entry start ptr
 > (0x2aeb000733) at 0x2aeb322dd8
 > 2015-03-03 14:18:54.584539 7f8e29f86780 -1 Bad entry start ptr
 > (0x2aeb000d70) at 0x2aeb323415
 > 2015-03-03 14:18:54.669991 7f8e29f86780 -1 Bad entry start ptr
 > (0x2aeb0013ad) at 0x2aeb323a52
 > 2015-03-03 14:18:54.707724 7f8e29f86780 -1 Bad entry start ptr
 > (0x2aeb0019ea) at 0x2aeb32408f
 > Overall journal integrity: DAMAGED

 I expect this is http://tracker.ceph.com/issues/9977, which is fixed in
 master.

 You are in *very* bleeding edge territory here, and I'd suggest using
 the latest development release if you want to experiment with the latest
 CephFS tooling.

 Cheers,
 John

>>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [sepia] debian jessie repository ?

2015-09-29 Thread Gregory Farnum
On Tue, Sep 29, 2015 at 3:59 AM, Jogi Hofmüller  wrote:
> Hi,
>
> Am 2015-09-25 um 22:23 schrieb Udo Lembke:
>
>> you can use this sources-list
>>
>> cat /etc/apt/sources.list.d/ceph.list
>> deb http://gitbuilder.ceph.com/ceph-deb-jessie-x86_64-basic/ref/v0.94.3
>> jessie main
>
> The thing is:  whatever I write into ceph.list, ceph-deploy just
> overwrites it with "deb http://ceph.com/debian-hammer/ jessie main"
> which does not exist :(
>
> Here is what the log says after "ceph-deploy install:
>
> [ceph1][DEBUG ] Err http://ceph.com jessie/main amd64 Packages
> [ceph1][DEBUG ]   404  Not Found [IP:
> 2607:f298:6050:51f3:f816:3eff:fe50:5ec 80]
> [ceph1][DEBUG ] Ign http://ceph.com jessie/main Translation-en_US
> [ceph1][DEBUG ] Ign http://ceph.com jessie/main Translation-en
> [ceph1][WARNIN] W: Failed to fetch
> http://ceph.com/debian-hammer/dists/jessie/main/binary-amd64/Packages
> 404  Not Found [IP: 2607:f298:6050:51f3:f816:3eff:fe50:5ec 80]
> [ceph1][WARNIN]
> [ceph1][WARNIN] E: Some index files failed to download. They have been
> ignored, or old ones used instead.
> [ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 100
> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
> --assume-yes -q update

Can you create a ceph-deploy ticket at tracker.ceph.com, please?
And maybe make sure you're running the latest ceph-deploy, but
honestly I've no idea what it's doing these days or if this is a
resolved issue.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rsync broken?

2015-09-29 Thread Wido den Hollander
On 09/28/2015 12:55 PM, Paul Mansfield wrote:
> 
> Hi,
> 
> We used to rsync from eu.ceph.com into a local mirror for when we build
> our code. We need to re-do this to pick up fresh packages built since
> the intrusion.
> 
> it doesn't seem possible to rsync from any current ceph download site
> 

That's odd? I just tried this:

$ rsync -avr --stats --progress eu.ceph.com::ceph .

Worked just fine. What error did you get?

> thanks
> Paul
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [sepia] debian jessie repository ?

2015-09-29 Thread Björn Lässig

On 09/25/2015 03:10 PM, Jogi Hofmüller wrote:

Am 2015-09-11 um 13:20 schrieb Florent B:


Jessie repository will be available on next Hammer release ;)


An how should I continue installing ceph meanwhile?  ceph-deploy new ...
overwrites the /etc/apt/sources.list.d/ceph.list and hence throws an
error :(


I did intalling hammer on jessie a couple of weeks ago:

add
deb http://ceph.com/debian-hammer/ wheezy main

and add some debian-wheezy sources to your sources.list

do the ''ceph-deploy install'' step by hand. It is crap, but it works 
with aptitude in interactive mode. search for ceph - setup the hammer 
version and then solve the dependencies manually.
For the first time it takes 15 minutes, every repetition again 5 
minutes. If your cluster has 100 nodes ... do not try this unless you 
are bored to death ;-(


Björn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS file to rados object mapping

2015-09-29 Thread Andras Pataki
Thanks, that worked.  Is there a mapping in the other direction easily
available, I.e. To find where all the 4MB pieces of a file are?

On 9/28/15, 4:56 PM, "John Spray"  wrote:

>On Mon, Sep 28, 2015 at 9:46 PM, Andras Pataki
> wrote:
>> Hi,
>>
>> Is there a way to find out which radios objects a file in cephfs is
>>mapped
>> to from the command line?  Or vice versa, which file a particular radios
>> object belongs to?
>
>The part of the object name before the period is the inode number (in
>hex).
>
>John
>
>> Our ceph cluster has some inconsistencies/corruptions and I am trying to
>> find out which files are impacted in cephfs.
>>
>> Thanks,
>>
>> Andras
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Andrija Panic
Jiri,

if you colocate more Journals on 1 SSD (we do...), make sure to understand
the following:

- if SSD dies, all OSDs that had their journals on it, are lost...
- the more journals you put on single SSD (1 journal being 1 partition),
the worse performance, since total SSD performance is not i.e.
dedicated/available to only 1 journal, since you are now i.e. colocating 6
journals on 1 SSD...so perromance is 1/6 for each journal...

Latenc will go up, bandwith will go down, the more journals you colocate...
XFS recommended...

I suggest make balance between wanted performance and $$$ for SSDs...

best

On 29 September 2015 at 13:32, Jiri Kanicky  wrote:

> Hi Lionel.
>
> Thank you for your reply. In this case I am considering to create separate
> partitions for each disk on the SSD drive. Would be good to know what is
> the performance difference, because creating partitions is kind of waste of
> space.
>
> One more question, is it a good idea to move journal for 3 OSDs to a
> single SSD considering if SSD fails the whole node with 3 HDDs will be
> down? Thinking of it, leaving journal on each OSD might be safer, because
> journal on one disk does not affect other disks (OSDs). Or do you think
> that having the journal on SSD is better trade off?
>
> Thank you
> Jiri
>
>
> On 29/09/2015 21:10, Lionel Bouton wrote:
>
>> Le 29/09/2015 07:29, Jiri Kanicky a écrit :
>>
>>> Hi,
>>>
>>> Is it possible to create journal in directory as explained here:
>>>
>>> http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster
>>>
>> Yes, the general idea (stop, flush, move, update ceph.conf, mkjournal,
>> start) is valid for moving your journal wherever you want.
>> That said it probably won't perform as well on a filesystem (LVM as
>> lower overhead than a filesystem).
>>
>> 1. Create BTRFS over /dev/sda6 (assuming this is SSD partition alocate
>>> for journal) and mount it to /srv/ceph/journal
>>>
>> BTRFS is probably the worst idea for hosting journals. If you must use
>> BTRFS, you'll have to make sure that the journals are created NoCoW
>> before the first byte is ever written to them.
>>
>> 2. Add OSD: ceph-deploy osd create --fs-type btrfs
>>> ceph1:sdb:/srv/ceph/journal/osd$id/journal
>>>
>> I've no experience with ceph-deploy...
>>
>> Best regards,
>>
>> Lionel
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS file to rados object mapping

2015-09-29 Thread Gregory Farnum
The formula for objects in a file is .. So you'll have noticed they all look something like
12345.0001, 12345.0002, 12345.0003, ...

So if you've got a particular inode and file size, you can generate a
list of all the possible objects in it. To find the object->OSD
mapping you'd need to run crush, by making use of the crushtool or
similar.
-Greg

On Tue, Sep 29, 2015 at 6:29 AM, Andras Pataki
 wrote:
> Thanks, that worked.  Is there a mapping in the other direction easily
> available, I.e. To find where all the 4MB pieces of a file are?
>
> On 9/28/15, 4:56 PM, "John Spray"  wrote:
>
>>On Mon, Sep 28, 2015 at 9:46 PM, Andras Pataki
>> wrote:
>>> Hi,
>>>
>>> Is there a way to find out which radios objects a file in cephfs is
>>>mapped
>>> to from the command line?  Or vice versa, which file a particular radios
>>> object belongs to?
>>
>>The part of the object name before the period is the inode number (in
>>hex).
>>
>>John
>>
>>> Our ceph cluster has some inconsistencies/corruptions and I am trying to
>>> find out which files are impacted in cephfs.
>>>
>>> Thanks,
>>>
>>> Andras
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw and keystone version 3 domains

2015-09-29 Thread Robert Duncan
Hi Shinobu,

My keystone version is
2014.2.2

Thanks again.
Rob.

The information contained and transmitted in this e-mail is confidential 
information, and is intended only for the named recipient to which it is 
addressed. The content of this e-mail may not have been sent with the authority 
of National College of Ireland. Any views or opinions presented are solely 
those of the author and do not necessarily represent those of National College 
of Ireland. If the reader of this message is not the named recipient or a 
person responsible for delivering it to the named recipient, you are notified 
that the review, dissemination, distribution, transmission, printing or 
copying, forwarding, or any other use of this message or any part of it, 
including any attachments, is strictly prohibited. If you have received this 
communication in error, please delete the e-mail and destroy all record of this 
communication. Thank you for your assistance.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rsync broken?

2015-09-29 Thread Paul Mansfield
On 29/09/15 08:24, Wido den Hollander wrote:
>> We used to rsync from eu.ceph.com into a local mirror for when we build
> 
> $ rsync -avr --stats --progress eu.ceph.com::ceph .
> 
> Worked just fine. What error did you get?

a colleague asked me to post the message, I now am not sure what he
might have done wrong, as I asked only cursory questions.

I'll double-check what he's actually trying to do.

apologies for the distraction.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Jiri Kanicky

Hi Lionel.

Thank you for your reply. In this case I am considering to create 
separate partitions for each disk on the SSD drive. Would be good to 
know what is the performance difference, because creating partitions is 
kind of waste of space.


One more question, is it a good idea to move journal for 3 OSDs to a 
single SSD considering if SSD fails the whole node with 3 HDDs will be 
down? Thinking of it, leaving journal on each OSD might be safer, 
because journal on one disk does not affect other disks (OSDs). Or do 
you think that having the journal on SSD is better trade off?


Thank you
Jiri

On 29/09/2015 21:10, Lionel Bouton wrote:

Le 29/09/2015 07:29, Jiri Kanicky a écrit :

Hi,

Is it possible to create journal in directory as explained here:
http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster

Yes, the general idea (stop, flush, move, update ceph.conf, mkjournal,
start) is valid for moving your journal wherever you want.
That said it probably won't perform as well on a filesystem (LVM as
lower overhead than a filesystem).


1. Create BTRFS over /dev/sda6 (assuming this is SSD partition alocate
for journal) and mount it to /srv/ceph/journal

BTRFS is probably the worst idea for hosting journals. If you must use
BTRFS, you'll have to make sure that the journals are created NoCoW
before the first byte is ever written to them.


2. Add OSD: ceph-deploy osd create --fs-type btrfs
ceph1:sdb:/srv/ceph/journal/osd$id/journal

I've no experience with ceph-deploy...

Best regards,

Lionel



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Attributes Question Marks

2015-09-29 Thread John Spray
Ah, this is a nice clear log!

I've described the bug here:
http://tracker.ceph.com/issues/13271

In the short term, you may be able to mitigate this by increasing
client_cache_size (on the client) if your RAM allows it.

John

On Tue, Sep 29, 2015 at 12:58 AM, Scottix  wrote:

> I know this is an old one but I got a log in ceph-fuse for it.
> I got this on OpenSuse 12.1
> 3.1.10-1.29-desktop
>
> Using ceph-fuse
> ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>
> I am running an rsync in the background and then doing a simple ls -la so
> the log is large.
>
> I am guessing this is the problem. The file is there and if I list the
> directory again it shows up properly.
>
> 2015-09-28 16:34:21.548631 7f372effd700  3 client.28239198 ll_lookup
> 0x7f370d1b1c50 data.2015-08-23_00-00-00.csv.bz2
> 2015-09-28 16:34:21.548635 7f372effd700 10 client.28239198 _lookup
> concluded ENOENT locally for 19d72a1.head(ref=4 ll_ref=5 cap_refs={}
> open={} mode=42775 size=0/0 mtime=2015-09-28 05:57:57.259306
> caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE parents=0x7f3732ff97c0 0x7f370d1b1c50)
> dn 'data.2015-08-23_00-00-00.csv.bz2'
>
>
> [image: Selection_034.png]
>
> It seems to show up more if multiple things are access the ceph mount,
> just my observations.
>
> Best,
> Scott
>
> On Tue, Mar 3, 2015 at 3:05 PM Scottix  wrote:
>
>> Ya we are not at 0.87.1 yet, possibly tomorrow. I'll let you know if it
>> still reports the same.
>>
>> Thanks John,
>> --Scottie
>>
>>
>> On Tue, Mar 3, 2015 at 2:57 PM John Spray  wrote:
>>
>>> On 03/03/2015 22:35, Scottix wrote:
>>> > I was testing a little bit more and decided to run the
>>> cephfs-journal-tool
>>> >
>>> > I ran across some errors
>>> >
>>> > $ cephfs-journal-tool journal inspect
>>> > 2015-03-03 14:18:54.453981 7f8e29f86780 -1 Bad entry start ptr
>>> > (0x2aebf6) at 0x2aeb32279b
>>> > 2015-03-03 14:18:54.539060 7f8e29f86780 -1 Bad entry start ptr
>>> > (0x2aeb000733) at 0x2aeb322dd8
>>> > 2015-03-03 14:18:54.584539 7f8e29f86780 -1 Bad entry start ptr
>>> > (0x2aeb000d70) at 0x2aeb323415
>>> > 2015-03-03 14:18:54.669991 7f8e29f86780 -1 Bad entry start ptr
>>> > (0x2aeb0013ad) at 0x2aeb323a52
>>> > 2015-03-03 14:18:54.707724 7f8e29f86780 -1 Bad entry start ptr
>>> > (0x2aeb0019ea) at 0x2aeb32408f
>>> > Overall journal integrity: DAMAGED
>>>
>>> I expect this is http://tracker.ceph.com/issues/9977, which is fixed in
>>> master.
>>>
>>> You are in *very* bleeding edge territory here, and I'd suggest using
>>> the latest development release if you want to experiment with the latest
>>> CephFS tooling.
>>>
>>> Cheers,
>>> John
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS file to rados object mapping

2015-09-29 Thread Andras Pataki
Thanks, that makes a lot of sense.
One more question about checksumming objects in rados.  Our cluster uses
two copies per object, and I have some where the checkums mismatch between
the two copies (that deep scrub warns about).  Does ceph store an
authoritative checksum of what the block should look like?  I.e. Is there
a way to tell which version of the block is correct?  I seem to recall
some changelog entry that Hammer is adding checksum storage for blocks, or
am I wrong?

Andras


On 9/29/15, 9:58 AM, "Gregory Farnum"  wrote:

>The formula for objects in a file is .sequence>. So you'll have noticed they all look something like
>12345.0001, 12345.0002, 12345.0003, ...
>
>So if you've got a particular inode and file size, you can generate a
>list of all the possible objects in it. To find the object->OSD
>mapping you'd need to run crush, by making use of the crushtool or
>similar.
>-Greg
>
>On Tue, Sep 29, 2015 at 6:29 AM, Andras Pataki
> wrote:
>> Thanks, that worked.  Is there a mapping in the other direction easily
>> available, I.e. To find where all the 4MB pieces of a file are?
>>
>> On 9/28/15, 4:56 PM, "John Spray"  wrote:
>>
>>>On Mon, Sep 28, 2015 at 9:46 PM, Andras Pataki
>>> wrote:
 Hi,

 Is there a way to find out which radios objects a file in cephfs is
mapped
 to from the command line?  Or vice versa, which file a particular
radios
 object belongs to?
>>>
>>>The part of the object name before the period is the inode number (in
>>>hex).
>>>
>>>John
>>>
 Our ceph cluster has some inconsistencies/corruptions and I am trying
to
 find out which files are impacted in cephfs.

 Thanks,

 Andras



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Lionel Bouton
Hi,

Le 29/09/2015 13:32, Jiri Kanicky a écrit :
> Hi Lionel.
>
> Thank you for your reply. In this case I am considering to create
> separate partitions for each disk on the SSD drive. Would be good to
> know what is the performance difference, because creating partitions
> is kind of waste of space.

The difference is hard to guess : filesystems need more CPU power than
raw block devices for example, so if you don't have much CPU power this
can make a significant difference. Filesystems might put more load on
our storage too (for example ext3/4 with data=journal will at least
double the disk writes). So there's a lot to consider and nothing will
be faster for journals than a raw partition. LVM logical volumes come a
close second behind because usually (if you simply use LVM to create
your logical volumes and don't try to use anything else like snapshots)
they don't change access patterns and almost don't need any CPU power.

>
> One more question, is it a good idea to move journal for 3 OSDs to a
> single SSD considering if SSD fails the whole node with 3 HDDs will be
> down?

If your SSDs are working well with Ceph and aren't cheap models dying
under heavy writes, yes. I use one 200GB DC3710 SSD for 6 7200rpm SATA
OSDs (using 60GB of it for the 6 journals) and it works very well (they
were a huge performance boost compared to our previous use of internal
journals).
Some SSDs are slower than HDDs for Ceph journals though (there has been
a lot of discussions on this subject on this mailing list).

> Thinking of it, leaving journal on each OSD might be safer, because
> journal on one disk does not affect other disks (OSDs). Or do you
> think that having the journal on SSD is better trade off?

You will put significantly more stress on your HDD leaving journal on
them and good SSDs are far more robust than HDDs so if you pick Intel DC
or equivalent SSD for journal your infrastructure might even be more
robust than one using internal journals (HDDs are dropping like flies
when you have hundreds of them). There are other components able to take
down all your OSDs : the disk controller, the CPU, the memory, the power
supply, ... So adding one robust SSD shouldn't change the overall
availabilty much (you must check their wear level and choose the models
according to the amount of writes you want them to support over their
lifetime though).

The main reason for journals on SSD is performance anyway. If your setup
is already fast enough without them, I wouldn't try to add SSDs.
Otherwise, if you can't reach the level of performance needed by adding
the OSDs already needed for your storage capacity objectives, go SSD.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simultaneous CEPH OSD crashes

2015-09-29 Thread Lionel Bouton
Le 27/09/2015 10:25, Lionel Bouton a écrit :
> Le 27/09/2015 09:15, Lionel Bouton a écrit :
>> Hi,
>>
>> we just had a quasi simultaneous crash on two different OSD which
>> blocked our VMs (min_size = 2, size = 3) on Firefly 0.80.9.
>>
>> the first OSD to go down had this error :
>>
>> 2015-09-27 06:30:33.257133 7f7ac7fef700 -1 os/FileStore.cc: In function
>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>> size_t, ceph::bufferlist&, bool)' thread 7f7ac7fef700 time 2015-09-27
>> 06:30:33.145251
>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>> || got != -5)
>>
>> the second OSD crash was similar :
>>
>> 2015-09-27 06:30:57.373841 7f05d92cf700 -1 os/FileStore.cc: In function
>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>> size_t, ceph::bufferlist&, bool)' thread 7f05d92cf700 time 2015-09-27
>> 06:30:57.260978
>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>> || got != -5)
>>
>> I'm familiar with this error : it happened already with a BTRFS read
>> error (invalid csum) and I could correct it after flush-journal/deleting
>> the corrupted file/starting OSD/pg repair.
>> This time though there isn't any kernel log indicating an invalid csum.
>> The kernel is different though : we use 3.18.9 on these two servers and
>> the others had 4.0.5 so maybe BTRFS doesn't log invalid checksum errors
>> with this version. I've launched btrfs scrub on the 2 filesystems just
>> in case (still waiting for completion).
>>
>> The first attempt to restart these OSDs failed: one OSD died 19 seconds
>> after start, the other 21 seconds. Seeing that, I temporarily brought
>> down the min_size to 1 which allowed the 9 incomplete PG to recover. I
>> verified this by bringing min_size again to 2 and then restarted the 2
>> OSDs. They didn't crash yet.
>>
>> For reference the assert failures were still the same when the OSD died
>> shortly after start :
>> 2015-09-27 08:20:19.332835 7f4467bd0700 -1 os/FileStore.cc: In function
>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>> size_t, ceph::bufferlist&, bool)' thread 7f4467bd0700 time 2015-09-27
>> 08:20:19.325126
>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>> || got != -5)
>>
>> 2015-09-27 08:20:50.626344 7f97f2d95700 -1 os/FileStore.cc: In function
>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>> size_t, ceph::bufferlist&, bool)' thread 7f97f2d95700 time 2015-09-27
>> 08:20:50.605234
>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>> || got != -5)
>>
>> Note that at 2015-09-27 06:30:11 a deep-scrub started on a PG involving
>> one (and only one) of these 2 OSD. As we evenly space deep-scrubs (with
>> currently a 10 minute interval), this might be relevant (or just a
>> coincidence).
>>
>> I made copies of the ceph osd logs (including the stack trace and the
>> recent events) if needed.
>>
>> Can anyone put some light on why these OSDs died ?
> I just had a thought. Could launching a defragmentation on a file in a
> BTRFS OSD filestore trigger this problem?

That's not it : we had another crash a couple of hours ago on one of the
two servers involved in the first crashes and there was no concurrent
defragmentation going on.

2015-09-29 14:18:53.479881 7f8d78ff9700 -1 os/FileStore.cc: In function
'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
size_t, ceph::bufferlist&, bool)' thread 7f8d78ff9700 time 2015-09-29
14:18:53.425790
os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
|| got != -5)

 ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
 1: (FileStore::read(coll_t, ghobject_t const&, unsigned long, unsigned
long, ceph::buffer::list&, bool)+0x96a) [0x8917ea]
 2: (ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned
long, unsigned long, ceph::buffer::list*)+0x81) [0x90ecc1]
 3: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*,
std::vector&)+0x6a81) [0x801091]
 4: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x63)
[0x809f23]
 5: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xb6f) [0x80adbf]
 6: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x2ced) [0x815f4d]
 7: (ReplicatedPG::do_request(std::tr1::shared_ptr,
ThreadPool::TPHandle&)+0x70c) [0x7b047c]
 8: (OSD::dequeue_op(boost::intrusive_ptr,
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x34a) [0x60c74a]
 9: (OSD::OpWQ::_process(boost::intrusive_ptr,
ThreadPool::TPHandle&)+0x628) [0x628808]
 10: (ThreadPool::WorkQueueVal, boost::intrusive_ptr
>::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x66ea8c]
 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa60416]
 12: (ThreadPool::WorkThread::entry()+0x10) [0xa62430]
 13: (()+0x8217) [0x7f8dae984217]
 14: (clone()+0x6d) [0x7f8dad129f8d]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret 

Re: [ceph-users] Simultaneous CEPH OSD crashes

2015-09-29 Thread Samuel Just
It's an EIO.  The osd got an EIO from the underlying fs.  That's what
causes those asserts.  You probably want to redirect to the relevant
fs maling list.
-Sam

On Tue, Sep 29, 2015 at 7:42 AM, Lionel Bouton
 wrote:
> Le 27/09/2015 10:25, Lionel Bouton a écrit :
>> Le 27/09/2015 09:15, Lionel Bouton a écrit :
>>> Hi,
>>>
>>> we just had a quasi simultaneous crash on two different OSD which
>>> blocked our VMs (min_size = 2, size = 3) on Firefly 0.80.9.
>>>
>>> the first OSD to go down had this error :
>>>
>>> 2015-09-27 06:30:33.257133 7f7ac7fef700 -1 os/FileStore.cc: In function
>>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>>> size_t, ceph::bufferlist&, bool)' thread 7f7ac7fef700 time 2015-09-27
>>> 06:30:33.145251
>>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>>> || got != -5)
>>>
>>> the second OSD crash was similar :
>>>
>>> 2015-09-27 06:30:57.373841 7f05d92cf700 -1 os/FileStore.cc: In function
>>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>>> size_t, ceph::bufferlist&, bool)' thread 7f05d92cf700 time 2015-09-27
>>> 06:30:57.260978
>>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>>> || got != -5)
>>>
>>> I'm familiar with this error : it happened already with a BTRFS read
>>> error (invalid csum) and I could correct it after flush-journal/deleting
>>> the corrupted file/starting OSD/pg repair.
>>> This time though there isn't any kernel log indicating an invalid csum.
>>> The kernel is different though : we use 3.18.9 on these two servers and
>>> the others had 4.0.5 so maybe BTRFS doesn't log invalid checksum errors
>>> with this version. I've launched btrfs scrub on the 2 filesystems just
>>> in case (still waiting for completion).
>>>
>>> The first attempt to restart these OSDs failed: one OSD died 19 seconds
>>> after start, the other 21 seconds. Seeing that, I temporarily brought
>>> down the min_size to 1 which allowed the 9 incomplete PG to recover. I
>>> verified this by bringing min_size again to 2 and then restarted the 2
>>> OSDs. They didn't crash yet.
>>>
>>> For reference the assert failures were still the same when the OSD died
>>> shortly after start :
>>> 2015-09-27 08:20:19.332835 7f4467bd0700 -1 os/FileStore.cc: In function
>>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>>> size_t, ceph::bufferlist&, bool)' thread 7f4467bd0700 time 2015-09-27
>>> 08:20:19.325126
>>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>>> || got != -5)
>>>
>>> 2015-09-27 08:20:50.626344 7f97f2d95700 -1 os/FileStore.cc: In function
>>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>>> size_t, ceph::bufferlist&, bool)' thread 7f97f2d95700 time 2015-09-27
>>> 08:20:50.605234
>>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>>> || got != -5)
>>>
>>> Note that at 2015-09-27 06:30:11 a deep-scrub started on a PG involving
>>> one (and only one) of these 2 OSD. As we evenly space deep-scrubs (with
>>> currently a 10 minute interval), this might be relevant (or just a
>>> coincidence).
>>>
>>> I made copies of the ceph osd logs (including the stack trace and the
>>> recent events) if needed.
>>>
>>> Can anyone put some light on why these OSDs died ?
>> I just had a thought. Could launching a defragmentation on a file in a
>> BTRFS OSD filestore trigger this problem?
>
> That's not it : we had another crash a couple of hours ago on one of the
> two servers involved in the first crashes and there was no concurrent
> defragmentation going on.
>
> 2015-09-29 14:18:53.479881 7f8d78ff9700 -1 os/FileStore.cc: In function
> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
> size_t, ceph::bufferlist&, bool)' thread 7f8d78ff9700 time 2015-09-29
> 14:18:53.425790
> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
> || got != -5)
>
>  ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
>  1: (FileStore::read(coll_t, ghobject_t const&, unsigned long, unsigned
> long, ceph::buffer::list&, bool)+0x96a) [0x8917ea]
>  2: (ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned
> long, unsigned long, ceph::buffer::list*)+0x81) [0x90ecc1]
>  3: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*,
> std::vector&)+0x6a81) [0x801091]
>  4: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x63)
> [0x809f23]
>  5: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xb6f) [0x80adbf]
>  6: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x2ced) [0x815f4d]
>  7: (ReplicatedPG::do_request(std::tr1::shared_ptr,
> ThreadPool::TPHandle&)+0x70c) [0x7b047c]
>  8: (OSD::dequeue_op(boost::intrusive_ptr,
> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x34a) [0x60c74a]
>  9: (OSD::OpWQ::_process(boost::intrusive_ptr,
> ThreadPool::TPHandle&)+0x628) [0x628808]
>  10: (ThreadPool::WorkQueueVal 

Re: [ceph-users] high density machines

2015-09-29 Thread J David
On Thu, Sep 3, 2015 at 3:49 PM, Gurvinder Singh
 wrote:
>> The density would be higher than the 36 drive units but lower than the
>> 72 drive units (though with shorter rack depth afaik).
> You mean the 1U solution with 12 disk is longer in length than 72 disk
> 4U version ?

This is a bit old and I apologize for dredging it up, but I wanted to
weigh in that we have used a couple of the 6017R-73THDP+ 12 x 3.5" 1U
chassis and will not be using any more.  The depth is truly obscene;
the 36" is not a misprint.  If you have open racks they may be
acceptable but in a cabinet they are so long that they have to
mismount (sticking past the rack both front and rear) to close the
doors and in doing so, occlude so much space they raise concerns about
cabinet airflow.

They are also *very* cut down to get that many drives into the space.
They don't even have a physical serial port for console; they depend
entirely on IPMI for management.  (And we have had very mixed success
with SuperMicro IPMI virtual serial consoles.)  Also of course there
is no drive servicing done without shutting down the entire node,
making a simple drive swap vastly more labor intensive.  If (as in our
case) the cluster is overprovisioned enough to survive the long-term
loss of several drives per unit until it makes sense to take the whole
thing down and replace them all it may be OK.  In any situation where
you expect most/all platters to be spinning, they're a non-start.

All in all, the money / rack units saved is not even close to worth
the extra hassle involved, particularly when you start counting up how
many of those 12 drives you are treating as spares to space out
servicing it.

The 5018A-AR12L looks like a better layout that trims down to a
"svelte" 32" of depth, but appears to keep most of the rest of the
downsides of its 36" cousin.  That wired-in Atom processor also raises
some concerns about CPU overload during any major OSD rebalance.

Anyway, sorry for raising an old issue, but if I can save even one
person from going with these for Ceph, it was worth it.

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [puppet] Moving puppet-ceph to the Openstack big tent

2015-09-29 Thread Loic Dachary
Good move :-)

On 29/09/2015 23:45, Andrew Woodward wrote:
> [I'm cross posting this to the other Ceph threads to ensure that it's seen]
> 
> We've discussed this on Monday on IRC and again in the puppet-openstack IRC 
> meeting. The current census is that we will move from the deprecated 
> stackforge organization and will be moved to the openstack one. At this time 
> we will not be perusing membership as a formal OpenStack project. This will 
> allow puppet-ceph to retain the tight relationship with OpenStack community 
> and tools for the time being. 
> 
> On Mon, Sep 28, 2015 at 8:32 AM David Moreau Simard  > wrote:
> 
> Hi,
> 
> puppet-ceph currently lives in stackforge [1] which is being retired
> [2]. puppet-ceph is also mirrored on the Ceph Github organization [3].
> This version of the puppet-ceph module was created from scratch and
> not as a fork of the (then) upstream puppet-ceph by Enovance [4].
> Today, the version by Enovance is no longer officially maintained
> since Red Hat has adopted the new release.
> 
> Being an Openstack project under Stackforge or Openstack brings a lot
> of benefits but it's not black and white, there are cons too.
> 
> It provides us with the tools, the processes and the frameworks to
> review and test each contribution to ensure we ship a module that is
> stable and is held to the highest standards.
> But it also means that:
> - We forego some level of ownership back to the Openstack foundation,
> it's technical committee and the Puppet Openstack PTL.
> - puppet-ceph contributors will also be required to sign the
> Contributors License Agreement and jump through the Gerrit hoops [5]
> which can make contributing to the project harder.
> 
> We have put tremendous efforts into creating a quality module and as
> such it was the first puppet module in the stackforge organization to
> implement not only unit tests but also integration tests with third
> party CI.
> Integration testing for other puppet modules are just now starting to
> take shape by using the Openstack CI inrastructure.
> 
> In the context of Openstack, RDO already ships with a mean to install
> Ceph with this very module and Fuel will be adopting it soon as well.
> This means the module will benefit from real world experience and
> improvements by the Openstack community and packagers.
> This will help further reinforce that not only Ceph is the best
> unified storage solution for Openstack but that we have means to
> deploy it in the real world easily.
> 
> We all know that Ceph is also deployed outside of this context and
> this is why the core reviewers make sure that contributions remain
> generic and usable outside of this use case.
> 
> Today, the core members of the project discussed whether or not we
> should move puppet-ceph to the Openstack big tent and we had a
> consensus approving the move.
> We would also like to hear the thoughts of the community on this topic.
> 
> Please let us know what you think.
> 
> Thanks,
> 
> [1]: https://github.com/stackforge/puppet-ceph
> [2]: https://review.openstack.org/#/c/192016/
> [3]: https://github.com/ceph/puppet-ceph
> [4]: https://github.com/redhat-cip/puppet-ceph
> [5]: https://wiki.openstack.org/wiki/How_To_Contribute
> 
> David Moreau Simard
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org 
> 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> 
> --
> 
> Andrew Woodward
> 
> Mirantis
> 
> Fuel Community Ambassador
> 
> Ceph Community
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [puppet] Moving puppet-ceph to the Openstack big tent

2015-09-29 Thread Andrew Woodward
[I'm cross posting this to the other Ceph threads to ensure that it's seen]

We've discussed this on Monday on IRC and again in the puppet-openstack IRC
meeting. The current census is that we will move from the deprecated
stackforge organization and will be moved to the openstack one. At this
time we will not be perusing membership as a formal OpenStack project. This
will allow puppet-ceph to retain the tight relationship with OpenStack
community and tools for the time being.

On Mon, Sep 28, 2015 at 8:32 AM David Moreau Simard  wrote:

> Hi,
>
> puppet-ceph currently lives in stackforge [1] which is being retired
> [2]. puppet-ceph is also mirrored on the Ceph Github organization [3].
> This version of the puppet-ceph module was created from scratch and
> not as a fork of the (then) upstream puppet-ceph by Enovance [4].
> Today, the version by Enovance is no longer officially maintained
> since Red Hat has adopted the new release.
>
> Being an Openstack project under Stackforge or Openstack brings a lot
> of benefits but it's not black and white, there are cons too.
>
> It provides us with the tools, the processes and the frameworks to
> review and test each contribution to ensure we ship a module that is
> stable and is held to the highest standards.
> But it also means that:
> - We forego some level of ownership back to the Openstack foundation,
> it's technical committee and the Puppet Openstack PTL.
> - puppet-ceph contributors will also be required to sign the
> Contributors License Agreement and jump through the Gerrit hoops [5]
> which can make contributing to the project harder.
>
> We have put tremendous efforts into creating a quality module and as
> such it was the first puppet module in the stackforge organization to
> implement not only unit tests but also integration tests with third
> party CI.
> Integration testing for other puppet modules are just now starting to
> take shape by using the Openstack CI inrastructure.
>
> In the context of Openstack, RDO already ships with a mean to install
> Ceph with this very module and Fuel will be adopting it soon as well.
> This means the module will benefit from real world experience and
> improvements by the Openstack community and packagers.
> This will help further reinforce that not only Ceph is the best
> unified storage solution for Openstack but that we have means to
> deploy it in the real world easily.
>
> We all know that Ceph is also deployed outside of this context and
> this is why the core reviewers make sure that contributions remain
> generic and usable outside of this use case.
>
> Today, the core members of the project discussed whether or not we
> should move puppet-ceph to the Openstack big tent and we had a
> consensus approving the move.
> We would also like to hear the thoughts of the community on this topic.
>
> Please let us know what you think.
>
> Thanks,
>
> [1]: https://github.com/stackforge/puppet-ceph
> [2]: https://review.openstack.org/#/c/192016/
> [3]: https://github.com/ceph/puppet-ceph
> [4]: https://github.com/redhat-cip/puppet-ceph
> [5]: https://wiki.openstack.org/wiki/How_To_Contribute
>
> David Moreau Simard
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
-- 

--

Andrew Woodward

Mirantis

Fuel Community Ambassador

Ceph Community
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Bill Sharer
I think I got over 10% improvement when I changed from cooked journal 
file on btrfs based system SSD to a raw partition on the system SSD.  
The cluster I've been testing with is all consumer grade stuff running 
on top of AMD piledriver and kaveri based mobo's with the on-board 
SATA.  My SSDs are a hodgepodge of OCZ Vertex 4 and Samsung 840 and 850 
(non-pro).  I'm also seeing a performance win by merging individual osds 
into btrfs mirror sets after doing thatand dropping the replica count 
from 3 to 2.  I also consider this a better defense in depth strategy 
since btrfs self-heals when it hits bit rot on the mirrors and raid sets.


That boost was probably aio and dio kicking in because of the raw versus 
cooked.  Note that I'm running Hammer on gentoo and my current WIP is 
moving kernels from 3.8 to 4.0.5 everywhere.  It will be interesting to 
see what happens with that.


Regards
Bill

On 09/29/2015 07:32 AM, Jiri Kanicky wrote:

Hi Lionel.

Thank you for your reply. In this case I am considering to create 
separate partitions for each disk on the SSD drive. Would be good to 
know what is the performance difference, because creating partitions 
is kind of waste of space.


One more question, is it a good idea to move journal for 3 OSDs to a 
single SSD considering if SSD fails the whole node with 3 HDDs will be 
down? Thinking of it, leaving journal on each OSD might be safer, 
because journal on one disk does not affect other disks (OSDs). Or do 
you think that having the journal on SSD is better trade off?


Thank you
Jiri

On 29/09/2015 21:10, Lionel Bouton wrote:

Le 29/09/2015 07:29, Jiri Kanicky a écrit :

Hi,

Is it possible to create journal in directory as explained here:
http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster 


Yes, the general idea (stop, flush, move, update ceph.conf, mkjournal,
start) is valid for moving your journal wherever you want.
That said it probably won't perform as well on a filesystem (LVM as
lower overhead than a filesystem).


1. Create BTRFS over /dev/sda6 (assuming this is SSD partition alocate
for journal) and mount it to /srv/ceph/journal

BTRFS is probably the worst idea for hosting journals. If you must use
BTRFS, you'll have to make sure that the journals are created NoCoW
before the first byte is ever written to them.


2. Add OSD: ceph-deploy osd create --fs-type btrfs
ceph1:sdb:/srv/ceph/journal/osd$id/journal

I've no experience with ceph-deploy...

Best regards,

Lionel



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw and keystone version 3 domains

2015-09-29 Thread Shinobu Kinjo
Hello,

Thank!!
Anyhow have you ever tried to access to swift object using v3?

Shinobu

- Original Message -
From: "Robert Duncan" 
To: "Shinobu Kinjo" , ceph-users@lists.ceph.com
Sent: Tuesday, September 29, 2015 8:48:57 PM
Subject: Re: [ceph-users] radosgw and keystone version 3 domains

Hi Shinobu,

My keystone version is
2014.2.2

Thanks again.
Rob.

The information contained and transmitted in this e-mail is confidential 
information, and is intended only for the named recipient to which it is 
addressed. The content of this e-mail may not have been sent with the authority 
of National College of Ireland. Any views or opinions presented are solely 
those of the author and do not necessarily represent those of National College 
of Ireland. If the reader of this message is not the named recipient or a 
person responsible for delivering it to the named recipient, you are notified 
that the review, dissemination, distribution, transmission, printing or 
copying, forwarding, or any other use of this message or any part of it, 
including any attachments, is strictly prohibited. If you have received this 
communication in error, please delete the e-mail and destroy all record of this 
communication. Thank you for your assistance.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com