[ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Massimiliano Cuttini

I was building ceph in order to use with iSCSI.
But I just see from the docs that need:

   *CentOS 7.5*
   (which is not available yet, it's still at 7.4)
   https://wiki.centos.org/Download

   *Kernel 4.17*
   (which is not available yet, it is still at 4.15.7)
   https://www.kernel.org/

So I guess, there is no ufficial support and this is just a bad prank.

Ceph is ready to be used with S3 since many years.
But need the kernel of the next century to works with such an old 
technology like iSCSI.

So sad.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy won't install luminous (but Jewel instead)

2018-02-28 Thread Massimiliano Cuttini

This worked.

However somebody should investigate why default is still jewel on Centos 7.4


Il 28/02/2018 00:53, jorpilo ha scritto:

Try using:
ceph-deploy --release luminous host1...

 Mensaje original 
De: Massimiliano Cuttini <m...@phoenixweb.it>
Fecha: 28/2/18 12:42 a. m. (GMT+01:00)
Para: ceph-users@lists.ceph.com
Asunto: [ceph-users] ceph-deploy won't install luminous (but Jewel 
instead)


This is the 5th time that I install and after purge the installation.
Ceph Deploy is alway install JEWEL instead of Luminous.

No way even if I force the repo from default to luminous:

|https://download.ceph.com/rpm-luminous/el7/noarch|

It still install Jewel it's stuck.

I've already checked if I had installed yum-plugin-priorities, and I 
did it.

Everything is exaclty as the documentation request.
But still I get always Jewel and not Luminous.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Linux Distribution: Is upgrade the kerner version a good idea?

2018-02-26 Thread Massimiliano Cuttini



Not good.
I'm not worried about time and effort.
I'm worried to fix this while there is not time.
Ceph is builded to avoid downtime, not a good idea create it on an a

system with availability issues.

It is only with switching (when installing a node), subsequent kernel
updates should be installed without any issues


Yes but I alway get the felling that many "bad" software rely on the 
distribution in order to know in which modality works or feature to use.

So, having a distribution with unexpected kernel can be confusing.
Of course this is just a guess and only my paranoia.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Linux Distribution: Is upgrade the kerner version a good idea?

2018-02-25 Thread Massimiliano Cuttini

Il 25/02/2018 13:42, Marc Roos ha scritto:
  
Maybe this is more a question about logics. I would by default always go

for my default/preffered linux. Because that is what you know, can
troubleshoot best, knows your team, you have made a conscious choice in
the past for this etc.  And deviating from this should be backed by good
argumentation.

I think so.


Afaik the newer kernel is only necessary for ceph clients, so if you
don’t do something hyper converged there is no need to get it (verify
this). So first decide what you need, then see if your default setup is
able of handling this.

So Ceph server side don't need latest kernel.
It's just the client that need it... is it so?
This should be explained better in the docs.


As a side note, I am running CentOS7 testcluster, installing the elrepo
kernel gave me some booting problems because of the mpt2sas driver
changes, but nothing that cannot be fixed with a little time an effort.
There is also different kernel ofered here. But you have to search the
archives where it is located.


Not good.
I'm not worried about time and effort.
I'm worried to fix this while there is not time.
Ceph is builded to avoid downtime, not a good idea create it on an a 
system with availability issues.









-Original Message-
From: Massimiliano Cuttini [mailto:m...@phoenixweb.it]
Sent: zondag 25 februari 2018 13:18
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Linux Distribution: Is upgrade the kerner version
a good idea?

Hi everybody,

Just a simple question.
In order to deploy Ceph...


Do you'll use a default distribution that already support
recommended kernel version (> 4.4).
Let's say Ubuntu.

OR

Do you'll use your preferred linux distribution and just upgrade it
to a higher kernerl version.
Let's say CentOS.

... and a bonus question:


Is upgrade the kernel to major version on a distribution a bad idea?
Or is just safe as like as upgrade like any other package?
I prefer ultra stables release instead of latest higher package.
But maybe I'm in wrong thinking that latest major kernel not in the
default repository is likely say "dev distribution" and instead is just
a stable release as like every others.


Thanks for all your tips, opinion and clarification! :)








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Linux Distribution: Is upgrade the kerner version a good idea?

2018-02-25 Thread Massimiliano Cuttini

Hi everybody,

Just a simple question.
In order to deploy Ceph...

   /Do you'll use a default distribution that already support
   recommended kernel version (> 4.4).//
   //Let's say Ubuntu./

OR

   /Do you'll use your preferred linux distribution and just upgrade it
   to a higher kernerl version.//
   //Let's say CentOS./

... and a bonus question:

Is upgrade the kernel to major version on a distribution a bad idea?
Or is just safe as like as upgrade like any other package?
I prefer ultra stables release instead of latest higher package.
But maybe I'm in wrong thinking that latest major kernel not in the 
default repository is likely say "dev distribution" and instead is just 
a stable release as like every others.


Thanks for all your tips, opinion and clarification! :)


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini



Il 23/01/2018 16:49, c...@jack.fr.eu.org ha scritto:

On 01/23/2018 04:33 PM, Massimiliano Cuttini wrote:
With Ceph you have to install an orchestrator 3rd party in order to 
have a clear picture of what is going on.

Which can be ok, but not alway pheasable.

Just as with everything
As said wikipedia, for instance, "Proxmox VE supports local storage 
with LVM group, directory and ZFS, as well as network storage types 
with iSCSI, Fibre Channel, NFS, GlusterFS, CEPH and DRBD.[14]"


Maybe fibre channel shall provides a webinterface. Maybe iSCSI shall 
too. Maybe drbd & glusterfs will provides another one.


Well, you are mixing different technologies:

1) ISCSI and FibreChannel are*networks comunication protocols*.
They just allow hypervisor to communicate to a SAN/NAS, they itself 
doesn't provide any kind of storage.


2) ZFS, glusterFS, NFS are "network ready" filesystem not a software 
deined SAN/NAS.


3) Ceph, ScaleIO, FreeNAS, HP virtualstore... they all are *Software 
Defined *storage.
This means that they setup disks, filesystems and network connections in 
order to be ready to use from client.

They can be thinked as a "storage kind of orchestrator" by theirself.

So only the group 3 is comparable technology.
In this competition I think that Ceph is the only one can win in the 
long run.
It's open, it works, it's easy, it's free, it's improving faster than 
others.
However, right now, it is the only one that miss a decent management 
dashboard.
This is to me so incomprehensible. Ceph is by far a killer app of the 
market.

So why just don't kill its latest barriers and get a mass adoption?




Or maybe this is not their job.

As you said, "Xen is just an hypervisor", thus you are using 
bare-metal low level tool, just like sane folks would use qemu. And 
yes, low-level tools are .. low level.


XenServer is an hypervisor but it has a truly great management dashboard 
which is XenCenter.

I guess VMware has it's own and i guess also that it's good.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini

Il 23/01/2018 14:32, c...@jack.fr.eu.org ha scritto:


I think I was not clear

There are VMs management system, look at 
https://fr.wikipedia.org/wiki/Proxmox_VE, 
https://en.wikipedia.org/wiki/Ganeti, probably 
https://en.wikipedia.org/wiki/OpenStack too


Theses systems interacts with Ceph.
When you create a VM, a rbd volume is created
When you delete a VM, associated volumes are deleted
When you resize a disk, the volume is resized

There is no need for manual interaction at the Ceph level at any way

If I really understood the end of your email, you're stuck with a 
deficient VM management system, based on xenserver

Your issues are not Ceph's issues, but xen's;


Half and half.

Xen is just an hypervisor while OpenStack is an orchestrator.
An orchestrator manage by API your nodes (both hypervisors and storages 
if you want).


The fact is that Ceph doesn't have an its own web interface while many 
other storage services  have their own (freeNAS or proprietary service 
like lefthand/virtualstorage).
With Ceph you have to install an orchestrator 3rd party in order to have 
a clear picture of what is going on.

Which can be ok, but not alway pheasable.

Coming back to my case Xen it's just an hypervisor, not an orchestrator.
So this means that many taks must be accomplished manually.
A simple web interface that wrap few basic shell command can save hours 
(and can probably be built within few months starting from the actual 
deploy).
I really think Ceph is the future.. but it has to become a service ready 
to use in every kind of scenario (with or without orchestrator).

Right now to me seems not ready.

I'm taking a look at OpenAttic right now.
Probably this can be the missing piece.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini



You're more than welcome - we have a lot of work ahead of us...
Feel free to join our Freenode IRC channel #openattic to get in touch!


A curiosity!
as far as I understood this software was created to manage only Ceph. Is 
it right?

so... why such a "far away" name for a software dedicated to Ceph?
I read some months ago about openattic but I was thinking it was 
something completly different before you wrote me.

 :)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini


Il 23/01/2018 13:20, c...@jack.fr.eu.org ha scritto:
- USER taks: create new images, increase images size, sink images 
size, check daily status and change broken disks whenever is needed.

Who does that ?
For instance, Ceph can be used for VMs. Your VMs system create images, 
resizes images, whatever, not the Ceph's admin.


I would like to have a single big remote storage, but as a best practice 
you should not.

Hypervisor can create images, resize and so on... you right.
However sometimes hypervisor mess up your LVM partitions and this means 
corruption of all VDI in the same disk.


So... the best practice is to setup a remote storage for each VM (you 
can group few if really don't want to have 200connections).
This reduce the risk with VDI corruption (it'll accidentally corrupt one 
not all at once, you can easily restore a snapshoot).

Xenserver as hypervisor doesn't support ceph client and need to go by ISCSI.
You need to map RBD on ISCSI, so you need to create a RBD for each LUN.
So at the end... you need to:
-create rbd,
-map iscsi,
-map hypervisor to iscsi,
-drink a coffee,
-create hypervisor virtualization layer (cause every HV want to use it's 
own snapshoot),

-copy the template of the VM request by customer,
-drink a second coffee
and finally run the VM

This is just a nightmare... of course just one of the many that a 
sysadmin have.

if you have 1000 VMs you need a GUI in order to scroll and see the panorama.
I don't think that you read your email by command line.
You should neither take a look to your VMs by a command line.

Probably one day I'll quit with XenServer, and all it's constrains 
however right now, i can't and still seems to be the more stable and 
safer way to virtualize.







___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini



   https://www.openattic.org/features.html

Oh god THIS is the answer!
Lenz, if you need help I can join also development.


Lenz



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini

Hey Lenz,

OpenAttic seems to implement several good feature and to be more-or-less 
what I was asking.

I'll go through all the website. :)


THANKS!


Il 16/01/2018 09:04, Lenz Grimmer ha scritto:

Hi Massimiliano,

On 01/11/2018 12:15 PM, Massimiliano Cuttini wrote:


_*3) Management complexity*_
Ceph is amazing, but is just too big to have everything under control
(too many services).
Now there is a management console, but as far as I read this management
console just show basic data about performance.
So it doesn't manage at all... it's just a monitor...

In the end You have just to manage everything by your command-line.

[...]


The management complexity can be completly overcome with a great Web
Manager.
A Web Manager, in the end is just a wrapper for Shell Command from the
CephAdminNode to others.
If you think about it a wrapper is just tons of time easier to develop
than what has been already developed.
I do really see that CEPH is the future of storage. But there is some
quick-avoidable complexity that need to be reduced.

If there are already some plan for these issue I really would like to know.

FWIW, there is openATTIC, which provides additional functionality beyond
of what the current dashboard provides. It's a web application that
utilizes various existing APIs (e.g. librados, RGW Admin Ops API)

   https://www.openattic.org/features.html

Lenz



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini

Il 22/01/2018 21:55, Jack ha scritto:

On 01/22/2018 08:38 PM, Massimiliano Cuttini wrote:

The web interface is needed because:*cmd-lines are prune to typos.*

And you never misclick, indeed;
Do you really mean: 1) misclick once on an option list, 2) miscklick 
once on the form, 3) mistype the input and 4) misclick again on the 
confirmation dialog box?

No... i can brag to never misclick that much in a row! :)
Well if you misclick that much is better don't tell around you are a 
system engineer ;)


However I think that everybody can have opinion and different opinion.
But reject the evidence is just flaming.


Yeah, well, whatever, most system engineers know how to handle Ceph.
Most non-system engineers do not.
A task, a job, I don't master other's job, hence it feels natural that
others do not master mine.

Sorry if this sound so strange to you.

Oh this doesn't strange to me.

You simple don't see the big picture.
Ceph was born in order to semplify the redundancy.
But what is the reason to build architecture in high availability?
I guess to live in peace while hardware can broke: change a broken disk 
within some days instead of within some hours (or minute).
This is all made to let us free and to increase our comfort by reducing 
stressing issues.

Focus on big issues and tuning instead of ordinary issues.

My proposal is EXACTLY in the same direction and I'll explain to you. 
There are 2 kinds of taks:
- USER taks: create new images, increase images size, sink images size, 
check daily status and change broken disks whenever is needed.
- SYSTEM taks: install, update, repair, improve, increase pool size, 
tuning performance (this should be done by command line).


If you think your job is just beeing a slave of Customer care & Sales 
folks well ...be happy with this.
If you think your job is be the /broken disks replacer boy /of the 
office than... be that man.
But don't come to me saying you need to be a system engineers to make 
these slavery jobs.
I prefer to focus on mantaining and tuning instead of be the puppet of 
the customer care.


You should try to consider, instead of flaming around, that there are 
people think differently not because they are just not good enought to 
do your job but because they see thinks differently.
Create a separation between /User task /(and move them to a web 
interface proven for dumbs) and /Admins task/ is just good.
Of course all Admin tasks will always be by command line but Users 
should not.


I really want to know if you'll flaming back again, or if you finally 
would try to give me a real answer with a good reason to don't have a 
web interface in order to get rid of slavery jobs.

But I suppose to know the answer.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Future

2018-01-22 Thread Massimiliano Cuttini
d 
before another.


A web interface can just make the basics checks before submitting a new 
command to the pool.
Review input,  check if elements included in argument list exists, and 
after ask you again if you are sure to go on.

This is just a clever way to handle delicate data.

To say "/ceph is not for rookies, //it's better having a threshold"/ can 
be said only from a person that really don't love it's own data (keeping 
management as error free as possible), but instead just want to be the 
only one allowed to manage them.


Less complexity, less errors, faster deploy of new customers.
Sorry if this sound so strange to you.






-Original Message-
From: Alex Gorbachev [mailto:a...@iss-integration.com]
Sent: dinsdag 16 januari 2018 6:18
To: Massimiliano Cuttini
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph Future

Hi Massimiliano,


On Thu, Jan 11, 2018 at 6:15 AM, Massimiliano Cuttini
<m...@phoenixweb.it> wrote:

Hi everybody,

i'm always looking at CEPH for the future.
But I do see several issue that are leaved unresolved and block nearly
future adoption.
I would like to know if there are some answear already:

1) Separation between Client and Server distribution.
At this time you have always to update client & server in order to
match the same distribution of Ceph.
This is ok in the early releases but in future I do expect that the
ceph-client is ONE, not many for every major version.
The client should be able to self determinate what version of the
protocol and what feature are enabable and connect to at least 3 or 5
older major version of Ceph by itself.

2) Kernel is old -> feature mismatch
Ok, kernel is old, and so? Just do not use it and turn to NBD.
And please don't let me even know, just virtualize under the hood.

3) Management complexity
Ceph is amazing, but is just too big to have everything under control
(too many services).
Now there is a management console, but as far as I read this
management console just show basic data about performance.
So it doesn't manage at all... it's just a monitor...

In the end You have just to manage everything by your command-line.
In order to manage by web it's mandatory:

create, delete, enable, disable services If I need to run ISCSI
redundant gateway, do I really need to cut command from your
online docs?
Of course no. You just can script it better than what every admin can

do.

Just give few arguments on the html forms and that's all.

create, delete, enable, disable users
I have to create users and keys for 24 servers. Do you really think
it's possible to make it without some bad transcription or bad
cut of the keys across all servers.
Everybody end by just copy the admin keys across all servers, giving
very unsecure full permission to all clients.

create MAPS  (server, datacenter, rack, node, osd).
This is mandatory to design how the data need to be replicate.
It's not good create this by script or shell, it's needed a graph
editor which can dive you the perpective of what will be copied where.

check hardware below the hood
It's missing the checking of the health of the hardware below.
But Ceph was born as a storage software that ensure redundacy and
protect you from single failure.
So WHY did just ignore to check the healths of disks with SMART?
FreeNAS just do a better work on this giving lot of tools to
understand which disks is which and if it will fail in the nearly

future.

Of course also Ceph could really forecast issues by itself and need to
start to integrate with basic hardware IO.
For example, should be possible to enable disable UID on the disks in
order to know which one need to be replace.

As a technical note, we ran into this need with Storcium, and it is
pretty easy to utilize UID indicators using both Areca and LSI/Avago
HBAs.  You will need the standard control tools available from their web
sites, as well as hardware that supports SGPIO (most enterprise JBODs
and drives do).  There's likely similar options to other HBAs.

Areca:

UID on:

cli64 curctrl=1 set password=
cli64 curctrl= disk identify drv=

UID OFF:

cli64 curctrl=1 set password=
cli64 curctrl= disk identify drv=0

LSI/Avago:

UID on:

sas2ircu  locate : ON

UID OFF:

sas2ircu  locate : OFF

HTH,
Alex Gorbachev
Storcium


I guess this kind of feature are quite standard across all linux
distributions.

The management complexity can be completly overcome with a great Web
Manager.
A Web Manager, in the end is just a wrapper for Shell Command from the
CephAdminNode to others.
If you think about it a wrapper is just tons of time easier to develop
than what has been already developed.
I do really see that CEPH is the future of storage. But there is some
quick-avoidable complexity that need to be reduced.

If there are already some plan for these issue I really would like to

know.

Thanks,
Max




___
ceph-users mailing list
ceph-us

[ceph-users] Ceph Future

2018-01-11 Thread Massimiliano Cuttini

Hi everybody,

i'm always looking at CEPH for the future.
But I do see several issue that are leaved unresolved and block nearly 
future adoption.

I would like to know if there are some answear already:

_*1) Separation between Client and Server distribution.*_
At this time you have always to update client & server in order to match 
the same distribution of Ceph.
This is ok in the early releases but in future I do expect that the 
ceph-client is ONE, not many for every major version.
The client should be able to self determinate what version of the 
protocol and what feature are enabable and connect to at least 3 or 5 
older major version of Ceph by itself.


_*2) Kernel is old -> feature mismatch*_
Ok, kernel is old, and so? Just do not use it and turn to NBD.
And please don't let me even know, just virtualize under the hood.

_*3) Management complexity*_
Ceph is amazing, but is just too big to have everything under control 
(too many services).
Now there is a management console, but as far as I read this management 
console just show basic data about performance.

So it doesn't manage at all... it's just a monitor...

In the end You have just to manage everything by your command-line.
In order to manage by web it's mandatory:

 * _create, delete, enable, disable services_
   If I need to run ISCSI redundant gateway, do I really need to
   cut command from your online docs?
   Of course no. You just can script it better than what every admin
   can do.
   Just give few arguments on the html forms and that's all.

 * _create, delete, enable, disable users_
   I have to create users and keys for 24 servers. Do you really think
   it's possible to make it without some bad transcription or bad
   cut of the keys across all servers.
   Everybody end by just copy the admin keys across all servers, giving
   very unsecure full permission to all clients.

 * _create MAPS  (server, datacenter, rack, node, osd)._
   This is mandatory to design how the data need to be replicate.
   It's not good create this by script or shell, it's needed a graph
   editor which can dive you the perpective of what will be copied where.

 * _check hardware below the hood_
   It's missing the checking of the health of the hardware below.
   But Ceph was born as a storage software that ensure redundacy and
   protect you from single failure.
   So WHY did just ignore to check the healths of disks with SMART?
   FreeNAS just do a better work on this giving lot of tools to
   understand which disks is which and if it will fail in the nearly
   future.
   Of course also Ceph could really forecast issues by itself and need
   to start to integrate with basic hardware IO.
   For example, should be possible to enable disable UID on the disks
   in order to know which one need to be replace.
   I guess this kind of feature are quite standard across all linux
   distributions.

The management complexity can be completly overcome with a great Web 
Manager.
A Web Manager, in the end is just a wrapper for Shell Command from the 
CephAdminNode to others.
If you think about it a wrapper is just tons of time easier to develop 
than what has been already developed.
I do really see that CEPH is the future of storage. But there is some 
quick-avoidable complexity that need to be reduced.


If there are already some plan for these issue I really would like to know.

Thanks,
Max



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] missing feature 400000000000000 ?

2017-07-17 Thread Massimiliano Cuttini

Hi Riccardo,

using ceph-fuse will add extra layer.
Consider to use instead ceph-nbd which is a porting to use network 
device blocks.
This should be faster and allow you to use latest tunables (which it's 
better).




Il 17/07/2017 10:56, Riccardo Murri ha scritto:

Thanks a lot to all!  Both the suggestion to use "ceph osd tunables
hammer" and to use "ceph-fuse" instead solved the issue.

Riccardo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mon on VM - centOS or Ubuntu?

2017-07-11 Thread Massimiliano Cuttini

Dear all,

i have to create several VM in order to use them as a MON on my cluster.
All my Ceph Clients are centOS.
But i'm thinking about creating all the monitor using Ubuntu, because it 
seems lighter.


Is this a matter of taste?
Or are there something I should know before go with a mixed OS cluster 
(also OSD use centOS).


Thanks for any usefull tips

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Monitor as local VM on top of the server pool cluster?

2017-07-10 Thread Massimiliano Cuttini

Hi everybody,

i would like to separate MON from OSD as reccomended.
In order to do so without new hardware I'm planning to create all the 
monitor as a Virtual Machine on top of my hypervisors (Xen).

I'm testing a pool of 8 nodes of Xen.

I'm thinking about create 8 monitor and pin one monitor for one Xen node.
So, i'm guessing, every Ceph monitor'll be local for each node client.
This should speed up the system by local connecting monitors with a 
little overflown for the monitors sync between nodes.


Is it a good idea have a local monitor virtualized on top of each 
hypervisor node?

Did you see any understimation or wrong design in this?

Thanks for every helpfull info.


Regards,
Max

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-06 Thread Massimiliano Cuttini

WOW!

Thanks to everybody!
A tons of suggestion and good tips!

At the moment we are already using 100Gb/s cards and we are already 
adopted 100Gb/s switch so we can go with 40Gb/s that are fully 
compatible with our SWITCH.
About CPU I was wrong, the model that we are seeing is not 2603 but 2630 
which is quite different.

Bad mistake!

This processor have 10 cores and 2.20GHz.
I think it's the best price/quality by intel.

About that it seems that most of your reccomendation goes in the 
direction to have less core but much faster speed.

Is this right? So having 10 cores is not as good as having a faster one?



Il 05/07/2017 12:51, Wido den Hollander ha scritto:

Op 5 juli 2017 om 12:39 schreef c...@jack.fr.eu.org:


Beware, a single 10G NIC is easily saturated by a single NVMe device


Yes, it is. But that what was what I'm pointing at. Bandwidth is usually not a 
problem, latency is.

Take a look at a Ceph cluster running out there, it is probably doing a lot of 
IOps, but not that much bandwidth.

A production cluster I took a look at:

"client io 405 MB/s rd, 116 MB/s wr, 12211 op/s rd, 13272 op/s wr"

This cluster is 15 machines with 10 OSDs (SSD, PM863a) each.

So 405/15 = 27MB/sec

It's doing 13k IOps now, that increases to 25k during higher load, but the 
bandwidth stays below 500MB/sec in TOTAL.

So yes, you are right, a NVMe device can sature a single NIC, but most of the 
time latency and IOps are what count. Not bandwidth.

Wido


On 05/07/2017 11:54, Wido den Hollander wrote:

Op 5 juli 2017 om 11:41 schreef "Van Leeuwen, Robert" <rovanleeu...@ebay.com>:


Hi Max,

You might also want to look at the PCIE lanes.
I am not an expert on the matter but my guess would be the 8 NVME drives + 
2x100Gbit would be too much for
the current Xeon generation (40 PCIE lanes) to fully utilize.


Fair enough, but you might want to think about if you really, really need 
100Gbit. Those cards are expensive, same goes for the Gbics and switches.

Storage is usually latency bound and not so much bandwidth. Imho a lot of 
people focus on raw TBs and bandwidth, but in the end IOps and latency are what 
usually matters.

I'd probably stick with 2x10Gbit for now and use the money I saved on more 
memory and faster CPUs.

Wido


I think the upcoming AMD/Intel offerings will improve that quite a bit so you 
may want to wait for that.
As mentioned earlier. Single Core cpu speed matters for latency so you probably 
want to up that.

You can also look at the DIMM configuration.
TBH I am not sure how much it impacts Ceph performance but having just 2 DIMMS 
slots populated will not give you max memory bandwidth.
Having some extra memory for read-cache probably won’t hurt either (unless you 
know your workload won’t include any cacheable reads)

Cheers,
Robert van Leeuwen

From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Massimiliano 
Cuttini <m...@phoenixweb.it>
Organization: PhoenixWeb Srl
Date: Wednesday, July 5, 2017 at 10:54 AM
To: "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
Subject: [ceph-users] New cluster - configuration tips and reccomendation - NVMe


Dear all,

luminous is coming and sooner we should be allowed to avoid double writing.
This means use 100% of the speed of SSD and NVMe.
Cluster made all of SSD and NVMe will not be penalized and start to make sense.

Looking forward I'm building the next pool of storage which we'll setup on next 
term.
We are taking in consideration a pool of 4 with the following single node 
configuration:

   *   2x E5-2603 v4 - 6 cores - 1.70GHz
   *   2x 32Gb of RAM
   *   2x NVMe M2 for OS
   *   6x NVMe U2 for OSD
   *   2x 100Gib ethernet cards

We have yet not sure about which Intel and how much RAM we should put on it to 
avoid CPU bottleneck.
Can you help me to choose the right couple of CPU?
Did you see any issue on the configuration proposed?

Thanks,
Max
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Massimiliano Cuttini

Dear all,

luminous is coming and sooner we should be allowed to avoid double writing.
This means use 100% of the speed of SSD and NVMe.
Cluster made all of SSD and NVMe will not be penalized and start to make 
sense.


Looking forward I'm building the next pool of storage which we'll setup 
on next term.
We are taking in consideration a pool of 4 with the following single 
node configuration:


 * 2x E5-2603 v4 - 6 cores - 1.70GHz
 * 2x 32Gb of RAM
 * 2x NVMe M2 for OS
 * 6x NVMe U2 for OSD
 * 2x 100Gib ethernet cards

We have yet not sure about which Intel and how much RAM we should put on 
it to avoid CPU bottleneck.

Can you help me to choose the right couple of CPU?
Did you see any issue on the configuration proposed?


Thanks,
Max

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cannot open /dev/xvdb: Input/output error

2017-06-26 Thread Massimiliano Cuttini



On Sun, Jun 25, 2017 at 11:28:37PM +0200, Massimiliano Cuttini wrote:

Il 25/06/2017 21:52, Mykola Golub ha scritto:

On Sun, Jun 25, 2017 at 06:58:37PM +0200, Massimiliano Cuttini wrote:

I can see the error even if I easily run list-mapped:

# rbd-nbd list-mapped
/dev/nbd0
2017-06-25 18:49:11.761962 7fcdd9796e00 -1 asok(0x7fcde3f72810) 
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) 
File exists/dev/nbd1

"AdminSocket::bind_and_listen: failed to bind" errors are harmless,
you can safely ignore them (or configure admin_socket in ceph.conf to
avoid names collisions).

I read around that this can lead to a lock in the opening.
http://tracker.ceph.com/issues/7690
If the daemon exists than you have to wait that it ends its operation before
you can connect.

In your case (rbd-nbd) this error is harmless. You can avoid them
setting in ceph.conf, [client] section something like below:

  admin socket = /var/run/ceph/$name.$pid.asok

Also to make every rbd-nbd process to log to a separate file you can
set (in [client] section):

  log file = /var/log/ceph/$name.$pid.log

I need to create all the user in ceph cluster before use this.
At the moment all the cluster was runnig with ceph admin keyring.
However, this is not an issue, I  can rapidly deploy all user needed.


root 12610  0.0  0.2 1836768 11412 ?   Sl   Jun23   0:43 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-602b05be-395d-442e-bd68-7742deaf97bd
 --name client.admin
root 17298  0.0  0.2 1644244 8420 ?Sl   21:15   0:01 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-3e16395d-7dad-4680-a7ad-7f398da7fd9e
 --name client.admin
root 18116  0.0  0.2 1570512 8428 ?Sl   21:15   0:01 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-41a76fe7-c9ff-4082-adb4-43f3120a9106
 --name client.admin
root 19063  0.1  1.3 2368252 54944 ?   Sl   21:15   0:10 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-6da2154e-06fd-4063-8af5-ae86ae61df50
 --name client.admin
root 21007  0.0  0.2 1570512 8644 ?Sl   21:15   0:01 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-c8aca7bd-1e37-4af4-b642-f267602e210f
 --name client.admin
root 21226  0.0  0.2 1703640 8744 ?Sl   21:15   0:01 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-cf2139ac-b1c4-404d-87da-db8f992a3e72
 --name client.admin
root 21615  0.5  1.4 2368252 60256 ?   Sl   21:15   0:33 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-acb2a9b0-e98d-474e-aa42-ed4e5534ddbe
 --name client.admin
root 21653  0.0  0.2 1703640 11100 ?   Sl   04:12   0:14 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-8631ab86-c85c-407b-9e15-bd86e830ba74
 --name client.admin

Do you observe the issue for all these volumes? I see many of them
were started recently (21:15) while other are older.

Only some of them.
But it's randomly.
Some of old and some just plugged becomes unavailable to xen.

Don't you observe sporadic crashes/restarts of rbd-nbd processes? You
can associate a nbd device with rbd-nbd process (and rbd volume)
looking at /sys/block/nbd*/pid and ps output.

I really don't know where to look for the rbd-nbd log.
Can you point it out?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ideas on the UI/UX improvement of ceph-mgr: Cluster Status Dashboard

2017-06-26 Thread Massimiliano Cuttini

Hi Saumay,

i think you should take in account to track SMART on every SSD founded.
If it has SMART capabilities, then track its test (or commit tests) and 
display its values on the dashboard (or separate graph).

This allow ADMINS to forecast the next OSD will die.

Preventing is better than Restoring! :)



Il 26/06/2017 06:49, saumay agrawal ha scritto:

Hi everyone!

I am working on the improvement of the web-based dashboard for Ceph.
My intention is to add some UI elements to visualise some performance
counters of a Ceph cluster. This gives a better overview to the users
of the dashboard about how the Ceph cluster is performing and, if
necessary, where they can make necessary optimisations to get even
better performance from the cluster.

Here is my suggestion on the two perf counters, commit latency and
apply latency. They are visualised using line graphs. I have prepared
UI mockups for the same.
1. OSD apply latency
[https://drive.google.com/open?id=0ByXy5gIBzlhYNS1MbTJJRDhtSG8]
2. OSD commit latency
[https://drive.google.com/open?id=0ByXy5gIBzlhYNElyVU00TGtHeVU]

These mockups show the latency values (y-axis) against the instant of
time (x-axis). The latency values for different OSDs are highlighted
using different colours. The average latency value of all OSDs is
shown specifically in red. This representation allows the dashboard
user to compare the performances of an OSD with other OSDs, as well as
with the average performance of the cluster.

The line width in these graphs is specially kept less, so as to give a
crisp and clear representation for more number of OSDs. However, this
approach may clutter the graph and make it incomprehensible for a
cluster having significantly higher number of OSDs. For such
situations, we can retain only the average latency indications from
both the graphs to make things more simple for the dashboard user.

Also, higher latency values suggest bad performance. We can come up
with some specific values for both the counters, above which we can
say that the cluster is performing very bad. If the value of any of
the OSDs exceeds this value, we can highlight entire graph in a light
red shade to draw the attention of user towards it.

I am planning to use AJAX based templates and plugins (like
Flotcharts) for these graphs. This would allow real-time update of the
graphs without having any need to reload the entire dashboard page.

Another feature I propose to add is the representation of the version
distribution of all the clients in a cluster. This can be categorised
into distribution
1. on the basis of ceph version
[https://drive.google.com/open?id=0ByXy5gIBzlhYYmw5cXF2bkdTWWM] and,
2. on the basis of kernel version
[https://drive.google.com/open?id=0ByXy5gIBzlhYczFuRTBTRDcwcnc]

I have used doughnut charts instead of regular pie charts, as they
have some whitespace at their centre. This whitespace makes the chart
appear less cluttered, while properly indicating the appropriate
fraction of the total value. Also, we can later add some data to
display at this centre space when we hover over a particular slice of
the chart.

The main purpose of this visualisation is to identify any number of
clients left behind while updating the clients of the cluster. Suppose
a cluster has 50 clients running ceph jewel. In the process of
updating this cluster, 40 clients get updated to ceph luminous, while
the other 10 clients remain behind on ceph jewel. This may occur due
to some bug or any interruption in the update process. In such
scenarios, the user can find which clients have not been updated and
update them according to his needs.  It may also give a clear picture
for troubleshooting, during any package dependency issues due to the
kernel. The clients are represented in both, absolutes numbers as well
as the percentage of the entire cluster, for a better overview.

An interesting approach could be highlighting the older version(s)
specifically to grab the attention of the user. For example, a user
running ceph jewel may not need to update as necessarily compared to
the user running ceph hammer.

As of now, I am looking for plugins in AdminLTE to implement these two
elements in the dashboard. I would like to have feedbacks and
suggestions on these two from the ceph community, on how can I make
them more informative about the cluster.

Also a request to the various ceph users and developers. It would be
great if you could share the various metrics you are using as a
performance indicator for your cluster, and how you are using them.
Any metrics being used to identify the issues in a cluster can also be
shared.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cannot open /dev/xvdb: Input/output error

2017-06-25 Thread Massimiliano Cuttini


Il 25/06/2017 21:52, Mykola Golub ha scritto:

On Sun, Jun 25, 2017 at 06:58:37PM +0200, Massimiliano Cuttini wrote:

I can see the error even if I easily run list-mapped:

# rbd-nbd list-mapped
/dev/nbd0
2017-06-25 18:49:11.761962 7fcdd9796e00 -1 asok(0x7fcde3f72810) 
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) 
File exists/dev/nbd1

"AdminSocket::bind_and_listen: failed to bind" errors are harmless,
you can safely ignore them (or configure admin_socket in ceph.conf to
avoid names collisions).
I read around that this can lead to a lock in the opening. 
http://tracker.ceph.com/issues/7690
If the daemon exists than you have to wait that it ends its operation 
before you can connect.



Don't you see other errors?

I received errors from XAPI:

   `There was an SR backend failure.

   status: non-zero exit

   stdout:

   stderr: Traceback (most recent call last):

   File "/opt/xensource/sm/RBDSR", line 774, in

   SRCommand.run(RBDSR, DRIVER_INFO)

   File "/opt/xensource/sm/SRCommand.py", line 352, in run

   ret = cmd.run(sr)

   File "/opt/xensource/sm/SRCommand.py", line 110, in run

   return self._run_locked(sr)

   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked

   rv = self._run(sr, target)

   File "/opt/xensource/sm/SRCommand.py", line 338, in _run

   return sr.scan(self.params['sr_uuid'])

   File "/opt/xensource/sm/RBDSR", line 244, in scan

   scanrecord.synchronise_new()

   File "/opt/xensource/sm/SR.py", line 581, in synchronise_new

   vdi._db_introduce()

   File "/opt/xensource/sm/VDI.py", line 312, in _db_introduce

   vdi = self.sr.session.xenapi.VDI.db_introduce(uuid, self.label, 
self.description, self.sr.sr_ref, ty, self.shareable, self.read_only, {}, 
self.location, {}, sm_config, self.managed, str(self.size), 
str(self.utilisation), metadata_of_pool, is_a_snapshot, 
xmlrpclib.DateTime(snapshot_time), snapshot_of)

   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 248, in call

   return self.__send(self.__name, args)

   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 150, in 
xenapi_request

   result = _parse_result(getattr(self, methodname)(*full_params))

   File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in call

   return self.__send(self.__name, args)

   File "/usr/lib64/python2.7/xmlrpclib.py", line 1581, in __request

   allow_none=self.__allow_none)

   File "/usr/lib64/python2.7/xmlrpclib.py", line 1086, in dumps

   data = m.dumps(params)

   File "/usr/lib64/python2.7/xmlrpclib.py", line 633, in dumps

   dump(v, write)

   File "/usr/lib64/python2.7/xmlrpclib.py", line 655, in __dump

   f(self, value, write)

   File "/usr/lib64/python2.7/xmlrpclib.py", line 757, in dump_instance

   self.dump_struct(value.dict, write)

   File "/usr/lib64/python2.7/xmlrpclib.py", line 736, in dump_struct

   dump(v, write)

   File "/usr/lib64/python2.7/xmlrpclib.py", line 655, in __dump

   f(self, value, write)

   File "/usr/lib64/python2.7/xmlrpclib.py", line 757, in dump_instance

   self.dump_struct(value.dict, write)

   File "/usr/lib64/python2.7/xmlrpclib.py", line 736, in dump_struct

   dump(v, write)

   File "/usr/lib64/python2.7/xmlrpclib.py", line 655, in __dump

   f(self, value, write)

   *File "/usr/lib64/python2.7/xmlrpclib.py", line 666, in dump_int*

   *raise OverflowError, "int exceeds XML-RPC limits"*

   *OverflowError: int exceeds XML-RPC limits*

PS: nice line to get an oveflown error.
Looking around it seems that Python have a well-know issue receiving > 
32bit response.

But I cannot understand why this should ever happens.


What is output for `ps auxww |grep rbd-nbd`?

This is the result:

root 12610  0.0  0.2 1836768 11412 ?   Sl   Jun23   0:43 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-602b05be-395d-442e-bd68-7742deaf97bd
 --name client.admin

root 17298  0.0  0.2 1644244 8420 ?Sl   21:15   0:01 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-3e16395d-7dad-4680-a7ad-7f398da7fd9e
 --name client.admin

root 18116  0.0  0.2 1570512 8428 ?Sl   21:15   0:01 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-41a76fe7-c9ff-4082-adb4-43f3120a9106
 --name client.admin

root 19063  0.1  1.3 2368252 54944 ?   Sl   21:15   0:10 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f81054cc/VHD-6da2154e-06fd-4063-8af5-ae86ae61df50
 --name client.admin

root 21007  0.0  0.2 1570512 8644 ?Sl   21:15   0:01 rbd-nbd 
--nbds_max 64 map 
RBD_XenStorage-51a45fd8-a4d1-4202-899c-00a0f

Re: [ceph-users] cannot open /dev/xvdb: Input/output error

2017-06-25 Thread Massimiliano Cuttini

I can see the error even if I easily run list-mapped:

   # rbd-nbd list-mapped
   /dev/nbd0
   2017-06-25 18:49:11.761962 7fcdd9796e00 -1 asok(0x7fcde3f72810) 
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) 
File exists/dev/nbd1

   /dev/nbd2
   /dev/nbd3
   /dev/nbd4
   /dev/nbd5
   /dev/nbd6
   /dev/nbd7

This issue downgrade my expectation about Ceph readyness for production.
After 4 months of work and test, this just is a thorn in my side.

I see that the issue was already raised by
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011938.html

Joecyw, how do you solve this issue?


Il 25/06/2017 16:03, Massimiliano Cuttini ha scritto:
UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) 
File exists 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cannot open /dev/xvdb: Input/output error

2017-06-25 Thread Massimiliano Cuttini

Hi Jason,

I'm using nbd-rbd.

Under /var/log/ceph/client.log
I see this error:

2017-06-25 05:25:32.833202 7f658ff04e00  0 ceph version 10.2.7 
(50e863e0f4bc8f4b9e31156de690d765af245185), process rbd-nbd, pid 8524
2017-06-25 05:25:32.853811 7f658ff04e00 -1 asok(0x7f6599531900) 
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed 
to bind the UNIX domain socket to 
'/var/run/ceph/ceph-client.admin.asok': (17) File exists





Il 25/06/2017 15:00, Jason Dillaman ha scritto:

Are you using librbd via QEMU or krbd? If librbd, what errors are
noted in the instance's librbd log file?

On Sun, Jun 25, 2017 at 4:30 AM, Massimiliano Cuttini <m...@phoenixweb.it> 
wrote:

After 4 months of test we decided to go live and store real VDI in
production.
However just the same day something went suddenly wrong.

The last copy of the VDI in Ceph was corrupted.
Trying to fix the filesystem works for open it, but mysqld never went online
even after reinstalling but only forcing fixing of innoDB.
To many database corrupted. Finally the last reboot VDI became inaccessible
and do not start.

At the sametime other 3 VDI stuck and becomes inaccessible, try to access to
them get this error:
cannot open /dev/xvdb: Input/output error

If I run from the Dom(0) the ceph -s it seems allright:

 cluster 33693979-3177-44f4-bcb8-1b3bbc658352
  health HEALTH_OK
  monmap e2: 2 mons at
{ceph-node1=xx:6789/0,ceph-node2=yy:6789/0}
 election epoch 218, quorum 0,1 ceph-node1,ceph-node2
  osdmap e2560: 8 osds: 8 up, 8 in
 flags sortbitwise,require_jewel_osds
   pgmap v8311677: 484 pgs, 3 pools, 940 GB data, 444 kobjects
 1883 GB used, 5525 GB / 7408 GB avail
  484 active+clean
   client io 482 kB/s wr, 0 op/s rd, 5 op/s wr

Everything seems ok.
I can see pool and rbd:

NAME
SIZE PARENT FMT PROT LOCK
VHD-1649fde4-6637-43d8-b815-656e4080887d
102400M  2
VHD-1bcd9d72-d8fe-4fc2-ad6a-ce84b18e4340
102400M  2
VHD-239ea931-5ddd-4aaf-bc89-b192641f6dcf
200G  2
VHD-3e16395d-7dad-4680-a7ad-7f398da7fd9e
200G  2
VHD-41a76fe7-c9ff-4082-adb4-43f3120a9106
102400M  2
VHD-41a76fe7-c9ff-4082-adb4-43f3120a9106@SNAP-346d01f4-cbd4-4e8a-af8f-473c2de3c60d
102400M  2 yes
VHD-48fdb12d-110b-419c-9330-0f05827fe41e
102400M  2
VHD-602b05be-395d-442e-bd68-7742deaf97bd
200G  2
VHD-691e4fb4-5f7b-4bc1-af7b-98780d799067
102400M  2
VHD-6da2154e-06fd-4063-8af5-ae86ae61df50
10240M  2  excl
VHD-8631ab86-c85c-407b-9e15-bd86e830ba74
200G  2
VHD-97cbc3d2-519c-4d11-b795-7a71179e193c
102400M  2
VHD-97cbc3d2-519c-4d11-b795-7a71179e193c@SNAP-ad9ab028-57d9-48e4-9d13-a9a02b06e68a
102400M  2 yes
VHD-acb2a9b0-e98d-474e-aa42-ed4e5534ddbe
102400M  2  excl
VHD-adaa639e-a7a4-4723-9c18-0c2b3ab9d99f
102400M  2
VHD-bb0f40c7-206b-4023-b56f-4a70d02d5a58
200G  2
VHD-bb0f40c7-206b-4023-b56f-4a70d02d5a58@SNAP-f8b1cc40-4a9b-4550-8bfa-7fad467e726b
200G  2 yes
VHD-c8aca7bd-1e37-4af4-b642-f267602e210f
102400M  2
VHD-c8aca7bd-1e37-4af4-b642-f267602e210f@SNAP-0e830e68-c7b4-49a9-8210-4c0b4ecca762
102400M  2 yes
VHD-cf2139ac-b1c4-404d-87da-db8f992a3e72
102400M  2
VHD-cf2139ac-b1c4-404d-87da-db8f992a3e72@SNAP-4fb59ee5-3256-4e75-9d48-facdd818d754
102400M  2 yes

however they are unbootable and unreadable even as secondary driver.
Hypervisor refure to load them in the VMs.

What Can I do?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cannot open /dev/xvdb: Input/output error

2017-06-25 Thread Massimiliano Cuttini
After 4 months of test we decided to go live and store real VDI in 
production.

However just the same day something went suddenly wrong.

The last copy of the VDI in Ceph was corrupted.
Trying to fix the filesystem works for open it, but mysqld never went 
online even after reinstalling but only forcing fixing of innoDB.
To many database corrupted. Finally the last reboot VDI became 
inaccessible and do not start.


At the sametime other 3 VDI stuck and becomes inaccessible, try to 
access to them get this error:

cannot open /dev/xvdb: Input/output error

If I run from the Dom(0) the ceph -s it seems allright:

cluster 33693979-3177-44f4-bcb8-1b3bbc658352
 health HEALTH_OK
 monmap e2: 2 mons at 
{ceph-node1=xx:6789/0,ceph-node2=yy:6789/0}
election epoch 218, quorum 0,1 ceph-node1,ceph-node2
 osdmap e2560: 8 osds: 8 up, 8 in
flags sortbitwise,require_jewel_osds
  pgmap v8311677: 484 pgs, 3 pools, 940 GB data, 444 kobjects
1883 GB used, 5525 GB / 7408 GB avail
 484 active+clean
  client io 482 kB/s wr, 0 op/s rd, 5 op/s wr

Everything seems ok.
I can see pool and rbd:

   NAME 
 SIZE PARENT FMT PROT LOCK
   VHD-1649fde4-6637-43d8-b815-656e4080887d 
  102400M  2
   VHD-1bcd9d72-d8fe-4fc2-ad6a-ce84b18e4340 
  102400M  2
   VHD-239ea931-5ddd-4aaf-bc89-b192641f6dcf 
 200G  2
   VHD-3e16395d-7dad-4680-a7ad-7f398da7fd9e 
 200G  2
   VHD-41a76fe7-c9ff-4082-adb4-43f3120a9106 
  102400M  2
   
VHD-41a76fe7-c9ff-4082-adb4-43f3120a9106@SNAP-346d01f4-cbd4-4e8a-af8f-473c2de3c60d
 102400M  2 yes
   VHD-48fdb12d-110b-419c-9330-0f05827fe41e 
  102400M  2
   VHD-602b05be-395d-442e-bd68-7742deaf97bd 
 200G  2
   VHD-691e4fb4-5f7b-4bc1-af7b-98780d799067 
  102400M  2
   VHD-6da2154e-06fd-4063-8af5-ae86ae61df50 
   10240M  2  excl
   VHD-8631ab86-c85c-407b-9e15-bd86e830ba74 
 200G  2
   VHD-97cbc3d2-519c-4d11-b795-7a71179e193c 
  102400M  2
   
VHD-97cbc3d2-519c-4d11-b795-7a71179e193c@SNAP-ad9ab028-57d9-48e4-9d13-a9a02b06e68a
 102400M  2 yes
   VHD-acb2a9b0-e98d-474e-aa42-ed4e5534ddbe 
  102400M  2  excl
   VHD-adaa639e-a7a4-4723-9c18-0c2b3ab9d99f 
  102400M  2
   VHD-bb0f40c7-206b-4023-b56f-4a70d02d5a58 
 200G  2
   
VHD-bb0f40c7-206b-4023-b56f-4a70d02d5a58@SNAP-f8b1cc40-4a9b-4550-8bfa-7fad467e726b
200G  2 yes
   VHD-c8aca7bd-1e37-4af4-b642-f267602e210f 
  102400M  2
   
VHD-c8aca7bd-1e37-4af4-b642-f267602e210f@SNAP-0e830e68-c7b4-49a9-8210-4c0b4ecca762
 102400M  2 yes
   VHD-cf2139ac-b1c4-404d-87da-db8f992a3e72 
  102400M  2
   
VHD-cf2139ac-b1c4-404d-87da-db8f992a3e72@SNAP-4fb59ee5-3256-4e75-9d48-facdd818d754
 102400M  2 yes

however they are unbootable and unreadable even as secondary driver.
Hypervisor refure to load them in the VMs.

What Can I do?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help needed rbd feature enable

2017-06-23 Thread Massimiliano Cuttini
What seems to be strange is that feature are *all disabled* when I 
create some images.

While ceph should use default settings of jewel at least.

Do I need to place in ceph.conf something in order to use default settings?




Il 23/06/2017 23:43, Massimiliano Cuttini ha scritto:


I guess you updated those feature before the commit that fix this:

https://github.com/ceph/ceph/blob/master/src/include/rbd/features.h

As stated:

// features that make an image inaccessible for read or write by
/// clients that don't understand them
#define RBD_FEATURES_INCOMPATIBLE (RBD_FEATURE_LAYERING   | \
RBD_FEATURE_STRIPINGV2 | \
  RBD_FEATURE_DATA_POOL)

/// features that make an image unwritable by clients that don't understand 
them
#define RBD_FEATURES_RW_INCOMPATIBLE(RBD_FEATURES_INCOMPATIBLE  | \
  RBD_FEATURE_EXCLUSIVE_LOCK | \
  RBD_FEATURE_OBJECT_MAP | \
 RBD_FEATURE_FAST_DIFF  | \
 RBD_FEATURE_DEEP_FLATTEN   | \
RBD_FEATURE_JOURNALING)


Some features are no-way back: if you downgrade you cannot restore 
anymore.

As stated here: https://bugzilla.redhat.com/show_bug.cgi?id=1326645
This is the right behaviour and redhat will not fix this.

So don't downgrade your features or you'll have to export-import all 
the images.

I'm getting crazy.



Il 23/06/2017 22:36, David Turner ha scritto:


I upgraded to Jewel from Hammer and was able to enable those features 
on all of my rbds that were format 2, which yours is.  Just test it 
on some non customer data and see how it goes.



On Fri, Jun 23, 2017, 4:33 PM Massimiliano Cuttini <m...@phoenixweb.it 
<mailto:m...@phoenixweb.it>> wrote:


Ok,

At moment my client use only nbd-rbd, can I use all these feature
or this is something unavoidable?
I guess it's ok.

Reading around seems that a lost feature cannot be re-enabled due
to back-compatibility with old clients.
... I guess I'll need to export and import in a new image fully
feature.
Is it?




Il 23/06/2017 22:25, David Turner ha scritto:

All of the features you are talking about likely require the
exclusive-lock which requires the 4.9 linux kernel.  You cannot
map any RBDs that have these features enabled with any kernel
older than that.

The features you can enable are layering, exclusive-lock,
object-map, and fast-diff.  You cannot enable deep-flatten on
any RBD ever.  RBD's can only be created with that feature.  You
may need to enable these in a specific order.  I believe that
the order I have the features listed is the order you need to
enable them in, at least that order should work.

On Fri, Jun 23, 2017 at 3:41 PM Massimiliano Cuttini
<m...@phoenixweb.it <mailto:m...@phoenixweb.it>> wrote:

Hi everybody,

I just realize that all my Images are completly without
features:

rbd info VHD-4c7ebb38-b081-48da-9b57-aac14bdf88c4
rbd image 'VHD-4c7ebb38-b081-48da-9b57-aac14bdf88c4':
 size 102400 MB in 51200 objects
 order 21 (2048 kB objects)
 block_name_prefix: rbd_data.5fde2ae8944a
 format: 2
 features:
 flags:

try to enabling them will get this error:

rbd: failed to update image features: (22) Invalid argument
2017-06-23 21:20:03.748746 7fdec1b34d80 -1 librbd: cannot update 
immutable features

I read on the guide I shoulded had place in the
config|rbd_default_features|

What can I do now to enable this feature all feature of
jewel on all images?
Can I insert all the feature of jewel or is there any issue
with old kernel?

|
|

|Thanks,
Max
|

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com







___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help needed rbd feature enable

2017-06-23 Thread Massimiliano Cuttini

I guess you updated those feature before the commit that fix this:

https://github.com/ceph/ceph/blob/master/src/include/rbd/features.h

As stated:

   // features that make an image inaccessible for read or write by
   /// clients that don't understand them
   #define RBD_FEATURES_INCOMPATIBLE (RBD_FEATURE_LAYERING   | \
RBD_FEATURE_STRIPINGV2 | \
 RBD_FEATURE_DATA_POOL)

   /// features that make an image unwritable by clients that don't understand 
them
   #define RBD_FEATURES_RW_INCOMPATIBLE(RBD_FEATURES_INCOMPATIBLE  | \
 RBD_FEATURE_EXCLUSIVE_LOCK | \
 RBD_FEATURE_OBJECT_MAP | \
RBD_FEATURE_FAST_DIFF  | \
RBD_FEATURE_DEEP_FLATTEN   | \
RBD_FEATURE_JOURNALING)


Some features are no-way back: if you downgrade you cannot restore anymore.
As stated here: https://bugzilla.redhat.com/show_bug.cgi?id=1326645
This is the right behaviour and redhat will not fix this.

So don't downgrade your features or you'll have to export-import all the 
images.

I'm getting crazy.



Il 23/06/2017 22:36, David Turner ha scritto:


I upgraded to Jewel from Hammer and was able to enable those features 
on all of my rbds that were format 2, which yours is.  Just test it on 
some non customer data and see how it goes.



On Fri, Jun 23, 2017, 4:33 PM Massimiliano Cuttini <m...@phoenixweb.it 
<mailto:m...@phoenixweb.it>> wrote:


Ok,

At moment my client use only nbd-rbd, can I use all these feature
or this is something unavoidable?
I guess it's ok.

Reading around seems that a lost feature cannot be re-enabled due
to back-compatibility with old clients.
... I guess I'll need to export and import in a new image fully
feature.
Is it?




Il 23/06/2017 22:25, David Turner ha scritto:

All of the features you are talking about likely require the
exclusive-lock which requires the 4.9 linux kernel.  You cannot
map any RBDs that have these features enabled with any kernel
older than that.

The features you can enable are layering, exclusive-lock,
object-map, and fast-diff.  You cannot enable deep-flatten on any
RBD ever.  RBD's can only be created with that feature.  You may
need to enable these in a specific order.  I believe that the
order I have the features listed is the order you need to enable
them in, at least that order should work.

On Fri, Jun 23, 2017 at 3:41 PM Massimiliano Cuttini
<m...@phoenixweb.it <mailto:m...@phoenixweb.it>> wrote:

Hi everybody,

I just realize that all my Images are completly without features:

rbd info VHD-4c7ebb38-b081-48da-9b57-aac14bdf88c4
rbd image 'VHD-4c7ebb38-b081-48da-9b57-aac14bdf88c4':
 size 102400 MB in 51200 objects
 order 21 (2048 kB objects)
 block_name_prefix: rbd_data.5fde2ae8944a
 format: 2
 features:
 flags:

try to enabling them will get this error:

rbd: failed to update image features: (22) Invalid argument
2017-06-23 21:20:03.748746 7fdec1b34d80 -1 librbd: cannot update 
immutable features

I read on the guide I shoulded had place in the
config|rbd_default_features|

What can I do now to enable this feature all feature of jewel
on all images?
Can I insert all the feature of jewel or is there any issue
with old kernel?

|
|

|Thanks,
Max
|

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help needed rbd feature enable

2017-06-23 Thread Massimiliano Cuttini

Ok,

At moment my client use only nbd-rbd, can I use all these feature or 
this is something unavoidable?

I guess it's ok.

Reading around seems that a lost feature cannot be re-enabled due to 
back-compatibility with old clients.

... I guess I'll need to export and import in a new image fully feature.
Is it?




Il 23/06/2017 22:25, David Turner ha scritto:
All of the features you are talking about likely require the 
exclusive-lock which requires the 4.9 linux kernel. You cannot map any 
RBDs that have these features enabled with any kernel older than that.


The features you can enable are layering, exclusive-lock, object-map, 
and fast-diff.  You cannot enable deep-flatten on any RBD ever.  RBD's 
can only be created with that feature. You may need to enable these in 
a specific order.  I believe that the order I have the features listed 
is the order you need to enable them in, at least that order should work.


On Fri, Jun 23, 2017 at 3:41 PM Massimiliano Cuttini 
<m...@phoenixweb.it <mailto:m...@phoenixweb.it>> wrote:


Hi everybody,

I just realize that all my Images are completly without features:

rbd info VHD-4c7ebb38-b081-48da-9b57-aac14bdf88c4
rbd image 'VHD-4c7ebb38-b081-48da-9b57-aac14bdf88c4':
 size 102400 MB in 51200 objects
 order 21 (2048 kB objects)
 block_name_prefix: rbd_data.5fde2ae8944a
 format: 2
 features:
 flags:

try to enabling them will get this error:

rbd: failed to update image features: (22) Invalid argument
2017-06-23 21:20:03.748746 7fdec1b34d80 -1 librbd: cannot update 
immutable features

I read on the guide I shoulded had place in the
config|rbd_default_features|

What can I do now to enable this feature all feature of jewel on
all images?
Can I insert all the feature of jewel or is there any issue with
old kernel?

|
|

|Thanks,
Max
|

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Which one should I sacrifice: Tunables or Kernel-rbd?

2017-06-23 Thread Massimiliano Cuttini

Ok,

so if I understand correctly your opinion: if you cannot choiche the 
kernel then you'll sacrifice immediatly the kernel-rbd.

I was at the same opinion but i'm still harvesting opinion.

Can you tell me if by using nbd-rbd I'm not losing any features?
I just cannot understand if nbd is a sort of "virtualized driver" that 
use ceph under a less-featured-standardized driver or if kernel and nbd 
differ only (assuming it's compared with last kernel) just for speed reason.



Thanks Turner for any further info!
Max



Il 23/06/2017 18:21, David Turner ha scritto:
If you have no control over what kernel the clients are going to use, 
then I wouldn't even consider using the kernel driver for the 
clients.  For me, I would do anything to maintain the ability to use 
the object map which would require the 4.9 kernel to use with the 
kernel driver.  Because of this and similar improvements to ceph that 
the kernel is requiring newer and newer versions to utilize, I've 
become a strong proponent of utilizing the fuse, rgw, and 
librados/librbd client options to keep my clients in feature parity 
with my cluster's ceph version.


On Fri, Jun 23, 2017 at 11:50 AM Massimiliano Cuttini 
<m...@phoenixweb.it <mailto:m...@phoenixweb.it>> wrote:


Not all server are real centOS servers.
Some of them are dedicated distribution locked at 7.2 with locked
kernel
fixed at 3.10.
Which as far as I can understand need CRUSH_TUNABLES2 and not even 3!


http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client

So what are you suggest to sacrifice?
Kernel-RBD or CRUSH_TUNABLE > 2?



Il 23/06/2017 14:51, Jason Dillaman ha scritto:
> CentOS 7.3's krbd supports Jewel tunables (CRUSH_TUNABLES5) and does
> not support NBD since that driver is disabled out-of-the-box. As an
> alternative for NBD, the goal is to also offer LIO/TCMU starting
with
> Luminous and the next point release of CentOS (or a vanilla
>=4.12-ish
> kernel).
    >
> On Fri, Jun 23, 2017 at 5:31 AM, Massimiliano Cuttini
<m...@phoenixweb.it <mailto:m...@phoenixweb.it>> wrote:
>> Dear all,
>>
>> running all server and clients a centOS release with a kernel
3.10.* I'm
>> facing this choiche:
>>
>> sacrifice TUNABLES and downgrade all the cluster to
>> CEPH_FEATURE_CRUSH_TUNABLES3 (which should be the right profile
for jewel on
>> old kernel 3.10)
>> sacrifice KERNEL RBD and map Ceph by NBD
>>
>> Which one should I sacrifice? And why?
>> Let me know your througth, pro & cons.
>>
>>
>> Thanks,
>> Max
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Squeezing Performance of CEPH

2017-06-23 Thread Massimiliano Cuttini

Ok,

I get the point.


Il 23/06/2017 17:42, Ashley Merrick ha scritto:
But your then have a very mismatch of performance across your OSD’s 
which is never recommend by CEPH.


It’s all about what you can do with your current boxes capacity to 
increase performance across the whole OSD set.


,Ashley

Sent from my iPhone

On 23 Jun 2017, at 10:40 PM, Massimiliano Cuttini <m...@phoenixweb.it 
<mailto:m...@phoenixweb.it>> wrote:



Ashley,

but.. instead of use NVMe as a journal, why don't add 2 OSD to the 
cluster?

Incresing number of OSD instead of improving performance of actual OSD?



Il 23/06/2017 15:40, Ashley Merrick ha scritto:

Sorry for the not inline reply.

If you can get 6 OSD’s per a NVME as long as your getting a decent 
rated NVME your bottle neck will be the NVME but will still improve 
over your current bottle neck.


You could add two NVME OSD’s, but their higher performance would be 
lost along with the other 12 OSD’s.


,Ashley

Sent from my iPhone

On 23 Jun 2017, at 8:34 PM, Massimiliano Cuttini <m...@phoenixweb.it 
<mailto:m...@phoenixweb.it>> wrote:



Hi Ashley,

You could move your Journal to another SSD this would remove the 
double write. 
If I move the journal to another SSD, I will loss an available OSD, 
so this is likely to say improve of *x2* and then decrease of *x½ *...
this should not improve performance in any case on a full SSD disks 
system.


Ideally you’d want one or two PCIe NVME in the servers for the 
Journal.
This seems a really good Idea, but image that I have only 2 slots 
for PCIe and 12 SSD disks.
I image that it's will not be possible place 12 Journal on 2 PCIe 
NVME without loss performance. or yes?


Or if you can hold off a bit then bluestore, which removes the 
double write, however is still handy to move some of the services 
to a seperate disk.
I hear that bluestore will remove double writing on journal (still 
not investigated), but I guess Luminous will be fully tested not 
before the end of the year.
About the today system really don't know if moving on a separate 
disks will have some impact considering that this is a full SSD 
disks system.


Even adding 2 PCIe NVME why should not use them as a OSD 
instead of journal solo?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help needed rbd feature enable

2017-06-23 Thread Massimiliano Cuttini

Hi everybody,

I just realize that all my Images are completly without features:

   rbd info VHD-4c7ebb38-b081-48da-9b57-aac14bdf88c4
   rbd image 'VHD-4c7ebb38-b081-48da-9b57-aac14bdf88c4':
size 102400 MB in 51200 objects
order 21 (2048 kB objects)
block_name_prefix: rbd_data.5fde2ae8944a
format: 2
features:
flags:

try to enabling them will get this error:

   rbd: failed to update image features: (22) Invalid argument
   2017-06-23 21:20:03.748746 7fdec1b34d80 -1 librbd: cannot update immutable 
features

I read on the guide I shoulded had place in the config|rbd_default_features|

What can I do now to enable this feature all feature of jewel on all images?
Can I insert all the feature of jewel or is there any issue with old kernel?

|
|

|Thanks,
Max
|

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Squeezing Performance of CEPH

2017-06-23 Thread Massimiliano Cuttini

Hi Everybody,

i also see that VM on top of this drive see an even lower speed:

 hdparm -Tt --direct /dev/xvdb

/dev/xvdb:
 Timing O_DIRECT cached reads:   2596 MB in  2.00 seconds = 1297.42 MB/sec
 Timing O_DIRECT disk reads: 910 MB in  3.00 seconds = 303.17 MB/sec

It's seem there is huge diff between DOM(0) speed and VM speed.
Is it normal?



Il 23/06/2017 10:24, Massimiliano Cuttini ha scritto:


Hi Mark,

having 2 node for testing allow me to downgrade the replication to 2x 
(till the production).

SSD have the following product details:

  * sequential read: 540MB/sec
  * sequential write: 520MB/sec

As you state my sequential write should be:

~600 * 2 (copies) * 2 (journal write per copy) / 8 (ssds) =
~225,25MB/s

If you think that 2 copies should be *simultaneously *on different 
cards/networks/nodes my calculation are:


~600 * 2 (journal write per copy) / 8 (ssds) = ~112,625MB/s

So yes, I think that they are terrible low (but maybe I miss 
something), about 20,8% of the theorical speed of an SSD.

Sequential Read are quite low too.
Maybe only Random Read is good.

Any suggestion?



Il 22/06/2017 19:41, Mark Nelson ha scritto:

Hello Massimiliano,

Based on the configuration below, it appears you have 8 SSDs total (2 
nodes with 4 SSDs each)?


I'm going to assume you have 3x replication and are you using 
filestore, so in reality you are writing 3 copies and doing full data 
journaling for each copy, for 6x writes per client write. Taking this 
into account, your per-SSD throughput should be somewhere around:


Sequential write:
~600 * 3 (copies) * 2 (journal write per copy) / 8 (ssds) = ~450MB/s

Sequential read
~3000 / 8 (ssds) = ~375MB/s

Random read
~3337 / 8 (ssds) = ~417MB/s

These numbers are pretty reasonable for SATA based SSDs, though the 
read throughput is a little low.  You didn't include the model of 
SSD, but if you look at Intel's DC S3700 which is a fairly popular 
SSD for ceph:


https://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3700-spec.html 



Sequential read is up to ~500MB/s and Sequential write speeds up to 
460MB/s.  Not too far off from what you are seeing.  You might try 
playing with readahead on the OSD devices to see if that improves 
things at all.  Still, unless I've missed something these numbers 
aren't terrible.


Mark

On 06/22/2017 12:19 PM, Massimiliano Cuttini wrote:

Hi everybody,

I want to squeeze all the performance of CEPH (we are using jewel 
10.2.7).

We are testing a testing environment with 2 nodes having the same
configuration:

  * CentOS 7.3
  * 24 CPUs (12 for real in hyper threading)
  * 32Gb of RAM
  * 2x 100Gbit/s ethernet cards
  * 2x OS dedicated in raid SSD Disks
  * 4x OSD SSD Disks SATA 6Gbit/s

We are already expecting the following bottlenecks:

  * [ SATA speed x n° disks ] = 24Gbit/s
  * [ Networks speed x n° bonded cards ] = 200Gbit/s

So the minimum between them is 24 Gbit/s per node (not taking in 
account

protocol loss).

24Gbit/s per node x2 = 48Gbit/s of maximum hypotetical theorical gross
speed.

Here are the tests:
///IPERF2/// Tests are quite good scoring 88% of the 
bottleneck.
Note: iperf2 can use only 1 connection from a bond.(it's a well know 
issue).


[ ID] Interval   Transfer Bandwidth
[ 12]  0.0-10.0 sec  9.55 GBytes  8.21 Gbits/sec
[  3]  0.0-10.0 sec  10.3 GBytes  8.81 Gbits/sec
[  5]  0.0-10.0 sec  9.54 GBytes  8.19 Gbits/sec
[  7]  0.0-10.0 sec  9.52 GBytes  8.18 Gbits/sec
[  6]  0.0-10.0 sec  9.96 GBytes  8.56 Gbits/sec
[  8]  0.0-10.0 sec  12.1 GBytes  10.4 Gbits/sec
[  9]  0.0-10.0 sec  12.3 GBytes  10.6 Gbits/sec
[ 10]  0.0-10.0 sec  10.2 GBytes  8.80 Gbits/sec
[ 11]  0.0-10.0 sec  9.34 GBytes  8.02 Gbits/sec
[  4]  0.0-10.0 sec  10.3 GBytes  8.82 Gbits/sec
[SUM]  0.0-10.0 sec   103 GBytes  88.6 Gbits/sec

///RADOS BENCH

Take in consideration the maximum hypotetical speed of 48Gbit/s tests
(due to disks bottleneck), tests are not good enought.

  * Average MB/s in write is almost 5-7Gbit/sec (12,5% of the mhs)
  * Average MB/s in seq read is almost 24Gbit/sec (50% of the mhs)
  * Average MB/s in random read is almost 27Gbit/se (56,25% of the 
mhs).


Here are the reports.
Write:

# rados bench -p scbench 10 write --no-cleanup
Total time run: 10.229369
Total writes made:  1538
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 601.406
Stddev Bandwidth:   357.012
Max bandwidth (MB/sec): 1080
Min bandwidth (MB/sec): 204
Average IOPS:   150
Stddev IOPS:89
Max IOPS:   270
Min IOPS:   51
Average Latency(s): 0.106218
Stddev Latency(s):  0.198735
Max latency(s): 1.87401
Min latency(s): 0.0225438

sequential read:

# rados bench -p scbench 10 seq
Total time run:   2.054359
Total reads made: 1538
Read

Re: [ceph-users] Which one should I sacrifice: Tunables or Kernel-rbd?

2017-06-23 Thread Massimiliano Cuttini

Not all server are real centOS servers.
Some of them are dedicated distribution locked at 7.2 with locked kernel 
fixed at 3.10.

Which as far as I can understand need CRUSH_TUNABLES2 and not even 3!

http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client

So what are you suggest to sacrifice?
Kernel-RBD or CRUSH_TUNABLE > 2?



Il 23/06/2017 14:51, Jason Dillaman ha scritto:

CentOS 7.3's krbd supports Jewel tunables (CRUSH_TUNABLES5) and does
not support NBD since that driver is disabled out-of-the-box. As an
alternative for NBD, the goal is to also offer LIO/TCMU starting with
Luminous and the next point release of CentOS (or a vanilla >=4.12-ish
kernel).

On Fri, Jun 23, 2017 at 5:31 AM, Massimiliano Cuttini <m...@phoenixweb.it> 
wrote:

Dear all,

running all server and clients a centOS release with a kernel 3.10.* I'm
facing this choiche:

sacrifice TUNABLES and downgrade all the cluster to
CEPH_FEATURE_CRUSH_TUNABLES3 (which should be the right profile for jewel on
old kernel 3.10)
sacrifice KERNEL RBD and map Ceph by NBD

Which one should I sacrifice? And why?
Let me know your througth, pro & cons.


Thanks,
Max



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Squeezing Performance of CEPH

2017-06-23 Thread Massimiliano Cuttini

Ashley,

but.. instead of use NVMe as a journal, why don't add 2 OSD to the cluster?
Incresing number of OSD instead of improving performance of actual OSD?



Il 23/06/2017 15:40, Ashley Merrick ha scritto:

Sorry for the not inline reply.

If you can get 6 OSD’s per a NVME as long as your getting a decent 
rated NVME your bottle neck will be the NVME but will still improve 
over your current bottle neck.


You could add two NVME OSD’s, but their higher performance would be 
lost along with the other 12 OSD’s.


,Ashley

Sent from my iPhone

On 23 Jun 2017, at 8:34 PM, Massimiliano Cuttini <m...@phoenixweb.it 
<mailto:m...@phoenixweb.it>> wrote:



Hi Ashley,

You could move your Journal to another SSD this would remove the 
double write. 
If I move the journal to another SSD, I will loss an available OSD, 
so this is likely to say improve of *x2* and then decrease of *x½ *...
this should not improve performance in any case on a full SSD disks 
system.



Ideally you’d want one or two PCIe NVME in the servers for the Journal.
This seems a really good Idea, but image that I have only 2 slots for 
PCIe and 12 SSD disks.
I image that it's will not be possible place 12 Journal on 2 PCIe 
NVME without loss performance. or yes?


Or if you can hold off a bit then bluestore, which removes the 
double write, however is still handy to move some of the services to 
a seperate disk.
I hear that bluestore will remove double writing on journal (still 
not investigated), but I guess Luminous will be fully tested not 
before the end of the year.
About the today system really don't know if moving on a separate 
disks will have some impact considering that this is a full SSD disks 
system.


Even adding 2 PCIe NVME why should not use them as a OSD instead 
of journal solo?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Squeezing Performance of CEPH

2017-06-23 Thread Massimiliano Cuttini

Hi Ashley,

You could move your Journal to another SSD this would remove the 
double write. 
If I move the journal to another SSD, I will loss an available OSD, so 
this is likely to say improve of *x2* and then decrease of *x½ *...

this should not improve performance in any case on a full SSD disks system.


Ideally you’d want one or two PCIe NVME in the servers for the Journal.
This seems a really good Idea, but image that I have only 2 slots for 
PCIe and 12 SSD disks.
I image that it's will not be possible place 12 Journal on 2 PCIe NVME 
without loss performance. or yes?


Or if you can hold off a bit then bluestore, which removes the double 
write, however is still handy to move some of the services to a 
seperate disk.
I hear that bluestore will remove double writing on journal (still not 
investigated), but I guess Luminous will be fully tested not before the 
end of the year.
About the today system really don't know if moving on a separate disks 
will have some impact considering that this is a full SSD disks system.


Even adding 2 PCIe NVME why should not use them as a OSD instead of 
journal solo?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Squeezing Performance of CEPH

2017-06-23 Thread Massimiliano Cuttini

*Of course yes!*

SSD bottleneck is the SATA controller.
If you use a NVMe/PCIe controller you get from almost the same SSD 
2.400MB/sec instead of 580MB/sec.


2400MB/sec x 8 =  ~19Gbit/sec
580MB/sec x 8 = ~5 Gbit/sec

If you don't trust me take a look at this benchmark between 2 really 
common SSD in the consumer market:

http://ssd.userbenchmark.com/Compare/Samsung-960-Pro-NVMe-PCIe-M2-512GB-vs-Samsung-850-Pro-256GB/m182182vs2385

So of course, talking about SSD, your speed is grossy the SATA 
controller (minus the overhead of the protocol).



Il 22/06/2017 20:35, c...@jack.fr.eu.org ha scritto:

On 22/06/2017 19:19, Massimiliano Cuttini wrote:

We are already expecting the following bottlenecks:

  * [ SATA speed x n° disks ] = 24Gbit/s
  * [ Networks speed x n° bonded cards ] = 200Gbit/s

6Gbps SATA does not mean you can read 6Gbps from that device

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Squeezing Performance of CEPH

2017-06-23 Thread Massimiliano Cuttini

Very good to know!
Thanks for the info.


Il 22/06/2017 20:15, Maged Mokhtar ha scritto:


Generally you can measure your bottleneck via a tool like 
atop/collectl/sysstat  and see how busy (ie %busy, %util ) your 
resources are: cpu/disks/net.


As was pointed out, in your case you will most probably have maxed out 
on your disks. But the above tools should help as you grow and tune 
your cluster.


Cheers,

Maged Mokhtar

PetaSAN

On 2017-06-22 19:19, Massimiliano Cuttini wrote:


Hi everybody,

I want to squeeze all the performance of CEPH (we are using jewel 
10.2.7).
We are testing a testing environment with 2 nodes having the same 
configuration:


  * CentOS 7.3
  * 24 CPUs (12 for real in hyper threading)
  * 32Gb of RAM
  * 2x 100Gbit/s ethernet cards
  * 2x OS dedicated in raid SSD Disks
  * 4x OSD SSD Disks SATA 6Gbit/s

We are already expecting the following bottlenecks:

  * [ SATA speed x n° disks ] = 24Gbit/s
  * [ Networks speed x n° bonded cards ] = 200Gbit/s

So the minimum between them is 24 Gbit/s per node (not taking in 
account protocol loss).


24Gbit/s per node x2 = 48Gbit/s of maximum hypotetical theorical 
gross speed.


Here are the tests:
///IPERF2/// Tests are quite good scoring 88% of the bottleneck.
Note: iperf2 can use only 1 connection from a bond.(it's a well know 
issue).



[ ID] Interval   Transfer Bandwidth
[ 12]  0.0-10.0 sec  9.55 GBytes  8.21 Gbits/sec
[  3]  0.0-10.0 sec  10.3 GBytes  8.81 Gbits/sec
[  5]  0.0-10.0 sec  9.54 GBytes  8.19 Gbits/sec
[  7]  0.0-10.0 sec  9.52 GBytes  8.18 Gbits/sec
[  6]  0.0-10.0 sec  9.96 GBytes  8.56 Gbits/sec
[  8]  0.0-10.0 sec  12.1 GBytes  10.4 Gbits/sec
[  9]  0.0-10.0 sec  12.3 GBytes  10.6 Gbits/sec
[ 10]  0.0-10.0 sec  10.2 GBytes  8.80 Gbits/sec
[ 11]  0.0-10.0 sec  9.34 GBytes  8.02 Gbits/sec
[  4]  0.0-10.0 sec  10.3 GBytes  8.82 Gbits/sec
[SUM]  0.0-10.0 sec   103 GBytes  88.6 Gbits/sec


///RADOS BENCH

Take in consideration the maximum hypotetical speed of 48Gbit/s tests 
(due to disks bottleneck), tests are not good enought.


  * Average MB/s in write is almost 5-7Gbit/sec (12,5% of the mhs)
  * Average MB/s in seq read is almost 24Gbit/sec (50% of the mhs)
  * Average MB/s in random read is almost 27Gbit/se (56,25% of the mhs).

Here are the reports.
Write:


# rados bench -p scbench 10 write --no-cleanup
Total time run: 10.229369
Total writes made:  1538
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 601.406
Stddev Bandwidth:   357.012
Max bandwidth (MB/sec): 1080
Min bandwidth (MB/sec): 204
Average IOPS:   150
Stddev IOPS:89
Max IOPS:   270
Min IOPS:   51
Average Latency(s): 0.106218
Stddev Latency(s):  0.198735
Max latency(s): 1.87401
Min latency(s): 0.0225438


sequential read:


# rados bench -p scbench 10 seq
Total time run:   2.054359
Total reads made: 1538
Read size:4194304
Object size:  4194304
Bandwidth (MB/sec):   2994.61
Average IOPS  748
Stddev IOPS:  67
Max IOPS: 802
Min IOPS: 707
Average Latency(s):   0.0202177
Max latency(s):   0.223319
Min latency(s):   0.00589238


random read:


# rados bench -p scbench 10 rand
Total time run:   10.036816
Total reads made: 8375
Read size:4194304
Object size:  4194304
Bandwidth (MB/sec):   3337.71
Average IOPS: 834
Stddev IOPS:  78
Max IOPS: 927
Min IOPS: 741
Average Latency(s):   0.0182707
Max latency(s):   0.257397
Min latency(s):   0.00469212


//

It's seems like that there are some bottleneck somewhere that we are 
understimating.

Can you help me to found it?



___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Squeezing Performance of CEPH

2017-06-23 Thread Massimiliano Cuttini

Hi Mark,

having 2 node for testing allow me to downgrade the replication to 2x 
(till the production).

SSD have the following product details:

 * sequential read: 540MB/sec
 * sequential write: 520MB/sec

As you state my sequential write should be:

   ~600 * 2 (copies) * 2 (journal write per copy) / 8 (ssds) = ~225,25MB/s

If you think that 2 copies should be *simultaneously *on different 
cards/networks/nodes my calculation are:


   ~600 * 2 (journal write per copy) / 8 (ssds) = ~112,625MB/s

So yes, I think that they are terrible low (but maybe I miss something), 
about 20,8% of the theorical speed of an SSD.

Sequential Read are quite low too.
Maybe only Random Read is good.

Any suggestion?



Il 22/06/2017 19:41, Mark Nelson ha scritto:

Hello Massimiliano,

Based on the configuration below, it appears you have 8 SSDs total (2 
nodes with 4 SSDs each)?


I'm going to assume you have 3x replication and are you using 
filestore, so in reality you are writing 3 copies and doing full data 
journaling for each copy, for 6x writes per client write. Taking this 
into account, your per-SSD throughput should be somewhere around:


Sequential write:
~600 * 3 (copies) * 2 (journal write per copy) / 8 (ssds) = ~450MB/s

Sequential read
~3000 / 8 (ssds) = ~375MB/s

Random read
~3337 / 8 (ssds) = ~417MB/s

These numbers are pretty reasonable for SATA based SSDs, though the 
read throughput is a little low.  You didn't include the model of SSD, 
but if you look at Intel's DC S3700 which is a fairly popular SSD for 
ceph:


https://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3700-spec.html 



Sequential read is up to ~500MB/s and Sequential write speeds up to 
460MB/s.  Not too far off from what you are seeing.  You might try 
playing with readahead on the OSD devices to see if that improves 
things at all.  Still, unless I've missed something these numbers 
aren't terrible.


Mark

On 06/22/2017 12:19 PM, Massimiliano Cuttini wrote:

Hi everybody,

I want to squeeze all the performance of CEPH (we are using jewel 
10.2.7).

We are testing a testing environment with 2 nodes having the same
configuration:

  * CentOS 7.3
  * 24 CPUs (12 for real in hyper threading)
  * 32Gb of RAM
  * 2x 100Gbit/s ethernet cards
  * 2x OS dedicated in raid SSD Disks
  * 4x OSD SSD Disks SATA 6Gbit/s

We are already expecting the following bottlenecks:

  * [ SATA speed x n° disks ] = 24Gbit/s
  * [ Networks speed x n° bonded cards ] = 200Gbit/s

So the minimum between them is 24 Gbit/s per node (not taking in account
protocol loss).

24Gbit/s per node x2 = 48Gbit/s of maximum hypotetical theorical gross
speed.

Here are the tests:
///IPERF2/// Tests are quite good scoring 88% of the bottleneck.
Note: iperf2 can use only 1 connection from a bond.(it's a well know 
issue).


[ ID] Interval   Transfer Bandwidth
[ 12]  0.0-10.0 sec  9.55 GBytes  8.21 Gbits/sec
[  3]  0.0-10.0 sec  10.3 GBytes  8.81 Gbits/sec
[  5]  0.0-10.0 sec  9.54 GBytes  8.19 Gbits/sec
[  7]  0.0-10.0 sec  9.52 GBytes  8.18 Gbits/sec
[  6]  0.0-10.0 sec  9.96 GBytes  8.56 Gbits/sec
[  8]  0.0-10.0 sec  12.1 GBytes  10.4 Gbits/sec
[  9]  0.0-10.0 sec  12.3 GBytes  10.6 Gbits/sec
[ 10]  0.0-10.0 sec  10.2 GBytes  8.80 Gbits/sec
[ 11]  0.0-10.0 sec  9.34 GBytes  8.02 Gbits/sec
[  4]  0.0-10.0 sec  10.3 GBytes  8.82 Gbits/sec
[SUM]  0.0-10.0 sec   103 GBytes  88.6 Gbits/sec

///RADOS BENCH

Take in consideration the maximum hypotetical speed of 48Gbit/s tests
(due to disks bottleneck), tests are not good enought.

  * Average MB/s in write is almost 5-7Gbit/sec (12,5% of the mhs)
  * Average MB/s in seq read is almost 24Gbit/sec (50% of the mhs)
  * Average MB/s in random read is almost 27Gbit/se (56,25% of the mhs).

Here are the reports.
Write:

# rados bench -p scbench 10 write --no-cleanup
Total time run: 10.229369
Total writes made:  1538
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 601.406
Stddev Bandwidth:   357.012
Max bandwidth (MB/sec): 1080
Min bandwidth (MB/sec): 204
Average IOPS:   150
Stddev IOPS:89
Max IOPS:   270
Min IOPS:   51
Average Latency(s): 0.106218
Stddev Latency(s):  0.198735
Max latency(s): 1.87401
Min latency(s): 0.0225438

sequential read:

# rados bench -p scbench 10 seq
Total time run:   2.054359
Total reads made: 1538
Read size:4194304
Object size:  4194304
Bandwidth (MB/sec):   2994.61
Average IOPS  748
Stddev IOPS:  67
Max IOPS: 802
Min IOPS: 707
Average Latency(s):   0.0202177
Max latency(s):   0.223319
Min latency(s):   0.00589238

random read:

# rados bench -p scbench 10 rand
Total time run:   10.036816
Total

Re: [ceph-users] Squeezing Performance of CEPH

2017-06-23 Thread Massimiliano Cuttini

Hi Ashley,

I already know, I was already expecting that the bottleneck was the 
minimum between bandwidth and disks (and was currently disk on my first 
email).

I thinking that write is still to low.

I read that removing journal overhead is not a good idea.
However I'm writing twice on a SSD... even this seems not a good idea.
How is possible to remove this overhead?



Il 22/06/2017 19:47, Ashley Merrick ha scritto:

Hello,

Also as Mark put, one minute your testing bandwidth capacity, next 
minute your testing disk capacity.


No way is a small set of SSD’s going to be able to max your current 
bandwidth, even if you removed the CEPH / Journal overhead. I would 
say the speeds you are getting are what you should expect , see with 
many other setups.


,Ashley

Sent from my iPhone

On 23 Jun 2017, at 12:42 AM, Mark Nelson <mnel...@redhat.com 
<mailto:mnel...@redhat.com>> wrote:



Hello Massimiliano,

Based on the configuration below, it appears you have 8 SSDs total (2 
nodes with 4 SSDs each)?


I'm going to assume you have 3x replication and are you using 
filestore, so in reality you are writing 3 copies and doing full data 
journaling for each copy, for 6x writes per client write.  Taking 
this into account, your per-SSD throughput should be somewhere around:


Sequential write:
~600 * 3 (copies) * 2 (journal write per copy) / 8 (ssds) = ~450MB/s

Sequential read
~3000 / 8 (ssds) = ~375MB/s

Random read
~3337 / 8 (ssds) = ~417MB/s

These numbers are pretty reasonable for SATA based SSDs, though the 
read throughput is a little low.  You didn't include the model of 
SSD, but if you look at Intel's DC S3700 which is a fairly popular 
SSD for ceph:


https://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3700-spec.html

Sequential read is up to ~500MB/s and Sequential write speeds up to 
460MB/s.  Not too far off from what you are seeing.  You might try 
playing with readahead on the OSD devices to see if that improves 
things at all.  Still, unless I've missed something these numbers 
aren't terrible.


Mark

On 06/22/2017 12:19 PM, Massimiliano Cuttini wrote:

Hi everybody,

I want to squeeze all the performance of CEPH (we are using jewel 
10.2.7).

We are testing a testing environment with 2 nodes having the same
configuration:

 * CentOS 7.3
 * 24 CPUs (12 for real in hyper threading)
 * 32Gb of RAM
 * 2x 100Gbit/s ethernet cards
 * 2x OS dedicated in raid SSD Disks
 * 4x OSD SSD Disks SATA 6Gbit/s

We are already expecting the following bottlenecks:

 * [ SATA speed x n° disks ] = 24Gbit/s
 * [ Networks speed x n° bonded cards ] = 200Gbit/s

So the minimum between them is 24 Gbit/s per node (not taking in account
protocol loss).

24Gbit/s per node x2 = 48Gbit/s of maximum hypotetical theorical gross
speed.

Here are the tests:
///IPERF2/// Tests are quite good scoring 88% of the bottleneck.
Note: iperf2 can use only 1 connection from a bond.(it's a well know 
issue).


   [ ID] Interval   Transfer Bandwidth
   [ 12]  0.0-10.0 sec  9.55 GBytes  8.21 Gbits/sec
   [  3]  0.0-10.0 sec  10.3 GBytes  8.81 Gbits/sec
   [  5]  0.0-10.0 sec  9.54 GBytes  8.19 Gbits/sec
   [  7]  0.0-10.0 sec  9.52 GBytes  8.18 Gbits/sec
   [  6]  0.0-10.0 sec  9.96 GBytes  8.56 Gbits/sec
   [  8]  0.0-10.0 sec  12.1 GBytes  10.4 Gbits/sec
   [  9]  0.0-10.0 sec  12.3 GBytes  10.6 Gbits/sec
   [ 10]  0.0-10.0 sec  10.2 GBytes  8.80 Gbits/sec
   [ 11]  0.0-10.0 sec  9.34 GBytes  8.02 Gbits/sec
   [  4]  0.0-10.0 sec  10.3 GBytes  8.82 Gbits/sec
   [SUM]  0.0-10.0 sec   103 GBytes  88.6 Gbits/sec

///RADOS BENCH

Take in consideration the maximum hypotetical speed of 48Gbit/s tests
(due to disks bottleneck), tests are not good enought.

 * Average MB/s in write is almost 5-7Gbit/sec (12,5% of the mhs)
 * Average MB/s in seq read is almost 24Gbit/sec (50% of the mhs)
 * Average MB/s in random read is almost 27Gbit/se (56,25% of the mhs).

Here are the reports.
Write:

   # rados bench -p scbench 10 write --no-cleanup
   Total time run: 10.229369
   Total writes made:  1538
   Write size: 4194304
   Object size:4194304
   Bandwidth (MB/sec): 601.406
   Stddev Bandwidth:   357.012
   Max bandwidth (MB/sec): 1080
   Min bandwidth (MB/sec): 204
   Average IOPS:   150
   Stddev IOPS:89
   Max IOPS:   270
   Min IOPS:   51
   Average Latency(s): 0.106218
   Stddev Latency(s):  0.198735
   Max latency(s): 1.87401
   Min latency(s): 0.0225438

sequential read:

   # rados bench -p scbench 10 seq
   Total time run:   2.054359
   Total reads made: 1538
   Read size:4194304
   Object size:  4194304
   Bandwidth (MB/sec):   2994.61
   Average IOPS  748
   Stddev IOPS:  67
   Max IOPS: 802
   Min IOPS: 707
   Average Latency(s):   0.0202177
   Max latency(s):   0.223319
   Min latency(s):   0.00589238

[ceph-users] Which one should I sacrifice: Tunables or Kernel-rbd?

2017-06-23 Thread Massimiliano Cuttini

Dear all,

running all server and clients a centOS release with a kernel 3.10.* I'm 
facing this choiche:


 * sacrifice TUNABLES and downgrade all the cluster to
   CEPH_FEATURE_CRUSH_TUNABLES3 (which should be the right profile for
   jewel on old kernel 3.10)
 * sacrifice KERNEL RBD and map Ceph by NBD

Which one should I sacrifice? And why?
Let me know your througth, pro & cons.


Thanks,
Max


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Squeezing Performance of CEPH

2017-06-22 Thread Massimiliano Cuttini

Hi everybody,

I want to squeeze all the performance of CEPH (we are using jewel 10.2.7).
We are testing a testing environment with 2 nodes having the same 
configuration:


 * CentOS 7.3
 * 24 CPUs (12 for real in hyper threading)
 * 32Gb of RAM
 * 2x 100Gbit/s ethernet cards
 * 2x OS dedicated in raid SSD Disks
 * 4x OSD SSD Disks SATA 6Gbit/s

We are already expecting the following bottlenecks:

 * [ SATA speed x n° disks ] = 24Gbit/s
 * [ Networks speed x n° bonded cards ] = 200Gbit/s

So the minimum between them is 24 Gbit/s per node (not taking in account 
protocol loss).


24Gbit/s per node x2 = 48Gbit/s of maximum hypotetical theorical gross 
speed.


Here are the tests:
///IPERF2/// Tests are quite good scoring 88% of the bottleneck.
Note: iperf2 can use only 1 connection from a bond.(it's a well know issue).

   [ ID] Interval   Transfer Bandwidth
   [ 12]  0.0-10.0 sec  9.55 GBytes  8.21 Gbits/sec
   [  3]  0.0-10.0 sec  10.3 GBytes  8.81 Gbits/sec
   [  5]  0.0-10.0 sec  9.54 GBytes  8.19 Gbits/sec
   [  7]  0.0-10.0 sec  9.52 GBytes  8.18 Gbits/sec
   [  6]  0.0-10.0 sec  9.96 GBytes  8.56 Gbits/sec
   [  8]  0.0-10.0 sec  12.1 GBytes  10.4 Gbits/sec
   [  9]  0.0-10.0 sec  12.3 GBytes  10.6 Gbits/sec
   [ 10]  0.0-10.0 sec  10.2 GBytes  8.80 Gbits/sec
   [ 11]  0.0-10.0 sec  9.34 GBytes  8.02 Gbits/sec
   [  4]  0.0-10.0 sec  10.3 GBytes  8.82 Gbits/sec
   [SUM]  0.0-10.0 sec   103 GBytes  88.6 Gbits/sec

///RADOS BENCH

Take in consideration the maximum hypotetical speed of 48Gbit/s tests 
(due to disks bottleneck), tests are not good enought.


 * Average MB/s in write is almost 5-7Gbit/sec (12,5% of the mhs)
 * Average MB/s in seq read is almost 24Gbit/sec (50% of the mhs)
 * Average MB/s in random read is almost 27Gbit/se (56,25% of the mhs).

Here are the reports.
Write:

   # rados bench -p scbench 10 write --no-cleanup
   Total time run: 10.229369
   Total writes made:  1538
   Write size: 4194304
   Object size:4194304
   Bandwidth (MB/sec): 601.406
   Stddev Bandwidth:   357.012
   Max bandwidth (MB/sec): 1080
   Min bandwidth (MB/sec): 204
   Average IOPS:   150
   Stddev IOPS:89
   Max IOPS:   270
   Min IOPS:   51
   Average Latency(s): 0.106218
   Stddev Latency(s):  0.198735
   Max latency(s): 1.87401
   Min latency(s): 0.0225438

sequential read:

   # rados bench -p scbench 10 seq
   Total time run:   2.054359
   Total reads made: 1538
   Read size:4194304
   Object size:  4194304
   Bandwidth (MB/sec):   2994.61
   Average IOPS  748
   Stddev IOPS:  67
   Max IOPS: 802
   Min IOPS: 707
   Average Latency(s):   0.0202177
   Max latency(s):   0.223319
   Min latency(s):   0.00589238

random read:

   # rados bench -p scbench 10 rand
   Total time run:   10.036816
   Total reads made: 8375
   Read size:4194304
   Object size:  4194304
   Bandwidth (MB/sec):   3337.71
   Average IOPS: 834
   Stddev IOPS:  78
   Max IOPS: 927
   Min IOPS: 741
   Average Latency(s):   0.0182707
   Max latency(s):   0.257397
   Min latency(s):   0.00469212

//

It's seems like that there are some bottleneck somewhere that we are 
understimating.

Can you help me to found it?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph UPDATE (not upgrade)

2017-04-27 Thread Massimiliano Cuttini

Exactly,

I don't want to upgrade CEPH. Thanks for "yum update --exclude=ceph*".
This was exactly what I needed.

However my YUM show me many packages from CEPH.
Probably these can be defined "MINOR UPDATES" and not a MAJOR UPGRADE 
from a version to another.

Is it right?

So the new question is "Can i run yum to apply minor updates or should I 
use ALWAYS ceph-deploy both for major and minor updates?"


Thanks,
Max


Il 26/04/2017 19:29, David Turner ha scritto:
He's asking how NOT to upgrade Ceph, but to update the rest of the 
packages on his system.  In Ubuntu, you have to type `apt-get 
dist-upgrade` instead of just `apt-get upgrade` when you want to 
upgrade ceph.  That becomes a problem when trying to update the 
kernel, but not too bad.  I think in CentOS you need to do something 
like `yum update --exclude=ceph*`.  You should also be able to disable 
the packages in the repo files and make it so that you have to include 
the packages to update the ceph packages.


On Wed, Apr 26, 2017 at 1:12 PM German Anders <gand...@despegar.com 
<mailto:gand...@despegar.com>> wrote:


Hi Massimiiano,

I think you best go with the upgrade process from Ceph site, take
a look at it, since you need to do it in an specific order:

1. the MONs
2. the OSDs
3. the MDS
4. the Object gateways

http://docs.ceph.com/docs/master/install/upgrading-ceph/

it's better to do it like that and get things fine :)

hope it helps,

Best,


**

*German Anders*


    2017-04-26 11:21 GMT-03:00 Massimiliano Cuttini <m...@phoenixweb.it
<mailto:m...@phoenixweb.it>>:

On a Ceph Monitor/OSD server can i run just:

*yum update -y*

in order to upgrade system and packages or did this mess up Ceph?


___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph UPDATE (not upgrade)

2017-04-26 Thread Massimiliano Cuttini

On a Ceph Monitor/OSD server can i run just:

   *yum update -y*

in order to upgrade system and packages or did this mess up Ceph?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Massimiliano Cuttini

Ah ...


Il 02/03/2017 15:56, Jason Dillaman ha scritto:

I'll refer you to the man page for blkdiscard [1]. Since it operates
on the block device, it doesn't know about filesystem holes and
instead will discard all data specified (i.e. it will delete all your
data).

[1] http://man7.org/linux/man-pages/man8/blkdiscard.8.html


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Massimiliano Cuttini



Il 02/03/2017 14:11, Jason Dillaman ha scritto:

On Thu, Mar 2, 2017 at 8:09 AM, Massimiliano Cuttini <m...@phoenixweb.it> wrote:

Ok,

then, if the command comes from the hypervisor that hold the image is it
safe?

No, it needs to be issued from the guest VM -- not the hypervisor that
is running the guest VM. The reason is that it's a black box to the
hypervisor and it won't know what sectors can be safely discarded.

This is true if you talk about the filesystem.
So the command

fstrim

would be the case for sure.
But if we talk about the block device.
The command

blkdiscard

could not run on a VM which see images as localdisks without any thin 
provisioning.

This command should be casted by the Hypervisor not the guest.

... or not?



But if the guest VM on the same Hypervisor try to using the image, what
happen?

If you trim from outside the guest, I would expect you to potentially
corrupt the image (if the fstrim tool doesn't stop you first since the
filesystem isn't mounted).


Ok make it sense on fstrim, but with blkdiscard?


Are these safe tools? (aka: safely exit with error instead of try the
command and ruin the image?).
Should I consider a snapshot before go?


As I mentioned, the only safe way to proceed would be to run the trim
from within the guest VM or wait until Ceph adds the rbd CLI tooling
to safely sparsify an image.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Massimiliano Cuttini

Ok,

then, if the command comes from the hypervisor that hold the image is it 
safe?
But if the guest VM on the same Hypervisor try to using the image, what 
happen?
Are these safe tools? (aka: safely exit with error instead of try the 
command and ruin the image?).

Should I consider a snapshot before go?




Il 02/03/2017 13:53, Jason Dillaman ha scritto:

In that case, the trim/discard requests would need to come directly
from the guest virtual machines to avoid damaging the filesystems. We
do have a backlog feature ticket [1] to allow an administrator to
transparently sparsify a in-use image via the rbd CLI, but no work has
been started on it yet.

[1] http://tracker.ceph.com/issues/13706

On Thu, Mar 2, 2017 at 5:16 AM, Massimiliano Cuttini <m...@phoenixweb.it> wrote:

Thanks Jason,

I need some further info, because I'm really worried about ruin my data.
On this pool I have only XEN virtual disks.
Did I have to run the command directly on the "pool" or on the "virtual
disks" ?

I guess that I have to run it on the pool.
As Admin I don't have access to local filesystem of the customer's virtual
disk and neither I can temporarly mount it to trim them.
Are my assumptions right?

Another info: did I need to umount the image from every device that is
actually using the image while I'm trimming it?

Thanks,
Max



Il 01/03/2017 20:11, Jason Dillaman ha scritto:

You should be able to issue an fstrim against the filesystem on top of
the nbd device or run blkdiscard against the raw device if you don't
have a filesystem.

On Wed, Mar 1, 2017 at 1:26 PM, Massimiliano Cuttini <m...@phoenixweb.it>
wrote:

Dear all,

i use the rbd-nbd connector.
Is there a way to reclaim free space from rbd image using this component
or
not?


Thanks,
Max

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Massimiliano Cuttini

Thanks Jason,

I need some further info, because I'm really worried about ruin my data.
On this pool I have only XEN virtual disks.
Did I have to run the command directly on the "pool" or on the "virtual 
disks" ?


I guess that I have to run it on the pool.
As Admin I don't have access to local filesystem of the customer's 
virtual disk and neither I can temporarly mount it to trim them.

Are my assumptions right?

Another info: did I need to umount the image from every device that is 
actually using the image while I'm trimming it?


Thanks,
Max


Il 01/03/2017 20:11, Jason Dillaman ha scritto:

You should be able to issue an fstrim against the filesystem on top of
the nbd device or run blkdiscard against the raw device if you don't
have a filesystem.

On Wed, Mar 1, 2017 at 1:26 PM, Massimiliano Cuttini <m...@phoenixweb.it> wrote:

Dear all,

i use the rbd-nbd connector.
Is there a way to reclaim free space from rbd image using this component or
not?


Thanks,
Max

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-01 Thread Massimiliano Cuttini

Dear all,

i use the rbd-nbd connector.
Is there a way to reclaim free space from rbd image using this component 
or not?



Thanks,
Max

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery ceph cluster down OS corruption

2017-02-28 Thread Massimiliano Cuttini

Exactly what happened to me!
Very gd


Il 28/02/2017 12:49, Mehmet ha scritto:
I assume this is the right Way. I had done a Disaster Recovery test a 
few months ago with Jewel on an only OSD Server. Just reinstall the OS 
an than ceph. Do not Touch the osds they will automatically Start.


Am 27. Februar 2017 10:52:56 MEZ schrieb Massimiliano Cuttini 
<m...@phoenixweb.it>:


It happens to my that OS being corrupted.
I just reinstalled the OS and deploy the monitor.
While I was going for zap and reinstal OSD I found that my OSD
were already running again.

Magically



Il 27/02/2017 10:07, Iban Cabrillo ha scritto:

Hi,

  Could I reinstall the server and try only to activate de OSD
again (without zap and prepare)?
Regards, I

2017-02-24 18:25 GMT+01:00 Iban Cabrillo <cabri...@ifca.unican.es
<mailto:cabri...@ifca.unican.es>>:

HI Eneko,
  yes the three mons are up and running.
  I do not have any other servers to plug-in these disk, but
could i reinstall the server and in some way mount the again
the osd-disk, ? I do not know the steps to do this

Regards, I

2017-02-24 14:52 GMT+01:00 Eneko Lacunza <elacu...@binovo.es
<mailto:elacu...@binovo.es>>:

Hi Iban,

Is the monitor data safe? If it is, just install jewel in
other servers and plug in the OSD disks, it should work.

El 24/02/17 a las 14:41, Iban Cabrillo escribió:

Hi,
  We have a serious issue. We have a mini cluster (jewel
version) with two server (Dell RX730), with 16Bays and
the OS intalled on dual 8 GB sd card, But this
configuration is working really really bad.


  The replication is 2, but yesterday one server crash
and this morning the other One, this is not the first
time, but others we had one server up and the data could
be replicated without any troubles, reinstalling the
osdserver completely.

  Until I understand, Ceph data and metadata is still on
bays (data on SATA and metadata on 2 fast SSDs), I think
only the OS installed on SD cards is corrupted.

  Is there any way to solve this situation?
  Any Idea will be great!!

Regards, I


-- 


Iban Cabrillo Bartolome
Instituto de Fisica de Cantabria (IFCA)
Santander, Spain
Tel: +34942200969 <tel:+34%20942%2020%2009%2069>
PGP PUBLIC KEY:
http://pgp.mit.edu/pks/lookup?op=get=0xD9DF0B3D6C8C08AC
<http://pgp.mit.edu/pks/lookup?op=get=0xD9DF0B3D6C8C08AC>


Bertrand Russell:/"El problema con el mundo es que los
estúpidos están seguros de todo y los inteligentes están
//llenos de dudas/"



___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


-- 
Zuzendari Teknikoa / Director Técnico

Binovo IT Human Project, S.L.
Telf. 943493611
   943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun 
(Gipuzkoa)
www.binovo.es <http://www.binovo.es>

___
ceph-users mailing list ceph-users@lists.ceph.com
<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> 

-- 


Iban Cabrillo Bartolome Instituto de Fisica de Cantabria
(IFCA) Santander, Spain Tel: +34942200969
<tel:+34%20942%2020%2009%2069>
PGP PUBLIC KEY:
http://pgp.mit.edu/pks/lookup?op=get=0xD9DF0B3D6C8C08AC
<http://pgp.mit.edu/pks/lookup?op=get=0xD9DF0B3D6C8C08AC>


Bertrand Russell:/"El problema con el mundo es que los
estúpidos están seguros de todo y los inteligentes están
//llenos de dudas/"

-- 


Iban Cabrillo Bartolome Instituto de Fisica de Cantabria (IFCA)
Santander,

Re: [ceph-users] krbd and kernel feature mismatches

2017-02-27 Thread Massimiliano Cuttini

Not really tested,

but searching around, many people say at moment RBD-NBD as more or less 
same perforamance.

While RBD-FUSE is really slow.

At the moment I cannot anymore test the kernel version because 
downgrading/re-upgrading CRUSH tunable will be a nightmare.


But you can try.




Il 27/02/2017 19:41, Simon Weald ha scritto:

Is there a performance hit when using rbd-nbd?

On 27/02/17 18:34, Massimiliano Cuttini wrote:

But if everybody get Kernel Mismatch (me too)

... why don't use directly rbd-nbd and forget about kernel-rbd

All feature, almost same performance.

No?




Il 27/02/2017 18:54, Ilya Dryomov ha scritto:

On Mon, Feb 27, 2017 at 6:47 PM, Shinobu Kinjo <ski...@redhat.com>
wrote:

We already discussed this:

https://www.spinics.net/lists/ceph-devel/msg34559.html

What do you think of comment posted in that ML?
Would that make sense to you as well?

Sorry, I dropped the ball on this.  I'll try to polish and push my man
page branch this week.

Thanks,

  Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] krbd and kernel feature mismatches

2017-02-27 Thread Massimiliano Cuttini

But if everybody get Kernel Mismatch (me too)

... why don't use directly rbd-nbd and forget about kernel-rbd

All feature, almost same performance.

No?




Il 27/02/2017 18:54, Ilya Dryomov ha scritto:

On Mon, Feb 27, 2017 at 6:47 PM, Shinobu Kinjo  wrote:

We already discussed this:

https://www.spinics.net/lists/ceph-devel/msg34559.html

What do you think of comment posted in that ML?
Would that make sense to you as well?

Sorry, I dropped the ball on this.  I'll try to polish and push my man
page branch this week.

Thanks,

 Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery ceph cluster down OS corruption

2017-02-27 Thread Massimiliano Cuttini

It happens to my that OS being corrupted.
I just reinstalled the OS and deploy the monitor.
While I was going for zap and reinstal OSD I found that my OSD were 
already running again.


Magically



Il 27/02/2017 10:07, Iban Cabrillo ha scritto:

Hi,

  Could I reinstall the server and try only to activate de OSD again 
(without zap and prepare)?

Regards, I

2017-02-24 18:25 GMT+01:00 Iban Cabrillo >:


HI Eneko,
  yes the three mons are up and running.
  I do not have any other servers to plug-in these disk, but could
i reinstall the server and in some way mount the again the
osd-disk, ? I do not know the steps to do this

Regards, I

2017-02-24 14:52 GMT+01:00 Eneko Lacunza >:

Hi Iban,

Is the monitor data safe? If it is, just install jewel in
other servers and plug in the OSD disks, it should work.

El 24/02/17 a las 14:41, Iban Cabrillo escribió:

Hi,
  We have a serious issue. We have a mini cluster (jewel
version) with two server (Dell RX730), with 16Bays and the OS
intalled on dual 8 GB sd card, But this configuration is
working really really bad.


  The replication is 2, but yesterday one server crash and
this morning the other One, this is not the first time, but
others we had one server up and the data could be replicated
without any troubles, reinstalling the osdserver completely.

  Until I understand, Ceph data and metadata is still on bays
(data on SATA and metadata on 2 fast SSDs), I think only the
OS installed on SD cards is corrupted.

  Is there any way to solve this situation?
  Any Idea will be great!!

Regards, I


-- 


Iban Cabrillo Bartolome
Instituto de Fisica de Cantabria (IFCA)
Santander, Spain
Tel: +34942200969 
PGP PUBLIC KEY:
http://pgp.mit.edu/pks/lookup?op=get=0xD9DF0B3D6C8C08AC



Bertrand Russell:/"El problema con el mundo es que los
estúpidos están seguros de todo y los inteligentes están
//llenos de dudas/"



___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Zuzendari Teknikoa / Director Técnico

Binovo IT Human Project, S.L.
Telf. 943493611
   943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es 

___ ceph-users
mailing list ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

-- 


Iban Cabrillo Bartolome Instituto de Fisica de Cantabria (IFCA)
Santander, Spain Tel: +34942200969 
PGP PUBLIC KEY:
http://pgp.mit.edu/pks/lookup?op=get=0xD9DF0B3D6C8C08AC


Bertrand Russell:/"El problema con el mundo es que los estúpidos
están seguros de todo y los inteligentes están //llenos de dudas/"

--
 
Iban Cabrillo Bartolome Instituto de Fisica de Cantabria (IFCA) 
Santander, Spain Tel: +34942200969
PGP PUBLIC KEY: 
http://pgp.mit.edu/pks/lookup?op=get=0xD9DF0B3D6C8C08AC 
 
Bertrand Russell:/"El problema con el mundo es que los estúpidos están 
seguros de todo y los inteligentes están //llenos de dudas/"


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph on XenServer

2017-02-26 Thread Massimiliano Cuttini

Hi Lindsay,

as far as I know, KVM stand for KERNEL VIRTUAL MACHINES.
When a VM is talking to KVM, in reality it's talking directly to kernel 
hypervisor.
There is not any software layering that is running the virtualization 
for you.

It's just the kernel.

This means: really high performance (no intermediates) but Kernel 
exposed to upstream attacks.



Il 26/02/2017 06:04, Lindsay Mathieson ha scritto:

On 26/02/2017 12:12 AM, Massimiliano Cuttini wrote:
The pity is that is based o KVM, which is as far as I know is a ligth 
hypervisor that is not able to isolate the virtual machine properly.
Due to this is possible to frozen the hypervisor kernel from a guest 
virtual machine allowing somebody to freeze all your VMs all in once.


Um ... No. KVM/Qemu is fully virtualised.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph on XenServer

2017-02-25 Thread Massimiliano Cuttini

Hi Brian,

never listen before.
However seems nice and fully featured.

The pity is that is based o KVM, which is as far as I know is a ligth 
hypervisor that is not able to isolate the virtual machine properly.
Due to this is possible to frozen the hypervisor kernel from a guest 
virtual machine allowing somebody to freeze all your VMs all in once.
Xen is completly isolated (of course not as light as KVM). You can 
freeze a VMs but other could not be affected in anyway.

Security should not considered an option.

Thanks anyway,
Max


Il 25/02/2017 14:02, Brian : ha scritto:

Hi Max,

Have you considered Proxmox at all? Nicely integrates with Ceph 
storage. I moved from Xenserver longtime ago and have no regrets.


Thanks
Brians

On Sat, Feb 25, 2017 at 12:47 PM, Massimiliano Cuttini 
<m...@phoenixweb.it <mailto:m...@phoenixweb.it>> wrote:


Hi Iban,

you are running xen (just the software) not xenserver (ad hoc
linux distribution).
Xenserver is a linux distribution based on CentOS.
You cannot recompile the kernel by your-own (... well, you can do,
but it's not a good idea).
And you should not install rpm by your-own (... but, i'm gonna do
this).

Then I'm stuck with some plugin and just some rpm compatible with
kernel 3.10.0-514.6.1.el7.x86_64
However I found a way to install more or less the Ceph Client.

I have forked the plugin of https://github.com/rposudnevskiy/RBDSR
<https://github.com/rposudnevskiy/RBDSR> It seems the most update.
Here you can find my findings and updates:
https://github.com/phoenixweb/RBDSR
<https://github.com/phoenixweb/RBDSR>
However at the moment I'm stuck with a kernel unsupported feature. :(
Let's see if I can let this work.

Thanks for your sharings
Max





Il 24/02/2017 18:46, Iban Cabrillo ha scritto:

Hi Massimiliano,
  We are running CEPH agains our openstack instance running Xen:

ii  xen-hypervisor-4.6-amd64 4.6.0-1ubuntu4.3  
 amd64  Xen Hypervisor on AMD64

ii  xen-system-amd64 4.6.0-1ubuntu4.1amd64
 Xen System on AMD64 (meta-package)
ii  xen-utils-4.6  4.6.0-1ubuntu4.3amd64  XEN
administrative tools
ii  xen-utils-common 4.6.0-1ubuntu4.3all  Xen
administrative tools - common files
ii  xenstore-utils 4.6.0-1ubuntu4.1amd64
 Xenstore command line utilities for Xen


2017-02-24 15:52 GMT+01:00 Massimiliano Cuttini
<m...@phoenixweb.it <mailto:m...@phoenixweb.it>>:

Dear all,

even if Ceph should be officially supported by Xen since 4 years.

  * 
http://xenserver.org/blog/entry/tech-preview-of-xenserver-libvirt-ceph.html

<http://xenserver.org/blog/entry/tech-preview-of-xenserver-libvirt-ceph.html>
  * https://ceph.com/geen-categorie/xenserver-support-for-rbd/
<https://ceph.com/geen-categorie/xenserver-support-for-rbd/>

rbd is supported on libvirt 1.3.2, but It has to be recompiled the

ii  libvirt-bin 1.3.2-0~15.10~ amd64  programs for the
libvirt library
ii  libvirt0:amd64   1.3.2-0~15.10~ amd64library
for interfacing with different virtualization systems

Still there is no support yet.

At this point there are only some self-made plugin and
solution around.
Here some:

  * https://github.com/rposudnevskiy/RBDSR
<https://github.com/rposudnevskiy/RBDSR>
  * https://github.com/mstarikov/rbdsr
<https://github.com/mstarikov/rbdsr>
  * https://github.com/FlorianHeigl/xen-ceph-rbd
<https://github.com/FlorianHeigl/xen-ceph-rbd>

Nobody know how much they are compatible or if they are gonna
to break Xen the next update.

The ugly truth is that XEN is not still able to fully support
Ceph and we can only pray that one of the plugin above will
not destroy our precious data or VDI.

Does anybody had some experience with the plugin above?
Which one you'll reccomend?
Is there any good installation guide to let this work correctly?
Can I just install Ceph with ceph-deploy or do I have to
unlock repos on the xenserver and instal it manuall, Thanks
for any kind of support.

Attaching volumes from openstack portal (or using cinder volume
manager), has been working fine since 6 months ago, but I do not
know what will happen on next updates


Regards,
Max



___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>




___

Re: [ceph-users] Ceph on XenServer

2017-02-25 Thread Massimiliano Cuttini

Hi Andrei,

i don't think so.
The future way to support Ceph in Xencenter is the kernel.

Xencenter is based on  centOS, and centOS is the downstream of RHE.
This means that some day in the future the kernel of RHE will be already 
compiled to completly support RADOS.

At that time having ceph working will be easy as having NFS working.

Xen is just the software and work already with CEPH if you setup a 
server by yourself with some linux distribution.

XenServer is the adhoc Linux distribution, with a kernel personalized.
Which makes install 3rd parts componets harder.

The fact that is based on CentOS is why more-or-less you can let this work.
However nobody know why should be as painfull as this.
Ceph is already well supported by all the  main Linux distribution.
In xenserver it seems that they just forgot to includes some RPMs and 
don't want to let you use it.
(while 4 years ago claimed that they are goona to support it in the near 
future).


This is so bad.

Andrei, are you using Ceph at the moment?




Il 24/02/2017 18:17, Andrei Mikhailovsky ha scritto:

Hi Max,

I've played around with ceph on xenserver about 2-3 years ago. I made 
it work, but it was all hackish and a lot of manual work. It didn't 
play well with the cloud orchestrator and I gave up hoping that either 
Citrix or Ceph team would make it work. Currently, I would not 
recommend using it in production. There was very little done since and 
I am not sure if it is even safe to use it on the current versions of 
ceph. You should be able to run nfs over ceph and use ceph this way, 
but I am not sure how well it would perform.


Taking into account that ceph has been converted into the redhat 
house, I doubt that they will play nicely with xenserver. Please 
correct my way of thinking, but my guess is that from RH point of view 
all the efforts would go to make ceph work well in kvm environment and 
forget about other hypervisors. From the xenserver point of view, I 
doubt people will be taking it seriously until ceph proves itself to 
be rock solid, or have commercial development backing, which so far, 
doesn't seem like the case. Just reading about issues with ceph during 
install, updates, usage, etc. on this thread tells me that ceph is 
still rough and needs to be polished.


Andrei



*From: *"Massimiliano Cuttini" <m...@phoenixweb.it>
*To: *"ceph-users" <ceph-users@lists.ceph.com>
*Sent: *Friday, 24 February, 2017 14:52:37
*Subject: *[ceph-users] Ceph on XenServer

Dear all,

even if Ceph should be officially supported by Xen since 4 years.

  * 
http://xenserver.org/blog/entry/tech-preview-of-xenserver-libvirt-ceph.html
  * https://ceph.com/geen-categorie/xenserver-support-for-rbd/

Still there is no support yet.

At this point there are only some self-made plugin and solution
around.
Here some:

  * https://github.com/rposudnevskiy/RBDSR
  * https://github.com/mstarikov/rbdsr
  * https://github.com/FlorianHeigl/xen-ceph-rbd

Nobody know how much they are compatible or if they are gonna to
break Xen the next update.

The ugly truth is that XEN is not still able to fully support Ceph
and we can only pray that one of the plugin above will not destroy
our precious data or VDI.

Does anybody had some experience with the plugin above?
Which one you'll reccomend?
Is there any good installation guide to let this work correctly?
Can I just install Ceph with ceph-deploy or do I have to unlock
repos on the xenserver and instal it manually?

Thanks for any kind of support.

Regards,
Max



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph on XenServer

2017-02-25 Thread Massimiliano Cuttini

Hi Iban,

you are running xen (just the software) not xenserver (ad hoc linux 
distribution).

Xenserver is a linux distribution based on CentOS.
You cannot recompile the kernel by your-own (... well, you can do, but 
it's not a good idea).

And you should not install rpm by your-own (... but, i'm gonna do this).

Then I'm stuck with some plugin and just some rpm compatible with kernel 
3.10.0-514.6.1.el7.x86_64

However I found a way to install more or less the Ceph Client.

I have forked the plugin of https://github.com/rposudnevskiy/RBDSR 
<https://github.com/rposudnevskiy/RBDSR> It seems the most update.
Here you can find my findings and updates: 
https://github.com/phoenixweb/RBDSR

However at the moment I'm stuck with a kernel unsupported feature. :(
Let's see if I can let this work.

Thanks for your sharings
Max




Il 24/02/2017 18:46, Iban Cabrillo ha scritto:

Hi Massimiliano,
  We are running CEPH agains our openstack instance running Xen:

ii  xen-hypervisor-4.6-amd64 4.6.0-1ubuntu4.3 
 amd64Xen Hypervisor on AMD64
ii  xen-system-amd64 4.6.0-1ubuntu4.1 
 amd64Xen System on AMD64 (meta-package)
ii  xen-utils-4.64.6.0-1ubuntu4.3 
 amd64XEN administrative tools
ii  xen-utils-common 4.6.0-1ubuntu4.3 
 all  Xen administrative tools - common files
ii  xenstore-utils   4.6.0-1ubuntu4.1 
 amd64Xenstore command line utilities for Xen



2017-02-24 15:52 GMT+01:00 Massimiliano Cuttini <m...@phoenixweb.it 
<mailto:m...@phoenixweb.it>>:


Dear all,

even if Ceph should be officially supported by Xen since 4 years.

  * 
http://xenserver.org/blog/entry/tech-preview-of-xenserver-libvirt-ceph.html

<http://xenserver.org/blog/entry/tech-preview-of-xenserver-libvirt-ceph.html>
  * https://ceph.com/geen-categorie/xenserver-support-for-rbd/
<https://ceph.com/geen-categorie/xenserver-support-for-rbd/>

rbd is supported on libvirt 1.3.2, but It has to be recompiled the

ii  libvirt-bin   1.3.2-0~15.10~ amd64programs for 
the libvirt library
ii  libvirt0:amd64   1.3.2-0~15.10~ amd64library for 
interfacing with different virtualization systems


Still there is no support yet.

At this point there are only some self-made plugin and solution
around.
Here some:

  * https://github.com/rposudnevskiy/RBDSR
<https://github.com/rposudnevskiy/RBDSR>
  * https://github.com/mstarikov/rbdsr
<https://github.com/mstarikov/rbdsr>
  * https://github.com/FlorianHeigl/xen-ceph-rbd
<https://github.com/FlorianHeigl/xen-ceph-rbd>

Nobody know how much they are compatible or if they are gonna to
break Xen the next update.

The ugly truth is that XEN is not still able to fully support Ceph
and we can only pray that one of the plugin above will not destroy
our precious data or VDI.

Does anybody had some experience with the plugin above?
Which one you'll reccomend?
Is there any good installation guide to let this work correctly?
Can I just install Ceph with ceph-deploy or do I have to unlock
repos on the xenserver and instal it manuall, Thanks for any kind
of support.

Attaching volumes from openstack portal (or using cinder volume 
manager), has been working fine since 6 months ago, but I do not know 
what will happen on next updates



Regards,
Max



___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph on XenServer

2017-02-24 Thread Massimiliano Cuttini

Dear all,

even if Ceph should be officially supported by Xen since 4 years.

 * http://xenserver.org/blog/entry/tech-preview-of-xenserver-libvirt-ceph.html
 * https://ceph.com/geen-categorie/xenserver-support-for-rbd/

Still there is no support yet.

At this point there are only some self-made plugin and solution around.
Here some:

 * https://github.com/rposudnevskiy/RBDSR
 * https://github.com/mstarikov/rbdsr
 * https://github.com/FlorianHeigl/xen-ceph-rbd

Nobody know how much they are compatible or if they are gonna to break 
Xen the next update.


The ugly truth is that XEN is not still able to fully support Ceph and 
we can only pray that one of the plugin above will not destroy our 
precious data or VDI.


Does anybody had some experience with the plugin above?
Which one you'll reccomend?
Is there any good installation guide to let this work correctly?
Can I just install Ceph with ceph-deploy or do I have to unlock repos on 
the xenserver and instal it manually?


Thanks for any kind of support.

Regards,
Max


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-25 Thread Massimiliano Cuttini

Hi travis,

can I have a develop account or tester account in order to submit issue 
by myself?


Thanks,
Massimiliano Cuttini


Il 18/11/2014 23:03, Travis Rhoden ha scritto:

I've captured this at http://tracker.ceph.com/issues/10133

On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com 
mailto:trho...@gmail.com wrote:


Hi Massimiliano,

I just recreated this bug myself.  Ceph-deploy is supposed to
install EPEL automatically on the platforms that need it.  I just
confirmed that it is not doing so, and will be opening up a bug in
the Ceph tracker.  I'll paste it here when I do so you can follow
it.  Thanks for the report!

 - Travis

On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini
m...@phoenixweb.it mailto:m...@phoenixweb.it wrote:

I solved by installing EPEL repo on yum.
I think that somebody should write down in the documentation
that EPEL is mandatory



Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto:

Dear all,

i try to install ceph but i get errors:

#ceph-deploy install node1
[]
[ceph_deploy.install][DEBUG ] Installing stable version
*firefly *on cluster ceph hosts node1
[ceph_deploy.install][DEBUG ] Detecting platform for host
node1 ...
[]
[node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64
0:1.1.3-2.1.el7 settato per essere installato
[node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64
0:9.2.5-6.20131218.el7_0 settato per essere installato
[node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64
0:9.2.5-6.20131218.el7_0 settato per essere installato
[node1][DEBUG ] -- Risoluzione delle dipendenze completata
[node1][WARNIN] Errore: Pacchetto:
ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)
[node1][WARNIN] Richiede:
libtcmalloc.so.4()(64bit)
[node1][WARNIN] Errore: Pacchetto:
ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
[node1][DEBUG ]  Si può provare ad usare --skip-broken
per aggirare il problema
[node1][WARNIN] Richiede:
libleveldb.so.1()(64bit)
[node1][WARNIN] Errore: Pacchetto:
ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
[node1][WARNIN] Richiede:
libtcmalloc.so.4()(64bit)
[node1][DEBUG ]  Provare ad eseguire: rpm -Va --nofiles
--nodigest
[node1][ERROR ] RuntimeError: command returned non-zero
exit status: 1
*[ceph_deploy][ERROR ] RuntimeError: Failed to execute
command: yum -y install ceph*

I installed GIANT version not FIREFLY on admin-node.
Is it a typo error in the config file or is it truly trying
to install FIREFLY instead of GIANT.

About the error, i see that it's related to wrong python
default libraries.
It seems that CEPH require libraries not available in the
current distro:

[node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
[node1][WARNIN] Richiede:
libleveldb.so.1()(64bit)
[node1][WARNIN] Richiede:
libtcmalloc.so.4()(64bit)

This seems strange.
Can you fix this?


Thanks,
Massimiliano Cuttini





___
ceph-users mailing list
ceph-users@lists.ceph.com  mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy osd activate Hang - (doc followed step by step)

2014-11-24 Thread Massimiliano Cuttini

Everytime i try to create a second OSD i get this hang:

   $ ceph-deploy osd activate ceph-node2:/var/local/osd1
   [cut ...]
   [ceph_deploy.cli][INFO  ] Invoked (1.5.20): /usr/bin/ceph-deploy osd
   activate ceph-node2:/var/local/osd1
   [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
   ceph-node2:/var/local/osd1:
   [ceph-node2][DEBUG ] connection detected need for sudo
   [ceph-node2][DEBUG ] connected to host: ceph-node2
   [ceph-node2][DEBUG ] detect platform information from remote host
   [ceph-node2][DEBUG ] detect machine type
   [ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.0.1406 Core
   [ceph_deploy.osd][DEBUG ] activating host ceph-node2 disk
   /var/local/osd1
   [ceph_deploy.osd][DEBUG ] will use init type: sysvinit
   [ceph-node2][INFO  ] Running command: sudo ceph-disk -v activate
   --mark-init sysvinit --mount /var/local/osd1
   [ceph-node2][WARNIN] DEBUG:ceph-disk:Cluster uuid is
   9f774eb5-e430-4d38-a470-e39d76c98c2b
   [ceph-node2][WARNIN] INFO:ceph-disk:Running command:
   /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
   [ceph-node2][WARNIN] DEBUG:ceph-disk:Cluster name is ceph
   [ceph-node2][WARNIN] DEBUG:ceph-disk:OSD uuid is
   b8d7c3c1-d436-4f52-8b3d-05a9d4af64ba
   [ceph-node2][WARNIN] DEBUG:ceph-disk:Allocating OSD id...
   [ceph-node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph
   --cluster ceph --name client.bootstrap-osd --keyring
   /var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise
   b8d7c3c1-d436-4f52-8b3d-05a9d4af64ba
   *[ceph-node2][WARNIN] 2014-11-24 18:39:35.259728 7f95207c1700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f951c026300 sd=4 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f951c026590).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:38.259891 7f95206c0700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f951c00 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f951e90).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:41.260031 7f95207c1700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f95100030e0 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510003370).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:44.260242 7f95206c0700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f9510003a60 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510003cf0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:47.260442 7f95207c1700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f9510002510 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f95100027a0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:50.260763 7f95206c0700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f9510003a60 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510003cf0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:53.260952 7f95207c1700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f9510002510 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f95100027a0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:56.261208 7f95206c0700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f9510003fb0 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510004240).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:59.261422 7f95207c1700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f9510004830 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510004ac0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:40:02.261659 7f95206c0700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f9510005f40 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f95100061d0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:40:05.261885 7f95207c1700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f95100092d0 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510009560).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:40:08.262083 7f95206c0700  0 --
   :/1017705  172.20.20.105:6789/0 pipe(0x7f9510006010 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510006490).fault*

... and so on

Can somebody help me to exit from this problem?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-18 Thread Massimiliano Cuttini

Dear all,

i try to install ceph but i get errors:

   #ceph-deploy install node1
   []
   [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on
   cluster ceph hosts node1
   [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ...
   []
   [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7
   settato per essere installato
   [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64
   0:9.2.5-6.20131218.el7_0 settato per essere installato
   [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64
   0:9.2.5-6.20131218.el7_0 settato per essere installato
   [node1][DEBUG ] -- Risoluzione delle dipendenze completata
   [node1][WARNIN] Errore: Pacchetto:
   ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)
   [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
   [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64
   (Ceph)
   [node1][DEBUG ]  Si può provare ad usare --skip-broken per aggirare
   il problema
   [node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
   [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64
   (Ceph)
   [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
   [node1][DEBUG ]  Provare ad eseguire: rpm -Va --nofiles --nodigest
   [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1
   *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum
   -y install ceph*

I installed GIANT version not FIREFLY on admin-node.
Is it a typo error in the config file or is it truly trying to install 
FIREFLY instead of GIANT.


About the error, i see that it's related to wrong python default libraries.
It seems that CEPH require libraries not available in the current distro:

   [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
   [node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
   [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)

This seems strange.
Can you fix this?


Thanks,
Massimiliano Cuttini



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-18 Thread Massimiliano Cuttini

I solved by installing EPEL repo on yum.
I think that somebody should write down in the documentation that EPEL 
is mandatory




Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto:

Dear all,

i try to install ceph but i get errors:

#ceph-deploy install node1
[]
[ceph_deploy.install][DEBUG ] Installing stable version *firefly
*on cluster ceph hosts node1
[ceph_deploy.install][DEBUG ] Detecting platform for host node1 ...
[]
[node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7
settato per essere installato
[node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64
0:9.2.5-6.20131218.el7_0 settato per essere installato
[node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64
0:9.2.5-6.20131218.el7_0 settato per essere installato
[node1][DEBUG ] -- Risoluzione delle dipendenze completata
[node1][WARNIN] Errore: Pacchetto:
ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)
[node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
[node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64
(Ceph)
[node1][DEBUG ]  Si può provare ad usare --skip-broken per
aggirare il problema
[node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
[node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64
(Ceph)
[node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
[node1][DEBUG ]  Provare ad eseguire: rpm -Va --nofiles --nodigest
[node1][ERROR ] RuntimeError: command returned non-zero exit status: 1
*[ceph_deploy][ERROR ] RuntimeError: Failed to execute command:
yum -y install ceph*

I installed GIANT version not FIREFLY on admin-node.
Is it a typo error in the config file or is it truly trying to install 
FIREFLY instead of GIANT.


About the error, i see that it's related to wrong python default 
libraries.

It seems that CEPH require libraries not available in the current distro:

[node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
[node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
[node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)

This seems strange.
Can you fix this?


Thanks,
Massimiliano Cuttini





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-18 Thread Massimiliano Cuttini

Then.
...very good! :)

Ok, the next bad thing is that I have installed GIANT on Admin node.
However ceph-deploy ignore ADMIN node installation and install FIREFLY.
Now i have ceph-deploy of Giant on my ADMIN node and my first OSD node 
with FIREFLY.

It seems to me odd. Is it fine or i should prepare myself to format again?



Il 18/11/2014 23:03, Travis Rhoden ha scritto:

I've captured this at http://tracker.ceph.com/issues/10133

On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com 
mailto:trho...@gmail.com wrote:


Hi Massimiliano,

I just recreated this bug myself.  Ceph-deploy is supposed to
install EPEL automatically on the platforms that need it.  I just
confirmed that it is not doing so, and will be opening up a bug in
the Ceph tracker.  I'll paste it here when I do so you can follow
it.  Thanks for the report!

 - Travis

On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini
m...@phoenixweb.it mailto:m...@phoenixweb.it wrote:

I solved by installing EPEL repo on yum.
I think that somebody should write down in the documentation
that EPEL is mandatory



Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto:

Dear all,

i try to install ceph but i get errors:

#ceph-deploy install node1
[]
[ceph_deploy.install][DEBUG ] Installing stable version
*firefly *on cluster ceph hosts node1
[ceph_deploy.install][DEBUG ] Detecting platform for host
node1 ...
[]
[node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64
0:1.1.3-2.1.el7 settato per essere installato
[node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64
0:9.2.5-6.20131218.el7_0 settato per essere installato
[node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64
0:9.2.5-6.20131218.el7_0 settato per essere installato
[node1][DEBUG ] -- Risoluzione delle dipendenze completata
[node1][WARNIN] Errore: Pacchetto:
ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)
[node1][WARNIN] Richiede:
libtcmalloc.so.4()(64bit)
[node1][WARNIN] Errore: Pacchetto:
ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
[node1][DEBUG ]  Si può provare ad usare --skip-broken
per aggirare il problema
[node1][WARNIN] Richiede:
libleveldb.so.1()(64bit)
[node1][WARNIN] Errore: Pacchetto:
ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
[node1][WARNIN] Richiede:
libtcmalloc.so.4()(64bit)
[node1][DEBUG ]  Provare ad eseguire: rpm -Va --nofiles
--nodigest
[node1][ERROR ] RuntimeError: command returned non-zero
exit status: 1
*[ceph_deploy][ERROR ] RuntimeError: Failed to execute
command: yum -y install ceph*

I installed GIANT version not FIREFLY on admin-node.
Is it a typo error in the config file or is it truly trying
to install FIREFLY instead of GIANT.

About the error, i see that it's related to wrong python
default libraries.
It seems that CEPH require libraries not available in the
current distro:

[node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
[node1][WARNIN] Richiede:
libleveldb.so.1()(64bit)
[node1][WARNIN] Richiede:
libtcmalloc.so.4()(64bit)

This seems strange.
Can you fix this?


Thanks,
Massimiliano Cuttini





___
ceph-users mailing list
ceph-users@lists.ceph.com  mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Admin Node Best Practices

2014-10-31 Thread Massimiliano Cuttini

Any hint?


Il 30/10/2014 15:22, Massimiliano Cuttini ha scritto:

Dear Ceph users,

I just received 2 fresh new servers and i'm starting to develop my 
Ceph Cluster.
The first step is: create the admin node in order to controll all the 
cluster by remote.
I have a big cluster of XEN servers and I'll setup there a new VM only 
for this.

I need some info:
1) As far as i know admin-node need only to deploy, it doesn't support 
any kind of service. Is it so or i missed something?
2) All my servers for the OSD nodes will be CENTOS7. Then do I need to 
setup the admin-node with the same OS or i can mix-up?
3) Can I delete the admin-node in the future and recreate it whenever 
i need it  or there are some unique informations (such keys) 
that i need always to preserve?

4) is it good having more than 1 ADMIN NODE or completly useless?
5) do you have some best practice to share? :)

Thanks,
Max


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Admin Node Best Practices

2014-10-30 Thread Massimiliano Cuttini

Dear Ceph users,

I just received 2 fresh new servers and i'm starting to develop my Ceph 
Cluster.
The first step is: create the admin node in order to controll all the 
cluster by remote.
I have a big cluster of XEN servers and I'll setup there a new VM only 
for this.

I need some info:
1) As far as i know admin-node need only to deploy, it doesn't support 
any kind of service. Is it so or i missed something?
2) All my servers for the OSD nodes will be CENTOS7. Then do I need to 
setup the admin-node with the same OS or i can mix-up?
3) Can I delete the admin-node in the future and recreate it whenever i 
need it  or there are some unique informations (such keys) that i 
need always to preserve?

4) is it good having more than 1 ADMIN NODE or completly useless?
5) do you have some best practice to share? :)

Thanks,
Max
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network hardware recommendations

2014-10-08 Thread Massimiliano Cuttini

If you want to build up with Viatta.
And this give you the possibility to have a fully feature OS.
What kind of hardware would you use to build up a switch?


Il 08/10/2014 09:10, Christian Balzer ha scritto:

On Wed, 08 Oct 2014 00:45:06 + Scott Laird wrote:


IIRC, one thing to look out for is that there are two ways to do IP over
Infiniband.  You can either do IP over Infiniband directly (IPoIB), or
encapsulate Ethernet in Infiniband (EoIB), and then do IP over the fake
Ethernet network.

IPoIB is more common, but I'd assume that IB-Ethernet bridges really
only bridge EoIB.


Most of them do indeed, alas the 4036E supposedly has a FPGA based IPoIB
to Ethernet gateway.

Probably another reason to build your own, aside from the price tag. ^o^

Christian

On Tue Oct 07 2014 at 5:34:57 PM Christian Balzer ch...@gol.com wrote:


On Tue, 07 Oct 2014 20:40:31 + Scott Laird wrote:


I've done this two ways in the past.  Either I'll give each machine
an Infiniband network link and a 1000baseT link and use the
Infiniband one as the private network for Ceph, or I'll throw an
Infiniband card into a PC and run something like Vyatta/VyOS on it
and make it a router, so IP traffic can get out of the IB network.
Of course, those have both been for test labs.  YMMV.


That.

Of course in a production environment you would want something with 2
routers in a failover configuration.
And there are switches/gateways that combine IB and Ethernet, but they
tend to be not so cheap. ^^

More below.


On Tue Oct 07 2014 at 11:05:23 AM Massimiliano Cuttini
m...@phoenixweb.it wrote:


  Hi Christian,

  When you say 10 gig infiniband, do you mean QDRx4 Infiniband
(usually flogged as 40Gb/s even though it is 32Gb/s, but who's
counting), which tends to be the same basic hardware as the 10Gb/s
Ethernet offerings from Mellanox?

A brand new 18 port switch of that caliber will only cost about
180$ per port, too.



I investigate about infiniband but i didn't found affordable
prices at all.

Then you're doing it wrong or comparing apples to oranges (you of
course need to compare IB switches to similar 10GbE ones).
And the prices of HCA (aka network cards in the servers) and cabling.


Moreover how do you connect your *l**egacy node servers* to your
*brand new storages* if you have Infiniband only on storages 
switches? Is there any mixed switch that allow you both to connect
with Infiniband and Ethernet?

If there is, please send specs because i cannot find just by
google it.


The moment you type in infiniband et google will already predict
amongst other pertinent things infiniband ethernet gateway and
infiniband ethernet bridge.
But even infiniband ethernet switch has a link telling you pretty
much what was said here now at the 6th position:
http://www.tomshardware.com/forum/44997-42-connect-
infiniband-switch-ethernet

Christian

Thanks,
Max

  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network hardware recommendations

2014-10-08 Thread Massimiliano Cuttini




Il 08/10/2014 14:39, Nathan Stratton ha scritto:
On Wed, Oct 8, 2014 at 8:15 AM, Massimiliano Cuttini 
m...@phoenixweb.it mailto:m...@phoenixweb.it wrote:


If you want to build up with Viatta.
And this give you the possibility to have a fully feature OS.
What kind of hardware would you use to build up a switch?


Hard to beat the Quanta T3048-LY2, 48 10 gig, 4 40 gig. Same chip as 
Cisco, Dell, HP, etc. Like I said, merchant silicon and white box 
switches are the wave of the future. You can use Quanta OS or what I 
recommend is get one with the ONIE bootloader, then you can put 
Cumulus software on it for more features of if you much more daring 
flash it with BigSwitch and go OpenFlow.



BigSwitch is better than Viatta or just something different?
I'm building a OpenStack+Ceph solutions, than BigSwitch seems to fit more.
About the Quanta that you suggest... whell WOW!
I see also the top solution: T5032-LY6 and even this one is affordable 
(just $7200).

About Infiniband, what kind of white switch would you suggest?

Thank you Nathan for share
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network hardware recommendations

2014-10-07 Thread Massimiliano Cuttini

Hi Christian,

When you say 10 gig infiniband, do you mean QDRx4 Infiniband (usually
flogged as 40Gb/s even though it is 32Gb/s, but who's counting), which
tends to be the same basic hardware as the 10Gb/s Ethernet offerings from
Mellanox?

A brand new 18 port switch of that caliber will only cost about 180$ per
port, too.



I investigate about infiniband but i didn't found affordable prices at all
Moreover how do you connect your /l//egacy node servers/ to your /brand 
new storages/ if you have Infiniband only on storages  switches?
Is there any mixed switch that allow you both to connect with Infiniband 
and Ethernet?


If there is, please send specs because i cannot find just by google it.

Thanks,
Max

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD - choose the right controller card, HBA/IT mode explanation

2014-10-03 Thread Massimiliano Cuttini


Il 02/10/2014 17:24, Christian Balzer ha scritto:

On Thu, 02 Oct 2014 12:20:06 +0200 Massimiliano Cuttini wrote:

Il 02/10/2014 03:18, Christian Balzer ha scritto:

On Wed, 01 Oct 2014 20:12:03 +0200 Massimiliano Cuttini wrote:

Hello Christian,

Il 01/10/2014 19:20, Christian Balzer ha scritto:

Hello,

On Wed, 01 Oct 2014 18:26:53 +0200 Massimiliano Cuttini wrote:


Dear all,

i need few tips about Ceph best solution for driver controller.
I'm getting confused about IT mode, RAID and JBoD.
I read many posts about don't go for RAID but use instead a JBoD
configuration.

I have 2 storage alternatives right now in my mind:

   *SuperStorage Server 2027R-E1CR24L*
   which use SAS3 via LSI 3008 AOC; IT Mode/Pass-through
   http://www.supermicro.nl/products/system/2U/2027/SSG-2027R-E1CR24L.cfm

and

   *SuperStorage Server 2027R-E1CR24N*
   which use SAS3 via LSI 3108 SAS3 AOC (in RAID mode?)
   http://www.supermicro.nl/products/system/2U/2027/SSG-2027R-E1CR24N.cfm


Firstly, both of these use an expander backplane.
So if you're planning on putting SSDs in there (even if just like 6
for journals) you may be hampered by that.
The Supermicro homepage is vague as usual and the manual doesn't
actually have a section for that backplane. I guess it will be a
4link connection, so 4x12Gb/s aka 4.8 GB/s.
If the disks all going to be HDDs you're OK, but keep that bit in
mind.

ok i was thinking about connect 24 SSD disks connected with SATA3
(6Gbps). This is why i choose a 8x SAS3 port LSI card that use double
PCI 3.0 connection, that support even (12Gbps).
This should allow me to use the full speed of the SSD (i guess).


Given the SSD speeds you cite below, SAS2 aka SATA3 would do, too.
And of course be cheaper.

Also what SSDs are you planning to deploy?

I would go with bulk of cheap consumer SSD.
I just need to perform better than HDDs, and that's all.
Everything better is just fine.

Bad idea.
Read the current SSD MTBF thread.
If your cluster is even remotely busy cheap consumer SSDs will cost you
more than top end Enterprise ones in a short time (TBW/$).
And they are so unpredictable and likely to fail that a replication of 2
is going to be very risky proposition, so increasing your cost by 1/3rd
anyway if you really care about reliability.
I read the SSD MTBF post and i don't agree to point that cheap SSD are 
bad (as I wrote).

The problem is not related to cheap or not, but to the size of the disk.
Having everyday 50Gb of data written on a 100Gb SSD or on a 1Tb SSD it's 
completly different.
The 1st solution will last just half a year, the second will last 5 
years (of course they are both cheap).
SSD have no unpredictable failure, they are not mechanic, they just end 
their life-cicle in a predeterminated number of writes.

Just take more space and you get more writes.
Take a SSD of 100Gb, both commercial or enterprise, is just silly IMOH.


If you can't afford a cluster made entirely of SSDs, a typical HDDs with
SSDs for journal mix is probably going to be fast enough.

Ceph at this point in time can't utilize the potential of a pure SSD
cluster anyway, see the:
[Single OSD performance on SSD] Can't go over 3,2K IOPS
thread.
Ok... this is a good point: why spend a lot if you will not get 
performance anyway?

I definitly have to take into account this reccomandation.

I made this analysis:
- Total output: 8x12 = 96Gbps full speed available on the PCI3.0

That's the speed/capacity of the controller.

I'm talking about the actual backplane, where drives plug in.
And that is connected either by one cable  (and thus 48Gb/s) or two
(and thus the 96GB/s you're expecting), the documentation is unclear
on the homepage and not in the manual of that server. Digging around I
found http://www.supermicro.com.tw/manuals/other/BPN-SAS3-216EL.pdf
which suggests two ports, so your basic assumptions are correct.

This is what is wrote for the backpane: One SATA backplane
(BPN-SAS3-216EL1) /SAS3 2.5 drive slots and 4x mini-SAS3 HD connectors
for SAS3 uplink/downlink//
//It support 4x port mini SAS3 HD connector.//
//This because there are somebody that will buy a AOC LSI card to speed
up further the backpane.//
/
It understood that support 1 or 2 expander card, each one with 4x mini
SAS3 cable.
2 cards daughter cards to have failover on the backplane (however this
storage come with just 1 port).
Then should be 4x 12Gb/s ? I'm getting confused.


No, read that PDF closely.
The single expander card of that server backplane has 2 uplink ports. Each
port usually (and in this case pretty much certainly) has 4 lanes at
12Gb/s each.

Definitly thank you! I'm not a hardware guru and i couldn't understood that.
Thanks you heartened me! :)


But verify that with your Supermicro vendor and read up about SAS/SATA
expanders.

If you want/need full speed, the only option with Supermicro seems to
be
http://www.supermicro.com.tw/products/chassis/2U/216/SC216BAC-R920LP.cfm
at this time for SAS3.

That backplane (BPN

Re: [ceph-users] OSD - choose the right controller card, HBA/IT mode explanation

2014-10-02 Thread Massimiliano Cuttini


Il 02/10/2014 03:18, Christian Balzer ha scritto:

On Wed, 01 Oct 2014 20:12:03 +0200 Massimiliano Cuttini wrote:


Hello Christian,


Il 01/10/2014 19:20, Christian Balzer ha scritto:

Hello,

On Wed, 01 Oct 2014 18:26:53 +0200 Massimiliano Cuttini wrote:


Dear all,

i need few tips about Ceph best solution for driver controller.
I'm getting confused about IT mode, RAID and JBoD.
I read many posts about don't go for RAID but use instead a JBoD
configuration.

I have 2 storage alternatives right now in my mind:

  *SuperStorage Server 2027R-E1CR24L*
  which use SAS3 via LSI 3008 AOC; IT Mode/Pass-through
  http://www.supermicro.nl/products/system/2U/2027/SSG-2027R-E1CR24L.cfm

and

  *SuperStorage Server 2027R-E1CR24N*
  which use SAS3 via LSI 3108 SAS3 AOC (in RAID mode?)
  http://www.supermicro.nl/products/system/2U/2027/SSG-2027R-E1CR24N.cfm


Firstly, both of these use an expander backplane.
So if you're planning on putting SSDs in there (even if just like 6 for
journals) you may be hampered by that.
The Supermicro homepage is vague as usual and the manual doesn't
actually have a section for that backplane. I guess it will be a 4link
connection, so 4x12Gb/s aka 4.8 GB/s.
If the disks all going to be HDDs you're OK, but keep that bit in mind.
   

ok i was thinking about connect 24 SSD disks connected with SATA3
(6Gbps). This is why i choose a 8x SAS3 port LSI card that use double
PCI 3.0 connection, that support even (12Gbps).
This should allow me to use the full speed of the SSD (i guess).


Given the SSD speeds you cite below, SAS2 aka SATA3 would do, too.
And of course be cheaper.

Also what SSDs are you planning to deploy?

I would go with bulk of cheap consumer SSD.
I just need to perform better than HDDs, and that's all.
Everything better is just fine.

I made this analysis:
- Total output: 8x12 = 96Gbps full speed available on the PCI3.0

That's the speed/capacity of the controller.

I'm talking about the actual backplane, where drives plug in.
And that is connected either by one cable  (and thus 48Gb/s) or two (and
thus the 96GB/s you're expecting), the documentation is unclear on the
homepage and not in the manual of that server. Digging around I found
http://www.supermicro.com.tw/manuals/other/BPN-SAS3-216EL.pdf
which suggests two ports, so your basic assumptions are correct.


This is what is wrote for the backpane: One SATA backplane (BPN-SAS3-216EL1)
/SAS3 2.5 drive slots and 4x mini-SAS3 HD connectors for SAS3 
uplink/downlink//

//It support 4x port mini SAS3 HD connector.//
//This because there are somebody that will buy a AOC LSI card to speed 
up further the backpane.//

/
It understood that support 1 or 2 expander card, each one with 4x mini 
SAS3 cable.
2 cards daughter cards to have failover on the backplane (however this 
storage come with just 1 port).

Then should be 4x 12Gb/s ? I'm getting confused.


But verify that with your Supermicro vendor and read up about SAS/SATA
expanders.

If you want/need full speed, the only option with Supermicro seems to be
http://www.supermicro.com.tw/products/chassis/2U/216/SC216BAC-R920LP.cfm
at this time for SAS3.
That backplane (BPN-SAS3-216A) come for 300$ while the one on the 
storage worth 600$ (BPN-SAS3-216EL1).
I think that they are both great, however i cannot choose the backlplane 
for that model.



Of course a direct connect backplane chassis with SAS2/SATA3 will do fine
as I wrote above, like this one.
http://www.supermicro.com.tw/products/chassis/2U/216/SC216BA-R1K28LP.cfm

In either case get the fastest motherboard/CPUs (Ceph will need those for
SSDs) and the appropriate controller(s). If you're unwilling to build them
yourself, I'm sure some vendor will do BTO. ^^


I cannot change the motherboard (but seems really good!).
About CPUs i decided to go for a double E5-2620.
http://ark.intel.com/products/64594/Intel-Xeon-Processor-E5-2620-15M-Cache-2_00-GHz-7_20-GTs-Intel-QPI
Not so fast  i went for quantity instead for quality (12cores will 
be enought, no?).

Do you think i need to change it with something better?
RAM is 4x 8Gb = 32gb


- Than i should have at least for each disk a maximum speed of 96Gbps/24
disks which 4Gbps each disk
- The disks are SATA3 6Gbps than i should have here a little bootleneck
that lower me at 4Gbps.
- However a common SSD never hit the interface speed, the tend to be at
450MB/s.

Average speed of a SSD:
Min Avg Max
369 Read 485522
162 Write 428   504
223 Mixed 449   512


Then having a bottleneck to 4Gbps (which mean 400MB/s) should be fine
(should only if I'm not in wrong).
Is it right what i thougth?


Also expanders introduce some level of overhead, so you're probably going
to wind up with less than 400MB/s per drive.

Is it good 400MB/s per drive?
I don't think that a SAS HDD would even reach this speed.


I think that the only bottleneck here is the 4x1Gb ethernet connection.


With a firebreathing storage server like that, you

[ceph-users] OSD - choose the right controller card, HBA/IT mode explanation

2014-10-01 Thread Massimiliano Cuttini

Dear all,

i need few tips about Ceph best solution for driver controller.
I'm getting confused about IT mode, RAID and JBoD.
I read many posts about don't go for RAID but use instead a JBoD 
configuration.


I have 2 storage alternatives right now in my mind:

   *SuperStorage Server 2027R-E1CR24L*
   which use SAS3 via LSI 3008 AOC; IT Mode/Pass-through
   http://www.supermicro.nl/products/system/2U/2027/SSG-2027R-E1CR24L.cfm

and

   *SuperStorage Server 2027R-E1CR24N*
   which use SAS3 via LSI 3108 SAS3 AOC (in RAID mode?)
   http://www.supermicro.nl/products/system/2U/2027/SSG-2027R-E1CR24N.cfm

Ok both of them solution should support JBoD.
However I read that only a LSI with HBA or/and flashed in IT MODE allow to:

 * plugplay a new driver and see it already on a linux distribution
   (without recheck disks)
 * see S.M.A.R.T. data (because there is no volume layer between
   motherboard and disks)
 * reduce the disk latency

Then i should probably avoid LSI 3108 (which have a RAID config by 
default) and go for the LSI 3008 (already flashed in IT mode).


Is it so or I'm completly wasting my time on useless specs?


Did I getting the point? What would you reccomend?
I'm getting dumb reading by myself tons of specs without any second 
human opinion.


Thanks you for any hint you'll give!

--
*Massimiliano Cuttini*

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD - choose the right controller card, HBA/IT mode explanation

2014-10-01 Thread Massimiliano Cuttini

Hello Christian,


Il 01/10/2014 19:20, Christian Balzer ha scritto:

Hello,

On Wed, 01 Oct 2014 18:26:53 +0200 Massimiliano Cuttini wrote:


Dear all,

i need few tips about Ceph best solution for driver controller.
I'm getting confused about IT mode, RAID and JBoD.
I read many posts about don't go for RAID but use instead a JBoD
configuration.

I have 2 storage alternatives right now in my mind:

 *SuperStorage Server 2027R-E1CR24L*
 which use SAS3 via LSI 3008 AOC; IT Mode/Pass-through
 http://www.supermicro.nl/products/system/2U/2027/SSG-2027R-E1CR24L.cfm

and

 *SuperStorage Server 2027R-E1CR24N*
 which use SAS3 via LSI 3108 SAS3 AOC (in RAID mode?)
 http://www.supermicro.nl/products/system/2U/2027/SSG-2027R-E1CR24N.cfm


Firstly, both of these use an expander backplane.
So if you're planning on putting SSDs in there (even if just like 6 for
journals) you may be hampered by that.
The Supermicro homepage is vague as usual and the manual doesn't actually
have a section for that backplane. I guess it will be a 4link connection,
so 4x12Gb/s aka 4.8 GB/s.
If the disks all going to be HDDs you're OK, but keep that bit in mind.
  

ok i was thinking about connect 24 SSD disks connected with SATA3 (6Gbps).
This is why i choose a 8x SAS3 port LSI card that use double PCI 3.0 
connection, that support even (12Gbps).

This should allow me to use the full speed of the SSD (i guess).

I made this analysis:
- Total output: 8x12 = 96Gbps full speed available on the PCI3.0
- Than i should have at least for each disk a maximum speed of 96Gbps/24 
disks which 4Gbps each disk
- The disks are SATA3 6Gbps than i should have here a little bootleneck 
that lower me at 4Gbps.
- However a common SSD never hit the interface speed, the tend to be at 
450MB/s.


Average speed of a SSD:
Min Avg Max
369 Read 485522
162 Write 428   504
223 Mixed 449   512


Then having a bottleneck to 4Gbps (which mean 400MB/s) should be fine 
(should only if I'm not in wrong).

Is it right what i thougth?

I think that the only bottleneck here is the 4x1Gb ethernet connection.


Ok both of them solution should support JBoD.
However I read that only a LSI with HBA or/and flashed in IT MODE allow
to:

   * plugplay a new driver and see it already on a linux distribution
 (without recheck disks)
   * see S.M.A.R.T. data (because there is no volume layer between
 motherboard and disks)

smartctl can handle handle the LSI RAID stuff fine.

Good




   * reduce the disk latency


Not sure about that, depending on the actual RAID and configuration any
cache of the RAID subsystem might get used, so improving things.

The most important reason to use IT for me would be in conjunction with
SSDs, none of the RAIDs I'm aware allow for TRIM/DISCARD. to work.


Did you know if i can flash the LSI 3108 to IT mode?


Then i should probably avoid LSI 3108 (which have a RAID config by
default) and go for the LSI 3008 (already flashed in IT mode).


Of the 2 I would pick the IT mode one for a classic Ceph deployment.


Ok, but why?
Can you suggest me some good tech datasheet about IT mode?




Is it so or I'm completly wasting my time on useless specs?


It might be a good idea to tell us what your actual plans are.
As in, how many nodes (these are quite dense ones with 24 drives!), how
much storage in total, what kind of use pattern, clients.

Right now we are just testing and experimenting.
We would start with a non-production environment with 2 nodes, learn 
Cephs in depth and then replicate testfindings on other 2 nodes, 
upgrade it to 10GB ethernet and go live.
I don't want to start with a bad hardware environment since the 
beginning, then I'm reading a lot to find the perfect config for our need.

However LSI specs are a nightmare they are completly confused.

About the kind of use, take in mind that we need Ceph to run XEN VMs 
with high-availability (LUN on a NAS), they commonly run Mysql and other 
low latency application.

Probably we'll implementing them with OpenStack in a next future.
Let me know if you need some more specs.

Thanks,
Massimiliano Cuttini

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com