Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-30 Thread Shinobu Kinjo
First off, the followings, please.

 * ceph -s
 * ceph osd tree
 * ceph pg dump

and

 * what you actually did with exact commands.

Regards,

On Tue, Jan 31, 2017 at 6:10 AM, José M. Martín  wrote:
> Dear list,
>
> I'm having some big problems with my setup.
>
> I was trying to increase the global capacity by changing some osds by
> bigger ones. I changed them without wait the rebalance process finished,
> thinking the replicas were saved in other buckets, but I found a lot of
> PGs incomplete, so replicas of a PG were placed in a same bucket. I have
> assumed I have lost data because I zapped the disks and used in other tasks.
>
> My question is: what should I do to recover as much data as possible?
> I'm using the filesystem and RBD.
>
> Thank you so much,
>
> --
>
> Jose M. Martín
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-30 Thread Gregory Farnum
You might also check out "ceph osd tree" and crush dump and make sure
they look the way you expect.

On Mon, Jan 30, 2017 at 1:23 PM, Gregory Farnum  wrote:
> On Sun, Jan 29, 2017 at 6:40 AM, Muthusamy Muthiah
>  wrote:
>> Hi All,
>>
>> Also tried EC profile 3+1 on 5 node cluster with bluestore enabled  . When
>> an OSD is down the cluster goes to ERROR state even when the cluster is n+1
>> . No recovery happening.
>>
>> health HEALTH_ERR
>> 75 pgs are stuck inactive for more than 300 seconds
>> 75 pgs incomplete
>> 75 pgs stuck inactive
>> 75 pgs stuck unclean
>>  monmap e2: 5 mons at
>> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
>> election epoch 10, quorum 0,1,2,3,4
>> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
>> mgr active: ca-cn1 standbys: ca-cn4, ca-cn3, ca-cn5, ca-cn2
>>  osdmap e264: 60 osds: 59 up, 59 in; 75 remapped pgs
>> flags sortbitwise,require_jewel_osds,require_kraken_osds
>>   pgmap v119402: 1024 pgs, 1 pools, 28519 GB data, 21548 kobjects
>> 39976 GB used, 282 TB / 322 TB avail
>>  941 active+clean
>>   75 remapped+incomplete
>>8 active+clean+scrubbing
>>
>> this seems to be an issue with bluestore , recovery not happening properly
>> with EC .
>
> It's possible but it seems a lot more likely this is some kind of
> config issue. Can you share your osd map ("ceph osd getmap")?
> -Greg
>
>>
>> Thanks,
>> Muthu
>>
>> On 24 January 2017 at 12:57, Muthusamy Muthiah 
>> wrote:
>>>
>>> Hi Greg,
>>>
>>> We use EC:4+1 on 5 node cluster in production deployments with filestore
>>> and it does recovery and peering when one OSD goes down. After few mins ,
>>> other OSD from a node where the fault OSD exists will take over the PGs
>>> temporarily and all PGs goes to active + clean state . Cluster also does not
>>> goes down during this recovery process.
>>>
>>> Only on bluestore we see cluster going to error state when one OSD is
>>> down.
>>> We are still validating this and let you know additional findings.
>>>
>>> Thanks,
>>> Muthu
>>>
>>> On 21 January 2017 at 02:06, Shinobu Kinjo  wrote:

 `ceph pg dump` should show you something like:

  * active+undersized+degraded ... [NONE,3,2,4,1]3[NONE,3,2,4,1]

 Sam,

 Am I wrong? Or is it up to something else?


 On Sat, Jan 21, 2017 at 4:22 AM, Gregory Farnum 
 wrote:
 > I'm pretty sure the default configs won't let an EC PG go active with
 > only "k" OSDs in its PG; it needs at least k+1 (or possibly more? Not
 > certain). Running an "n+1" EC config is just not a good idea.
 > For testing you could probably adjust this with the equivalent of
 > min_size for EC pools, but I don't know the parameters off the top of
 > my head.
 > -Greg
 >
 > On Fri, Jan 20, 2017 at 2:15 AM, Muthusamy Muthiah
 >  wrote:
 >> Hi ,
 >>
 >> We are validating kraken 11.2.0 with bluestore  on 5 node cluster with
 >> EC
 >> 4+1.
 >>
 >> When an OSD is down , the peering is not happening and ceph health
 >> status
 >> moved to ERR state after few mins. This was working in previous
 >> development
 >> releases. Any additional configuration required in v11.2.0
 >>
 >> Following is our ceph configuration:
 >>
 >> mon_osd_down_out_interval = 30
 >> mon_osd_report_timeout = 30
 >> mon_osd_down_out_subtree_limit = host
 >> mon_osd_reporter_subtree_level = host
 >>
 >> and the recovery parameters set to default.
 >>
 >> [root@ca-cn1 ceph]# ceph osd crush show-tunables
 >>
 >> {
 >> "choose_local_tries": 0,
 >> "choose_local_fallback_tries": 0,
 >> "choose_total_tries": 50,
 >> "chooseleaf_descend_once": 1,
 >> "chooseleaf_vary_r": 1,
 >> "chooseleaf_stable": 1,
 >> "straw_calc_version": 1,
 >> "allowed_bucket_algs": 54,
 >> "profile": "jewel",
 >> "optimal_tunables": 1,
 >> "legacy_tunables": 0,
 >> "minimum_required_version": "jewel",
 >> "require_feature_tunables": 1,
 >> "require_feature_tunables2": 1,
 >> "has_v2_rules": 1,
 >> "require_feature_tunables3": 1,
 >> "has_v3_rules": 0,
 >> "has_v4_buckets": 0,
 >> "require_feature_tunables5": 1,
 >> "has_v5_rules": 0
 >> }
 >>
 >> ceph status:
 >>
 >>  health HEALTH_ERR
 >> 173 pgs are stuck inactive for more than 300 seconds
 >> 173 pgs incomplete
 >> 173 pgs stuck inactive
 >> 173 pgs stuck unclean
 >>  monmap e2: 5 

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-30 Thread Gregory Farnum
On Sun, Jan 29, 2017 at 6:40 AM, Muthusamy Muthiah
 wrote:
> Hi All,
>
> Also tried EC profile 3+1 on 5 node cluster with bluestore enabled  . When
> an OSD is down the cluster goes to ERROR state even when the cluster is n+1
> . No recovery happening.
>
> health HEALTH_ERR
> 75 pgs are stuck inactive for more than 300 seconds
> 75 pgs incomplete
> 75 pgs stuck inactive
> 75 pgs stuck unclean
>  monmap e2: 5 mons at
> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
> election epoch 10, quorum 0,1,2,3,4
> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
> mgr active: ca-cn1 standbys: ca-cn4, ca-cn3, ca-cn5, ca-cn2
>  osdmap e264: 60 osds: 59 up, 59 in; 75 remapped pgs
> flags sortbitwise,require_jewel_osds,require_kraken_osds
>   pgmap v119402: 1024 pgs, 1 pools, 28519 GB data, 21548 kobjects
> 39976 GB used, 282 TB / 322 TB avail
>  941 active+clean
>   75 remapped+incomplete
>8 active+clean+scrubbing
>
> this seems to be an issue with bluestore , recovery not happening properly
> with EC .

It's possible but it seems a lot more likely this is some kind of
config issue. Can you share your osd map ("ceph osd getmap")?
-Greg

>
> Thanks,
> Muthu
>
> On 24 January 2017 at 12:57, Muthusamy Muthiah 
> wrote:
>>
>> Hi Greg,
>>
>> We use EC:4+1 on 5 node cluster in production deployments with filestore
>> and it does recovery and peering when one OSD goes down. After few mins ,
>> other OSD from a node where the fault OSD exists will take over the PGs
>> temporarily and all PGs goes to active + clean state . Cluster also does not
>> goes down during this recovery process.
>>
>> Only on bluestore we see cluster going to error state when one OSD is
>> down.
>> We are still validating this and let you know additional findings.
>>
>> Thanks,
>> Muthu
>>
>> On 21 January 2017 at 02:06, Shinobu Kinjo  wrote:
>>>
>>> `ceph pg dump` should show you something like:
>>>
>>>  * active+undersized+degraded ... [NONE,3,2,4,1]3[NONE,3,2,4,1]
>>>
>>> Sam,
>>>
>>> Am I wrong? Or is it up to something else?
>>>
>>>
>>> On Sat, Jan 21, 2017 at 4:22 AM, Gregory Farnum 
>>> wrote:
>>> > I'm pretty sure the default configs won't let an EC PG go active with
>>> > only "k" OSDs in its PG; it needs at least k+1 (or possibly more? Not
>>> > certain). Running an "n+1" EC config is just not a good idea.
>>> > For testing you could probably adjust this with the equivalent of
>>> > min_size for EC pools, but I don't know the parameters off the top of
>>> > my head.
>>> > -Greg
>>> >
>>> > On Fri, Jan 20, 2017 at 2:15 AM, Muthusamy Muthiah
>>> >  wrote:
>>> >> Hi ,
>>> >>
>>> >> We are validating kraken 11.2.0 with bluestore  on 5 node cluster with
>>> >> EC
>>> >> 4+1.
>>> >>
>>> >> When an OSD is down , the peering is not happening and ceph health
>>> >> status
>>> >> moved to ERR state after few mins. This was working in previous
>>> >> development
>>> >> releases. Any additional configuration required in v11.2.0
>>> >>
>>> >> Following is our ceph configuration:
>>> >>
>>> >> mon_osd_down_out_interval = 30
>>> >> mon_osd_report_timeout = 30
>>> >> mon_osd_down_out_subtree_limit = host
>>> >> mon_osd_reporter_subtree_level = host
>>> >>
>>> >> and the recovery parameters set to default.
>>> >>
>>> >> [root@ca-cn1 ceph]# ceph osd crush show-tunables
>>> >>
>>> >> {
>>> >> "choose_local_tries": 0,
>>> >> "choose_local_fallback_tries": 0,
>>> >> "choose_total_tries": 50,
>>> >> "chooseleaf_descend_once": 1,
>>> >> "chooseleaf_vary_r": 1,
>>> >> "chooseleaf_stable": 1,
>>> >> "straw_calc_version": 1,
>>> >> "allowed_bucket_algs": 54,
>>> >> "profile": "jewel",
>>> >> "optimal_tunables": 1,
>>> >> "legacy_tunables": 0,
>>> >> "minimum_required_version": "jewel",
>>> >> "require_feature_tunables": 1,
>>> >> "require_feature_tunables2": 1,
>>> >> "has_v2_rules": 1,
>>> >> "require_feature_tunables3": 1,
>>> >> "has_v3_rules": 0,
>>> >> "has_v4_buckets": 0,
>>> >> "require_feature_tunables5": 1,
>>> >> "has_v5_rules": 0
>>> >> }
>>> >>
>>> >> ceph status:
>>> >>
>>> >>  health HEALTH_ERR
>>> >> 173 pgs are stuck inactive for more than 300 seconds
>>> >> 173 pgs incomplete
>>> >> 173 pgs stuck inactive
>>> >> 173 pgs stuck unclean
>>> >>  monmap e2: 5 mons at
>>> >>
>>> >> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
>>> >> election epoch 106, quorum 0,1,2,3,4
>>> >> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
>>> >> mgr active: ca-cn1 standbys: ca-cn2, ca-cn4, 

[ceph-users] Minimize data lost with PG incomplete

2017-01-30 Thread José M . Martín
Dear list,

I'm having some big problems with my setup.

I was trying to increase the global capacity by changing some osds by
bigger ones. I changed them without wait the rebalance process finished,
thinking the replicas were saved in other buckets, but I found a lot of
PGs incomplete, so replicas of a PG were placed in a same bucket. I have
assumed I have lost data because I zapped the disks and used in other tasks.

My question is: what should I do to recover as much data as possible?
I'm using the filesystem and RBD.

Thank you so much,

--

Jose M. Martín


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Python get_stats() gives wrong number of objects?

2017-01-30 Thread Kent Borg

On 01/30/2017 11:32 AM, John Spray wrote:



Is an object's existence and value synchronous?

Yep.


Makes sense, as hashing is at the core of this design.

Thanks for the super fast response!

-kb
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Python get_stats() gives wrong number of objects?

2017-01-30 Thread John Spray
On Mon, Jan 30, 2017 at 4:22 PM, Kent Borg  wrote:
> On 01/30/2017 11:20 AM, John Spray wrote:
>>
>> Pool stats are not synchronous -- when you call get_stats it is not
>> querying every OSD in the system before giving you a response.
>
> Ah!
>
> Is an object's existence and value synchronous?

Yep.

John

>
> Thanks,
>
> -kb
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Python get_stats() gives wrong number of objects?

2017-01-30 Thread Kent Borg

On 01/30/2017 11:20 AM, John Spray wrote:
Pool stats are not synchronous -- when you call get_stats it is not 
querying every OSD in the system before giving you a response.


Ah!

Is an object's existence and value synchronous?

Thanks,

-kb

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Python get_stats() gives wrong number of objects?

2017-01-30 Thread John Spray
On Mon, Jan 30, 2017 at 4:06 PM, Kent Borg  wrote:
> I have been playing with the Python version of librados and am getting
> startling answers from get_stats() on a pool. I am seeing 'num_objects' as
> zero at a point where I am expecting one. But if I loop, waiting for my
> expected one, I will get it in a second or so.
>
> I think I created this object with a synchronous write() but I sprinkled
> aio_flush() calls in my code for good measure, still have the same problem.
>
> My code is single threaded and the only code running against this cluster.
> Why would I get a delayed answer?

Pool stats are not synchronous -- when you call get_stats it is not
querying every OSD in the system before giving you a response.  The
OSDs send periodic stats reports about their PGs to the monitor, and
the stats you're getting are whatever was the most recent data on the
mon.

John

> Another hint: the pool has been freshly created by my same script, and I
> notice that pool creation and deletion is slow...is there some explicit wait
> I should do after creating a pool before using it?
>
> Thanks,
>
> -kb
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Python get_stats() gives wrong number of objects?

2017-01-30 Thread Kent Borg
I have been playing with the Python version of librados and am getting 
startling answers from get_stats() on a pool. I am seeing 'num_objects' 
as zero at a point where I am expecting one. But if I loop, waiting for 
my expected one, I will get it in a second or so.


I think I created this object with a synchronous write() but I sprinkled 
aio_flush() calls in my code for good measure, still have the same problem.


My code is single threaded and the only code running against this 
cluster. Why would I get a delayed answer?


Another hint: the pool has been freshly created by my same script, and I 
notice that pool creation and deletion is slow...is there some explicit 
wait I should do after creating a pool before using it?


Thanks,

-kb

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph rados gw, select objects by metadata

2017-01-30 Thread Casey Bodley


On 01/30/2017 06:11 AM, Johann Schwarzmeier wrote:

Hello Wido,
That is not good news, but it's what i expected. Thanks for your qick 
answer.

Jonny

Am 2017-01-30 11:57, schrieb Wido den Hollander:
Op 30 januari 2017 om 10:29 schreef Johann Schwarzmeier 
:



Hello,
I’m quite new to ceph and radosgw. With the python API, I found calls
for writing objects via boto API. It’s also possible to add metadata’s
to our objects. But now I have a question: is it possible to select or
search objects via metadata?  A little more in detail: I want to store
objects with metadata like color = blue, color red and so on. And the I
would select all object with color = blue. Sorry for a stupid question
but I’m not able to find an answer in the documentation.


The RADOS Gateway implements the S3 API from Amazon and doesn't allow 
for this.


The whole design for Ceph is also that it's object-name based and you
can't query for xattr values nor names.

So what you are trying to achieve will not work.

Wido


Br Jonny

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


You might be interested in the work Yehuda has done to integrate with 
elasticsearch, which does allow you to search for user-specified 
metadata. You can learn more about it here: 
http://tracker.ceph.com/projects/ceph/wiki/Rgw_metadata_search

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitoring

2017-01-30 Thread Andre Forigato
Matthew,

Very good documentation on performance counters.

Thank you for sharing with us.

Regards,

André

- Mensagem original -
> De: "Matthew Vernon" 
> Para: "Marc Roos" , "ceph-users" 
> 
> Enviadas: Segunda-feira, 30 de janeiro de 2017 9:18:55
> Assunto: Re: [ceph-users] Ceph monitoring

> Dear Marc,
> 
> On 28/01/17 23:43, Marc Roos wrote:
> 
>> Is there a doc that describes all the parameters that are published by
>> collectd-ceph?
> 
> The best I've found is the Redhat documentation of the performance
> counters (which are what collectd-ceph is querying):
> 
> https://access.redhat.com/documentation/en/red-hat-ceph-storage/1.3/paged/administration-guide/chapter-9-performance-counters
> 
> HTH,
> 
> Matthew
> 
> 
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS flapping: how to increase MDS timeouts?

2017-01-30 Thread John Spray
On Mon, Jan 30, 2017 at 7:09 AM, Burkhard Linke
 wrote:
> Hi,
>
>
>
> On 01/26/2017 03:34 PM, John Spray wrote:
>>
>> On Thu, Jan 26, 2017 at 8:18 AM, Burkhard Linke
>>  wrote:
>>>
>>> HI,
>>>
>>>
>>> we are running two MDS servers in active/standby-replay setup. Recently
>>> we
>>> had to disconnect active MDS server, and failover to standby works as
>>> expected.
>>>
>>>
>>> The filesystem currently contains over 5 million files, so reading all
>>> the
>>> metadata information from the data pool took too long, since the
>>> information
>>> was not available on the OSD page caches. The MDS was timed out by the
>>> mons,
>>> and a failover switch to the former active MDS (which was available as
>>> standby again) happened. This MDS in turn had to read the metadata, again
>>> running into a timeout, failover, etc. I resolved the situation by
>>> disabling
>>> one of the MDS, which kept the mons from failing the now solely available
>>> MDS.
>>
>> The MDS does not re-read every inode on startup -- rather, it replays
>> its journal (the overall number of files in your system does not
>> factor into this).
>>
>>> So given a large filesystem, how do I prevent failover flapping between
>>> MDS
>>> instances that are in the rejoin state and reading the inode information?
>>
>> The monitor's decision to fail an unresponsive MDS is based on the MDS
>> not sending a beacon to the mon -- there is no limit on how long an
>> MDS is allowed to stay in a given state (such as rejoin).
>>
>> So there are two things to investigate here:
>>   * Why is the MDS taking so long to start?
>>   * Why is the MDS failing to send beacons to the monitor while it is
>> in whatever process that is taking it so long?
>
>
> Under normal operation our system has about 4.5-4.9 million active caps.
> Most of them (~ 4 millions) are associated to the machine running the
> nightly backups.
>
> I assume that during the rejoin phase, the MDS is renewing the clients'
> caps. We see massive amount of small I/O on the data pool (up to
> 30.000-40.000 IOPS) during the rejoin phase. Does the MDS need to access the
> inode information to renew a cap? This would explain the high number of IOPS
> and why the rejoin phase can take up to 20 minutes.

Ah, I see.  You've identified the issue - the client is informing the
MDS about which inodes it has caps on, and the MDS is responding by
loading those inodes -- in order to dereference them it goes via the
data pool to read the backtrace on each of the inode objects.

This is not a great behaviour from the MDS: doing O(files with caps)
IOs, especially to the data pool, is not something we want to be doing
during failovers.

Things to try to mitigate this with the current code:
 * Using standby-replay daemons (if you're not already), so that the
standby has a better chance to already have the inodes in cache and
thereby avoid loading them
 * Increasing the MDS journal size ("mds log max segments") so that
the MDS will tend to keep a longer journal and have a better chance to
still have the inodes in the journal at the time the failover happens.
 * Decreasing "mds cache size" to limit the number of caps that can be
out there at any one time

I'll respond separately to ceph-devel about how we might change the
code to improve this case.

John



>
> Not sure about the second question, since the IOPS should not prevent
> beacons from reaching the monitors. We will have to move the MDS servers to
> different racks during this week. I'll try to bump up the debug level
> before.
>
>
> Regards,
> Burkhard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitoring

2017-01-30 Thread Matthew Vernon

Dear Marc,

On 28/01/17 23:43, Marc Roos wrote:


Is there a doc that describes all the parameters that are published by
collectd-ceph?


The best I've found is the Redhat documentation of the performance 
counters (which are what collectd-ceph is querying):


https://access.redhat.com/documentation/en/red-hat-ceph-storage/1.3/paged/administration-guide/chapter-9-performance-counters

HTH,

Matthew


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph rados gw, select objects by metadata

2017-01-30 Thread Johann Schwarzmeier

Hello Wido,
That is not good news, but it's what i expected. Thanks for your qick 
answer.

Jonny

Am 2017-01-30 11:57, schrieb Wido den Hollander:
Op 30 januari 2017 om 10:29 schreef Johann Schwarzmeier 
:



Hello,
I’m quite new to ceph and radosgw. With the python API, I found calls
for writing objects via boto API. It’s also possible to add metadata’s
to our objects. But now I have a question: is it possible to select or
search objects via metadata?  A little more in detail: I want to store
objects with metadata like color = blue, color red and so on. And the 
I

would select all object with color = blue. Sorry for a stupid question
but I’m not able to find an answer in the documentation.


The RADOS Gateway implements the S3 API from Amazon and doesn't allow 
for this.


The whole design for Ceph is also that it's object-name based and you
can't query for xattr values nor names.

So what you are trying to achieve will not work.

Wido


Br Jonny

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph rados gw, select objects by metadata

2017-01-30 Thread Wido den Hollander

> Op 30 januari 2017 om 10:29 schreef Johann Schwarzmeier 
> :
> 
> 
> Hello,
> I’m quite new to ceph and radosgw. With the python API, I found calls 
> for writing objects via boto API. It’s also possible to add metadata’s 
> to our objects. But now I have a question: is it possible to select or 
> search objects via metadata?  A little more in detail: I want to store 
> objects with metadata like color = blue, color red and so on. And the I 
> would select all object with color = blue. Sorry for a stupid question 
> but I’m not able to find an answer in the documentation.

The RADOS Gateway implements the S3 API from Amazon and doesn't allow for this.

The whole design for Ceph is also that it's object-name based and you can't 
query for xattr values nor names.

So what you are trying to achieve will not work.

Wido

> Br Jonny
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph rados gw, select objects by metadata

2017-01-30 Thread Johann Schwarzmeier

Hello,
I’m quite new to ceph and radosgw. With the python API, I found calls 
for writing objects via boto API. It’s also possible to add metadata’s 
to our objects. But now I have a question: is it possible to select or 
search objects via metadata?  A little more in detail: I want to store 
objects with metadata like color = blue, color red and so on. And the I 
would select all object with color = blue. Sorry for a stupid question 
but I’m not able to find an answer in the documentation.

Br Jonny

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bluestore osd failed

2017-01-30 Thread Eugene Skorlov
Hello,

We use 3 node cluster with EC 8+2. Kraken 11.2.0

Cluster was installed with 11.1.1 and upgrade to 11.2.0

After a couple of days 1 osd is stop and fail to start.

this osd was recreated from scratch on ver11.2.0, but after some times it fail 
again.


2017-01-27 11:21:35.333547 7fef07711940 -1 WARNING: the following dangerous and 
experimental features are enabled: bluestore,rocksdb
2017-01-27 11:21:35.333562 7fef07711940  0 set uid:gid to 64045:64045 
(ceph:ceph)
2017-01-27 11:21:35.333581 7fef07711940  0 ceph version 11.2.0 
(f223e27eeb35991352ebc1f67423d4ebc252adb7), process ceph-osd, pid 3701964
2017-01-27 11:21:35.333858 7fef07711940 -1 WARNING: experimental feature 
'bluestore' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster.  Do not use
feature with important data.

2017-01-27 11:21:35.336332 7fef07711940  0 pidfile_write: ignore empty 
--pid-file
2017-01-27 11:21:35.339124 7fef07711940 -1 WARNING: the following dangerous and 
experimental features are enabled: bluestore,rocksdb
2017-01-27 11:21:35.352842 7fef07711940  0 load: jerasure load: lrc load: isa 
2017-01-27 11:21:35.353651 7fef07711940  1 bluestore(/var/lib/ceph/osd/ceph-2) 
mount path /var/lib/ceph/osd/ceph-2
2017-01-27 11:21:35.353726 7fef07711940  1 bdev create path 
/var/lib/ceph/osd/ceph-2/block type kernel
2017-01-27 11:21:35.354569 7fef07711940  1 bdev(/var/lib/ceph/osd/ceph-2/block) 
open path /var/lib/ceph/osd/ceph-2/block
2017-01-27 11:21:35.354861 7fef07711940  1 bdev(/var/lib/ceph/osd/ceph-2/block) 
open size 2000293007360 (0x1d1bac11000, 1862 GB) block_size 4096 (4096 B) 
rotational
2017-01-27 11:21:35.355413 7fef07711940  1 bdev create path 
/var/lib/ceph/osd/ceph-2/block type kernel
2017-01-27 11:21:35.356189 7fef07711940  1 bdev(/var/lib/ceph/osd/ceph-2/block) 
open path /var/lib/ceph/osd/ceph-2/block
2017-01-27 11:21:35.356343 7fef07711940  1 bdev(/var/lib/ceph/osd/ceph-2/block) 
open size 2000293007360 (0x1d1bac11000, 1862 GB) block_size 4096 (4096 B) 
rotational
2017-01-27 11:21:35.356353 7fef07711940  1 bluefs add_block_device bdev 1 path 
/var/lib/ceph/osd/ceph-2/block size 1862 GB
2017-01-27 11:21:35.356425 7fef07711940  1 bluefs mount
2017-01-27 11:21:35.515297 7fef07711940  0  set rocksdb option compression = 
kNoCompression
2017-01-27 11:21:35.515325 7fef07711940  0  set rocksdb option 
max_write_buffer_number = 4
2017-01-27 11:21:35.515337 7fef07711940  0  set rocksdb option 
min_write_buffer_number_to_merge = 1
2017-01-27 11:21:35.515351 7fef07711940  0  set rocksdb option 
recycle_log_file_num = 4
2017-01-27 11:21:35.515361 7fef07711940  0  set rocksdb option 
write_buffer_size = 268435456
2017-01-27 11:21:35.515432 7fef07711940  0  set rocksdb option compression = 
kNoCompression
2017-01-27 11:21:35.515446 7fef07711940  0  set rocksdb option 
max_write_buffer_number = 4
2017-01-27 11:21:35.515457 7fef07711940  0  set rocksdb option 
min_write_buffer_number_to_merge = 1
2017-01-27 11:21:35.515466 7fef07711940  0  set rocksdb option 
recycle_log_file_num = 4
2017-01-27 11:21:35.515476 7fef07711940  0  set rocksdb option 
write_buffer_size = 268435456
2017-01-27 11:21:35.515747 7fef07711940  4 rocksdb: RocksDB version: 5.0.0

2017-01-27 11:21:35.515764 7fef07711940  4 rocksdb: Git sha 
rocksdb_build_git_sha:@0@
2017-01-27 11:21:35.515767 7fef07711940  4 rocksdb: Compile date Jan 19 2017
2017-01-27 11:21:35.515772 7fef07711940  4 rocksdb: DB SUMMARY

2017-01-27 11:21:35.515862 7fef07711940  4 rocksdb: CURRENT file:  CURRENT

2017-01-27 11:21:35.515870 7fef07711940  4 rocksdb: IDENTITY file:  IDENTITY

2017-01-27 11:21:35.515875 7fef07711940  4 rocksdb: MANIFEST file:  
MANIFEST-000694 size: 21330 Bytes

2017-01-27 11:21:35.515903 7fef07711940  4 rocksdb: SST files in db dir, Total 
Num: 72, files: 000219.sst 000341.sst 000394.sst 000435.sst 000499.sst 
000500.sst 000515.sst 000516.sst 000517.sst 

2017-01-27 11:21:35.515910 7fef07711940  4 rocksdb: Write Ahead Log file in db: 
000695.log size: 0 ; 

2017-01-27 11:21:35.515917 7fef07711940  4 rocksdb: 
Options.error_if_exists: 0
2017-01-27 11:21:35.515920 7fef07711940  4 rocksdb:   
Options.create_if_missing: 0
2017-01-27 11:21:35.515923 7fef07711940  4 rocksdb: 
Options.paranoid_checks: 1
2017-01-27 11:21:35.515925 7fef07711940  4 rocksdb: 
Options.env: 0x56477b543020
2017-01-27 11:21:35.515928 7fef07711940  4 rocksdb: 
   Options.info_log: 0x56477b544160
2017-01-27 11:21:35.515930 7fef07711940  4 rocksdb:  
Options.max_open_files: -1
2017-01-27 11:21:35.515932 7fef07711940  4 rocksdb:
Options.max_file_opening_threads: 16
2017-01-27 11:21:35.515933 7fef07711940  4 rocksdb: 
Options.disableDataSync: 0
2017-01-27 11:21:35.515936 7fef07711940