[ceph-users] cache tiering write vs read promotion

2017-05-18 Thread Webert de Souza Lima
Hello,

I'm using cache tiering with cephfs on latest ceph jewel release.

For my use case, I wanted to make new writes go "directly" to the cache
pool , and any
use other logic for promoting when reading, like after 2 reads, for example.

I see that the following settings are available:

hit_set_count
hit_set_period
min_read_recency_for_promote
min_write_recency_for_promote

Playing with those values I could see that the only way I could make the
first writes go directly to the cache pool was setting the  hit_set_count =
0. Doing that, the other options don't have any effect.

I tried setting the  hit_set_count and hit_set_period for any number above
zero, and set min_write_recency_for_promote = 0 but that does not work as
expected.

Is that possible? Could it be arranged?


Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting remapped PG's + OSD flaps

2017-05-18 Thread David Turner
"340 osds: 101 up, 112 in" This is going to be your culprit.  Your CRUSH
map is in a really weird state.  How many OSDs do you have in this
cluster?  When OSDs go down, secondary OSDs take over for it, but when OSDs
get marked out, the cluster re-balances to distribute the data according to
how many replicas your settings say it should have (remapped PGs).  Your
cluster thinks it has 340 OSDs in total, it believes that 112 of them are
added to the cluster, but only 101 of them are currently up and running.
That means that it is trying to put all of your data onto those 101 OSDs.
Your settings to have 16k PGs is fine for 340 OSDs, but with 101 OSDs
you're getting the error of too many PGs per OSD.

So next steps:
1) How many OSDs do you expect to be in your Ceph cluster?
2) Did you bring your OSDs back up during your rolling restart testing
BEFORE
a) They were marked down in the cluster?
b) You moved onto the next node?  Additionally, did you wait for all
backfilling to finish before you proceeded to the next node?
3) Do you have enough memory in your nodes or are your OSDs being killed by
OOM killer?  I see that you have a lot of peering PGs in your status
output.  That is indicative that the OSDs are continually restarting or
being marked down for not responding.

On Thu, May 18, 2017 at 2:41 PM nokia ceph  wrote:

> Hello,
>
>
> Env;- Bluestore EC 4+1 v11.2.0 RHEL7.3 16383 PG
>
>
> We  did our resiliency testing and found OSD's keeps on flapping and
> cluster went to error state.
>
> What we did:-
>
> 1. we have 5 node cluster
> 2. poweroff/stop ceph.target on last node and waited everything seems to
> reach back to normal.
> 3. Then power up the last node and then we see this recovery stuck on
> remapped PG.
> ~~~
>  osdmap e4829: 340 osds: 101 up, 112 in; 15011 *remapped pgs*
> *~~~*
> 4. Initially all osd's reach 340, at the same time this remapped value
> reached 16384 with OSD epoch value e818
> 5. Then after 1 or 2 hour we suspect that this remapped PG value keeps on
> incremnet/decrement results the osd's started failed one by one. While we
> tested with below patch also  still no change.
> patch -
> https://github.com/ceph/ceph-ci/commit/wip-prune-past-intervals-kraken
>
>
> #ceph -s
> 2017-05-18 18:07:45.876586 7fd6bb87e700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2017-05-18 18:07:45.900045 7fd6bb87e700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> cluster cb55baa8-d5a5-442e-9aae-3fd83553824e
>  health HEALTH_ERR
> 27056 pgs are stuck inactive for more than 300 seconds
> 744 pgs degraded
> 10944 pgs down
> 3919 pgs peering
> 11416 pgs stale
> 744 pgs stuck degraded
> 15640 pgs stuck inactive
> 11416 pgs stuck stale
> 16384 pgs stuck unclean
> 744 pgs stuck undersized
> 744 pgs undersized
> recovery 1279809/135206985 objects degraded (0.947%)
> too many PGs per OSD (731 > max 300)
> 11/112 in osds are down
>  monmap e3: 5 mons at {PL6-CN1=
> 10.50.62.151:6789/0,PL6-CN2=10.50.62.152:6789/0,PL6-CN3=10.50.62.153:6789/0,PL6-CN4=10.50.62.154:6789/0,PL6-CN5=1
> 0.50.62.155:6789/0}
> election epoch 22, quorum 0,1,2,3,4
> PL6-CN1,PL6-CN2,PL6-CN3,PL6-CN4,PL6-CN5
> mgr no daemons active
>  osdmap e4827: 340 osds: 101 up, 112 in; 15011 remapped pgs
> flags sortbitwise,require_jewel_osds,require_kraken_osds
>   pgmap v83202: 16384 pgs, 1 pools, 52815 GB data, 26407 kobjects
> 12438 GB used, 331 TB / 343 TB avail
> 1279809/135206985 objects degraded (0.947%)
> 4512 stale+down+remapped
> 3060 down+remapped
> 2204 stale+down
> 2000 stale+remapped+peering
> 1259 stale+peering
> 1167 down
>  739 stale+active+undersized+degraded
>  702 stale+remapped
>  557 peering
>  102 remapped+peering
>
>
>
> # ceph pg stat
> 2017-05-18 18:09:18.345865 7fe2f72ec700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2017-05-18 18:09:18.368566 7fe2f72ec700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> v83204: 16384 pgs: 1 inactive, 1259 stale+peering, 75 remapped, 2000
> stale+remapped+peering, 102 remapped+peering, 2204 stale+down, 739
> stale+active+undersized+degraded, 1 down+remapped+peering, 702
> stale+remapped, 557 peering, 4512 stale+down+remapped, 3060 down+remapped,
> 5 active+undersized+degraded, 1167 down; 52815 GB data, 12438 GB used, 331
> TB / 343 TB avail; 1279809/135206985 objects degraded (0.947%)
>
>
>
> Randomly capture some pg value.
> ~~~
> 3.3ffc 1646  0   

[ceph-users] Troubleshooting remapped PG's + OSD flaps

2017-05-18 Thread nokia ceph
Hello,


Env;- Bluestore EC 4+1 v11.2.0 RHEL7.3 16383 PG


We  did our resiliency testing and found OSD's keeps on flapping and
cluster went to error state.

What we did:-

1. we have 5 node cluster
2. poweroff/stop ceph.target on last node and waited everything seems to
reach back to normal.
3. Then power up the last node and then we see this recovery stuck on
remapped PG.
~~~
 osdmap e4829: 340 osds: 101 up, 112 in; 15011 *remapped pgs*
*~~~*
4. Initially all osd's reach 340, at the same time this remapped value
reached 16384 with OSD epoch value e818
5. Then after 1 or 2 hour we suspect that this remapped PG value keeps on
incremnet/decrement results the osd's started failed one by one. While we
tested with below patch also  still no change.
patch -
https://github.com/ceph/ceph-ci/commit/wip-prune-past-intervals-kraken


#ceph -s
2017-05-18 18:07:45.876586 7fd6bb87e700 -1 WARNING: the following dangerous
and experimental features are enabled: bluestore,rocksdb
2017-05-18 18:07:45.900045 7fd6bb87e700 -1 WARNING: the following dangerous
and experimental features are enabled: bluestore,rocksdb
cluster cb55baa8-d5a5-442e-9aae-3fd83553824e
 health HEALTH_ERR
27056 pgs are stuck inactive for more than 300 seconds
744 pgs degraded
10944 pgs down
3919 pgs peering
11416 pgs stale
744 pgs stuck degraded
15640 pgs stuck inactive
11416 pgs stuck stale
16384 pgs stuck unclean
744 pgs stuck undersized
744 pgs undersized
recovery 1279809/135206985 objects degraded (0.947%)
too many PGs per OSD (731 > max 300)
11/112 in osds are down
 monmap e3: 5 mons at {PL6-CN1=
10.50.62.151:6789/0,PL6-CN2=10.50.62.152:6789/0,PL6-CN3=10.50.62.153:6789/0,PL6-CN4=10.50.62.154:6789/0,PL6-CN5=1
0.50.62.155:6789/0}
election epoch 22, quorum 0,1,2,3,4
PL6-CN1,PL6-CN2,PL6-CN3,PL6-CN4,PL6-CN5
mgr no daemons active
 osdmap e4827: 340 osds: 101 up, 112 in; 15011 remapped pgs
flags sortbitwise,require_jewel_osds,require_kraken_osds
  pgmap v83202: 16384 pgs, 1 pools, 52815 GB data, 26407 kobjects
12438 GB used, 331 TB / 343 TB avail
1279809/135206985 objects degraded (0.947%)
4512 stale+down+remapped
3060 down+remapped
2204 stale+down
2000 stale+remapped+peering
1259 stale+peering
1167 down
 739 stale+active+undersized+degraded
 702 stale+remapped
 557 peering
 102 remapped+peering



# ceph pg stat
2017-05-18 18:09:18.345865 7fe2f72ec700 -1 WARNING: the following dangerous
and experimental features are enabled: bluestore,rocksdb
2017-05-18 18:09:18.368566 7fe2f72ec700 -1 WARNING: the following dangerous
and experimental features are enabled: bluestore,rocksdb
v83204: 16384 pgs: 1 inactive, 1259 stale+peering, 75 remapped, 2000
stale+remapped+peering, 102 remapped+peering, 2204 stale+down, 739
stale+active+undersized+degraded, 1 down+remapped+peering, 702
stale+remapped, 557 peering, 4512 stale+down+remapped, 3060 down+remapped,
5 active+undersized+degraded, 1167 down; 52815 GB data, 12438 GB used, 331
TB / 343 TB avail; 1279809/135206985 objects degraded (0.947%)



Randomly capture some pg value.
~~~
3.3ffc 1646  0 1715 0   0 3451912192
1646 1646 stale+active+undersized+degraded 2017-05-18 11:06:32.453158
846'1646  872:1634 [36,NONE,278,219,225] 36
[36,NONE,278,219,225] 360'0 2017-05-18 07:14:30.303859
0'0 2017-05-18 07:14:30.303859
3.3ffb 1711  00 0   0 3588227072
1711 1711 down 2017-05-18 15:20:52.858840
846'1711 1602:1708[150,161,NONE,NONE,83]150
 [150,161,NONE,NONE,83]1500'0 2017-05-18
07:14:30.303838 0'0 2017-05-18 07:14:30.303838
3.3ffa 1617  00 0   0 3391094784
1617 1617down+remapped 2017-05-18 17:12:54.943317
846'1617 2525:1637[48,292,77,277,49] 48
[48,NONE,NONE,277,49] 480'0 2017-05-18 07:14:30.303807
0'0 2017-05-18 07:14:30.303807
3.3ff9 1682  00 0   0 3527409664
1682 1682down+remapped 2017-05-18 16:16:42.223632
846'1682 2195:1678 [266,79,NONE,309,258]266
[NONE,NONE,NONE,NONE,258]2580'0 2017-05-18
07:14:30.303793 0'0 2017-05-18 07:14:30.303793
~~~

ceph.conf

[mon]
mon_osd_down_out_interval = 3600
mon_osd_reporter_subtree_level=host
mon_osd_down_out_subtree_limit=host
mon_osd_min_down_reporters = 4
mon_allow_pool_delete = true
[osd]
bluestore = true
bluestore_cache_size = 107374182
bluefs_buffered_io = true
osd_op_threads 

Re: [ceph-users] OSD crash loop - FAILED assert(recovery_info.oi.snaps.size())

2017-05-18 Thread Steve Anthony
Hmmm, after crashing for a few days every 30 seconds it's apparently
running normally again. Weird. I was thinking since it's looking for a
snapshot object, maybe re-enabling snaptrimming and removing all the
snapshots in the pool would remove that object (and the problem)? Never
got to that point this time, but I'm going to need to cycle more OSDs in
and out of the cluster, so if it happens again I might try that and update.

Thanks!

-Steve


On 05/17/2017 03:17 PM, Gregory Farnum wrote:
>
>
> On Wed, May 17, 2017 at 10:51 AM Steve Anthony  > wrote:
>
> Hello,
>
> After starting a backup (create snap, export and import into a second
> cluster - one RBD image still exporting/importing as of this message)
> the other day while recovery operations on the primary cluster were
> ongoing I noticed an OSD (osd.126) start to crash; I reweighted it
> to 0
> to prepare to remove it. Shortly thereafter I noticed the problem
> seemed
> to move to another OSD (osd.223). After looking at the logs, I noticed
> they appeared to have the same problem. I'm running Ceph version 9.2.1
> (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) on Debian 8.
>
> Log for osd.126 from start to crash: https://pastebin.com/y4fn94xe
>
> Log for osd.223 from start to crash: https://pastebin.com/AE4CYvSA
>
>
> May 15 10:39:55 ceph13 ceph-osd[21506]: -9308> 2017-05-15
> 10:39:51.561342 7f225c385900 -1 osd.126 616621 log_to_monitors
> {default=true}
> May 15 10:39:55 ceph13 ceph-osd[21506]: 2017-05-15 10:39:55.328897
> 7f2236be3700 -1 osd/ReplicatedPG.cc: In function 'virtual void
> ReplicatedPG::on_local_recover(const hobject_t&, const
> object_stat_sum_t&, const ObjectRecoveryInfo&, ObjectContextRef,
> ObjectStore::Transaction*)' thread 7f2236be3700 time 2017-05-15
> 10:39:55.322306
> May 15 10:39:55 ceph13 ceph-osd[21506]: osd/ReplicatedPG.cc: 192:
> FAILED
> assert(recovery_info.oi.snaps.size())
>
> May 15 16:45:25 ceph19 ceph-osd[30527]: 2017-05-15 16:45:25.343391
> 7ff40f41e900 -1 osd.223 619808 log_to_monitors {default=true}
> May 15 16:45:30 ceph19 ceph-osd[30527]: osd/ReplicatedPG.cc: In
> function
> 'virtual void ReplicatedPG::on_local_recover(const hobject_t&, const
> object_stat_sum_t&, const ObjectRecoveryInfo&, ObjectContextRef,
> ObjectStore::Transaction*)' thread 7ff3eab63700 time 2017-05-15
> 16:45:30.799839
> May 15 16:45:30 ceph19 ceph-osd[30527]: osd/ReplicatedPG.cc: 192:
> FAILED
> assert(recovery_info.oi.snaps.size())
>
>
> I did some searching and thought it might be related to
> http://tracker.ceph.com/issues/13837 aka
> https://bugzilla.redhat.com/show_bug.cgi?id=1351320 so I disabled
> scrubbing and deep-scrubbing, and set osd_pg_max_concurrent_snap_trims
> to 0 for all OSDs. No luck. I had changed the systemd service file to
> automatically restart osd.223 while recovery was happening, but it
> appears to have stalled; I suppose it's needed up for the
> remaining objects.
>
>
> Yeah, these aren't really related that I can see — though I haven't
> spent much time in this code that I can recall. The OSD is receiving a
> "push" as part of log recovery and finds that the object it's
> receiving is a snapshot object without having any information about
> the snap IDs that exist, which is weird. I don't know of any way a
> client could break it either, but maybe David or Jason know something
> more.
> -Greg
>  
>
>
> I didn't see anything else online, so I thought I see if anyone
> has seen
> this before or has any other ideas. Thanks for taking the time.
>
> -Steve
>
>
> --
> Steve Anthony
> LTS HPC Senior Analyst
> Lehigh University
> sma...@lehigh.edu 
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Steve Anthony
LTS HPC Senior Analyst
Lehigh University
sma...@lehigh.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-18 Thread Jason Dillaman
I'm unfortunately out of ideas at the moment. I think the best chance
of figuring out what is wrong is to repeat it while logs are enabled.

On Wed, May 17, 2017 at 4:51 PM, Stefan Priebe - Profihost AG
 wrote:
> No i can't reproduce it with active logs. Any furthr ideas?
>
> Greets,
> Stefan
>
> Am 17.05.2017 um 21:26 schrieb Stefan Priebe - Profihost AG:
>> Am 17.05.2017 um 21:21 schrieb Jason Dillaman:
>>> Any chance you still have debug logs enabled on OSD 23 after you
>>> restarted it and the scrub froze again?
>>
>> No but i can do that ;-) Hopefully it freezes again.
>>
>> Stefan
>>
>>>
>>> On Wed, May 17, 2017 at 3:19 PM, Stefan Priebe - Profihost AG
>>>  wrote:
 Hello,

 now it shows again:
>> 4095 active+clean
>>1 active+clean+scrubbing

 and:
 # ceph pg dump | grep -i scrub
 dumped all in format plain
 pg_stat objects mip degrmispunf bytes   log disklog
 state   state_stamp v   reportedup  up_primary
 acting  acting_primary  last_scrub  scrub_stamp last_deep_scrub
 deep_scrub_stamp
 2.aa40400   0   0   0   10128667136 3010
 3010active+clean+scrubbing  2017-05-11 09:37:37.962700
 181936'11196478  181936:8688051  [23,41,9]   23  [23,41,9]
 23  176730'10793226 2017-05-10 03:43:20.849784  171715'10548192
2017-05-04 14:27:39.210713

 So it seems the same scrub is stuck again... even after restarting the
 osd. It just took some time until the scrub of this pg happened again.

 Greets,
 Stefan
 Am 17.05.2017 um 21:13 schrieb Jason Dillaman:
> Can you share your current OSD configuration? It's very curious that
> your scrub is getting randomly stuck on a few objects for hours at a
> time until an OSD is reset.
>
> On Wed, May 17, 2017 at 2:55 PM, Stefan Priebe - Profihost AG
>  wrote:
>> Hello Jason,
>>
>> minutes ago i had another case where i restarted the osd which was shown
>> in objecter_requests output.
>>
>> It seems also other scrubs and deep scrubs were hanging.
>>
>> Output before:
>> 4095 active+clean
>>1 active+clean+scrubbing
>>
>> Output after restart:
>> 4084 active+clean
>>7 active+clean+scrubbing+deep
>>5 active+clean+scrubbing
>>
>> both values are changing every few seconds again doing a lot of scrubs
>> and deep scubs.
>>
>> Greets,
>> Stefan
>> Am 17.05.2017 um 20:36 schrieb Stefan Priebe - Profihost AG:
>>> Hi,
>>>
>>> that command does not exist.
>>>
>>> But at least ceph -s permanently reports 1 pg in scrubbing with no 
>>> change.
>>>
>>> Log attached as well.
>>>
>>> Greets,
>>> Stefan
>>> Am 17.05.2017 um 20:20 schrieb Jason Dillaman:
 Does your ceph status show pg 2.cebed0aa (still) scrubbing? Sure -- I
 can quickly scan the new log if you directly send it to me.

 On Wed, May 17, 2017 at 2:18 PM, Stefan Priebe - Profihost AG
  wrote:
> can send the osd log - if you want?
>
> Stefan
>
> Am 17.05.2017 um 20:13 schrieb Stefan Priebe - Profihost AG:
>> Hello Jason,
>>
>> the command
>> # rados -p cephstor6 rm rbd_data.21aafa6b8b4567.0aaa
>>
>> hangs as well. Doing absolutely nothing... waiting forever.
>>
>> Greets,
>> Stefan
>>
>> Am 17.05.2017 um 17:05 schrieb Jason Dillaman:
>>> OSD 23 notes that object rbd_data.21aafa6b8b4567.0aaa is
>>> waiting for a scrub. What happens if you run "rados -p  rm
>>> rbd_data.21aafa6b8b4567.0aaa" (capturing the OSD 23 logs
>>> during this)? If that succeeds while your VM remains blocked on that
>>> remove op, it looks like there is some problem in the OSD where ops
>>> queued on a scrub are not properly awoken when the scrub completes.
>>>
>>> On Wed, May 17, 2017 at 10:57 AM, Stefan Priebe - Profihost AG
>>>  wrote:
 Hello Jason,

 after enabling the log and generating a gcore dump, the request was
 successful ;-(

 So the log only contains the successfull request. So i was only 
 able to
 catch the successful request. I can send you the log on request.

 Luckily i had another VM on another Cluster behaving the same.

 This time osd.23:
 # ceph --admin-daemon
 

[ceph-users] Mixing cache-mode writeback with read-proxy

2017-05-18 Thread Guillaume Comte
Hi list,

Does it makes sense to split an SSD in two parts, one on which i will put
writeback cache-mode and an other with read-proxy mode in order to benefit
of the two modes ?

Thks

-- 
*Guillaume Comte*
06 25 85 02 02  | guillaume.co...@blade-group.com

90 avenue des Ternes, 75 017 Paris
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Reed Dier
> BTW, you asked about Samsung parts earlier. We are running these
> SM863's in a block storage cluster:
> 
> Model Family: Samsung based SSDs
> Device Model: SAMSUNG MZ7KM240HAGR-0E005
> Firmware Version: GXM1003Q
> 
>  
> 177 Wear_Leveling_Count 0x0013   094   094   005Pre-fail
> Always   -   2195
> 
> The problem is that I don't know how to see how many writes have gone
> through these drives.
> 
> But maybe they're EOL anyway?
> 
> Cheers, Dan

I have SM863a 1.9T’s in an all SSD pool.

Model Family: Samsung based SSDs
Device Model: SAMSUNG MZ7KM1T9HMJP-5

The easiest way to read the number of ‘drive writes’ is the WLC/177 attribute. 
Where ‘Value’ is going to be normalized value of percentage used (out of 100%) 
counting down, and the ‘raw value’ is going to be your actual Program/Erase 
Cycles average value, aka your drive writes.

> ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
> WHEN_FAILED RAW_VALUE
>   9 Power_On_Hours  0x0032   099   099   000Old_age   Always  
>  -   1758
> 177 Wear_Leveling_Count 0x0013   099   099   005Pre-fail  Always  
>  -   7


So in my case,  for this drive in question, the average of all the NAND has 
been fully written 7 times.

The 1.9T SM863 is rated at 12.32 PBW, with a warranty period of 5 years, so 
~3.6 DWPD, or ~6,500 drive writes for the total life of the drive.

Now your drive shows 2,195 PE Cycles, which would be about 33% of the total PE 
cycles its rated for. I’m guessing that some of the NAND may have higher PE 
cycles than others, and the raw value reported may be the max value, rather 
than the average.

Intel reports the min/avg/max on their drives using isdct.

> $ sudo isdct show -smart ad -intelssd 0
> 
> - SMART Attributes PHMD_400AGN -
> - AD -
> AverageEraseCycles : 256
> Description : Wear Leveling Count
> ID : AD
> MaximumEraseCycles : 327
> MinimumEraseCycles : 188
> Normalized : 98
> Raw : 1099533058236

This is a P3700, one of the oldest in use. So this one has seen ~2% of its life 
expectancy usage, where some NAND has seen 75% more PE cycles than others.

Would be curious what the raw value for Samsung is reporting, but thats an easy 
way to gauge drive writes.

Reed

> On May 18, 2017, at 3:30 AM, Dan van der Ster  wrote:
> 
> On Thu, May 18, 2017 at 3:11 AM, Christian Balzer  > wrote:
>> On Wed, 17 May 2017 18:02:06 -0700 Ben Hines wrote:
>> 
>>> Well, ceph journals are of course going away with the imminent bluestore.
>> Not really, in many senses.
>> 
> 
> But we should expect far fewer writes to pass through the RocksDB and
> its WAL, right? So perhaps lower endurance flash will be usable.
> 
> BTW, you asked about Samsung parts earlier. We are running these
> SM863's in a block storage cluster:
> 
> Model Family: Samsung based SSDs
> Device Model: SAMSUNG MZ7KM240HAGR-0E005
> Firmware Version: GXM1003Q
> 
>  9 Power_On_Hours  0x0032   098   098   000Old_age
> Always   -   9971
> 177 Wear_Leveling_Count 0x0013   094   094   005Pre-fail
> Always   -   2195
> 241 Total_LBAs_Written  0x0032   099   099   000Old_age
> Always   -   701300549904
> 242 Total_LBAs_Read 0x0032   099   099   000Old_age
> Always   -   20421265
> 251 NAND_Writes 0x0032   100   100   000Old_age
> Always   -   1148921417736
> 
> The problem is that I don't know how to see how many writes have gone
> through these drives.
> Total_LBAs_Written appears to be bogus -- it's based on time. It
> matches exactly the 3.6DWPD spec'd for that model:
>  3.6*240GB*9971 hours = 358.95TB
>  701300549904 LBAs * 512Bytes/LBA = 359.06TB
> 
> If we trust Wear_Leveling_Count then we're only dropping 6% in a year
> -- these should be good.
> 
> But maybe they're EOL anyway?
> 
> Cheers, Dan
> 
>>> Are small SSDs still useful for something with Bluestore?
>>> 
>> Of course, the WAL and other bits for the rocksdb, read up on it.
>> 
>> On top of that is the potential to improve things further with things
>> like bcache.
>> 
>>> For speccing out a cluster today that is a many 6+ months away from being
>>> required, which I am going to be doing, i was thinking all-SSD would be the
>>> way to go. (or is all-spinner performant with Bluestore?) Too early to make
>>> that call?
>>> 
>> Your call and funeral with regards to all spinners (depending on your
>> needs).
>> Bluestore at the very best of circumstances could double your IOPS, but
>> there are other factors at play and most people who NEED SSD journals now
>> would want something with SSDs in Bluestore as well.
>> 
>> If you're planning to actually deploy a (entirely) Bluestore cluster in
>> production with mission critical data before next year, you're a lot
>> braver than me.
>> An early adoption scheme with Bluestore nodes being in their own failure
>> domain (rack) would 

Re: [ceph-users] Debian Wheezy repo broken

2017-05-18 Thread Alfredo Deza
On Thu, May 18, 2017 at 5:53 AM, Harald Hannelius  wrote:
>
>
> Has anyone got any suggestions on how to either circumvent this problem?

It is not clear what version of ceph you want to install and on what
server. A full paste of the log (this looks like ceph-deploy output?)
would be great here

>
>
> On Fri, 12 May 2017, Harald Hannelius wrote:
>
>>
>> I am unable to perform ceph-install {node} on Debian Wheezy.
>>
>>
>> [server][DEBUG ] Hit http://security.debian.org wheezy/updates/main
>> Translation-en
>> [server][DEBUG ] Hit http://download.ceph.com wheezy Release
>> [server][DEBUG ] Hit http://download.ceph.com wheezy/main amd64 Packages
>> [server][DEBUG ] Ign http://download.ceph.com wheezy/main
>> Translation-en_US
>> [server][DEBUG ] Ign http://download.ceph.com wheezy/main Translation-en
>> [server][DEBUG ] Reading package lists...
>> [server][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
>> DEBIAN_PRIORITY=critical apt-get --assume-yes -q --no-install-recommends
>> install -o Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon
>> radosgw
>> [server][DEBUG ] Reading package lists...
>> [server][DEBUG ] Building dependency tree...
>> [server][DEBUG ] Reading state information...
>> [server][DEBUG ] Package ceph-mds is not available, but is referred to by
>> another package.
>> [server][DEBUG ] This may mean that the package is missing, has been
>> obsoleted, or
>> [server][DEBUG ] is only available from another source
>> [server][DEBUG ]
>> [server][WARNIN] E: Unable to locate package ceph-osd
>> [server][WARNIN] E: Package 'ceph-mds' has no installation candidate
>> [server][WARNIN] E: Unable to locate package ceph-mon
>> [server][WARNIN] E: Unable to locate package radosgw
>> [server][ERROR ] RuntimeError: command returned non-zero exit status: 100
>> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
>> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes
>> -q --no-install-recommends install -o Dpkg::Options::=--force-confnew
>> ceph-osd ceph-mds ceph-mon radosgw
>>
>>
>>
>>
>
> --
>
> Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Available tools for deploying ceph cluster as a backend storage ?

2017-05-18 Thread Shambhu Rajak
Let me explore the code to my needs. Thanks Chris
Regards,
Shambhu

From: Bitskrieg [mailto:bitskr...@bitskrieg.net]
Sent: Thursday, May 18, 2017 6:40 PM
To: Shambhu Rajak; wes_dilling...@harvard.edu
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Available tools for deploying ceph cluster as a 
backend storage ?


Shambhu,

If you're looking for something turnkey/dead-simple, you should be talking to 
red hat.  Everything out there that is FOSS is either A. Not fully-featured for 
life cycle management (ceph-deploy) or requires nontrivial amounts of 
time/expertise/etc. To put together and add whatever extra management features 
you need (Ansible, salt, etc.).

That said, we did create a relatively simple framework for complete life cycle 
management of ceph using salt, along with the pieces required for cinder, 
glance, and nova integration.  Some of the stuff is environment specific, but 
those pieces are easy enough to pull out and adjust to your needs.  Code is 
here: https://git.cybbh.space/vta/saltstack.

Chris

On May 18, 2017 8:56:52 AM Shambhu Rajak 
> wrote:
HI Wes
Since I want a production deployment, full-fledged management would be 
necessary for administrating, maintaining, could you suggest on this lines.
Thanks,
Shambhu

From: Wes Dillingham 
[mailto:wes_dilling...@harvard.edu]
Sent: Thursday, May 18, 2017 6:08 PM
To: Shambhu Rajak
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Available tools for deploying ceph cluster as a 
backend storage ?

If you dont want a full fledged configuration management approach ceph-deploy 
is your best bet. 
http://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-new/

On Thu, May 18, 2017 at 8:28 AM, Shambhu Rajak 
> wrote:
Hi ceph-users,

I want to deploy ceph-cluster as a backend storage for openstack, so I am 
trying to find the best tool available for deploying ceph cluster.
Few are in my mind:
https://github.com/ceph/ceph-ansible
https://github.com/01org/virtual-storage-manager/wiki/Getting-Started-with-VSM

Is there anything else that are available that could be much easier to use and 
give production deployment.

Thanks,
Shambhu Rajak

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Respectfully,

Wes Dillingham
wes_dilling...@harvard.edu
Research Computing | Infrastructure Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Available tools for deploying ceph cluster as a backend storage ?

2017-05-18 Thread Lenz Grimmer
Hi,

On 05/18/2017 02:28 PM, Shambhu Rajak wrote:

> I want to deploy ceph-cluster as a backend storage for openstack, so I
> am trying to find the best tool available for deploying ceph cluster.
> 
> Few are in my mind:
> 
> https://github.com/ceph/ceph-ansible
> 
> https://github.com/01org/virtual-storage-manager/wiki/Getting-Started-with-VSM
> 
> Is there anything else that are available that could be much easier to
> use and give production deployment.

If you're looking for a Salt-based solution, take a look at DeepSea as
well: https://github.com/SUSE/DeepSea

Lenz



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Available tools for deploying ceph cluster as a backend storage ?

2017-05-18 Thread Shambhu Rajak
HI Wes
Since I want a production deployment, full-fledged management would be 
necessary for administrating, maintaining, could you suggest on this lines.
Thanks,
Shambhu

From: Wes Dillingham [mailto:wes_dilling...@harvard.edu]
Sent: Thursday, May 18, 2017 6:08 PM
To: Shambhu Rajak
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Available tools for deploying ceph cluster as a 
backend storage ?

If you dont want a full fledged configuration management approach ceph-deploy 
is your best bet. 
http://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-new/

On Thu, May 18, 2017 at 8:28 AM, Shambhu Rajak 
> wrote:
Hi ceph-users,

I want to deploy ceph-cluster as a backend storage for openstack, so I am 
trying to find the best tool available for deploying ceph cluster.
Few are in my mind:
https://github.com/ceph/ceph-ansible
https://github.com/01org/virtual-storage-manager/wiki/Getting-Started-with-VSM

Is there anything else that are available that could be much easier to use and 
give production deployment.

Thanks,
Shambhu Rajak

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Respectfully,

Wes Dillingham
wes_dilling...@harvard.edu
Research Computing | Infrastructure Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Available tools for deploying ceph cluster as a backend storage ?

2017-05-18 Thread Wes Dillingham
If you dont want a full fledged configuration management approach
ceph-deploy is your best bet. http://docs.ceph.com/
docs/master/rados/deployment/ceph-deploy-new/

On Thu, May 18, 2017 at 8:28 AM, Shambhu Rajak  wrote:

> Hi ceph-users,
>
>
>
> I want to deploy ceph-cluster as a backend storage for openstack, so I am
> trying to find the best tool available for deploying ceph cluster.
>
> Few are in my mind:
>
> https://github.com/ceph/ceph-ansible
>
> https://github.com/01org/virtual-storage-manager/wiki/Gettin
> g-Started-with-VSM
>
>
>
> Is there anything else that are available that could be much easier to use
> and give production deployment.
>
>
>
> Thanks,
> Shambhu Rajak
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Respectfully,

Wes Dillingham
wes_dilling...@harvard.edu
Research Computing | Infrastructure Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Available tools for deploying ceph cluster as a backend storage ?

2017-05-18 Thread Eugen Block

Hi,

ceph-deploy is a nice tool, it does a lot of work for you and is not  
very hard to understand, if you know the basics of ceph.


http://docs.ceph.com/docs/master/rados/deployment/

Regards,
Eugen


Zitat von Shambhu Rajak :


Hi ceph-users,

I want to deploy ceph-cluster as a backend storage for openstack, so  
I am trying to find the best tool available for deploying ceph  
cluster.

Few are in my mind:
https://github.com/ceph/ceph-ansible
https://github.com/01org/virtual-storage-manager/wiki/Getting-Started-with-VSM

Is there anything else that are available that could be much easier  
to use and give production deployment.


Thanks,
Shambhu Rajak




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Available tools for deploying ceph cluster as a backend storage ?

2017-05-18 Thread Shambhu Rajak
Hi ceph-users,

I want to deploy ceph-cluster as a backend storage for openstack, so I am 
trying to find the best tool available for deploying ceph cluster.
Few are in my mind:
https://github.com/ceph/ceph-ansible
https://github.com/01org/virtual-storage-manager/wiki/Getting-Started-with-VSM

Is there anything else that are available that could be much easier to use and 
give production deployment.

Thanks,
Shambhu Rajak
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Debian Wheezy repo broken

2017-05-18 Thread Harald Hannelius



Has anyone got any suggestions on how to either circumvent this problem?

On Fri, 12 May 2017, Harald Hannelius wrote:



I am unable to perform ceph-install {node} on Debian Wheezy.


[server][DEBUG ] Hit http://security.debian.org wheezy/updates/main 
Translation-en

[server][DEBUG ] Hit http://download.ceph.com wheezy Release
[server][DEBUG ] Hit http://download.ceph.com wheezy/main amd64 Packages
[server][DEBUG ] Ign http://download.ceph.com wheezy/main Translation-en_US
[server][DEBUG ] Ign http://download.ceph.com wheezy/main Translation-en
[server][DEBUG ] Reading package lists...
[server][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive 
DEBIAN_PRIORITY=critical apt-get --assume-yes -q --no-install-recommends 
install -o Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw

[server][DEBUG ] Reading package lists...
[server][DEBUG ] Building dependency tree...
[server][DEBUG ] Reading state information...
[server][DEBUG ] Package ceph-mds is not available, but is referred to by 
another package.
[server][DEBUG ] This may mean that the package is missing, has been 
obsoleted, or

[server][DEBUG ] is only available from another source
[server][DEBUG ]
[server][WARNIN] E: Unable to locate package ceph-osd
[server][WARNIN] E: Package 'ceph-mds' has no installation candidate
[server][WARNIN] E: Unable to locate package ceph-mon
[server][WARNIN] E: Unable to locate package radosgw
[server][ERROR ] RuntimeError: command returned non-zero exit status: 100
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env 
DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes 
-q --no-install-recommends install -o Dpkg::Options::=--force-confnew 
ceph-osd ceph-mds ceph-mon radosgw







--

Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dan 
> van der Ster
> Sent: 18 May 2017 09:30
> To: Christian Balzer 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Changing SSD Landscape
> 
> On Thu, May 18, 2017 at 3:11 AM, Christian Balzer  wrote:
> > On Wed, 17 May 2017 18:02:06 -0700 Ben Hines wrote:
> >
> >> Well, ceph journals are of course going away with the imminent bluestore.
> > Not really, in many senses.
> >
> 
> But we should expect far fewer writes to pass through the RocksDB and its 
> WAL, right? So perhaps lower endurance flash will be
> usable.

Depends, I flagged up an issue in Bluestore where client latency writing to 
spinners was tied to the underlying disks latency. Sage has introduced a new 
deferred write feature which does a similar double write strategy to Filestore, 
first into the WAL, where it gets coalesced and then written out to the disk. 
The deferred writes are tuneable, as in you can say only defer writes up to 
128kbetc. But if you want the same write latency you see in Filestore, then 
you will encounter increased SSD wear to match it. 

> 
> BTW, you asked about Samsung parts earlier. We are running these SM863's in a 
> block storage cluster:
> 
> Model Family: Samsung based SSDs
> Device Model: SAMSUNG MZ7KM240HAGR-0E005
> Firmware Version: GXM1003Q
> 
>   9 Power_On_Hours  0x0032   098   098   000Old_age
> Always   -   9971
> 177 Wear_Leveling_Count 0x0013   094   094   005Pre-fail
> Always   -   2195
> 241 Total_LBAs_Written  0x0032   099   099   000Old_age
> Always   -   701300549904
> 242 Total_LBAs_Read 0x0032   099   099   000Old_age
> Always   -   20421265
> 251 NAND_Writes 0x0032   100   100   000Old_age
> Always   -   1148921417736
> 
> The problem is that I don't know how to see how many writes have gone through 
> these drives.
> Total_LBAs_Written appears to be bogus -- it's based on time. It matches 
> exactly the 3.6DWPD spec'd for that model:
>   3.6*240GB*9971 hours = 358.95TB
>   701300549904 LBAs * 512Bytes/LBA = 359.06TB
> 
> If we trust Wear_Leveling_Count then we're only dropping 6% in a year
> -- these should be good.
> 
> But maybe they're EOL anyway?
> 
> Cheers, Dan
> 
> >> Are small SSDs still useful for something with Bluestore?
> >>
> > Of course, the WAL and other bits for the rocksdb, read up on it.
> >
> > On top of that is the potential to improve things further with things
> > like bcache.
> >
> >> For speccing out a cluster today that is a many 6+ months away from
> >> being required, which I am going to be doing, i was thinking all-SSD
> >> would be the way to go. (or is all-spinner performant with
> >> Bluestore?) Too early to make that call?
> >>
> > Your call and funeral with regards to all spinners (depending on your
> > needs).
> > Bluestore at the very best of circumstances could double your IOPS,
> > but there are other factors at play and most people who NEED SSD
> > journals now would want something with SSDs in Bluestore as well.
> >
> > If you're planning to actually deploy a (entirely) Bluestore cluster
> > in production with mission critical data before next year, you're a
> > lot braver than me.
> > An early adoption scheme with Bluestore nodes being in their own
> > failure domain (rack) would be the best I could see myself doing in my
> > generic cluster.
> > For the 2 mission critical production clusters, they are (will be)
> > frozen most likely.
> >
> > Christian
> >
> >> -Ben
> >>
> >> On Wed, May 17, 2017 at 5:30 PM, Christian Balzer  wrote:
> >>
> >> >
> >> > Hello,
> >> >
> >> > On Wed, 17 May 2017 11:28:17 +0200 Eneko Lacunza wrote:
> >> >
> >> > > Hi Nick,
> >> > >
> >> > > El 17/05/17 a las 11:12, Nick Fisk escribió:
> >> > > > There seems to be a shift in enterprise SSD products to larger
> >> > > > less
> >> > write intensive products and generally costing more than what
> >> > > > the existing P/S 3600/3700 ranges were. For example the new
> >> > > > Intel NVME
> >> > P4600 range seems to start at 2TB. Although I mention Intel
> >> > > > products, this seems to be the general outlook across all
> >> > manufacturers. This presents some problems for acquiring SSD's for
> >> > Ceph
> >> > > > journal/WAL use if your cluster is largely write only and
> >> > > > wouldn't
> >> > benefit from using the extra capacity brought by these SSD's to
> >> > > > use as cache.
> >> > > >
> >> > > > Is anybody in the same situation and is struggling to find good
> >> > > > P3700
> >> > 400G replacements?
> >> > > >
> >> > > We usually build tiny ceph clusters, with 1 gbit network and
> >> > > S3610/S3710 200GB SSDs for journals. We have been experiencing
> >> > > supply problems for those disks lately, although it seems that
> >> > > 400GB disks are available, at least for now.
> >> > >

Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Dan van der Ster
On Thu, May 18, 2017 at 3:11 AM, Christian Balzer  wrote:
> On Wed, 17 May 2017 18:02:06 -0700 Ben Hines wrote:
>
>> Well, ceph journals are of course going away with the imminent bluestore.
> Not really, in many senses.
>

But we should expect far fewer writes to pass through the RocksDB and
its WAL, right? So perhaps lower endurance flash will be usable.

BTW, you asked about Samsung parts earlier. We are running these
SM863's in a block storage cluster:

Model Family: Samsung based SSDs
Device Model: SAMSUNG MZ7KM240HAGR-0E005
Firmware Version: GXM1003Q

  9 Power_On_Hours  0x0032   098   098   000Old_age
Always   -   9971
177 Wear_Leveling_Count 0x0013   094   094   005Pre-fail
Always   -   2195
241 Total_LBAs_Written  0x0032   099   099   000Old_age
Always   -   701300549904
242 Total_LBAs_Read 0x0032   099   099   000Old_age
Always   -   20421265
251 NAND_Writes 0x0032   100   100   000Old_age
Always   -   1148921417736

The problem is that I don't know how to see how many writes have gone
through these drives.
Total_LBAs_Written appears to be bogus -- it's based on time. It
matches exactly the 3.6DWPD spec'd for that model:
  3.6*240GB*9971 hours = 358.95TB
  701300549904 LBAs * 512Bytes/LBA = 359.06TB

If we trust Wear_Leveling_Count then we're only dropping 6% in a year
-- these should be good.

But maybe they're EOL anyway?

Cheers, Dan

>> Are small SSDs still useful for something with Bluestore?
>>
> Of course, the WAL and other bits for the rocksdb, read up on it.
>
> On top of that is the potential to improve things further with things
> like bcache.
>
>> For speccing out a cluster today that is a many 6+ months away from being
>> required, which I am going to be doing, i was thinking all-SSD would be the
>> way to go. (or is all-spinner performant with Bluestore?) Too early to make
>> that call?
>>
> Your call and funeral with regards to all spinners (depending on your
> needs).
> Bluestore at the very best of circumstances could double your IOPS, but
> there are other factors at play and most people who NEED SSD journals now
> would want something with SSDs in Bluestore as well.
>
> If you're planning to actually deploy a (entirely) Bluestore cluster in
> production with mission critical data before next year, you're a lot
> braver than me.
> An early adoption scheme with Bluestore nodes being in their own failure
> domain (rack) would be the best I could see myself doing in my generic
> cluster.
> For the 2 mission critical production clusters, they are (will be) frozen
> most likely.
>
> Christian
>
>> -Ben
>>
>> On Wed, May 17, 2017 at 5:30 PM, Christian Balzer  wrote:
>>
>> >
>> > Hello,
>> >
>> > On Wed, 17 May 2017 11:28:17 +0200 Eneko Lacunza wrote:
>> >
>> > > Hi Nick,
>> > >
>> > > El 17/05/17 a las 11:12, Nick Fisk escribió:
>> > > > There seems to be a shift in enterprise SSD products to larger less
>> > write intensive products and generally costing more than what
>> > > > the existing P/S 3600/3700 ranges were. For example the new Intel NVME
>> > P4600 range seems to start at 2TB. Although I mention Intel
>> > > > products, this seems to be the general outlook across all
>> > manufacturers. This presents some problems for acquiring SSD's for Ceph
>> > > > journal/WAL use if your cluster is largely write only and wouldn't
>> > benefit from using the extra capacity brought by these SSD's to
>> > > > use as cache.
>> > > >
>> > > > Is anybody in the same situation and is struggling to find good P3700
>> > 400G replacements?
>> > > >
>> > > We usually build tiny ceph clusters, with 1 gbit network and S3610/S3710
>> > > 200GB SSDs for journals. We have been experiencing supply problems for
>> > > those disks lately, although it seems that 400GB disks are available, at
>> > > least for now.
>> > >
>> > This. Very much THIS.
>> >
>> > We're trying to get 200 or 400 or even 800GB DC S3710 or S3610s here
>> > recently with zero success.
>> > And I'm believing our vendor for a change that it's not their fault.
>> >
>> > What seems to be happening (no official confirmation, but it makes all the
>> > sense in the world to me) is this:
>> >
>> > Intel is trying to switch to 3DNAND (like they did with the 3520s), but
>> > while not having officially EOL'ed the 3(6/7)10s also allowed the supply
>> > to run dry.
>> >
>> > Which of course is not a smart move, because now people are massively
>> > forced to look for alternatives and if they work unlikely to come back.
>> >
>> > I'm looking at oversized Samsungs (base model equivalent to 3610s) and am
>> > following this thread for other alternatives.
>> >
>> > Christian
>> > --
>> > Christian BalzerNetwork/Systems Engineer
>> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
>> > http://www.gol.com/
>> > ___
>> > ceph-users mailing list
>> > 

[ceph-users] 3mon cluster, after ifdown the public network interface of leader mon, sendQ of one peon monitor will suddenly increase sharply

2017-05-18 Thread Chenyehua
Dear Cephers:
I meet a problem now:
my cluster has 3 servers, each server has 1mon and 1osd
node0(mon0+osd0)、node1(mon1+osd1)、node2(mon2+osd2)

3 mon(mon0 192.168.202.35/24, mon1 192.168.202.36/24, mon2 192.168.202.37/24)

public network 192.168.2.*/24
cluster network 172.16.2.*/24

operation:
1.ifdown node0 public network interface for 30s
2.on node1, netstat -apn-c 1 |grep 6789, you can see the sendQ of tcp 
connection (src: mon1 -> dst: mon0) would be up to several hundred or several 
thousand ;
3.ifup node0s public network interface, use netstat again on node1 
,suddenly the sendQ can be up to 100 thousand or more; this will lead to in the 
following several seconds, mon1 can not send defer message to mon0(because of 
tcp sendQ block), this time "ceph -s" will display mon1 is down;
until sendQ of mon1 back to zero, 3 mon can finish election;

my questions:
1.why tcp sendQ(mon1 prepare to send to mon0) suddenly increase sharply when 
ifup mon0s public network?
2.when ifdown node0 net interface, state of the tcp conn (between mon0 and 
mon1) still be ESTABLISHED, after 900s ,conn close;
but if I down cluster network interface of node0, tcp conn (between osd0 and 
osd1) almost turn to be TIME_WAIT1;
why mon-mon tcp conn policy is different from osd-osd tcp conn?

Looking forward to your reply,thanks!
Best regards!

-
本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from New H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com