Re: raid10 is killing me, and applications that aren't willing towait for it to respond
gene heskett writes: > Is this info helpful? I don't know really. I was thinking about the file dialogs or requestors and how they often try access previously used locations. For example, I've learned not to download with Firefox to a network drive. I don't know if Firefox is still like that but in the past, after downloading to a network drive, Firefox wanted to put the next download in the same place and if the network drive wasn't available, it just froze indefinitely and only getting that network drive going would bring it out of its coma. So I was just thinking if your file dialogs try to access something that isn't available it could cause this kind of delay but I don't know, it doesn't seem to fit that well. Also if the issue happened with any common app packaged in Debian, it might be easier to figure out what's happening.
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
> I've no idea how to start debugging this but I feel like the problem `strace` maybe? Stefan
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
Hello, On Fri, Dec 15, 2023 at 12:00:00AM -0800, David Christensen wrote: > On 12/14/23 18:36, gene heskett wrote: > > Thunar, yes, but I don't use it, not my cup of tea. […] > It sounds like OpenSCAD and gidislicer have something in common that is > causing the issue, while the other apps do not have that something. So, the > challenge is finding the shared object files (dynamic linking) and/or the > source files (static linking) that are present in the affected programs and > not present in the unaffected programs. I will add that the Thunar file manager was included in Gene's "affected" list and I think Gene posted some logs before that showed some dbus timeout. I've no idea how to start debugging this but I feel like the problem may exist somewhere in Gene's desktop environment and affect things that call its file dialog. Anyway, I think we can all agree at this point that this has got nothing to do with RAID and mdadm. Though Gene said he has tried to reinstall several times I think, with the same outcome, so it's not something that's going to be avoided by another install otherwise I might suggest that. Thanks, Andy -- https://bitfolk.com/ -- No-nonsense VPS hosting
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/14/23 18:36, gene heskett wrote: On 12/14/23 16:36, Anssi Saari wrote: gene heskett writes: It repeats per gui access. Starting a gfx program such as OpenSCAD, or qidislicer from an xfce4 terminal cli, is delayed for this similar but not always identical lag. And reports odd warnings etc while its getting ready to open its gui. Does this happen with common GUI tools too like, say, Firefox? firefox, no. Or XFCE's file manager, Thunar I believe? Thunar, yes, but I don't use it, not my cup of tea. It wants to be a replacement for mc, but fails at 90% of what mc can do. Or a text editor like Gedit? Gedit has ben banned from any of my machines for at least 15 years, it made scrambled eggs out of of several linuxcnc configuration files I had to re-write from scratch, but geany has never done that. And geany is as instant as nano. Or even the XFCE terminal? Comes up instantly from the menu, I use it heavily because it has tabs. I use them much like workspaces. Is this info helpful? Thank you Anssi Saari Cheers, Gene Heskett. It sounds like OpenSCAD and gidislicer have something in common that is causing the issue, while the other apps do not have that something. So, the challenge is finding the shared object files (dynamic linking) and/or the source files (static linking) that are present in the affected programs and not present in the unaffected programs. It would be helpful if you posted a list of affected programs and a list of unaffected programs to provide alternatives for a search. Please note any programs that you did not install using conventional Debian packages (and that may be the root cause of the issue). David
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/14/23 16:36, Anssi Saari wrote: gene heskett writes: It repeats per gui access. Starting a gfx program such as OpenSCAD, or qidislicer from an xfce4 terminal cli, is delayed for this similar but not always identical lag. And reports odd warnings etc while its getting ready to open its gui. Does this happen with common GUI tools too like, say, Firefox? firefox, no. Or XFCE's file manager, Thunar I believe? Thunar, yes, but I don't use it, not my cup of tea. It wants to be a replacement for mc, but fails at 90% of what mc can do. Or a text editor like Gedit? Gedit has ben banned from any of my machines for at least 15 years, it made scrambled eggs out of of several linuxcnc configuration files I had to re-write from scratch, but geany has never done that. And geany is as instant as nano. Or even the XFCE terminal? Comes up instantly from the menu, I use it heavily because it has tabs. I use them much like workspaces. Is this info helpful? Thank you Anssi Saari Cheers, Gene Heskett. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
gene heskett writes: > It repeats per gui access. Starting a gfx program such as OpenSCAD, or > qidislicer from an xfce4 terminal cli, is delayed for this similar but > not always identical lag. And reports odd warnings etc while its > getting ready to open its gui. Does this happen with common GUI tools too like, say, Firefox? Or XFCE's file manager, Thunar I believe? Or a text editor like Gedit? Or even the XFCE terminal?
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/14/23 04:17, Nicolas George wrote: to...@tuxteam.de (12023-12-14): I've skimmed some of the answers, and they correspond to your confusing request. Someone mentions DNS timeouts to rule them out right away (do you access your RAID over the net? Is DNS resolution involved at all?) no, and no. He quoted: Error creating proxy: Error calling StartServiceByName for org.gtk.vfs.GPhoto2VolumeMonitor: Timeout was reached (g-io-error-quark, 24) The odd part of that is that there is, stuck on screen on every workspace, a volume control gui of some kind that has no exit icon. I cannot get rid of it. It has a wrench icon where most gui's have an exit button. And that lead to a red trash can icon labeled remove widget, and it did, whatever the heck a widget is. That means the issue is in the DBus monster moussaka¹. The odds of finding a solution in the current circumstances are vanishingly thin. Regards, Cheers, Nik, Gene Heskett. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/14/23 00:39, to...@tuxteam.de wrote: On Wed, Dec 13, 2023 at 10:26:19AM -0500, gene heskett wrote: Greetings all; I thought I was doing things right a year back when I built a raid10 for my /home partition. but I'm tired of fighting with it for access. Anything that wants to open a file on it, is subjected to a freeze of at least 30 seconds BEFORE the file requester is drawn on screen. Once it has done the screen draw and the path is established, read/writes then proceed at multi-gigabyte speeds just like it should [...] - disk access latency - digikam - photo volume monitor - cache buffers (which?) - klipper I've been here several times with this problem without any constructive responses [...] So one more time: Why can't I use my software raid10 on 4 1T SSD's ? Gene, just a humble suggestion. I'm too short in time to wade through all this deep software cake, of which I know but a fraction. Perhaps if you structured your requests a bit better, the quality of the answers would improve? The latest info is that non-gui stuff works instantly. gui stuff lags at least 30 seconds, mouse still moves but the rest of the same screen is non-responsive until this tomeout has taken place, then everything returns to normal. I've skimmed some of the answers, and they correspond to your confusing request. Someone mentions DNS timeouts to rule them out right away (do you access your RAID over the net? Is DNS resolution involved at all?) no Other answers veer of in similar disparate directions, but that corresponds to your request's deeply confusing nature. Because I was not able to define it any better. Let me humbly suggest to structure your search a bit (you do have deep experience in fault searching, we all know). What I get from your post is that you seem to see the root of your problems in a long latency on (first?) storage access to your block device (whether it matters that it be a RAID10 or a RAID42 we just don't know!). As I also don't know, this raid10 is my only experience with a raid of any kind. This looks like a promising avenue, so let's pretend we start with this one. Do you experience this latency also with simpler tools (something which doesn't "draw a requester on screen", like, say, ls or find)? no, even a dd write is essentially instant. Let's thus try to rule out the deep pie of sh*** (uh, software stack) you are using to access the disk. Do you still observe this latency? Is there a pattern (like, when accessing something for the first time, and/or accessing things after a longer inactivity period, yadda, yadda). It repeats per gui access. Starting a gfx program such as OpenSCAD, or qidislicer from an xfce4 terminal cli, is delayed for this similar but not always identical lag. And reports odd warnings etc while its getting ready to open its gui. Might there be a clue there? IDK I could copy/paste some of it if you like to see it, but to me it doesn't look related or I would have already. If yes, you can follow the path "disk access latency". If no, the problem might lie further up the stack (and then, things like DNS latencies might play a role again!). With your posts, my head spins and my time slot in the mornings, before I go to $DAYJOB is used up before I can start even to think about how debug things. And I appreciate that Tomas, $DAYJOBS take precedence, always. Triply appreciated when you are the only tech person responsible for keeping a tv station on the air and working smoothly like I was for nearly 50 years before I retired, its not a $DAYJOB, its a $24/7/365.25JOB. In one short word: please focus. Debugging complex stuff becomes impossible otherwise. I now think I have a gui problem and can imagine something in the original debian gnome install getting the request to open the gui and has to fail before xfce4 even gets the request. Whether that is true or not, is up to ways to test the theory. That I'm clueless about. There is enough kde/plasma installed that it thinks it has to start kmail at boot time, lots of kde and gnome leftovers, but the popups asking for a pw, are also subjected to this delay after the bootup is completed. Is this all connected? At this time IDK. Cheers Thank you Tomas. Cheers, Gene Heskett. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/13/23 15:33, gene heskett wrote: gene@coyote:~$ time dd if=/dev/zero of=/home/gene/zero bs=1M count=100 oflag=sync 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.935655 s, 112 MB/s real 0m0.940s user 0m0.000s sys 0m0.254s Thank you for providing a console session that confirms the issue is not md RAID. For completeness, I suggest that you do both write and read benchmarks: 2023-12-13 17:56:58 root@taz ~ # smartctl -i /dev/sda | grep "Device Model" Device Model: INTEL SSDSC2CW060A3 2023-12-13 17:57:12 root@taz ~ # dd if=/dev/zero of=/home/dpchrist/100mb.zero bs=1M count=100 oflag=sync 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 1.6329 s, 64.2 MB/s 2023-12-13 17:57:57 root@taz ~ # free && sync && echo 3 > /proc/sys/vm/drop_caches && free totalusedfree shared buff/cache available Mem:32698252 194103229428472 761160 1328748 29606508 Swap: 976892 0 976892 totalusedfree shared buff/cache available Mem:32698252 194151629580404 717360 1176332 29650836 Swap: 976892 0 976892 2023-12-13 17:58:03 root@taz ~ # dd of=/dev/null if=/home/dpchrist/100mb.zero bs=1M count=100 oflag=sync 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.263723 s, 398 MB/s I have found that as I run computers, there is an accumulation of cruft over time. The more I mess with a computer, the sooner it becomes unstable. Eventually, "finding the needle in the haystack" and "putting Humpty Dumpty back together again" do not work -- the computer requires a backup-wipe-install-restore cycle. Your posts indicate that your computer is overdue. And, I suspect a deeper issue -- you have one computer that is your workstation, your file server, and your backup server. This over-complicates everything and creates a strong disincentive to backup-wipe-install-restore. I have been there, done that, lost service, and lost data. Now I have several laptops/ desktops/ workstations, a dedicated file server, and a dedicated backup server. Life is good. :-) Again -- I suggest that you build a backup server, then build a file server, then rebuild the workstation. I am confident you will be rewarded with simpler administration and improved reliability. David
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/13/23 16:30, Andy Smith wrote: Hello, On Wed, Dec 13, 2023 at 10:26:19AM -0500, gene heskett wrote: I thought I was doing things right a year back when I built a raid10 for my /home partition. but I'm tired of fighting with it for access. Anything that wants to open a file on it, is subjected to a freeze of at least 30 seconds BEFORE the file requester is drawn on screen. I haven't chimed in to any of the multiple times you've brought this to the list, because it's just so bizarre. I've about 20 years' experience of using mdadm and have never seen anything like what you're reporting, so I just don't know what the problem could be or how to find it. The only times I've seen anything remotely like it have been when there's been hardware problems with failing writes, but I know you've been through this with the list several time sand no such low level issues were ever uncovered. Would it be correct to say that you only experience these IO delays from GUI applications? Like, if you do a simple: $ time echo "test" > ~/foo does that complete in a normal time? The lshw >lshw.txt would be in the "guiless" category, and it worked even quicker than if I left it to come out on the cli. Another item that may be of interest is that the gui is xfce4. That was a bit over a 40k write And if you did a bigger write, again from the command line? $ dd if=/dev/zero of=/path/to/your/home/dir/zero bs=1m count=100 00+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0783268 s, 1.3 GB/s I had to use an uppercase M in the bs=, to get: gene@coyote:~$ time dd if=/dev/zero of=/home/gene/zero bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0829926 s, 1.3 GB/s real0m0.085s user0m0.005s sys 0m0.080s I'd have to say thats realtime. no lag. and with sync: $ dd if=/dev/zero of=/path/to/your/home/dir/zero bs=1m count=100 oflag=sync 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.463356 s, 226 MB/s gene@coyote:~$ time dd if=/dev/zero of=/home/gene/zero bs=1M count=100 oflag=sync 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.935655 s, 112 MB/s real0m0.940s user0m0.000s sys 0m0.254s A lot slower but still no lag, about an even second. But then when you try to use some GUI application to save something to a file, the initial save file dialog takers ages to appear and everything seems frozen? Correct... If so then I feel like this may actually be some sort of problem with your desktop environment, but then I've no idea how to narrow that down. I think we have made progress, Andy, having narrowed it down to the gui. To me that is progress. Now I need a gui expert, which I am for sure not. Never have been, never will be. Thanks, Andy Thanks a bunch Andy, I think your logic was quite helpful in narrowing down the problem area. Take care, stay warm and well. Cheers, Gene Heskett. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On Wed, Dec 13, 2023 at 02:19:07PM -0500, gene heskett wrote: > On 12/13/23 13:24, Andrew M.A. Cater wrote: > > On Wed, Dec 13, 2023 at 10:26:19AM -0500, gene heskett wrote: > > > Greetings all; > > > > > > > Hi Gene, > > > > Respectfully, if I were you, I might consider tearing down one machine > > and rebuilding the data on it bit by bit. > > > > Questions to answer first: > > > > 1. Are all the disks the same size? > > > yes > > 2. Are all the disks the same manufacturer? > All 1T Samsung 870's > > 3. Are they all connected to the same controller if this is an add-in card? > > > yes. add in card. > > > If not an add in card: > > > > 4. Are they all connected to the SATA sockets on the motherboard? > > motherboard? > > > No. All connected to a 6 port board, in port order. > > > 4. If to the motherboard, are they the only devices connected to the SATA > > sockets there? > No, main board is Asus Prime Z370-A II but it only has 6 ports, all busy. > > > > 5. What is the primary device that has / on it - NVME / SSD / spinning rust? > > SSD, another 1T samsung. > > > > So one more time: Why can't I use my software raid10 on 4 1T SSD's ? > > > > > > > _How did you set the RAID 10 up? > > > > Would you be willing to scrap the data in /home and start again? > No, I have a lot of work I'd be the rest of my life rebuilding. > Howevr, in preparation to restarting amanda, I've just installed a 2nd sata > add on card, this one with 16 ports, 4 of which are already loaded with 2T > gigastones so I do have the means to rsync /home to 1 or more of those. > Copy /home to another drive - then disconnect power and drive cables to it. You seem to like adding many disks to one machine: I'd honestly suggest grabbing another machine to put half these disks into. If you've got NVME - put that in as your boot drive, maybe. Maybe use LVM and guided partitioning with all files in one partition. Then use the four 1T disks and the four way card and mdadm to set up the mirrored RAID with LVM on top for /home and add that to your fstab. Rsync the data back from your one drive that you put the original /home onto and you're done with that disk. Do all this with a brand new bookworm disk and linuxcnc and you're done Simplify, simplify, simplify :) Andy (amaca...@debian.org)
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/13/23 13:24, Andrew M.A. Cater wrote: On Wed, Dec 13, 2023 at 10:26:19AM -0500, gene heskett wrote: Greetings all; I thought I was doing things right a year back when I built a raid10 for my /home partition. but I'm tired of fighting with it for access. Anything that wants to open a file on it, is subjected to a freeze of at least 30 seconds BEFORE the file requester is drawn on screen. Once it has done the screen draw and the path is established, read/writes then proceed at multi-gigabyte speeds just like it should, but some applications refuse to wait that long, so digiKam cannot import from my camera for example one, QIDISlicer is another that get plumb upset and declares a segfault, core dumped, but it can't write the core dump for the same reason it declared a segfault. Here is a copy/paste of the last attempt to select the "device" tab in QIDISlicer: --- Error creating proxy: Error calling StartServiceByName for org.gtk.vfs.GPhoto2VolumeMonitor: Timeout was reached (g-io-error-quark, 24) ** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register URI scheme wxfs more than once ** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register URI scheme memory more than once (qidi-slicer:389574): Gtk-CRITICAL **: 04:55:47.084: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar [2023-12-13 05:10:27.325222] [0x7f77e6ffd6c0] [error] Socket created. Multicast: 255.255.255.255. Interface: 192.168.71.3 Unhandled unknown exception; terminating the application. Segmentation fault (core dumped) - This where it was attempting to open the cache buffers if needed to remember what moonraker, a web server driver which is part of the klipper install on the printer, addressed at 192.168.71.110: with an odd, high numbered port above 10,000. I've been here several times with this problem without any constructive responses other than strace, which of course does NOT work for network stuff, and would if my past history with it is any indication, generate several terabytes of output, but it fails for the same reason, no place to put its output because I assume, it can't write to the raid10 in a timely manner. Hi Gene, Respectfully, if I were you, I might consider tearing down one machine and rebuilding the data on it bit by bit. Questions to answer first: 1. Are all the disks the same size? yes 2. Are all the disks the same manufacturer? All 1T Samsung 870's 3. Are they all connected to the same controller if this is an add-in card? yes. add in card. If not an add in card: 4. Are they all connected to the SATA sockets on the motherboard? motherboard? No. All connected to a 6 port board, in port order. 4. If to the motherboard, are they the only devices connected to the SATA sockets there? No, main board is Asus Prime Z370-A II but it only has 6 ports, all busy. 5. What is the primary device that has / on it - NVME / SSD / spinning rust? SSD, another 1T samsung. So one more time: Why can't I use my software raid10 on 4 1T SSD's ? _How did you set the RAID 10 up? Would you be willing to scrap the data in /home and start again? No, I have a lot of work I'd be the rest of my life rebuilding. Howevr, in preparation to restarting amanda, I've just installed a 2nd sata add on card, this one with 16 ports, 4 of which are already loaded with 2T gigastones so I do have the means to rsync /home to 1 or more of those. Since lshw is a bit verbose, and this machine is stuffed, except the m2 sockets, I'll attach the output of lshw for those who want to peruse it. Maybe it will answer additional hdwe questions. I do have an m2 module I intend to add at some point, a WD_BLACK SN770 NVMe SSD of 2T capacity. Supposedly rate at 5160 MHZ/SEC. Its still in the box. All best, as ever, Andy (amaca...@debian.org) Cheers, Gene Heskett. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis . Cheers, Gene Heskett. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis coyote description: Desktop Computer product: System Product Name (SKU) vendor: System manufacturer version: System Version serial: System Serial Number width: 64 bits capabilities: smbios-3.1.1 dmi-3.1.1 smp vsyscall32 configuration: boot=normal chassis=desktop family=To be filled by O.E.M. sku=SKU uuid=93a9e285-63b0-4a26-8f43-40b0765b113c *-core description: Motherboard product: PRIME Z370-A II vendor: ASUSTeK COMPUTER INC. physical id: 0 ver
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/13/23 13:50, Dan Ritter wrote: Pocket wrote: Many reasons If the RAID controller bites the bullet you are usually toast unless you have another RAID controller (same manufacturer and type) as a spare. mdadm, zfs and btrfs all lack this problem. Not for me as I am not going down that worm hole I have zero luck replacing one companies raid controller with another and ditto on raid built into the motherboard. As above. As above I really don't need any help losing my data/files as I do a good job of that all by myself ;) btrfs and zfs have snapshots which really help avoiding losing data. On other machines, rsnapshot is often suitable. I am exploring rdiff-backup I found it is better to just have my data on several backup disks, that way if one fails I get another disk and copy all the data to the newly purchased disk. RAID isn't a backup solution, it's a way of keeping things going until you have time to restore. (And also a way of improving performance and/or manageability.) If you don't need or want it, you shouldn't use it. Same as any tool. I don't need the expense or trouble. Raspberry pi(s) and USB drives equate to "just works" -- It's not easy to be me
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/13/23 13:47, Nicolas George wrote: Pocket (12023-12-13): If the RAID controller Then use software RAID with a Libre implementation. Nope been there done that and I ain't doing that I found it is better to just have my data on several backup disks Yeah, backups and RAID are not meant to protect against the same issues, so if you think one replaces the other… After removing raid, I completely redesigned my network to be more inline with the howtos and other information. You know that RAID has nothing to do with the setup of your network, right? Not saying it did -- It's not easy to be me
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
Pocket wrote: > > Many reasons > > If the RAID controller bites the bullet you are usually toast unless you > have another RAID controller (same manufacturer and type) as a spare. mdadm, zfs and btrfs all lack this problem. > I have zero luck replacing one companies raid controller with another and > ditto on raid built into the motherboard. As above. > I really don't need any help losing my data/files as I do a good job of that > all by myself ;) btrfs and zfs have snapshots which really help avoiding losing data. On other machines, rsnapshot is often suitable. > I found it is better to just have my data on several backup disks, that way > if one fails I get another disk and copy all the data to the newly purchased > disk. RAID isn't a backup solution, it's a way of keeping things going until you have time to restore. (And also a way of improving performance and/or manageability.) If you don't need or want it, you shouldn't use it. Same as any tool. -dsr-
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
gene heskett writes: > It is a separate 6 port sata controller because the mobo is out of > ports. There is no obvious lag during bios post or grub booting it. That *should* rule out DNS then, unless something really strange is going on. What does mdadm tell you about the raid device, and its component devices? Is the filesystem on the raid healthy? Cheers, Tom
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
Pocket (12023-12-13): > If the RAID controller Then use software RAID with a Libre implementation. > I found it is better to just have my data on several backup disks Yeah, backups and RAID are not meant to protect against the same issues, so if you think one replaces the other… > After removing raid, I completely redesigned my network to be more inline > with the howtos and other information. You know that RAID has nothing to do with the setup of your network, right? -- Nicolas George signature.asc Description: PGP signature
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/13/23 13:20, gene heskett wrote: On 12/13/23 11:51, Pocket wrote: On 12/13/23 10:26, gene heskett wrote: Greetings all; I thought I was doing things right a year back when I built a raid10 for my /home partition. but I'm tired of fighting with it for access. Anything that wants to open a file on it, is subjected to a freeze of at least 30 seconds BEFORE the file requester is drawn on screen. Once it has done the screen draw and the path is established, read/writes then proceed at multi-gigabyte speeds just like it should, but some applications refuse to wait that long, so digiKam cannot import from my camera for example one, QIDISlicer is another that get plumb upset and declares a segfault, core dumped, but it can't write the core dump for the same reason it declared a segfault. Here is a copy/paste of the last attempt to select the "device" tab in QIDISlicer: --- Error creating proxy: Error calling StartServiceByName for org.gtk.vfs.GPhoto2VolumeMonitor: Timeout was reached (g-io-error-quark, 24) ** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register URI scheme wxfs more than once ** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register URI scheme memory more than once (qidi-slicer:389574): Gtk-CRITICAL **: 04:55:47.084: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar [2023-12-13 05:10:27.325222] [0x7f77e6ffd6c0] [error] Socket created. Multicast: 255.255.255.255. Interface: 192.168.71.3 Unhandled unknown exception; terminating the application. Segmentation fault (core dumped) - This where it was attempting to open the cache buffers if needed to remember what moonraker, a web server driver which is part of the klipper install on the printer, addressed at 192.168.71.110: with an odd, high numbered port above 10,000. I've been here several times with this problem without any constructive responses other than strace, which of course does NOT work for network stuff, and would if my past history with it is any indication, generate several terabytes of output, but it fails for the same reason, no place to put its output because I assume, it can't write to the raid10 in a timely manner. So one more time: Why can't I use my software raid10 on 4 1T SSD's ? Cheers, Gene Heskett. I gave up using raid many years ago and I used the extra drives as backups. So why did you give up? Must have been a reason. Many reasons No real benefit (companies excepted), and issues like you have been posting. If the RAID controller bites the bullet you are usually toast unless you have another RAID controller (same manufacturer and type) as a spare. I have zero luck replacing one companies raid controller with another and ditto on raid built into the motherboard. I really don't need any help losing my data/files as I do a good job of that all by myself ;) I found it is better to just have my data on several backup disks, that way if one fails I get another disk and copy all the data to the newly purchased disk. After removing raid, I completely redesigned my network to be more inline with the howtos and other information. I have little to nothing on the client system I use daily, everything is on networks systems and they have certain things they do. I have a "git" server that has all my setup/custom/building scripts and all my programming and solidworks projects. I have DELPHI build apps going back to about 1995. It all backed up to a backup server(master and slave) and also a 4TB offline external hard drive. I have not "lost" any information since. I also found that DHCP and NetworkManager is your friend. Maybe you should review your network setup as you seem to have a lot is issues with it? Wrote a script to rsync /home to the backup drives. Cheers, Gene Heskett. -- It's not easy to be me
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/13/23 11:51, Pocket wrote: On 12/13/23 10:26, gene heskett wrote: Greetings all; I thought I was doing things right a year back when I built a raid10 for my /home partition. but I'm tired of fighting with it for access. Anything that wants to open a file on it, is subjected to a freeze of at least 30 seconds BEFORE the file requester is drawn on screen. Once it has done the screen draw and the path is established, read/writes then proceed at multi-gigabyte speeds just like it should, but some applications refuse to wait that long, so digiKam cannot import from my camera for example one, QIDISlicer is another that get plumb upset and declares a segfault, core dumped, but it can't write the core dump for the same reason it declared a segfault. Here is a copy/paste of the last attempt to select the "device" tab in QIDISlicer: --- Error creating proxy: Error calling StartServiceByName for org.gtk.vfs.GPhoto2VolumeMonitor: Timeout was reached (g-io-error-quark, 24) ** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register URI scheme wxfs more than once ** (qidi-slicer:389574): CRITICAL **: 04:55:46.975: Cannot register URI scheme memory more than once (qidi-slicer:389574): Gtk-CRITICAL **: 04:55:47.084: gtk_box_gadget_distribute: assertion 'size >= 0' failed in GtkScrollbar [2023-12-13 05:10:27.325222] [0x7f77e6ffd6c0] [error] Socket created. Multicast: 255.255.255.255. Interface: 192.168.71.3 Unhandled unknown exception; terminating the application. Segmentation fault (core dumped) - This where it was attempting to open the cache buffers if needed to remember what moonraker, a web server driver which is part of the klipper install on the printer, addressed at 192.168.71.110: with an odd, high numbered port above 10,000. I've been here several times with this problem without any constructive responses other than strace, which of course does NOT work for network stuff, and would if my past history with it is any indication, generate several terabytes of output, but it fails for the same reason, no place to put its output because I assume, it can't write to the raid10 in a timely manner. So one more time: Why can't I use my software raid10 on 4 1T SSD's ? Cheers, Gene Heskett. I gave up using raid many years ago and I used the extra drives as backups. So why did you give up? Must have been a reason. Wrote a script to rsync /home to the backup drives. Cheers, Gene Heskett. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis
Re: raid10 is killing me, and applications that aren't willing towait for it to respond
On 12/13/23 10:41, Tom Furie wrote: gene heskett writes: I thought I was doing things right a year back when I built a raid10 for my /home partition. but I'm tired of fighting with it for access. Anything that wants to open a file on it, is subjected to a freeze of at least 30 seconds BEFORE the file requester is drawn on screen. Once it has done the screen draw and the path is established, Where is the raid10 located and how is it interfaced to the device you're accessing it from? That delay, along with other things you mentioned suggests (but this is only a guess without other relevant information) a DNS timeout. It is a separate 6 port sata controller because the mobo is out of ports. There is no obvious lag during bios post or grub booting it. /etc/fstab: gene@coyote:/etc/lvm/profile$ cat /etc/fstab # /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # systemd generates mount units based on this file, see systemd.mount(5). # Please run 'systemctl daemon-reload' after making changes here. # # # / was on /dev/sda1 during installation UUID=f295334b-fdcb-4428-bed3-cb9e9e129be6 / ext4 errors=remount-ro 0 1 # /tmp was on /dev/sda3 during installation UUID=518cb65d-21f0-493f-8bb5-a5f435796991 /tmpext4 defaults0 2 # swap was on /dev/sda2 during installation UUID=422b50db-9913-4ed3-92c3-dc18be72cc61 noneswapsw 0 0 /dev/sr0/media/cdrom0 udf,iso9660 user,noauto 0 0 UUID=bc6135de-0578-4e3b-b2c0-5c4687abd9bd /home ext4 errors=remount-ro 0 2 UUID=d24c3a99-9f40-4b71-92d4-916804553cb5 none swapsw 0 0 - From df: gene@coyote:/etc/lvm/profile$ sudo df [sudo] password for gene: Filesystem 1K-blocks Used Available Use% Mounted on udev 16328024 0 16328024 0% /dev tmpfs 3272676 18763270800 1% /run /dev/sda1 863983352 18784928 801236776 3% / tmpfs1636337612 16363364 1% /dev/shm tmpfs5120 8 5112 1% /run/lock /dev/sda347749868222628 45069232 1% /tmp /dev/md0p1 1796382580 330887008 1374170596 20% /home tmpfs 3272672 458683226804 2% /run/user/1000 - A dhcpd has been installed, but is limited to issuing a single fixed address to a 3d printer plugged into my network, as the printer doesn't seem to know what to do with a hosts file that runs the rest of my home network. Anything else you want, just ask. Cheers, Tom . Thank you Tom. Cheers, Gene Heskett. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis