Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 11:21 AM, Xin Li delp...@delphij.net wrote: Signed PGP part On 07/10/13 02:02, Dag-Erling Smrgrav wrote: The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. I think there are multiple versions of this (I also have one[1]) but the concern is that if one creates a pool with ashift=9, and now ashift=12, the pool gets unimportable. So there need a way to disable this behavior. Another thing (not really related to the automatic detection) is that we need a way to manually override this setting from command line when creating the pool, this is under active discussion at Illumos mailing list right now. [1] https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26 Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. -- Justin signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 1:06 PM, Steven Hartland kill...@multiplay.co.uk wrote: - Original Message - From: Justin T. Gibbs I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a desired ashift, which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) Yes, the approaches are similar. Our current version records the logical access size in the vdev structure too, which might relate to the issue below. One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. I would expect the zio pipeline to simply insert an ashift aligned thunking buffer for these operations, but I haven't tried going past an ashift of 13 in my tests. If it is an issue, it seems the restriction should be based on logical access size, not optimal access size. -- Justin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 1:42 PM, Steven Hartland kill...@multiplay.co.uk wrote: - Original Message - From: Justin T. Gibbs On Jul 10, 2013, at 1:06 PM, Steven Hartland wrote: - Original Message - From: Justin T. Gibbs I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a desired ashift, which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) Yes, the approaches are similar. Our current version records the logical access size in the vdev structure too, which might relate to the issue below. One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. I would expect the zio pipeline to simply insert an ashift aligned thunking buffer for these operations, but I haven't tried going past an ashift of 13 in my tests. If it is an issue, it seems the restriction should be based on logical access size, not optimal access size. Yes with your methodology you'll only see the issue if zfs_max_auto_ashift and physical_ashift are both 13, but this can be the case for example on a RAID controller with large stripsize. I'm not sure I follow. logical_ashift is available in our latest code, as is the physical_ashift. But even without the logical_ashift, why doesn't the zio pipeline properly thunk zio_phys_read() access based on the configured ashift? -- Justin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Adaptec 2903b SCSI card
It appears that there is no support for the Adaptec 2903b SCSI card, but of course I could be wrong. I would like to get this card to work, so if anyone could point me to a painfully obvious url or some documentation on how to get it to work that I have clearly overlooked, I would be forever grateful. If not, I would be willing to code the driver myself, depending on the size of the job, as i'd have maybe 2 days to do it (starting in a week or two). If this is the case, could anyone point me to any documentation on an API for implementing SCSI device drivers? I see that people.freebsd.org/~gibbs is somewhat of a place to start as far as studying existing SCSI drivers, but if there's anything else, I'd be happy to know and perhaps even code the thing. Your best bet at determining the interface for the card is to look at the Linux driver. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bus_dmamap_load change request.
I changed the code locally a while back to allocate the table during the allocation of the dma tag. This seems to work, but I just haven't gotten back to it to think through all of the consequences. Whatever the solution, we don't want to device a solution where most drivers have allocated 2X their required S/G list size. Most other O/Ses seem to pay for the dmamap abstraction this 100% allocation overcost. This is indeed a pity for 95% of systems currently using FreeBSD (IA32) that donnot actually require such overallocation. There is only 1 tag to several maps. If you allocate one extra S/G list per-tag, then your overallocation is 1/nmaps. The current scheme using the stack gives 0 over allocation, but isn't too nice for other reasons. I also don't completely understand your claim that overallocation is not required for IA32. Is this because you feel the driver should do the mapping loop and store the results into its own, taylored to its dma engine, structure? It is hard to find a mapping abstraction that avoids a conversion. What about supplying several dmamap (create/load/...) methods, for example: 1) A one shot load method for reasonnable known map size that does not overallocate if not required by arch. (the one we have) 2) A one shot method that may overallocate depending on arch and/or map size (may use method #1 when possible). 3) A multi-shot method that may not overallocate if not needed for arch. By multishot, I mean the callback may be called several times with partial map + some more flag. The question for all of these methods is, where is the mapping list stored? That would help me understand your proposal. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Patch #3 (TCP / Linux / Performance)
Does the FreeBSD tcp stack do zero copy (page flip the data to userspace)? In the localhost case, it seems like there are two copies to/from userspace there. It has the ability to do it via sendfile() and a few other mechanisms, but not as a normal part of typical read()/write(). Ahh, but there are patches floating around that do support zero-copy. Just ask Ken Merry and Drew Gallatin. I don't think they've been integrated due to lack of testing time, but they've existed for 2 or so years now. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: ahc(4) reports PCI parity error interrupt
On Fri, Nov 02, 2001 at 12:21:22AM +0100, Wilko Bulte wrote: I forgot to mention that I had the same config running fine on the same Abit MB using 4.3-stable. In 4.3-stable, we ignored the parity errors due to a logic bug. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: ahc(4) reports PCI parity error interrupt
It appears things are more complicated than that. I have swapped the Adaptec 29160 for a 2940UW. I am now running continous buildworlds, a 'dd' of the SCSI disk and fxtv in parallel. This appears to work like a charm until now. I'll let it run and see what develops I bet you the aic7892 on the 29160 does larger bursts than the 7880 on the 2940UW. This isn't quite an apples to apples comparison. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
My understanding is that you need a dmamap for every buffer that you want to map into bus space. You need one dmamap for each independantly manageable mapping. A single mapping may result in a long list of segments, regardless of whether you have a single KVA buffer or multiple KVA buffers that might contribute to the mapping. Yes yes, I understand that. But that's only if you want to map a buffer that's larger than PAGE_SIZE bytes, like, say, a 64K buffer being sent to a disk controller. What I want to make sure everyone understands here is that I'm not typically dealing with buffers this large: instead I have lots of small buffers that are smaller than PAGE_SIZE bytes. A single mbuf alone is only 256 bytes, of which only a fraction is used for data. An mbuf cluster buffer is usually only 2048 bytes. Transmitted packets are typically fragmented across 2 or 3 mbufs: the first mbuf contains the header, and the other two contain data. (Or the first one contains part of the header, the second one contains additional header data, and the third contains data -- whatever.) At most I will have 1500 bytes of data to send, which is less than PAGE_SIZE, and that 1500 bytes will be fragmented across a bunch of smaller buffers that are also smaller than PAGE_SIZE. Therefore I will not have one dmamap with multiple segments: I will have a bunch of dmamaps with one segment each. The fact that the data is less than a page in size matters little to the bus dma concept. In other words, how is this packet presented to the hardware? Does it care that all of the component pieces are PAGE_SIZE in length? Probably not. It just wants the list of address/length pairs that compose that packet and there is no reason that each chunk needs to have it own, and potentially expensive, dmamap. Creating a dmamap, depending on the architecture, could be expensive. You really want to create them in advance (or pool them), with at most one dmamap per concurrent transaction you support in your driver. The only problem here is that I can't really predict how many transactions will be going at one time. I will have at least RX_DMA_RING maps (one for each mbuf in the RX DMA ring), and some fraction of TX_DMA_RING maps. I could have the TX DMA ring completely filled with packets waiting to be DMA'ed and transmitted, or I may have only one entry in the ring currently in use. So I guess I have to allocate RX_DMA_RING + TX_DMA_RING dmamaps in order to be safe. Yes or allocate them in chunks so that the total amount is only as large as the greatest demand your driver has ever seen. With the added complications of deferring the mapping if we're out of space, issuing the callback, etc. Why can't I just call bus_dmamap_load() multiple times, once for each mbuf in the mbuf list? Due to the cost of the dmamaps, the cost of which is platform and bus-dma implementation dependent - e.g. could be a 1-1 mapping to a hardware resource. Consider the case of having a full TX and RX ring in your driver. Instead of #TX*#RX dmamaps, you will now have three or more times that number. There is also the issue of coalessing the discontiguous chunks if there are too many chunks for your driver to handle. Bus dma is supposed to handle that for you (the x86 implementation doesn't yet, but it should) but it can't if it doesn't understand the segment limit per transaction. You've hidden that from bus dma by using a map per segment. (Note: for the record, an mbuf list usually contains one packet fragmented across multiple mbufs. An mbuf chain contains several mbuf lists, linked together via the m_nextpkt pointer in the header of the first mbuf in each list. By the time we get to the device driver, we always have mbuf lists only.) Okay, so I haven't written a network driver yet, but you got the idea, right? 8-) Chances are you are going to use the map again soon, so destroying it on every transaction is a waste. Ok, I spent some more time on this. I updated the code at: http://www.freebsd.org/~wpaul/busdma I'll take a look. The changes are: ... - Added routines to allocate a chunk of maps in a singly linked list, from which the other routines can grab them as needed. Are these hung off the dma tag or something? dmamaps may hold settings that are peculuar to the device that allocated them, so they cannot be shared with other clients of bus_dmamap_load_mbuf. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
The fact that the data is less than a page in size matters little to the bus dma concept. In other words, how is this packet presented to the hardware? Does it care that all of the component pieces are PAGE_SIZE in length? Probably not. It just wants the list of address/length pairs that compose that packet and there is no reason that each chunk needs to have it own, and potentially expensive, dmamap. Maybe, but bus_dmamap_load() only lets you map one buffer at a time. I want to map a bunch of little buffers, and the API doesn't let me do that. And I don't want to change the API, because that would mean modifying busdma_machdep.c on each platform, which is a hell that I would rather avoid. bus_dmamap_load() is only one part of the API. bus_dmamap_load_mbuf or bus_dmamap_load_uio or also part of the API. They just don't happen to be impmeneted yet. 8-) Perhaps there should be an MD primitive that knows how to append to a mapping? This would allow you to write an MI loop that does exactly what you want. there are too many chunks for your driver to handle. Bus dma is supposed to handle that for you (the x86 implementation doesn't yet, but it should) but it can't if it doesn't understand the segment limit per transaction. You've hidden that from bus dma by using a map per segment. Ok, a slightly different question: what happens if I call bus_dmamap_load() more than once with different buffers but with the same dmamap? The behavior is undefined. - Added routines to allocate a chunk of maps in a singly linked list, from which the other routines can grab them as needed. Are these hung off the dma tag or something? dmamaps may hold settings that are peculuar to the device that allocated them, so they cannot be shared with other clients of bus_dmamap_load_mbuf. It's a separate list. The driver is reponsible for allocating the head of the list, then it hands it to bus_dmamap_list_alloc() along with the required dma tag. bus_dmamap_list_alloc() then calls bus_dmapap_create() to populate the list. The driver doesn't have to manipulate the list itself, until time comes to destroy it. Okay, but does this mean that bus_dmamap_load_mbuf no longer takes a dmamap? Drivers may want to allocate/manage the dmamaps in a different way. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
Correction. This sample: if (bus_dma_tag_create(pci-parent_dmat, PAGE_SIZE, lim, BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL, len, 1, BUS_SPACE_MAXSIZE_32BIT, 0, pci-cntrol_dmat) != 0) { isp_prt(isp, ISP_LOGERR, cannot create a dma tag for control spaces); free(isp-isp_xflist, M_DEVBUF); free(pci-dmaps, M_DEVBUF); return (1); } You'll need to change the number of segments to match the max supported by the card (or the max you will ever need). This example made me realize that the bounce code doesn't deal with multiple segments being copied into a single page (i.e. tracking and using remaining free space in a page already allocated for bouncing for a single map). I'll have to break loose some time to fix that. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
Every hear the phrase you get what you pay for? The API isn't all that clear, and we don't have a man page or document that describes in detail how to use it properly. Rather than whining about that, I decided to tinker with it and Use The Source, Luke (tm). This is the result. Fair enough. My understanding is that you need a dmamap for every buffer that you want to map into bus space. You need one dmamap for each independantly manageable mapping. A single mapping may result in a long list of segments, regardless of whether you have a single KVA buffer or multiple KVA buffers that might contribute to the mapping. Each mbuf has a single data buffer associated with it (either the data area in the mbuf itself, or external storage). We're not allowed to make assumptions about where these buffers are. Also, a single ethernet frame can be fragmented across multiple mbufs in a list. So unless I'm mistaken, for each mbuf in an mbuf list, what we have to do is this: - create a bus_dmamap_t for the data area in the mbuf using bus_dmamap_create() Creating a dmamap, depending on the architecture, could be expensive. You really want to create them in advance (or pool them), with at most one dmamap per concurrent transaction you support in your driver. - do the physical to bus mapping with bus_dmamap_load() bus_dmamap_load() only understands how to map a single buffer. You will have to pull pieces of bus_dmamap_load into a new function (or create inlines for common bits) to do this correctly. The algorithm goes something like this: foreach mbuf in the mbuf chain to load /* * Parse this contiguous piece of KVA into * its bus space regions. */ foreach bus space discontiguous region if (too_many_segs) return (error); Add new S/G element With the added complications of deferring the mapping if we're out of space, issuing the callback, etc. - call bus_dmamap_sync() as needed (might handle copying if bounce buffers are required) - insert mysterious DMA operation here - do post-DMA sync as needed (again, might require bounce copying) - call bus_dmamap_unload() to un-do the bus mapping (which might free bounce buffers if some were allocated by bus_dmamap_load()) - destroy the bus_dmamap_t Chances are you are going to use the map again soon, so destroying it on every transaction is a waste. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
What makes it FreeBSD...
It's reasonable to want to control what get's called FreeBSD. Certainly. But I think it has to go beyond the installer. We should define an environment that third party applications can depend on being available in any installation that claims to be FreeBSD. Without this, you have the same environement that Linux does where third party apps are only qualified on distribution U and X and have no hope of working on distributions Y and Z. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Re. The Foundation [was Re: FreeBSD Mall now BSDCentral]
The Foundation has yet to approach Wind River about the Trademark, so I cannot speculate on their disposition. What are your plans to change this situation? The Foundation isn't planning to do anything about the trademark or in regards to any of its other proposed activities until sufficient donations have arrived. I believe that the financial statement released with our announcement makes it clear why this must be the case. How much do you think it will cost to transfer the trademark? We're seeking legal counsel on this issue now. We should have a better estimate shortly. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Re. The Foundation [was Re: FreeBSD Mall now BSDCentral]
The Foundation has yet to approach Wind River about the Trademark, so I cannot speculate on their disposition. What are your plans to change this situation? The Foundation isn't planning to do anything about the trademark or in regards to any of its other proposed activities until sufficient donations have arrived. I believe that the financial statement released with our announcement makes it clear why this must be the case. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Re. The Foundation [was Re: FreeBSD Mall now BSDCentral]
Hackers, CC [EMAIL PROTECTED] Wind River might be cautious slow when considering donating the FreeBSD trademark to a FreeBSD Foundation controlled by a board of just 3 directors, that even FreeBSD people are just getting to know of, have yet to evaluate as `The Foundation'. I believe that all of the members of the Foundation's current board are well known to the FreeBSD community. That said, the community will always be able to vote their level of confidence based on whether they financially support the Foundation. As was mentioned in the FAQ, the Foundation will not engage in any further activities until sufficient funds arrive. (To those of you who have already donated - Thanks for the support!) Wind River might find it easier quicker to decide to donate the trademark to the foundation, if the foundation comprised not just 3 self appointed directors, but perhaps included at least 3 non executive directors on their board. The current BOD has only one self-appointed director, me. This came as a natural consequence of spear-heading the formation of the Foundation. The other two directors I convinced to join^W^W^Wchose because of their proven track record as responsible individuals who have worked to further the FreeBSD cause. The FreeBSD Foundation is an administrative body who's role is to efficiently collect and disburse funds to better the project. Core has never proven itself adept at administration. For this reason, the Foundation is a completely separate entity explicitly structured to handle these, somewhat tedious and boring, activities. Adding 3 non-execs might increase the assurance of others too, not just WR. BOD could invite core to nominate the first few that BOD appoint. The non execs could retire by chronological rotation, as real directors do. The Foundation's current BOD is running with the bare minimum number of directors and officers as required by the IRS. The BOD has always planned to add to its membership (5 to 7 directors in total), but in these early days of operation the Foundation is simply not mature enough to warrant the extra bureaucracy of additional directors. As for selection of future directors, the assumption has always been that the pool of candidates would encompass those from the business, academic and charity circles with expertise that would benefit the operation of the Foundation. The directors must support the administrative nature of the Foundation, and the skill set of future directors will reflect this. Having directors from outside the FreeBSD community will broaden the perspective and enhance the operation of the Foundation. Due care to avoid any type of conflict of interest issues will be exercised at all times. As I recall FreeBSD Inc (ie jkh dg, a team of 2) once held the trademark, I could be wrong, but I don't believe that this was ever the case. The sooner BOD appoint a handfull of non-execs, some nominated by core, the sooner it'll be easy to encourage Wind River to donate the trademark, before who knows what might happen at, to, or within Wind River ? The Foundation has yet to approach Wind River about the Trademark, so I cannot speculate on their disposition. -- Justin Secretary/Tresurer The FreeBSD Foundation To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: IBM ACP modem driver
A couple of months ago IBM released the source to their linux modem driver for the thinkpad 600 600E: I was looking to port this stuff to -current, so I'd be interested in seeing your patches. It would be nice if we could simply rewrite the kernel portion of the code to avoid the GPL license on the original IBM code. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: call for testers: port aggregation netgraph module
Each link is checked once every second to see if the link is still up. An attempt to send a packet over a dead link will cause the packet to be shifted over to the next link in the bundle. Any chance this can be done through an async event rather than by polling? -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: call for testers: port aggregation netgraph module
Each link is checked once every second to see if the link is still up. An attempt to send a packet over a dead link will cause the packet to be shifted over to the next link in the bundle. Any chance this can be done through an async event rather than by polling? If there was, I would have done it. Perhaps it would be best to create an interface that allows async notification but to provide a default implementation of the interface that polls? This would allow hardware that has a mechanism to detect the state change to override the default method while all other cards "just work" without modification by polling. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: aic7xxx driver SCBs
On Fri, 9 Mar 2001, Mike Smith wrote: Joe; it looks like you have some funny ideas about something that's not actually very relevant. I assume that you have already gone and bought Monster Cable(tm) SCSI cables, and that you have the special oxygen-free-copper SCSI controller PCBs, because none of this is going to mean anything unless you have. Pardon and sorry, Mike. It is rather your reply that looks funny to me. Okay. Everyone take a big breath. My rule of thumb for email is, if you can't reply kindly to someone's question, you shouldn't reply at all. Sometimes I fall shy of that mark too, but I think its a worthy goal for everyone participating on our lists. So, Mike, try to play nice. Okay? As to the original question... SCSI Control Blocks (SCBs) are the controller resources used by the aic7xxx driver to deliver commands. You need to have one SCB for every outstanding transaction the card handles. All cards supported by either Linux or FreeBSD allocate a total of 253 SCBs. This number is *not* configurable on either OS (assuming you are using my driver). One thing that confuses people is the number of SCBs that can be concurrently stored on chip, varies with from chip to chip and sometimes with how the chip is integrated on the MB or HBA. The more hardware SCBs, the better the performance, but due to an SCB paging scheme, the total number of oustanding transactions all controllers support is 253. If you are curious, the driver probe message indicates the number of hardware SCBs and the total number of transactions, like so: aic7899: Wide Channel A, SCSI Id=7, 32/255 SCBs hardware SCBS ==^ ^== total number 2 of the 255 SCBs are reserved for various things hence the actual number of transactions being 253, not 255. If only one nubmer is printed, the hardware and total are equal. The Linux driver allows you to specify the number of transactions that can be queued to any given device on the bus. The default, again if your are using my driver, is to allow up to 253, the maximum the controller supports. The driver will automatically throttle this number based on the capabilities of the individual device. In the FreeBSD environment, this throttling occurs in the SCSI layer, not the controller. So, how many transactions should you allow? To some extent that depends on your workload. For non-sequential I/O (in my opinion the most common scenario), the more tags the merrier. This is why both the Linux and FreeBSD environments default to the maximum number of tags. Some devices, due to limited processing power on the drive or poor firmware engineering do show reduced sequential performance when put under a high tag load. If you know you are only going to do sequential I/O *and* you have the time to experiment with the results, your application might benefit from reducing the number of tags allowed. If you want to change these parameters under FreeBSD, read the camcontrol manpage about the "tags" option. Under linux, as you already know, play with the settings via "make *config". -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: disk problem
i did the vinum stuff again, but now im getting all kind of errors. btw, the kernel is of this morning (4.2 cvsuped this morning - local time), i noticed some fixes to the aic7xxx. You have a bad cable or terminator. The 7899 runs quite a bit faster than the 2940U, so the problem may not have shown itself with your previous configuration. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: bus_alloc_resource and RF_SHARABLE
: Just so I'm completely clear on this though, the intent is that multiple : bus_alloc_resource calls for a single BAR within a single driver is : explictly prohibited, right? So if I want to map ONLY the first byte : and the last byte of, say, a 16MB PCI BAR, I have to map the whole : thing, use the same resource handle for everything, and give up any : potential address space/vm protection afforded by having the middle : unmapped, right? Right now BARs can be only mapped once. If you have a physical device that is serviced by a bunch of sub-devices, you'll need to cope by providing that functionality in a "bridge" driver. I'm working on this for my NetBSD puc driver port. Actually, the problem is not that you can only "bus_allocate" a BAR once, but rather that we map that area into KVA auto-magically. We should instead allow the user to perform their own mapping(s) into KVA. You don't need a bridge driver to get this effect. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Reg: Adaptec AIC-7892 on board SCSI controller ..
Justin, Thanks for your prompt response, but I did see the 3.3 release notes, before attempting the install. It does say that "Adaptec AIC7850, AIC7860, AIC7880, AIC789x, on-board SCSI controllers." are supported. Btw, the release notes for 3.4 also says the same. Can anyone throw more light on this ? Thanks for your time. When those release notes were written, only the aic7895 existed in the aic789X range. That chip is supported by 3.3. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Reg: Adaptec AIC-7892 on board SCSI controller ..
I bought 3.3 CD's from Walnut Creek and use BSD at home, but that has a IDE disk. This is my first attempt at installing one with SCSI. Upgrading to 4.x is not an option. FreeBSD 3.3 does not include support for the 7892. IIRC 3.4 and all releases after it, supports the 7892. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: KVM switch vs. FreeBSD psm driver
I've got a Belkin OmniView Pro 8-Port KVM switch which thinks it's much smarter than it really is. I've had so many problems with this product that I dumped it for an Apex Outlook. I couldn't be happier. Since I donated the Belkin to another group, I've heard that they were able to send it in to be fixed. There was some manufacturing defect on early boxes that accounted for part of my problem. Since the signal quality through the switch was so poor anyway, I don't regret going to the Apex. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
FreeBSD Foundation: Examples of FreeBSD as teaching aid/research plat
As some of you may know, I'm working on a 501(c)3 (tax exempt/non-profit) determination for the FreeBSD Foundation. The IRS seems to be a little confused about the nature of FreeBSD and we're currenlty working on a response to an initial determination from the IRS that was not favorable. One thing that would help us to explain the nature of FreeBSD and how it is used by the public is to enumerate some specific examples of how FreeBSD is used as either a teaching aid or a research platform by educational institutions. If possible, please include a contact name, email, or phone number so we can ask additional questions if necessary. Thanks in advance for your help! Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: SCSI disk naming problem
In article [EMAIL PROTECTED] you wrote: Current FreeBSD SCSi disk naming mechanism is problem for using more than one disks in the chain during the disk failure. The problem is that the name is not fixed with is SCSI ID. e.g., if one disk is presented in the chain, regardless its SCSI ID, it is always named "da0"; ... Is there problem with fixed disk naming mechanism? 'Path based names' do not deal with systems that have multiple paths to the same device. For example, if I have two host adapters talking on the same bus for redundancy, which name to I give to the devices on the bus? -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: any docs on how to use bus_dma_tag_create e.a. ?
/*lowaddr*/BUS_SPACE_MAXADDR_32BIT, /*highaddr*/BUS_SPACE_MAXADDR, low and high address of the region that the DMA engine cannot access. Meaning e.g. the 16Mbyte barrier that ISA DMA has? For PCI this would be a 4Gb range(?). The range could be much larger than 4GB. Remember this is a range the device *cannot* access, not a range it can access. So, the beginning of the range for an ISA device would be BUS_SPACE_MAXADDR_24BIT and the hight address would be BUS_SPACE_MAXADDR. Depending on the platform or configuration of the machine, the high address could be larger than a 32bit quantity. /*maxsize*/MAXBSIZE, Maximum DMA transfer size. /*nsegments*/AHC_NSEG, Maximum number of discontinuities in the mapped region. Eh.. ? /*maxsegsz*/AHC_MAXTRANSFER_SIZE, Maximum size of a segment. maxsize = nsegments * maxsegsz. Eh.. ? Many DMA engines have S/G capability and so can perform a single DMA that spans multiple segments of "bus space contiguous" data. By setting these parameters, the bus_dmamap_load function can determine how best to map your transfer into bus space and will return to you an array of segments to program into your DMA hardware. You should use the new API if possible. That is what I'm planning to do. The amount of sample code in the various drivers is rather limited as most drivers use the old code. It seems that its mostly confined to the SCSI code, but hopefully that will change over time. So I hope you don't mind me asking some more questions, Not a problem. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: any docs on how to use bus_dma_tag_create e.a. ?
/*lowaddr*/BUS_SPACE_MAXADDR_32BIT, /*highaddr*/BUS_SPACE_MAXADDR, low and high address of the region that the DMA engine cannot access. Meaning e.g. the 16Mbyte barrier that ISA DMA has? For PCI this would be a 4Gb range(?). The range could be much larger than 4GB. Remember this is a range the device *cannot* access, not a range it can access. So, the beginning of the range for an ISA device would be BUS_SPACE_MAXADDR_24BIT and the hight address would be BUS_SPACE_MAXADDR. Depending on the platform or configuration of the machine, the high address could be larger than a 32bit quantity. /*maxsize*/MAXBSIZE, Maximum DMA transfer size. /*nsegments*/AHC_NSEG, Maximum number of discontinuities in the mapped region. Eh.. ? /*maxsegsz*/AHC_MAXTRANSFER_SIZE, Maximum size of a segment. maxsize = nsegments * maxsegsz. Eh.. ? Many DMA engines have S/G capability and so can perform a single DMA that spans multiple segments of bus space contiguous data. By setting these parameters, the bus_dmamap_load function can determine how best to map your transfer into bus space and will return to you an array of segments to program into your DMA hardware. You should use the new API if possible. That is what I'm planning to do. The amount of sample code in the various drivers is rather limited as most drivers use the old code. It seems that its mostly confined to the SCSI code, but hopefully that will change over time. So I hope you don't mind me asking some more questions, Not a problem. -- Justin To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: any docs on how to use bus_dma_tag_create e.a. ?
bus_dma related stuff is only required if the device has a DMA engine you wish to use. To access the shared memory on the card (e.g. map Eh, sorry, I was confused. It has *both* shared memory and a DMA engine. it into the kernel's virtual address space), you will need to use the resource manager and bus space. Do you by chance have an example (maybe in -current somewhere) of the shared memory stuff? I found some DMA stuff in ahc_pci.c: /* Allocate a dmatag for our SCB DMA maps */ /* XXX Should be a child of the PCI bus dma tag */ error = bus_dma_tag_create(/*parent*/NULL, A parent tag would indicate the restrictions of any parent bridge between the device you are talking to and CPU memory. We haven't modified the new bus code yet to pass through this information, so just leave it NULL for now. /*alignment*/1, Any alignment constraints on the target memory region of a DMA specified in bytes. If the allocation must be 32bit aligned, you would specify 4. /*boundary*/0, Any boundary constraints on the target memory region of a DMA, for instance if the DMA cannot cross a 64k boundary, you would set this to 64K. /*lowaddr*/BUS_SPACE_MAXADDR_32BIT, /*highaddr*/BUS_SPACE_MAXADDR, low and high address of the region that the DMA engine cannot access. /*filter*/NULL, /*filterarg*/NULL, If the device's DMA constraints cannot be specified with a single region, you must specify a region that encompasses all such regions and specify a filter function to provide a finer level of control. /*maxsize*/MAXBSIZE, Maximum DMA transfer size. /*nsegments*/AHC_NSEG, Maximum number of discontinuities in the mapped region. /*maxsegsz*/AHC_MAXTRANSFER_SIZE, Maximum size of a segment. maxsize = nsegments * maxsegsz. /*flags*/BUS_DMA_ALLOCNOW Allocate all necessary resources to handle a single mapping for this tag at the time the tag is created. Most (?) drivers seem to use the older framework (can I distinguish those by COMPAT_PCI_DRIVER() ?). You should use the new API if possible. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: any docs on how to use bus_dma_tag_create e.a. ?
bus_dma related stuff is only required if the device has a DMA engine you wish to use. To access the shared memory on the card (e.g. map Eh, sorry, I was confused. It has *both* shared memory and a DMA engine. it into the kernel's virtual address space), you will need to use the resource manager and bus space. Do you by chance have an example (maybe in -current somewhere) of the shared memory stuff? I found some DMA stuff in ahc_pci.c: /* Allocate a dmatag for our SCB DMA maps */ /* XXX Should be a child of the PCI bus dma tag */ error = bus_dma_tag_create(/*parent*/NULL, A parent tag would indicate the restrictions of any parent bridge between the device you are talking to and CPU memory. We haven't modified the new bus code yet to pass through this information, so just leave it NULL for now. /*alignment*/1, Any alignment constraints on the target memory region of a DMA specified in bytes. If the allocation must be 32bit aligned, you would specify 4. /*boundary*/0, Any boundary constraints on the target memory region of a DMA, for instance if the DMA cannot cross a 64k boundary, you would set this to 64K. /*lowaddr*/BUS_SPACE_MAXADDR_32BIT, /*highaddr*/BUS_SPACE_MAXADDR, low and high address of the region that the DMA engine cannot access. /*filter*/NULL, /*filterarg*/NULL, If the device's DMA constraints cannot be specified with a single region, you must specify a region that encompasses all such regions and specify a filter function to provide a finer level of control. /*maxsize*/MAXBSIZE, Maximum DMA transfer size. /*nsegments*/AHC_NSEG, Maximum number of discontinuities in the mapped region. /*maxsegsz*/AHC_MAXTRANSFER_SIZE, Maximum size of a segment. maxsize = nsegments * maxsegsz. /*flags*/BUS_DMA_ALLOCNOW Allocate all necessary resources to handle a single mapping for this tag at the time the tag is created. Most (?) drivers seem to use the older framework (can I distinguish those by COMPAT_PCI_DRIVER() ?). You should use the new API if possible. -- Justin To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: any docs on how to use bus_dma_tag_create e.a. ?
In article [EMAIL PROTECTED] you wrote: I'm currently trying to hack a driver together for a PCI card that uses shared memory to communicate to the host. If I'm not completely offtrack I need to use (under newbus/-current) bus_dma_tag_create, bus_dma_alloc etc to get access to the cards shared memory. bus_dma related stuff is only required if the device has a DMA engine you wish to use. To access the shared memory on the card (e.g. map it into the kernel's virtual address space), you will need to use the resource manager and bus space. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: any docs on how to use bus_dma_tag_create e.a. ?
In article 199907201904.vaa03...@yedi.iaf.nl you wrote: I'm currently trying to hack a driver together for a PCI card that uses shared memory to communicate to the host. If I'm not completely offtrack I need to use (under newbus/-current) bus_dma_tag_create, bus_dma_alloc etc to get access to the cards shared memory. bus_dma related stuff is only required if the device has a DMA engine you wish to use. To access the shared memory on the card (e.g. map it into the kernel's virtual address space), you will need to use the resource manager and bus space. -- Justin To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: CAM: delaying new commands during reset
This stuff should really go to the SCSI list. I read that list much more frequently than this one. The Iomega USB Zip drive is a bit slow when resetting (reset of the USB part of the drive). It takes 1s or more to reset. The reset is initiated because for example an illegal command was received (sync cache for example). The hard coded reset delay in there is quite crude. You should just call xpt_freeze_devq() for that device and then release the queue from a timeout handler. In general, the peripheral drivers will wait until after a bus settle delay anyway, but the only way to ensure this delay is to freeze the queue. The problem is that the reset is initiated and the command that failed xpt_done()-d with an error. All ccbs that have an error status set should cause the device queue to be frozen and the CAM_DEV_QFRZN flag should be set in the cam status field of the CCB. If you don't freeze the queue, the peripheral driver cannot perform error recovery in a consistant way. It also looks like all umass I/O is blocking/polling. Since this can occur from a SWI, this is pretty bad to do. Is there no alternative? It also appears that this driver has a very limited error code vocabulary. Is that because the transport or device gives little information about errors? What would be the proper approach to make the ccb delay until the reset has finished? return a CAM_REQUEUE_REQ instead of CAM_SCSI_BUSY? Or store the ccb and process it when the reset is done? CAM_REQUEUE_REQ is for commands that have been queued in the SIM but have never been sent to a device. The error modle goes something like this: All ccbs with non-zero status should cause the device queue to be frozen and the CAM_DEV_QFRZN flag set in the status word. When an error occurrs: Return all queued CCBs that match the device(s) affected by the error with CAM_REQUEUE_REQ status. Return any 'invalidated' commands (commands that were on the device but have been thrown out in response to this error) with the correct error status (CAM_BUS_RESET, CAM_BDR_SENT, etc.) Return any commands that have completed with an error with the apropriate error code (CAM_UNCOR_PARIY, CAM_SCSI_STATUS_ERROR, CAM_DATA_RUN_ERR, etc.) If you require a specific amount of recovery time, freeze the device or simq and schedule a timeout to release the queue. An underrun is not an error. If a ccb is returned with no error codes set, the residual will be examined, so you must set the residual on all completed commands. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: DVD-ram
How do you think about some MO(Magneto Otpical disk) and PD drives? 3.5" 650MB and 1.3GB MO drives should handle 512KB/sector(128MB, 230MB, 540MB) and 2048KB/sector media(640MB, 1.3GB). The da driver should handle 2048KB sector sized media right now. If it doesn't for some reason, that is a bug and it should be fixed. Some PD drives use 2 LUNs. One of them is used for CD drive mode and another is for PD drive. I don't see any problem with the 2 LUNs. CAM probes all luns and has no restrictions on the device types allowed on those luns. Lun 0 could be handled by the cd driver and lun 1 could be handled by the da driver. How do you treat write protection? DVD-RAM type II media can be remove from the cartridge and be read as like as DVD-ROM media by some latest DVD-ROM drives, for example Panasonic's. But the striped DVD-RAM media is treat ad read only media by DVD-RAM drive. The cd driver will have to understand that some media can be written to and some cannot. The main reason to use the cd driver for this is that DVD is under the MMC command set that the cd driver is supposed to support. The fact that it doesn't support all of those commands right now is a bug, but that doesn't mean that a new driver type is needed. In addition, there are many bugy MO drives, ex. cash probelem, and media, ex. formated media for Windows. They cause to need some extra error handling. So you need quirk entries. The CD driver already has a quirk facility that could be expanded to handle these bugs. Again, I don't see a compelling reason for writing a new driver over expanding the functionality in the old. We were very happy to use DVD-RAM/MO/PD drives on FreeBSD-2.2.X, because pre-CAM SCSI system had the od-driver. We could not use these devices on FreeBSD-3.X without the new od-driver. How did the da driver fail? -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: CAM: delaying new commands during reset
This stuff should really go to the SCSI list. I read that list much more frequently than this one. The Iomega USB Zip drive is a bit slow when resetting (reset of the USB part of the drive). It takes 1s or more to reset. The reset is initiated because for example an illegal command was received (sync cache for example). The hard coded reset delay in there is quite crude. You should just call xpt_freeze_devq() for that device and then release the queue from a timeout handler. In general, the peripheral drivers will wait until after a bus settle delay anyway, but the only way to ensure this delay is to freeze the queue. The problem is that the reset is initiated and the command that failed xpt_done()-d with an error. All ccbs that have an error status set should cause the device queue to be frozen and the CAM_DEV_QFRZN flag should be set in the cam status field of the CCB. If you don't freeze the queue, the peripheral driver cannot perform error recovery in a consistant way. It also looks like all umass I/O is blocking/polling. Since this can occur from a SWI, this is pretty bad to do. Is there no alternative? It also appears that this driver has a very limited error code vocabulary. Is that because the transport or device gives little information about errors? What would be the proper approach to make the ccb delay until the reset has finished? return a CAM_REQUEUE_REQ instead of CAM_SCSI_BUSY? Or store the ccb and process it when the reset is done? CAM_REQUEUE_REQ is for commands that have been queued in the SIM but have never been sent to a device. The error modle goes something like this: All ccbs with non-zero status should cause the device queue to be frozen and the CAM_DEV_QFRZN flag set in the status word. When an error occurrs: Return all queued CCBs that match the device(s) affected by the error with CAM_REQUEUE_REQ status. Return any 'invalidated' commands (commands that were on the device but have been thrown out in response to this error) with the correct error status (CAM_BUS_RESET, CAM_BDR_SENT, etc.) Return any commands that have completed with an error with the apropriate error code (CAM_UNCOR_PARIY, CAM_SCSI_STATUS_ERROR, CAM_DATA_RUN_ERR, etc.) If you require a specific amount of recovery time, freeze the device or simq and schedule a timeout to release the queue. An underrun is not an error. If a ccb is returned with no error codes set, the residual will be examined, so you must set the residual on all completed commands. -- Justin To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: DVD-ram
How do you think about some MO(Magneto Otpical disk) and PD drives? 3.5 650MB and 1.3GB MO drives should handle 512KB/sector(128MB, 230MB, 540MB) and 2048KB/sector media(640MB, 1.3GB). The da driver should handle 2048KB sector sized media right now. If it doesn't for some reason, that is a bug and it should be fixed. Some PD drives use 2 LUNs. One of them is used for CD drive mode and another is for PD drive. I don't see any problem with the 2 LUNs. CAM probes all luns and has no restrictions on the device types allowed on those luns. Lun 0 could be handled by the cd driver and lun 1 could be handled by the da driver. How do you treat write protection? DVD-RAM type II media can be remove from the cartridge and be read as like as DVD-ROM media by some latest DVD-ROM drives, for example Panasonic's. But the striped DVD-RAM media is treat ad read only media by DVD-RAM drive. The cd driver will have to understand that some media can be written to and some cannot. The main reason to use the cd driver for this is that DVD is under the MMC command set that the cd driver is supposed to support. The fact that it doesn't support all of those commands right now is a bug, but that doesn't mean that a new driver type is needed. In addition, there are many bugy MO drives, ex. cash probelem, and media, ex. formated media for Windows. They cause to need some extra error handling. So you need quirk entries. The CD driver already has a quirk facility that could be expanded to handle these bugs. Again, I don't see a compelling reason for writing a new driver over expanding the functionality in the old. We were very happy to use DVD-RAM/MO/PD drives on FreeBSD-2.2.X, because pre-CAM SCSI system had the od-driver. We could not use these devices on FreeBSD-3.X without the new od-driver. How did the da driver fail? -- Justin To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message