Hi, After reading some of the SCO vs Linux threads and articles, I'd like to propose a theory that I haven't seen suggested before. It may sound strange and I may be on shaky ground, but give it a chance.
What I have seen on the debate is mostly (probably rightfully) just bashing SCO. Some people assume the code was copied either from SCO into Linux or through some compilicated path of intermediate points. Or it was the reverse, SCO took some open source code, used it, and it propagated in. I wonder if it is possible for people to work totally independently on code that accomplishes the same function in the same language (i.e C) and have fairly large stretches of code that are the exact same. You may be thinking that this is ridiculous. If two pages in two different books were identical then yes it might be a copyright violation. But this might not be true if it were a couple of sentences. For example: "It was a dark and stormy night. It was one of those nights where Homer Simpson, PI, knew that nothing good could possibly walk into his office. But then she walked in. She looked good and she had the kind of expression on her face that told you she knew it. And Homer certainly knew it ......." Well, I am sure many of those sentences may have been used before. And yes I took Homer Simpson from the Simpsons. It was intended as a joke. But the point is that the length of the violation is important and depends on the number of possible permutations. In term of books and English, the number of permutations can be relatively high (it won't be 26^n though, obviously). But we still don't necessary consider single sentences equivalencies to be a copyright violation. It has to be longer. It could be argued that the number of permutations of a computer program that is written in a given language to accomplish the same task may be relatively low. Here are some things to consider that argue for similarities within programs of the same language and same function. 1. Most C coders use K&R as a guide. K&R defines and lays out some standards for writing code. There are other standards that may be taught in school or learned in certain work environments (DoD, govt, IBM, or homegrown standards). 2. They are implementing the same standard. System V unix is an open standard. Networking and other standards are open. They define data structures, methods, functions that are to be used. This data is encorporated into any program that implements the standards. 3. C does not have a large vocabulary like English. In a computer program, there are very few things you can do. Most of them are: Set/declare variables (and in C you must declare the variables all at once in the beginning), branch (an if test), loop (for, while, or whatever), or call a function. Contrast this with the 50-100 thousand words in English. 4. Variable names. Point 1 relates to this. Many variable names will be the same. Doesn't everyone use 'i', 'j', or 'k' to go through a loop. Using current and next for pointers in a loop is common. The standard will sometimes define the nature of the data structure you are using. For example, let's say you had a structure called sk_buff, why wouldn't you call the variable skb, or cur_skb or something like that. 5. Algorithmic similarities. In kernel development, there is very strong pressure to get the exact best algorithm for any task. This is because the kernel is very central to the efficiency of the computer. In programming applications, efficiency is typically measured by O(f(n)) notation. That is, if something is O(n) is takes time proportional to n. The constants are often ignored, and this can be a good approximation. In kernel programming, it is often known that the best algorithm is O(n) (i.e. time = kn, k is constant). The kernel programmers strive to use their tricks and insight to minimize k. It may be that there is only one best way to do this and it is easy for any intelligent person to see. Therefore, I would argue that much of the similarity in some these programs are algorithmic and are not copyright violations. It's just that two intelligent people will often think alike. 6. Function names. Many function names may be defined by the standard. Other function names are just based on the implementation. Many developers use getXX() or something similar. The function names are often the best attempt developers (at least open source ones) to make the code understandable and follow from the algorithm. If the algorithm is similar, often the function names will be. Also function names often come from the language of the domain being modeled/programmed. If it's networking, networking terms are used by everyone. 7. File names. Similar argument to function names. They arise from standards, the domain being programmed, and the algorithm. 8. Include file ordering. This may be suspicious, but there can be reasons for it. Sometimes include files have a precedece order. That is, on some compilers if you change the order it won't compile. Maybe limits.h or init.h needs to be first since it defines macros or variables used by other .h files. This may only apply to older compilers, but the code should run on all compilers with minimal adjustment. And this may give rise to some ordering of the include files. 9. Release of program resources. Also can be suspicious, but there are some rules. If many resources need to be released, what I do is release them in LIFO order. That is, the last resource I allocated is the first one released. This is just in case there are dependencies in the way in which the variable are allocated. 10. Order of variable declarations. This one may be a little harder to deal with. But many times, there aren't that many variables declared and there may be a standard way of declaring them (ints before pointers maybe). Usually if a function gets too compilicated it is broken down anyway, reducing the number of variables in it. None of these points alone is sufficient to eliminate the accusation of copyright violations. But consider them all together. Then consider what code was compared. There are thousands of C files (about 5000 or so for Linux, I don't know with SCO). They also could have taken the entire CVS vault for both systems and compared all versions against all versions. That's alot of code to expect that there are no similarities. It's also interesting to note that if my explanation is correct, it would also explain similarities in SCO to Linux code. You might not want to tell SCO that if they are sued :-) This post turned out longer than I thought it would be. I just started thinking about factors that could occur by chance that would induce similarities, and the list just kept getting longer. So make your own judgements, and post to the list if you think I made any major errors or if you agree for that matter. Later, Bill Gooding __________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com _______________________________________________ TriLUG mailing list http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ: http://www.trilug.org/faq/TriLUG-faq.html
