Definitely no offence taken, felt or inferred in any way.

It's a subject that draws some interesting viewpoints which are worth re-visiting every now and again. It's often worth putting up an argument for others to knock down and see what alternatives there are. Learning from that sort of thing is often enlightening - somebody else's viewpoint or idea strikes a chord and a problem that might not even have been seen before is resolved.

Regards,
Alastair.

----- Original Message ----- From: "Wills, Steve" <[EMAIL PROTECTED]>
To: "RBASE-L Mailing List" <[email protected]>
Sent: Wednesday, April 02, 2008 9:13 PM
Subject: [RBASE-L] - Re: How do I make NULL truly NULL? - A Word of Caution With NULLs in Research & Analysis ...


Actually Alastair, it was something you said that touched on the concept of Missing Data Values.

Way back when I first started my database in R:Base one of _my_ first rules
was don't have any nulls under any circumstances whatsoever!
...
N/A for text, -1, 0 or 9999 for integers has served me well over all these
years.

It seems that you have developed your own domain of Missing Data Values, even if you don't explicitly call them that. I'm sure there are others, but your (good) thinking is part of what prompted me to wax so verbose yesterday ... but let no person, including myself, try to blame you for my long message. ;-)

It's just that when I saw that you had already arrived at something that I never really considered until I worked in a university research group, I thought I would attempt to elaborate on what I believe you were sharing with Rachael.

Hope you didn't take anything I wrote as being any sort of correction for you.

Also, I agree with you about the presence of too many NULLs extant in the data being a potential indicator to re-visit the design/data model, but I'm not gonna' go into that at all as I think yesterday's missive was enuf' from me for now ... :-)

Sincerely,
Steve in Memphis


-----Original Message-----
From: [email protected] [mailto:[EMAIL PROTECTED] On Behalf Of Alastair Burr
Sent: Wednesday, April 02, 2008 1:33 PM
To: RBASE-L Mailing List
Subject: [RBASE-L] - Re: How do I make NULL truly NULL? - A Word of Caution With NULLs in Research & Analysis ...

I'm not sure which particular phrase of mine caught your eye, Steve, but
just in case there was any misunderstanding:

I was NOT saying that I use anything other than -0- to represent a null.

What I was saying is that I always have something to represent a positive
entry.
In a text field that will usually be 'N/A' or '(unknown)'.
In a number (integer/numeric/double) it will be a number that cannot be part
of the expected results.
Unknown time is 0:00:00 - my time values are timings (length) rather than
clock time.
Unknown date is 01/01/1900 - dates haven't reached that far back.

With, I think, only one exception I never have to change my null setting
from -0- to anything else because there is always data.

Instead of a where clause "where xyz is NOT NULL" I would use "where xyz <
9999" or "where xyz <> 'N/A' "

Why no nulls? Because I've seen too many errors if something can be left
"empty". Because I've had people say in the days when computers weren't as
reliable as now "the machine's lost the data" or "what idiot didn't fill
that in?"

Yes, R:Base handles nulls just fine and always has - I'd just rather not
have any if it can be avoided.

As a very quick example I have a menu that is used in various places:

Excellent (returns an integer value of 5)
Good (returns an integer value of 4)
Average (returns an integer value of 3)
Low (returns an integer value of 2)
Poor (returns an integer value of 1)

The column will not accept nulls but will accept those integer values along
with:
(Unknown) (returns an integer value of 9)
which does not appear on the displayed menu/combobox

I choose to enter dummy rows and then edit the default data however my forms
will not allow a change _to_ a value of 9 for (Unknown) - the value is 9 to
start with but it has to be updated on row edit.
A report would show the text values including '(Unknown)' if there were any.

As ever, it works for me but that doesn't mean that I'm right or it's the
only way.
Not-null or, indeed, more than zero regards,
Alastair.


----- Original Message ----- From: "Wills, Steve" <[EMAIL PROTECTED]>
To: "RBASE-L Mailing List" <[email protected]>
Sent: Tuesday, April 01, 2008 5:20 PM
Subject: [RBASE-L] - Re: How do I make NULL truly NULL? - A Word of Caution
With NULLs in Research & Analysis ...


********************************************
*** Don't Get Burned By NULL-Conclusions ***
********************************************
Now, most of y'all probably already know something about this - hopefully,
gained by observation of the bitter experience of others - but, given a
couple of phrases I saw in one each of Rachael's and Alastair's messages, I
would like to caution that one must be EXTREMELY cautious about NULL's
versus 0's [zeroes] in the area of research and/or analysis.  In simple
terms, a NULL value should (almost?) ALWAYS be excluded from any sort of
numeric calculation or the results might cost someone their job, literally
... well, unless their superiors and/or clients never bother to check the
data.

RBase, TTBOMK, has always handled NULL-related calculations appropriately by
default.  In a GROUP BY, for example, NULLs will be GROUPed, which is
appropriate.  However, if NULLs are present in a column on which a
calculation is being performed, such as COUNT(), SUM(), AVG(), etc, they are
EXCLUDED from the calculation.

(I think I even recall from way back in RB System V[?] two variants of
either SELECT ... and/or COMPUTE ..., one of which included NULLs and one of
which did not.  The utility, at the time, seems to have been related to
knowing how many records were actually in the table versus how many of them
had values.  At this point in time, my memory may fail me on this, but I'm
sure that someone else can elaborate further, if they feel it helpful and/or
have too much time on their hands ... but I digress.)

*******************************************
*** NULL <> AnyThing [including 0/Zero] ***
*******************************************
Simon and Garfunkel sang, "I'd rather be a hammer than a nail."  Well, had
they been db folk, they might well have said, "I'd rather be a zero than a
NULL."  Again, most especially from a research/analytical perspective, NULL
and 0 [zero] are two DISTINCTLY DIFFERENT things.

Let's say that you are a marine biologist/environmental engineer and you are
collecting samples in sample tubes of the floodwaters after Hurricane
Katrina (a good friend of mine was the Principal Investigator leading this
research effort).  The purpose was to assess the water for the presence of
chemicals, their amounts, toxicity levels, etc.  Now, let's say that 1000
samples of 100ml each were collected.  Then, let's say that 100 of the
sample tubes could not be properly analyzed for various, albeit valid,
reasons and let's also say that 200 samples showed absolutely no presence of
some dangerous chemical, which we'll call "X".

Those 100 un-assayable samples are, of course, NULLs in the data.  Samples
were collected, but they produced no data.  As such, they cannot be included
in any calculations or conclusions about the nature of the post-Katrina
floodwaters as they tell us nothing about it.  Conversely, the 200 samples
indicating an absence of chemical "X" are NOT NULLs.  What they tell us is
that there is a PRESENCE of 0/zero quantity of chemical "X" in these
samples.  We don't need to discuss a geographical grid-array for relative
differences, water flow, potential sources, or anything more sophisticated
to illustrate from these samples that NULL and 0/zero are different.  The
100 samples that told us nothing remain "nothings", i.e. NULLs in the data.
Those 200 samples which told us that there was 0 chemical "X" will have a 0
in the rows for observations of chemical "X" for each of those samples.

This should seem a pretty basic understanding.  However, distinguishing
between what is a NULL and what is a 0/zero can be a bit more challenging as
the difference may be subtle, but certainly discrete.

So, let's also use an example from sports, say baseball.  Let's think about
pitching records.  The last Pittsburgh Pirate - your home team, RBTI - to
garner the Cy Young Award was Doug Drabek, in 1990.  That season, he pitched
22 Victories and 6 Losses.  (We will avoid ERA, although it's a great metric
for pitchers, as it fails to illustrate my point.)  Let's calculate a simple
winning percentage of W%=((Games_Won/(Games_Won + Games_Lost)) * 100).
Drabek's W% for 1990 was 78.57=((22/(22 + 6)) * 100).

However, his team played 162 regular games that season.

It would probably be unfair and would certainly be inaccurate to factor into
the calculation of his W% those games in which he did not pitch.  Still, he
was on the roster for each of those games.  However, were we to unload
TeamRoster_Game data for all 162 games, an un-normalizedby but not atypical
extract, and, in so doing, treated Drabek's "NULL-appearances" as 0/Zeroes.
Without some degree of expertise, both in the subject matter and the nature
of the extract data, his W% could appear as "whacky" as a wild-pitch from a
knuckleballer.  You see, it all depends on what is desired of the data and
how the data is handled or treated.


******************************************************
*** MISSING DATA VALUES, "NULLs, with personality" ***
******************************************************
Now, if NULLs are conceptually troublesome, one may and should, according to
good research practices, create a set of "Missing Data Values" to use rather
than just a NULL.  In fact, this topic appears to be an area of great
interest among researchers/statisticians, including how to impute those
values ... but that's WAY BEYOND my expertise.

(Razzak, as I recall, you do||did possess that background, so feel free to
chime in, in the case that I haven't managed yet to induce an extreme
soporific state among the audience.)

If you're still awake here, Missing Data Values are, typically, ultimately
cast||stored as some sort of numeric value, just like valid data collected
from samples.  They also have some sort of descriptive text label associated
with them.  These then, even though they are NOT observational or sample
data, become valid data points, i.e. part of the solution domain, rather
than unexplained NULLs.

Possible MISSING_DATA_VALUE records:

SubjectMatterArea  MissingDataValue  ValueLabel
-----------------  ----------------  
----------------------------------------------
Biology                         -11  Sample Contaminated
Biology                         -33  Sample Lost - Trap Failed
Biology                         -34  Sample Lost - Trap Thawed
Biology                         -35  Sample Lost - Trap Missing
Biology                        -999  Instrument Failure
Survey Research                 -11  Respondent Refused To Answer
(Telephone)
Survey Research                 -12  Respondent Failed To Answer (Paper)
Survey Research                 -22  Respondent Does Not Know
Survey Research                 -33  Response Not Intelligible
(Writing/Recording)
Survey Research                -888  Did Not Ask - Respondent Broke Off
Interview
Survey Research                -999  Did Not Ask - Programmatic SKIP


Obviously, these are not NULLs, although they do represent what could have
been NULLs.  Of course, within any research topic, they cannot intersect
with any "Existing Values" in the solution domain.

So, if we're looking at a population count in a trap that is an integer
ranging from 0..N, we would have to use some negative integer value for any
Missing Data Values we define.  Then, we can exclude these Missing Data
Values in our calculations by :

 "SELECT ... FROM DATA_POINTS WHERE DataValue >= 0 ...".

Of course, if the domain of valid data values includes negative numbers, we
must devise other values for Missing Data and, if this is the case, probably
utilize the ValueTypeCode :

 "SELECT ... FROM DATA_POINTS WHERE ValueTypeCode <> 0 ..."
 -- Assume that ValueTypeCode is a column in DATA_POINTS and 0 means
'MissingDataValue'


All this effort is really to help the researcher and/or research
administrator, primarily in two general areas.  One is to help the
researcher do a little "CYA", as NULLs, especially a lot of NULLs - I know,
this fundamentally illogical, as a bunch of nothin' is still a bunch of
nothin' - among researchers, can automatically engender doubt about the
research among peers/reviewers, whether it's about process or outcome.  The
other benefit, in the area of research management, is to aid in devising
improvements in the process, such as sample improvement, better data
collection, improved instrumentation, etc.

Hopefully, if you made it this far, the potential impact of mishandling
NULLs is apparent.  If not, try to think about some examples in your data
and outputs that might be adversely impacted from a lack of awareness about
NULLs and how to treat them.  Despite the best efforts at modeling a
database structure, predicated and integrated with good business (data)
anlaysis, NULLs can and do exist in the real world and can therefore exist
in the virtual world.  However, they can also be meaningfully "morphed" by
employing Missing Data Values, as the appropriate for the business or
subject matter context.

This can be a very important aspect of research (data) and, you might have
guessed, something with which I've had some degree of experience, vis-à-vis
the data-side.  I have seen NULLs treated as 0's/zeroes and, amazingly,
0's/zeroes treated as NULLs.  There are - fortunately, not often -
researchers who are sometimes careless in their understanding and/or
treatment of their own data.  Believe me, research-support level dba better
have his/her ducks in a row when he/she makes the decision to point out such
an error, even when the objective is to protect from disrepute the research
organization, its work-product, and even that erring researcher.

I will add that, IMNSHO, this was another good deed that did not go
unpunished ...

Well, let us calculate the value of my usual $0.02 by dividing it with the
sum of my word-count ... well, thanks for listening and I hope this somehow
helped someone.


NULL-ified,
Steve in Memphis






From: [email protected] [mailto:[EMAIL PROTECTED] On Behalf Of Alastair
Burr
Sent: Monday, March 31, 2008 6:55 PM
To: RBASE-L Mailing List
Subject: [RBASE-L] - Re: How do I make NULL truly NULL?

Way back when I first started my database in R:Base one of _my_ first rules
was don't have any nulls under any circumstances whatsoever!

It's been a good rule but, of course, I don't keep to it 100%

N/A for text, -1, 0 or 9999 for integers has served me well over all these
years. In my experience, if you find you need a lot of nulls then you
probably need a separate table. It's easy to convert your favourite
representation into a meaningful comment in forms and reports: (IFEQ(Field,
9999, '(Unknown)', (CTXT(Field)) )).

A bit more thinking might be needed for exporting but it's easy enough to
get R:base to do what you want or to convert in the receiving program -
although R:Base is (usually) easier.

To really stick my head above the parapet I have never understood how
anything can be null - if a value is not known then show that it is not
known. To not know whether it is known or not known seems to me to be a
failure -but take note of my second paragraph above!

Regards,
Alastair.



----- Original Message ----- From: Dennis McGrath
To: RBASE-L Mailing List
Sent: Monday, March 31, 2008 7:10 PM
Subject: [RBASE-L] - Re: How do I make NULL truly NULL?

Matt,

Have you tried exporting as ASCII when NULL = ' ' (a space)
Try it and look at the generated file.
There will be no spaces between the commas for a NULL value.
This is exactly what you need.

Here are my rules for the use of the NULL setting:
ALWAYS have NULL='-0-' EXCEPT:
When exporting data as ASCII, CSV, etc.
When editing with a form
When printing a report

If you do not have NULL='-0-' and you edit tables manually, you will never
get a null entered in text fields.
Try as you might, the value will always be a space.

The exact opposite is true in forms. If null='-0-' not only will your forms
look ugly, but you won't know the difference between NULL and '-0-'

When Unloading data and structure from RBASE, ALWAYS set NULL='-0-'

RBASE has worked like this as long as I can remember.

Dennis McGrath

________________________________________
From: [email protected] [mailto:[EMAIL PROTECTED] On Behalf Of Matthew
Burke
Sent: Monday, March 31, 2008 12:46 PM
To: RBASE-L Mailing List
Subject: [RBASE-L] - Re: How do I make NULL truly NULL?

      Well here's my problem with how it seems to work. I understand the
need to visually separate null from a blank space within R:Base. So sure,
when I open a table within R:Base show me some kind of value, -0- will do
just fine. However when I export data from my table to say... and basic csv
text file. That data should not be represented by a -0-, but nothing at all,
in my opinion. That way if I want to use the data in another database, or as
in this case a third party wants to use it in another database, they can do
so without the other database having to understand that -0- = null. I
thought about setting NULL = ' ' but even that wouldn't work, because even
then the third party database would still have to know that null = ' '. I'm
certainly no database guru and I really appreciate all the help I receive
from you guys and gals. My only other database experience lies with MySQL
and it's always been my understanding that null = nothing at all.

      So I guess ultimately I don't really need to know why R:Base uses
text representation for null, but I need to know if there is a solution I
can use to make null = nothing at all or empty. Basically if 'empty' was or
possibly is a valid command, the logic I want to accomplish is R> set null
empty. Is this possible?

To further explain what I want, here's an example.

Let's say we have a table with one row containing three fields of data; a,
b, and c.
Field b is null in this case. Something like this.

a
b
c
1
text
-0-
text

As my R:Base is setup now, if I were to export this data to a text file
(csv) I get:
"text","-0- ","text"

The end result text file (csv) I want to achieve is:
"text","","text"

Thanks again for all your help with my novice problems/questions.

Matthew Burke
Pioneer Bank - IT Department
Computer Systems Technician
505-627-4415

This email contains PRIVILEGED AND CONFIDENTIAL information intended only
for the use of the addressee(s) named above. If you are not the intended
recipient of this email, or the employee or agent responsible for delivering
it to the intended recipient, you are hereby notified that any dissemination
or copying of this email is strictly prohibited. If you have received this
email in error, please notify us by replying to this email and delete this
email from your records. Thank you for your cooperation.

________________________________________
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.1/1350 - Release Date: 30/03/2008
12:32




--
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.3/1354 - Release Date: 01/04/2008
05:38




--
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.4/1355 - Release Date: 01/04/2008 17:37


Reply via email to