Re: [Discuss] question about r-inflammation lesson 1

2016-12-15 Thread Tyler Smith
Hi Lukas,



I stand corrected! 



I have had issues with inconsistent (among functions) type coercion
before. Some of these issues have been resolved over time, and I assumed
this was another case of that. However, with some trivial testing, I
find that's not the case. I found the following situation on R 3.3.2:


- `min()` and `max()` call primitive (i.e., C) code, and work as
  expected on data frames (and data frame rows, which are actually
  data frames)
- `rowMeans()` explicitly converts data frames with `as. matrix()`, and
  so works as expected
- `sd()` explicitly converts data frames to `numeric()`, and works
  as expected
- `mean()` does *not* do any coercion, and fails with a warning on data
  frames (and rows)


Which means the message in the lesson is basically sound: sometimes R
functions will treat data frame rows as vectors, and sometimes they
don't, and there's no a priori way to know which is which or why!


With that in mind, I'll think about ways to improve the original callout
to clarify this, if I can.


Best,



Tyler

--

plantarum.ca







On Thu, Dec 15, 2016, at 07:59 AM, Lukas Weber wrote:

> Hi Tyler,

> 

> Thanks for your comment. I added this passage in a pull request about
> a year ago, after we had some problems at a workshop.
> 

> I don't remember all the details, but we definitely had problems on
> multiple machines. I think it may have been Windows computers only. We
> were using the current version of R at the time.
> 

> There are some more details in this pull request (closed):
> https://github.com/swcarpentry/r-novice-inflammation/pull/177
> 

> We included this passage simply to provide an easy fix (convert using
> "as.numeric()") for anyone else who has the same problem. I agree it's
> best not to introduce any unnecessary concepts too early -- hence we
> put it in a box and tried to keep it as simple and short as possible;
> while still including it directly in the course materials in case
> other instructors have the same problem. I remember it took us a few
> minutes to find a solution during the workshop, since it wasn't
> immediately clear what was causing the problem.
> 

> I tried the example again just now on my Mac, and it worked fine,
> without the fix. As you point out, the sliced row of the data frame
> should actually be automatically coerced when you use max(). Sliced
> columns are already numeric vectors, so no coercion is required there.
> 

> Re-working the whole lesson to remove this edge case would be
> difficult, since we would like to keep it consistent with the Python
> materials, especially using the same inflammation data set. Maybe
> someone else also has some views here?
> 

> Best regards,

> Lukas

> 

> 

> On Wed, Dec 14, 2016 at 4:09 AM, Tyler Smith
> <ty...@plantarum.ca> wrote:
>> Hi,

>> 

>>  I've been working through lesson one in the r-inflammation
>>  lesson.  It
>>  includes the following passage:

>> 

>>  > ## Forcing Conversion

>>  >

>>  > The code above may give you an error in some R installations,

>>  > since R does not automatically convert a sliced row of a
>>  > `data.frame` to a vector.
>>  > (Confusingly, sliced columns are automatically converted.)

>>  > If this happens, you can use the `as.numeric` command to convert
>>  > the row of data to a numeric vector:
>>  >

>>  > `patient_1 <- as.numeric(dat[1, ])`

>> 

>>  The example data is entirely numeric, with no missing values, and no
>>  non-numeric columns. In that case, type coercion should work as you
>>  expect. If it doesn't, I would be very surprised if the results
>>  depend
>>  on a particular R *installation*. It may be the case that older R

>>  *versions* did different things.  But I'm not sure about that. Can

>>  someone confirm which R versions require the explicit conversion
>>  of data
>>  to numeric in this example?

>> 

>>  coercion in R does have some ugly corner cases. If this is in
>>  fact one
>>  of them, I think it would be a good idea to rework the example
>>  so that
>>  it doesn't require this kind of fix.

>> 

>>  Incidentally, columns always work because a column by definition is
>>  composed of a single vector (which therefore has a single
>>  type). Rows
>>  can include data from different columns, and thus may have different
>>  types that need to be coerced into the lowest common denominator
>>  before
>>  we can use them. This isn't really confusing when you
>>  understand how a
>>  dataframe is constructed, but it's perhaps an issue that we
>>  don't need
>>  to throw at

[Discuss] question about r-inflammation lesson 1

2016-12-13 Thread Tyler Smith
Hi,

I've been working through lesson one in the r-inflammation lesson.  It
includes the following passage:

> ## Forcing Conversion
>
> The code above may give you an error in some R installations,
> since R does not automatically convert a sliced row of a `data.frame` to a 
> vector.
> (Confusingly, sliced columns are automatically converted.)
> If this happens, you can use the `as.numeric` command to convert the row of 
> data to a numeric vector:
>
> `patient_1 <- as.numeric(dat[1, ])`

The example data is entirely numeric, with no missing values, and no
non-numeric columns. In that case, type coercion should work as you
expect. If it doesn't, I would be very surprised if the results depend
on a particular R *installation*. It may be the case that older R
*versions* did different things.  But I'm not sure about that. Can
someone confirm which R versions require the explicit conversion of data
to numeric in this example?

coercion in R does have some ugly corner cases. If this is in fact one
of them, I think it would be a good idea to rework the example so that
it doesn't require this kind of fix.

Incidentally, columns always work because a column by definition is
composed of a single vector (which therefore has a single type). Rows
can include data from different columns, and thus may have different
types that need to be coerced into the lowest common denominator before
we can use them. This isn't really confusing when you understand how a
dataframe is constructed, but it's perhaps an issue that we don't need
to throw at students in lesson 1.

Best,

Tyler

-- 
plantarum.ca
___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

[Discuss] Submitting a pull request to SWC lessons: which files to push

2016-12-13 Thread Tyler Smith
HI folks,

I have finally put together a small pull request to complete my
instructor training. I'm trying to keep it small and discrete - I've
added a handful of svg diagrams, and changed the source of a single Rmd
file. However, in rebuilding the lesson, all the associated png files
are regenerated, and the md files derived from the Rmd sources are
rebuilt as well.

None of these files are excluded by the project gitignore, so my changes
to a single file have resulted in modifications to nearly 40 files in
the repository. Should I be pushing all of that to my pull request, or
should I be adding png and md to gitignore? What's the best way to send
you guys my contribution?

Best,

Tyler

-- 
plantarum.ca
___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

Re: [Discuss] pulling along those behind

2015-10-29 Thread Tyler Smith
On Thu, Oct 29, 2015, at 02:28 PM, Karin Lagesen wrote:
> 
> ...is there some statistics on this? I think that errors would be a lot 
> less scary if we could show how much of their time even those that code 
> for a living spend on debugging their code.
> 

Poking around on programmers.stackexchange, 50% shows up as a common
estimate of how much 'programming' effort is actually directed at
debugging. That actually seems low to me: I can generate bugs way faster
than I can find them!

http://programmers.stackexchange.com/a/91764/60644

___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org


[Discuss] December Instructor training

2015-10-27 Thread Tyler Smith
Hi,

I'm very interested in attending the December instructor training, but
don't have a group to apply with. I will actually be in Toronto on
December 9 and 10, so it would be easiest to join a group there if
anyone has space for one more. If not, I could try and arrange my travel
such that I could join an Ottawa group if there is one. I can help or
coordinate the application, and will be able to host at least one
workshop next spring here in Ottawa.

Please let me know if anyone is interested in joining up!

Best,

Tyler

-- 
plantarum.ca

___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org