Re: [discuss] SQL Database version control tool recommendations?

2018-08-29 Thread Byron Smith
No recommendation here, but I just came across this tool for
including/diffing sqlite binaries in git repos
, which is clearly relevant.

(It came up in a HN discussion for another related application
.)

On Fri, Aug 10, 2018 at 1:28 PM Carl Boettiger via discuss <
discuss@lists.carpentries.org> wrote:

> Hi Tiffany!, list,
>
> I think dumps are reasonable for regular backups, but not a good choice
> for creating more long-term archives; so guess it depends a bit on the
> goal.  Changes in versions, options, encoding, and database engines can
> make it difficult to import SQL dumps accurately.  I think text-file
> formats (csv) are still the best long-term archive option -- they are easy
> to version and compress and ubiquitous, but far from a perfect option -- in
> particular, a round-trip db -> csv -> db may likely not preserve data types
> (boolean / int / char etc) accurately. Storing this as 'metadata' can help
> but is somewhat manual. I'm not convinced that we have a good performant,
> compressable, cross-platform, widely established file-based exchange format
> available at this time (queue comments about json, hdf5, or parquet).
>
> A somewhat separate issue is whether such files need a git-like tool to
> manage versions.  IMHO the goal is really to preserve each dump in a way
> that doesn't risk accidental overwriting of a previous version and captures
> some basic metadata (timestamp); something a file-naming convention can
> provide and git may not be necessary (given both the potentially large size
> of data dumps and the often compelling case to compress these files in a
> binary format).
>
> I'm really no expert in any of this though, so sharing this as much to
> learn where it goes wrong rather than as solid advice!
>
> Cheers,
>
> Carl
>
> On Fri, Aug 10, 2018 at 12:08 PM Bennet Fauber  wrote:
>
>> Tiffany,
>>
>> You might experiment with some smallish databases.  The order of
>> records may well change significantly from dump to dump, making the
>> apparent differences and the actual differences between any two dumps
>> appear much larger than they really are.
>>
>> Good luck!
>>
>>
>> On Fri, Aug 10, 2018 at 12:49 PM Tiffany A. Timbers via discuss
>>  wrote:
>> >
>> > Thanks all for your input - very helpful! Dav - happy for you to
>> questions the general strategy. As I said, I know very little about this.
>> In my case its a smallish, simple SQLite database with ~ 8 tables. So
>> dumping/transaction logs/etc might work well and easily. But if there's a
>> better and different strategy for checkpointing SQLite databases, I'd love
>> to learn.
>> >
>> > Thanks!
>> > Tiffany
>> > The Carpentries / discuss / see discussions + participants + delivery
>> options Permalink
>>
>> --
>> The Carpentries: discuss
>> Permalink:
>> https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mb0ae3c22005b6cfbf5866889
>> Delivery options:
>> https://carpentries.topicbox.com/groups/discuss/subscription
>>
> --
>
> http://carlboettiger.info
> *The Carpentries * / discuss /
> see discussions  +
> participants  + 
> delivery
> options 
> Permalink
> 
>

--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-M5d95bbf9e8825d3a09fe7cb8
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like Notebooks"

2018-08-29 Thread Cameron Macdonell
Hi everyone,


This is a great discussion.  Feel free to continue it.


Would someone be interested in writing a blog post to summarize Joel Grus' 
opinions and the discussion we've had here?


Cam

Discuss List Moderator


From: Hao Ye 
Sent: Wednesday, August 29, 2018 7:36:58 AM
To: discuss
Subject: Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like 
Notebooks"

I agree with what Simon wrote about hidden state, and I strongly feel that it 
is a lesson/concept that we should emphasize more for workshop attendees, 
especially those that don't have a substantial amount of experience with 
programming.

The notion that there even *is* such a thing as a hidden state will be new to a 
lot of folks - think how many come from experience working with Excel where you 
can see the data and the inputs and outputs of all calculations simultaneously.

In my experience, teaching REPL first conveys the notion of a hidden state 
slightly better, because you hit enter after every command, and typos often 
produce feedback immediately that something went wrong. That's a very different 
mode of operation than typing code into a Jupyter notebook cell, R markdown 
code chunk, or script file, where you can build up the code over time, 
including bugs and errors, and nothing happens until you try and execute.

I think there's good discussion to be had about workflows involving shell, 
IDEs, notebooks, etc., but not all our workshop attendees are at the stage of 
receiving that information in a useful context yet.

Best,
--
Hao Ye
hao...@weecology.org


On Wed, Aug 29, 2018 at 5:06 AM, Waldman, Simon 
mailto:sm...@hw.ac.uk>> wrote:
FWIW, when helping in SWC workshops, I’ve often found students getting confused 
in python notebooks due to hidden state.

The hidden state issues of notebooks are, however, no different to how many of 
us work in IDEs with interpreted languages (RStudio, MATLAB),


On Wed, Aug 29, 2018 at 9:25 AM, Bennet Fauber 
mailto:ben...@umich.edu>> wrote:
Carol,

I don't think anyone is saying, "Tell people not to use notebooks."
The questions are about whether they improve the learning experience
for beginners.  There is also the question of whether use of the GUI
somehow defeats the purpose of the shell lesson by contradicting what
is often said there; namely, the command line is a powerful tool, you
should use it.

One respondent said they review ways to run python -- python, ipython,
jupyter -- then go on to use whatever in their workshop.  That goes
some way toward giving the participants choices.  It may not
counteract the message that is still implicit or implicit in the shell
lesson.

Perhaps the shell lesson should be modified so that the shell is
treated as a data management tool, and notebooks and Rstudio are
treated as development environments?  Then the dissonance between
advocating the shell in one lesson and abandoning it in another would
be lessened?

Perhaps all it would take would be a couple of examples of running a
notebook from the command line and telling it to start from scratch
and run all cells.  If a notebook can be run in the same way from a
prompt that a .py file can be, then maybe showing that capability
solves a whole bunch of problems.

The out-of-phase evaluation of things that is possible in notebooks
can also lead to irreproducible results, which is not, I think, in
keeping with the goals of the Carpentries.

Some people will want to keep notebooks, and others will want to
forego them; there should be a place for both approaches, and the one
that best fits the goal of the particular offering and the expected
audience should be chosen.  I think it would not be a service to
future learners if only one way were available for all circumstances.

Perhaps it would help to consider a full two-day workshop as a bundle,
and pick the lesson components that leads to the most coherent and
clear presentation of the most important points to the targeted
audience?  That would lesson the dissonance between command-line for
shell/git and GUI for R/Python, maybe?  Should there be an option to
do GUI-only workshops, no shell and a GUI for Git?  Similarly, a
command-line only option.  I think that might be worth considering.

On Tue, Aug 28, 2018 at 6:31 PM Carol Willing
mailto:willi...@willingconsulting.com>> wrote:
>
> Hi all,
>
> There's positive discussion that has been started by Joel's talk. While I 
> liked his talk and there are some good points re: improving support for 
> software engineering best practices in Jupyter and JupyterLab notebooks, I'm 
> a bit concerned about the direction that this conversation is going.
>
> While all are entitled to their personal opinions and the Carpentries will 
> use notebooks when and if needed, I believe that the Carpentries would be 
> doing its students a disservice by warning people not to use the notebooks or 
> conda.
>
> The notebooks are a popular and effective tool for 

Re: [discuss] Carpentry lessons material in multiple (spoken) languages

2018-08-29 Thread Maxime Boissonneault

My two cents from a bilingual organization in Canada :

From our experience at Compute Canada (we produce most of our written 
content in both French and English), having *separate* content is a 
recipe for one of them to become outdated. What we found works well is 
MediaWiki with the Language Extension bundle. I wrote a 
software-carpentry-style lesson about OpenACC programming using that here :

https://docs.computecanada.ca/wiki/OpenACC_Tutorial

What makes things easier with this way is that
1) each page is disentangled into multiple translation units 
automatically, by the extensions
2) the extensions provide us with a quick way to spot outdated or new 
translation units, which we can then translate :

https://docs.computecanada.ca/mediawiki/index.php?title=Special%3ALanguageStats=D=fr=1


I don't know that it is possible to do something nearly as convenient 
with just gh-pages...


Regards,

Maxime

On 2018-08-28 6:30 PM, Rémi Rampin wrote:
2018-08-28 05:11 EDT, "DVD PS" >:


So, over the last months, I've been working to get something that
eases the work of the translators and provides all the languages
the same visibility - i.e. one page for all!


Hi David,

I wonder if this is the right approach. There is no real benefit in 
having everything be in the same repository, so long as you have links 
pointing to the other languages. In fact, I am not sure if 
git-novice-_*es*_/01-basics/ is worse than 
git-novice/*_es/_*01-basics/ (or the current 
git-novice/*_es/_episodes/_*01-basics/).


I think having separate repos (or branches) for different languages 
would make it much easier to keep the translation up-to-date, since 
you can use Git merges in the usual way to see what have changed in 
the upstream (e.g. the English lesson) and update your translation.


I see that you are not far from this since you are in fact using a 
submodule, however:


  * submodules are error prone (especially in sub-directories), it is
likely that contributors or even maintainers will get this wrong
  * the build process gets more complicated (do contributors to the
translation have to clone the outer project instead of the
translation?)
  * the submodules for each translation will need to be updated in the
outer repo for changes in the translation to appear on the site
  * you cannot have the translation be a branch that you merge from
the English upstream (infinite recursion)

What do you think?

--
Rémi
*The Carpentries * / discuss 
/ see discussions  + 
participants  
+ delivery options 
 
Permalink 
 




--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Président - Comité de coordination du soutien à la recherche de Calcul Québec
Team lead - Research Support National Team, Compute Canada
Instructeur Software Carpentry
Ph. D. en physique


--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Tdb042c4bc0ecf365-M0ae9399ed69dc884844c20a8
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like Notebooks"

2018-08-29 Thread Hao Ye
I agree with what Simon wrote about hidden state, and I strongly feel that
it is a lesson/concept that we should emphasize more for workshop
attendees, especially those that don't have a substantial amount of
experience with programming.

The notion that there even *is* such a thing as a hidden state will be new
to a lot of folks - think how many come from experience working with Excel
where you can see the data and the inputs and outputs of all calculations
simultaneously.

In my experience, teaching REPL first conveys the notion of a hidden state
slightly better, because you hit enter after every command, and typos often
produce feedback immediately that something went wrong. That's a very
different mode of operation than typing code into a Jupyter notebook cell,
R markdown code chunk, or script file, where you can build up the code over
time, including bugs and errors, and nothing happens until you try and
execute.

I think there's good discussion to be had about workflows involving shell,
IDEs, notebooks, etc., but not all our workshop attendees are at the stage
of receiving that information in a useful context yet.

Best,
--
Hao Ye
hao...@weecology.org


On Wed, Aug 29, 2018 at 5:06 AM, Waldman, Simon  wrote:

> FWIW, when helping in SWC workshops, I’ve often found students getting
> confused in python notebooks due to hidden state.
>
> The hidden state issues of notebooks are, however, no different to how
> many of us work in IDEs with interpreted languages (RStudio, MATLAB),



On Wed, Aug 29, 2018 at 9:25 AM, Bennet Fauber  wrote:

> Carol,
>
> I don't think anyone is saying, "Tell people not to use notebooks."
> The questions are about whether they improve the learning experience
> for beginners.  There is also the question of whether use of the GUI
> somehow defeats the purpose of the shell lesson by contradicting what
> is often said there; namely, the command line is a powerful tool, you
> should use it.
>
> One respondent said they review ways to run python -- python, ipython,
> jupyter -- then go on to use whatever in their workshop.  That goes
> some way toward giving the participants choices.  It may not
> counteract the message that is still implicit or implicit in the shell
> lesson.
>
> Perhaps the shell lesson should be modified so that the shell is
> treated as a data management tool, and notebooks and Rstudio are
> treated as development environments?  Then the dissonance between
> advocating the shell in one lesson and abandoning it in another would
> be lessened?
>
> Perhaps all it would take would be a couple of examples of running a
> notebook from the command line and telling it to start from scratch
> and run all cells.  If a notebook can be run in the same way from a
> prompt that a .py file can be, then maybe showing that capability
> solves a whole bunch of problems.
>
> The out-of-phase evaluation of things that is possible in notebooks
> can also lead to irreproducible results, which is not, I think, in
> keeping with the goals of the Carpentries.
>
> Some people will want to keep notebooks, and others will want to
> forego them; there should be a place for both approaches, and the one
> that best fits the goal of the particular offering and the expected
> audience should be chosen.  I think it would not be a service to
> future learners if only one way were available for all circumstances.
>
> Perhaps it would help to consider a full two-day workshop as a bundle,
> and pick the lesson components that leads to the most coherent and
> clear presentation of the most important points to the targeted
> audience?  That would lesson the dissonance between command-line for
> shell/git and GUI for R/Python, maybe?  Should there be an option to
> do GUI-only workshops, no shell and a GUI for Git?  Similarly, a
> command-line only option.  I think that might be worth considering.
>
> On Tue, Aug 28, 2018 at 6:31 PM Carol Willing
>  wrote:
> >
> > Hi all,
> >
> > There's positive discussion that has been started by Joel's talk. While
> I liked his talk and there are some good points re: improving support for
> software engineering best practices in Jupyter and JupyterLab notebooks,
> I'm a bit concerned about the direction that this conversation is going.
> >
> > While all are entitled to their personal opinions and the Carpentries
> will use notebooks when and if needed, I believe that the Carpentries would
> be doing its students a disservice by warning people not to use the
> notebooks or conda.
> >
> > The notebooks are a popular and effective tool for scientists and data
> scientists to have in their toolbox. Project Jupyter won the ACM Software
> System Award recently, and the ACM stated "These tools, which include
> IPython, the Jupyter Notebook and JupyterHub, have become a de facto
> standard for data analysis in research, education, journalism and
> industry." https://awards.acm.org/software-system
> >
> > While it's great for folks to have different personal perspectives, 

[discuss] Re: [Carpentries en Latinoamerica] Carpentry lessons material in multiple (spoken) languages

2018-08-29 Thread Rayna Harris via discuss
Wow, David! The ability to toggle back and forth between the English and
Spanish versions of a lesson by clicking on the little globe icon at the
top and selecting the language from a drop-down menu is amazing!

Will this tool that you built also be able to support multiple languages in
the future?

On Tue, Aug 28, 2018 at 4:11 AM, DVD PS  wrote:

> Hello everyone!!
>
> TLDR; see the English/Spanish git-lesson: https://swcarpentry-i18n.
> github.io/git-novice/
> - There are some identified problems, but if you find a new one, please,
> create an issue
> 
> .
>
> The translation of the lessons is something where I think we can produce
> much more impact than what we have already achieved. Last year some people
> from the Latino American community did a great effort and fully translated a
> few lessons into Spanish
> .
> However, the way translations are maintained is not optimal while at the
> same time it doesn't get the same visibility than the English lessons.
>
> So, over the last months, I've been working to get something that eases
> the work of the translators and provides all the languages the same
> visibility - i.e. one page for all!
>
> I have managed to do that as follows:
>
> 1.- Create a jekyll theme for the carpentries. Instead of making every
> lesson to merge with upstream (and solve conflicts), we can use a
> repository for the theme and keep the lessons clean with only the material.
> Discussed in styles#229 .
> There it details some advantages.
>
> 2.- Modify the English text on that theme to variables/tokens, and include
> them as assets for each of the language translated. There's a PR to the
> themed style (carpentry-theme#4
> )
>
> 3.- Cleaned the git-lesson of all that's not needed, and create a
> submodule with the automatically generated material from the translation
> tool. The original lesson material gets converted into a po file using a tool
> I've adapted for that purpose
> . Such file can be used
> with translation tools (e.g., poedit, weblate) and the resultant gets
> converted into markdown. The idea is that such process gets done
> automatically in Travis once the translations have been accepted by the
> translation team. I'll keep working on the automated process during the
> following weeks and show it up once it's completed.
>
> An example of the final result is in https://swcarpentry-i18n.
> github.io/git-novice/ - There are still few issues
> ,
> and there are more little things I keep finding that needs to be improved.
> But I wanted to let you all know that this is almost there, and by using
> themes I think it will not oly help the translators but the maintanence of
> the normal lessons. If you want to give me a hand, don't doubt it for a
> second! I will really appreciate it.
>
> Cheers,
> David
>
> --
> Has recibido este mensaje porque estás suscrito al grupo "Latinoamerica"
> de Grupos de Google.
> Para cancelar la suscripción a este grupo y dejar de recibir sus mensajes,
> envía un correo electrónico a latinoamerica+unsubscr...@carpentries.org.
> Para publicar en este grupo, envía un correo electrónico a
> latinoamer...@carpentries.org.
> Para ver esta conversación en el sitio web, visita
> https://groups.google.com/a/carpentries.org/d/msgid/
> latinoamerica/CAD0QkqXHee-_NYtg6m-3noUyja60vAe2N0L2%
> 3DFz5C1BQiMskXQ%40mail.gmail.com
> 
> .
>

--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Tdb042c4bc0ecf365-Me6ad9a2c50c543503d3f2d89
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like Notebooks"

2018-08-29 Thread Bennet Fauber
Carol,

I don't think anyone is saying, "Tell people not to use notebooks."
The questions are about whether they improve the learning experience
for beginners.  There is also the question of whether use of the GUI
somehow defeats the purpose of the shell lesson by contradicting what
is often said there; namely, the command line is a powerful tool, you
should use it.

One respondent said they review ways to run python -- python, ipython,
jupyter -- then go on to use whatever in their workshop.  That goes
some way toward giving the participants choices.  It may not
counteract the message that is still implicit or implicit in the shell
lesson.

Perhaps the shell lesson should be modified so that the shell is
treated as a data management tool, and notebooks and Rstudio are
treated as development environments?  Then the dissonance between
advocating the shell in one lesson and abandoning it in another would
be lessened?

Perhaps all it would take would be a couple of examples of running a
notebook from the command line and telling it to start from scratch
and run all cells.  If a notebook can be run in the same way from a
prompt that a .py file can be, then maybe showing that capability
solves a whole bunch of problems.

The out-of-phase evaluation of things that is possible in notebooks
can also lead to irreproducible results, which is not, I think, in
keeping with the goals of the Carpentries.

Some people will want to keep notebooks, and others will want to
forego them; there should be a place for both approaches, and the one
that best fits the goal of the particular offering and the expected
audience should be chosen.  I think it would not be a service to
future learners if only one way were available for all circumstances.

Perhaps it would help to consider a full two-day workshop as a bundle,
and pick the lesson components that leads to the most coherent and
clear presentation of the most important points to the targeted
audience?  That would lesson the dissonance between command-line for
shell/git and GUI for R/Python, maybe?  Should there be an option to
do GUI-only workshops, no shell and a GUI for Git?  Similarly, a
command-line only option.  I think that might be worth considering.

On Tue, Aug 28, 2018 at 6:31 PM Carol Willing
 wrote:
>
> Hi all,
>
> There's positive discussion that has been started by Joel's talk. While I 
> liked his talk and there are some good points re: improving support for 
> software engineering best practices in Jupyter and JupyterLab notebooks, I'm 
> a bit concerned about the direction that this conversation is going.
>
> While all are entitled to their personal opinions and the Carpentries will 
> use notebooks when and if needed, I believe that the Carpentries would be 
> doing its students a disservice by warning people not to use the notebooks or 
> conda.
>
> The notebooks are a popular and effective tool for scientists and data 
> scientists to have in their toolbox. Project Jupyter won the ACM Software 
> System Award recently, and the ACM stated "These tools, which include 
> IPython, the Jupyter Notebook and JupyterHub, have become a de facto standard 
> for data analysis in research, education, journalism and industry." 
> https://awards.acm.org/software-system
>
> While it's great for folks to have different personal perspectives, I want to 
> make sure that the Carpentries and its lessons do not recommend that the 
> Jupyter Notebooks, IPython, and JupyterHub should be avoided by scientists 
> and data scientists.
>
> Thanks,
>
> Carol Willing
>
>
> > On 28 Aug 2018, at 11:38, Maxime Boissonneault 
> >  wrote:
> >
> > These kinds of things are rather hard to track in time, because everything 
> > is a moving target (conda and other package managers constantly get 
> > updated, but also version of packages changes), but here is a bit more 
> > details :
> >
> > - The 10x performance difference was with a user code, which I 
> > unfortunately can't share (nor do I still have a copy of it). It was about 
> > numpy, which may or may not have changed since MKL can now be shipped with 
> > Anaconda.
> >
> > - FFTW, 2x performance gain : These slides compare between Conda-provided 
> > (and those provided by other package managers) FFTW, and one which was 
> > built on an avx2 cluster, the performance gain is 2x (see slides 28 and 29 :
> > https://archive.fosdem.org/2018/schedule/event/installing_software_for_scientists/attachments/slides/2437/export/events/attachments/installing_software_for_scientists/slides/2437/20180204_installing_software_for_scientists.pdf
> >
> >
> > - Tensorflow, 7x gain for CPU version, slide 28 of this talk : 
> > https://archive.fosdem.org/2018/schedule/event/how_to_make_package_managers_cry/attachments/slides/2297/export/events/attachments/how_to_make_package_managers_cry/slides/2297/how_to_make_package_managers_cry.pdf
> >
> >   This one was not comparing Conda itself, but manylinux python wheels 
> > provided by the Tensorflow team, 

Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like Notebooks"

2018-08-29 Thread JACKSON Michael
Hi folks,

Variants on this theme  regularly recur over the years, whether the tools used 
in SWC are a means to an end or the end in themselves, whether the focus should 
be on a suite of tools used from the command-line, or not.

Maybe from this, instead of a one-size-fits-all-so-no-one-is-happy lesson, the 
Python lesson could be forked into two variants:

* One on Jupyter notebooks, as a way to wean researchers off of Excel and into 
a more programmatic way of doing things.
* One on good programming practice, using Python command-line and text editor. 
This lesson would be in the spirit of moving researchers onto command-line 
based tools.

Hosts could then decide which might be best for their audiences for a specific 
workshop.

cheers,
mike


From: Maxime Boissonneault 
Sent: 29 August 2018 13:54
To: discuss; Carol Willing
Cc: Titus Brown
Subject: Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like 
Notebooks"

Hi Carol,
I don't think this is where the subthread about Conda is heading.
Jupyter notbooks is orthogonal to Anaconda. You can definitely have
Jupyter without Conda. From a teaching perspective, both Conda and
Jupyter notebooks do a fine job. But just as it would be beneficial to
warn users about notebook caveats (hidden states and such), it would
also be good to do the same for conda caveats (performance).

Cheers,

Maxime




On 2018-08-28 6:29 PM, Carol Willing wrote:
> Hi all,
>
> There's positive discussion that has been started by Joel's talk. While I 
> liked his talk and there are some good points re: improving support for 
> software engineering best practices in Jupyter and JupyterLab notebooks, I'm 
> a bit concerned about the direction that this conversation is going.
>
> While all are entitled to their personal opinions and the Carpentries will 
> use notebooks when and if needed, I believe that the Carpentries would be 
> doing its students a disservice by warning people not to use the notebooks or 
> conda.
>
> The notebooks are a popular and effective tool for scientists and data 
> scientists to have in their toolbox. Project Jupyter won the ACM Software 
> System Award recently, and the ACM stated "These tools, which include 
> IPython, the Jupyter Notebook and JupyterHub, have become a de facto standard 
> for data analysis in research, education, journalism and industry." 
> https://awards.acm.org/software-system
>
> While it's great for folks to have different personal perspectives, I want to 
> make sure that the Carpentries and its lessons do not recommend that the 
> Jupyter Notebooks, IPython, and JupyterHub should be avoided by scientists 
> and data scientists.
>
> Thanks,
>
> Carol Willing
>
>
>> On 28 Aug 2018, at 11:38, Maxime Boissonneault 
>>  wrote:
>>
>> These kinds of things are rather hard to track in time, because everything 
>> is a moving target (conda and other package managers constantly get updated, 
>> but also version of packages changes), but here is a bit more details :
>>
>> - The 10x performance difference was with a user code, which I unfortunately 
>> can't share (nor do I still have a copy of it). It was about numpy, which 
>> may or may not have changed since MKL can now be shipped with Anaconda.
>>
>> - FFTW, 2x performance gain : These slides compare between Conda-provided 
>> (and those provided by other package managers) FFTW, and one which was built 
>> on an avx2 cluster, the performance gain is 2x (see slides 28 and 29 :
>> https://archive.fosdem.org/2018/schedule/event/installing_software_for_scientists/attachments/slides/2437/export/events/attachments/installing_software_for_scientists/slides/2437/20180204_installing_software_for_scientists.pdf
>>
>>
>> - Tensorflow, 7x gain for CPU version, slide 28 of this talk : 
>> https://archive.fosdem.org/2018/schedule/event/how_to_make_package_managers_cry/attachments/slides/2297/export/events/attachments/how_to_make_package_managers_cry/slides/2297/how_to_make_package_managers_cry.pdf
>>
>>This one was not comparing Conda itself, but manylinux python wheels 
>> provided by the Tensorflow team, but no doubt Conda has the same issue if 
>> they build for generic architectures.
>>
>>
>>
>> Basically, any package that is compiled in a portable manner, such as what 
>> Conda and manylinux wheels do, will have some degree of speedup if compiled 
>> for the target architecture instead. This is typically achieved by the team 
>> of analysts who manage a cluster.
>>
>> Cheers,
>>
>> Maxime
>>
>>
>> On 2018-08-28 2:20 PM, Ashwin Srinath wrote:
>>> I'm very interested to see these examples? We use and advocate the use
>>> of conda environments and I'm happy to be convinced otherwise.
>>>
>>> Thanks,
>>> Ashwin
>>>
>>> On Tue, Aug 28, 2018 at 2:17 PM, Maxime Boissonneault
>>>  wrote:
 Regarding performance, we have example of code using Anaconda-provided
 packages that run 10 times slower than the same code using locally built
 

Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like Notebooks"

2018-08-29 Thread April Wright via discuss
We keep an introduction to the notebook in the 00 lesson of the Data
Carpentry python materials (spider also covered, courtesy of Katrin Tirol).
There’s also a more comprehensive intro to notebooks, contributed by Maxim
Belkin, in the extras for that repo. I know other workshops (Like SWC
python) do sometimes use those two resources to get up and running in the
more novice-oriented workshops. Always happy for use and input!

Regardless of what toolset is being used, I agree with Dave that learners
need to be taught how to use it effectively.



On Wed, Aug 29, 2018 at 7:57 AM David Nicholson via discuss <
discuss@lists.carpentries.org> wrote:

> +1 for what Juan said.
>
> I think most of the cognitive load of notebooks can be addressed by giving
> people a crash course in Jupyter, and by narrating what you do, just like
> SWC suggests that instructors narrate what they do at the command line or
> in a REPL, e.g. "so I'm going to type print parentheses hello close
> parentheses in this cell and then execute it by hitting control enter", etc.
>
> I've seen Jupyter-heavy tutorials for example at SciPy that give these
> sorts of quickie intros to notebooks.
> I can't find an example but here's something similar I've done:
> https://github.com/NickleDave/EWIN-coding-bootcamp/blob/master/Python/bootcamp%20day%201%20%2B%20Python%20preliminaries.ipynb
> Seems like a good opportunity to explain that the most common use cases
> are presenting results/methods, teaching, and scratch coding, **not**
> writing production code / large code bases.
> Maybe that will help prevent people getting the wrong impression (and then
> giving a talk about it)   
>
> David Nicholson, Ph.D.
> nickledave.github.io
> https://github.com/NickleDave
> Prinz lab , Emory
> University, Atlanta, GA, USA
>
> On Wed, Aug 29, 2018 at 8:54 AM, Maxime Boissonneault <
> maxime.boissonnea...@calculquebec.ca> wrote:
>
>> Hi Carol,
>> I don't think this is where the subthread about Conda is heading. Jupyter
>> notbooks is orthogonal to Anaconda. You can definitely have Jupyter without
>> Conda. From a teaching perspective, both Conda and Jupyter notebooks do a
>> fine job. But just as it would be beneficial to warn users about notebook
>> caveats (hidden states and such), it would also be good to do the same for
>> conda caveats (performance).
>>
>> Cheers,
>>
>> Maxime
>>
>>
>>
>>
>>
>> On 2018-08-28 6:29 PM, Carol Willing wrote:
>>
>>> Hi all,
>>>
>>> There's positive discussion that has been started by Joel's talk. While
>>> I liked his talk and there are some good points re: improving support for
>>> software engineering best practices in Jupyter and JupyterLab notebooks,
>>> I'm a bit concerned about the direction that this conversation is going.
>>>
>>> While all are entitled to their personal opinions and the Carpentries
>>> will use notebooks when and if needed, I believe that the Carpentries would
>>> be doing its students a disservice by warning people not to use the
>>> notebooks or conda.
>>>
>>> The notebooks are a popular and effective tool for scientists and data
>>> scientists to have in their toolbox. Project Jupyter won the ACM Software
>>> System Award recently, and the ACM stated "These tools, which include
>>> IPython, the Jupyter Notebook and JupyterHub, have become a de facto
>>> standard for data analysis in research, education, journalism and
>>> industry." https://awards.acm.org/software-system
>>>
>>> While it's great for folks to have different personal perspectives, I
>>> want to make sure that the Carpentries and its lessons do not recommend
>>> that the Jupyter Notebooks, IPython, and JupyterHub should be avoided by
>>> scientists and data scientists.
>>>
>>> Thanks,
>>>
>>> Carol Willing
>>>
>>>
>>> On 28 Aug 2018, at 11:38, Maxime Boissonneault <
 maxime.boissonnea...@calculquebec.ca> wrote:

 These kinds of things are rather hard to track in time, because
 everything is a moving target (conda and other package managers constantly
 get updated, but also version of packages changes), but here is a bit more
 details :

 - The 10x performance difference was with a user code, which I
 unfortunately can't share (nor do I still have a copy of it). It was about
 numpy, which may or may not have changed since MKL can now be shipped with
 Anaconda.

 - FFTW, 2x performance gain : These slides compare between
 Conda-provided (and those provided by other package managers) FFTW, and one
 which was built on an avx2 cluster, the performance gain is 2x (see slides
 28 and 29 :

 https://archive.fosdem.org/2018/schedule/event/installing_software_for_scientists/attachments/slides/2437/export/events/attachments/installing_software_for_scientists/slides/2437/20180204_installing_software_for_scientists.pdf


 - Tensorflow, 7x gain for CPU version, slide 28 of this talk :
 

Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like Notebooks"

2018-08-29 Thread David Nicholson via discuss
 +1 for what Juan said.

I think most of the cognitive load of notebooks can be addressed by giving
people a crash course in Jupyter, and by narrating what you do, just like
SWC suggests that instructors narrate what they do at the command line or
in a REPL, e.g. "so I'm going to type print parentheses hello close
parentheses in this cell and then execute it by hitting control enter", etc.

I've seen Jupyter-heavy tutorials for example at SciPy that give these
sorts of quickie intros to notebooks.
I can't find an example but here's something similar I've done:
https://github.com/NickleDave/EWIN-coding-bootcamp/blob/master/Python/bootcamp%20day%201%20%2B%20Python%20preliminaries.ipynb
Seems like a good opportunity to explain that the most common use cases are
presenting results/methods, teaching, and scratch coding, **not** writing
production code / large code bases.
Maybe that will help prevent people getting the wrong impression (and then
giving a talk about it)   

David Nicholson, Ph.D.
nickledave.github.io
https://github.com/NickleDave
Prinz lab , Emory University,
Atlanta, GA, USA

On Wed, Aug 29, 2018 at 8:54 AM, Maxime Boissonneault <
maxime.boissonnea...@calculquebec.ca> wrote:

> Hi Carol,
> I don't think this is where the subthread about Conda is heading. Jupyter
> notbooks is orthogonal to Anaconda. You can definitely have Jupyter without
> Conda. From a teaching perspective, both Conda and Jupyter notebooks do a
> fine job. But just as it would be beneficial to warn users about notebook
> caveats (hidden states and such), it would also be good to do the same for
> conda caveats (performance).
>
> Cheers,
>
> Maxime
>
>
>
>
>
> On 2018-08-28 6:29 PM, Carol Willing wrote:
>
>> Hi all,
>>
>> There's positive discussion that has been started by Joel's talk. While I
>> liked his talk and there are some good points re: improving support for
>> software engineering best practices in Jupyter and JupyterLab notebooks,
>> I'm a bit concerned about the direction that this conversation is going.
>>
>> While all are entitled to their personal opinions and the Carpentries
>> will use notebooks when and if needed, I believe that the Carpentries would
>> be doing its students a disservice by warning people not to use the
>> notebooks or conda.
>>
>> The notebooks are a popular and effective tool for scientists and data
>> scientists to have in their toolbox. Project Jupyter won the ACM Software
>> System Award recently, and the ACM stated "These tools, which include
>> IPython, the Jupyter Notebook and JupyterHub, have become a de facto
>> standard for data analysis in research, education, journalism and
>> industry." https://awards.acm.org/software-system
>>
>> While it's great for folks to have different personal perspectives, I
>> want to make sure that the Carpentries and its lessons do not recommend
>> that the Jupyter Notebooks, IPython, and JupyterHub should be avoided by
>> scientists and data scientists.
>>
>> Thanks,
>>
>> Carol Willing
>>
>>
>> On 28 Aug 2018, at 11:38, Maxime Boissonneault <
>>> maxime.boissonnea...@calculquebec.ca> wrote:
>>>
>>> These kinds of things are rather hard to track in time, because
>>> everything is a moving target (conda and other package managers constantly
>>> get updated, but also version of packages changes), but here is a bit more
>>> details :
>>>
>>> - The 10x performance difference was with a user code, which I
>>> unfortunately can't share (nor do I still have a copy of it). It was about
>>> numpy, which may or may not have changed since MKL can now be shipped with
>>> Anaconda.
>>>
>>> - FFTW, 2x performance gain : These slides compare between
>>> Conda-provided (and those provided by other package managers) FFTW, and one
>>> which was built on an avx2 cluster, the performance gain is 2x (see slides
>>> 28 and 29 :
>>> https://archive.fosdem.org/2018/schedule/event/installing_
>>> software_for_scientists/attachments/slides/2437/
>>> export/events/attachments/installing_software_for_
>>> scientists/slides/2437/20180204_installing_software_for_scientists.pdf
>>>
>>>
>>> - Tensorflow, 7x gain for CPU version, slide 28 of this talk :
>>> https://archive.fosdem.org/2018/schedule/event/how_to_make_
>>> package_managers_cry/attachments/slides/2297/export/events/
>>> attachments/how_to_make_package_managers_cry/slides/
>>> 2297/how_to_make_package_managers_cry.pdf
>>>
>>>This one was not comparing Conda itself, but manylinux python wheels
>>> provided by the Tensorflow team, but no doubt Conda has the same issue if
>>> they build for generic architectures.
>>>
>>>
>>>
>>> Basically, any package that is compiled in a portable manner, such as
>>> what Conda and manylinux wheels do, will have some degree of speedup if
>>> compiled for the target architecture instead. This is typically achieved by
>>> the team of analysts who manage a cluster.
>>>
>>> Cheers,
>>>
>>> Maxime
>>>
>>>
>>> On 2018-08-28 2:20 PM, Ashwin 

Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like Notebooks"

2018-08-29 Thread Maxime Boissonneault

Hi Carol,
I don't think this is where the subthread about Conda is heading. 
Jupyter notbooks is orthogonal to Anaconda. You can definitely have 
Jupyter without Conda. From a teaching perspective, both Conda and 
Jupyter notebooks do a fine job. But just as it would be beneficial to 
warn users about notebook caveats (hidden states and such), it would 
also be good to do the same for conda caveats (performance).


Cheers,

Maxime




On 2018-08-28 6:29 PM, Carol Willing wrote:

Hi all,

There's positive discussion that has been started by Joel's talk. While I liked 
his talk and there are some good points re: improving support for software 
engineering best practices in Jupyter and JupyterLab notebooks, I'm a bit 
concerned about the direction that this conversation is going.

While all are entitled to their personal opinions and the Carpentries will use 
notebooks when and if needed, I believe that the Carpentries would be doing its 
students a disservice by warning people not to use the notebooks or conda.

The notebooks are a popular and effective tool for scientists and data scientists to have 
in their toolbox. Project Jupyter won the ACM Software System Award recently, and the ACM 
stated "These tools, which include IPython, the Jupyter Notebook and JupyterHub, 
have become a de facto standard for data analysis in research, education, journalism and 
industry." https://awards.acm.org/software-system

While it's great for folks to have different personal perspectives, I want to 
make sure that the Carpentries and its lessons do not recommend that the 
Jupyter Notebooks, IPython, and JupyterHub should be avoided by scientists and 
data scientists.

Thanks,

Carol Willing



On 28 Aug 2018, at 11:38, Maxime Boissonneault 
 wrote:

These kinds of things are rather hard to track in time, because everything is a 
moving target (conda and other package managers constantly get updated, but 
also version of packages changes), but here is a bit more details :

- The 10x performance difference was with a user code, which I unfortunately 
can't share (nor do I still have a copy of it). It was about numpy, which may 
or may not have changed since MKL can now be shipped with Anaconda.

- FFTW, 2x performance gain : These slides compare between Conda-provided (and 
those provided by other package managers) FFTW, and one which was built on an 
avx2 cluster, the performance gain is 2x (see slides 28 and 29 :
https://archive.fosdem.org/2018/schedule/event/installing_software_for_scientists/attachments/slides/2437/export/events/attachments/installing_software_for_scientists/slides/2437/20180204_installing_software_for_scientists.pdf


- Tensorflow, 7x gain for CPU version, slide 28 of this talk : 
https://archive.fosdem.org/2018/schedule/event/how_to_make_package_managers_cry/attachments/slides/2297/export/events/attachments/how_to_make_package_managers_cry/slides/2297/how_to_make_package_managers_cry.pdf

   This one was not comparing Conda itself, but manylinux python wheels 
provided by the Tensorflow team, but no doubt Conda has the same issue if they 
build for generic architectures.



Basically, any package that is compiled in a portable manner, such as what 
Conda and manylinux wheels do, will have some degree of speedup if compiled for 
the target architecture instead. This is typically achieved by the team of 
analysts who manage a cluster.

Cheers,

Maxime


On 2018-08-28 2:20 PM, Ashwin Srinath wrote:

I'm very interested to see these examples? We use and advocate the use
of conda environments and I'm happy to be convinced otherwise.

Thanks,
Ashwin

On Tue, Aug 28, 2018 at 2:17 PM, Maxime Boissonneault
 wrote:

Regarding performance, we have example of code using Anaconda-provided
packages that run 10 times slower than the same code using locally built
packages, optimized for the cluster architectures. That's not *a bit*
slower, that's a lot slower.

Regarding "cheating on your partner", that analogy is not by me, but the
point he is trying to carry is that Anaconda basically replaces any cluster
provided versions, which HPC center people are working hard to optimize.
Recent versions of Anaconda are even worse, by packaging things like
compilers and linkers, creating conflicts with cluster-provided system
libraries and tools, and creating a lot of debugging problems for users and
support people alike.

Regards,

Maxime


On 2018-08-28 12:48 PM, Rémi Rampin wrote:

2018-08-28 12:27 EDT, Maxime Boissonneault
:

As a side-discussion, I think we should also be wary of using Anaconda,
and tell users not to use it in a cluster environment. For reasons, see
here :
https://twitter.com/mboisso/status/1034476890353020928

Hi Maxime,

All I see in this thread is that "it's like cheating on your partner" (!!!)
and it's "generically optimized software" that might be a bit slower than
locally-built libs (interesting concern when using Python, an interpreted
scripting language (and on the slow side too)).

Could you 

Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like Notebooks"

2018-08-29 Thread Juan Nunez-Iglesias via discuss
On Wed, Aug 29, 2018, at 7:06 PM, Waldman, Simon wrote:
> Use notebooks, but take care to explain that python != the notebook,
> similarly to explaining that git != github.
I usually do a quick demo of several ways of running Python: Python
REPL, IPython REPL, python my_script.py, notebook. I use the analogy
that a Python program is like a videotape (yes I'm dating myself with
the young'uns), the various ways of running it are like various tape
players, each with different features etc, and the Python language is
like the VHS format, as opposed to Betamax or others.
I haven't systematically examined the results but anecdotally, I get
lots of nods, and people are then comfortable to pick their favourite
system. For example, I have seen people copying and pasting code from
notebooks to an IPython prompt because they like that model better.
Regarding the rest of the chatter on this thread, I think Chris Holdgraf
said it best[1] on Twitter: "I think it's useful to think of the
notebook as a communication tool that can be used for coding, rather
than the other way around". And, in a conversation[2] with Gaël
Varoquaux and Tal Yarkoni: "To your point about 'talking to your
manager', I think this can be generalized to 'talking to people who are
not developing code with you', which is probably the majority of people
in data science. I don't use notebooks for software development, but I
use them for most communications."
In short: I was nodding (and laughing) in agreement with all of Joel's
slides, but the talk misunderstands the purpose of notebooks.
Juan.

Links:

  1. https://twitter.com/choldgraf/status/1033972634730594304
  2. https://twitter.com/choldgraf/status/105522230497281

--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/T1505f74d7f6e32f8-M4fae7a8e02914a2f57f8eba0
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


RE: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like Notebooks"

2018-08-29 Thread Waldman, Simon
FWIW, when helping in SWC workshops, I’ve often found students getting confused 
in python notebooks due to hidden state.

The hidden state issues of notebooks are, however, no different to how many of 
us work in IDEs with interpreted languages (RStudio, MATLAB), where we run bits 
of code at a time while experimenting or debugging. It’s a useful approach. My 
suggestion is that we should aim to,

  1.  Use notebooks, but take care to explain that python != the notebook, 
similarly to explaining that git != github.
  2.  Make sure people understand how they work – that you’re running a bit of 
code at a time, that editing something that you’ve already run doesn’t change 
what’s in memory, and so forth. It’s a gotcha, but what we teach is full of 
gotchas, and instructors and helpers will need to take a bit of time with those 
who don’t immediately see what’s going on.

From: April Wright via discuss 
Sent: 29 August 2018 00:43
To: discuss@lists.carpentries.org
Subject: Re: [discuss] Slide of Joel Grus' JupyterCon Talk "I Don't Like 
Notebooks"

Hi all-

I agree with what Christina said. Someone upthread asked if the notebook was 
meant to compete with MatLab. But with novices, our competition isn't MatLab - 
it's Excel. You can open Excel, subset data, and plot it. Most of the learners 
I work with have experience doing that. They know those little moments of 
wonder and excitement of plotting data for the first time, and having it tell a 
cool story. My job is to convince low programming knowledge/awareness audiences 
that reproducible computational analyses aren't here to steal the joy from 
working with data, but to enable deeper and more exciting ways to interact with 
data. Jupyter Notebooks, as Christina & Adam noted, are great for that. The 
output looks nice, and provides immediate visual feedback. The interface is 
much less abstract, and is more familiar to learners I work with.

Qualitatively, I definitely notice that the conversations students have in 
workshops/class are very different when teaching with the notebook than 
without. No matter what I did teaching with a text editor and interpreter, for 
novices, switching always seems like too much. The pace at which the 
interpreter fills up, copy + paste when it works, copy + paste when you have 
typos - all that stuff has always seemed to be a little too much for someone 
who is just opening the interpreter for the first time. But when the notebook 
is used, the content rather than the content delivery seems to be where the 
discussion goes. You have to structure your lessons to promote discussion, but 
there's no technology that can remove the burden on the instructor to use it 
well.

Lastly, I don't know another technology that is doing as much for accessibility 
as Jupyter. All my undergraduates work more than 20 hours weekly. Some are 
renting computers from the school, and need to renew those rentals, and might 
not get the same computer after renewal. If there's a serious hurricane on the 
coast, my reservist students can get called up on deployment. It's hard to 
express the value of things like JupyterHub and Binder for in-browser click 
execution for this population. Maybe there's an in-browser click execute 
terminal emulator apart from Jupyter, and I don't know about it. But it strikes 
me that if we're serious about meeting students where they are, then we're 
serious about this particular technology.

I was pretty skeptical about notebooks for a long time, but I'm basically all 
in now for novice training.

--a


On Tue, Aug 28, 2018 at 10:59 PM Christina Koch via discuss 
mailto:discuss@lists.carpentries.org>> wrote:
Hi all,

I was envisioning using a text editor for teaching Python, and keep coming back 
to the idea that I (and my learners) want to be creating a record in a file of 
some kind (script or notebook) but we also want to be able to run bits of that 
file, not the whole thing at once (as it will grow over the course of the 
lesson).  I'd shy away from a simple editor + command line combination for an 
entire lesson, as I'd end up creating a lot of noise as I keep re-running the 
script. For R, developing a script in Rstudio allows you to run pieces at a 
time.  Is Spyder a Python equivalent that would allow me to add to my ("notes") 
script without executing the whole thing as I add pieces to it?

I'll second Adam's comment about "prettiness" -- esp. if you're doing anything 
with tables, I think the notebook interface is a lot less jarring, especially 
to novice programmers.

Christina

On Tue, Aug 28, 2018 at 11:28 AM Brian Stucky 
mailto:stuc...@flmnh.ufl.edu>> wrote:
I agree both with Joel's broader criticisms of notebooks and Kevin's 
SWC-specific comments.  As with Kevin, I have mostly been keeping this to 
myself, so I am happy to see this discussion.  Regarding SWC specifically, I 
have also thought it odd that the early parts of a workshop spend considerable 
effort trying to convince learners of the