Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Michael Hale Mon, 08 Jul 2013 12:31:37 -0700

Just a quick add-on to Jane and Paul about the scope of data in Wikidata. I 
think it is inevitable that Wikidata will start holding excess data that isn't 
being used in Wikipedia. Take the climate boxes that are on many city pages 
that show the average high and low per month for the last 5 years and whatnot. 
If we make a Lua module and template that generates those tables each time a 
new statement is added to specific properties in Wikidata then over time 
Wikidata will have a lot of historical weather data that isn't currently being 
displayed in Wikipedia. I think that's a good thing. No one deletes stuff these 
days. They just let the databases grow because storage is so cheap.

From: [email protected]
To: [email protected]
Date: Mon, 8 Jul 2013 15:13:21 -0400
Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and 
improved Wikicode

There are lots of code snippets scattered around the internet, but most of them 
can't be wired together in a simple flowchart manner. If you look at object 
libraries that are designed specifically for that purpose, like Modelica, you 
can do all sorts of neat engineering tasks like simulate the thermodynamics and 
power usage of a new refrigerator design. Then if your company is designing a 
new insulation material you would make a new "block" with the experimentally 
determined properties of your material to include in the programmatic flowchart 
to quickly calibrate other aspects of the refrigerator's design. To my 
understanding, Modelica is as big and good as it gets for code libraries that 
represent physically accurate objects. Often, the visual representation of 
those objects needs to be handled separately. As far as general purpose, 
standard programming libraries go, Mathematica is the best one I've found for 
quickly prototyping new functionality. A typical "web mashup" app or site will 
combine functionality and/or data from 3 to 6 APIs. Mobile apps will typically 
use the phone's functionality, an extra library for better graphics support, a 
proprietary library or two made by the company, and a couple of web APIs. A 
similar story for desktop media-editing programs, business software, and 
high-end games except the libraries are often larger. But there aren't many 
software libraries that I would describe as huge. And there are even fewer that 
manage to scale the usefulness of the library equally with the size it occupies 
on disk.

Platform fragmentation (increase in number and popularity of smart phones and 
tablets) has proven to be a tremendous challenge for continuing to improve 
libraries. I now just have 15 different ways to draw a circle on different 
screens. The attempts to provide virtual machines with write-once run-anywhere 
functionality (Java and .NET) have failed, often due to customer lock-in 
reasons as much as platform fragmentation. Flash isn't designed to grow much 
beyond its current scope. The web standards can only progress as quickly as the 
least common denominator of functionality provided by other means, which is 
better than nothing I suppose. Mathematica has continued to improve their 
library (that's essentially what they sell), but they don't try to cover a lot 
of platforms. They also aren't open source and don't attempt to make the entire 
encyclopedia interactive and programmable. Open source attempts like the Boost 
C++ library don't seem to grow very quickly. But I think using Wikipedia 
articles as a scaffold for a massive open source, object-oriented library might 
be what is needed.

I have a few approaches I use to decide what code to write next. They can be 
arranged from most useful as an exercise to stay sharp in the long term to most 
immediately useful for a specific project. Sometimes I just write code in a 
vacuum. Like, I will just choose a simple task like making a 2D ball bounce 
around some stairs interactively and I will just spend a few hours writing it 
and rewriting it to be more efficient and easier to expand. It always gives me 
a greater appreciation for the types of details that can be specified to a 
computer (and hence the scope of the computational universe, or space of all 
computer programs). Like with the ball bouncing example you can get lost 
defining interesting options for the ball and the ground or in the geometry 
logic for calculating the intersections (like if the ball doesn't deform or if 
the stairs have certain constraints on their shape there are optimizations you 
can make). At the end of the exercise I still just have a ball bouncing down 
some stairs, but my mind feels like it has been on a journey. Sometimes I try 
to write code that I think a group of people would find useful. I will browse 
the articles in the areas of computer science category by popularity and start 
writing the first things I see that aren't already in the libraries I use. So 
I'll expand Mathematica's FindClusters function to support density based 
methods or I'll expand the RandomSample function to support files that are too 
large to fit in memory with a reservoir sampling algorithm. Finally, I write 
code for specific projects. I'm trying to genetically engineer turf grass that 
doesn't need to be cut, so I need to automate some of the work I do for GenBank 
imports and sequence comparisons. For all of those, if there was an organized 
place to put my code afterwards so it would fit into a larger useful library I 
would totally be willing to do a little bit of gluing work to help fit it all 
together.

> Date: Mon, 8 Jul 2013 19:13:54 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and 
> improved Wikicode
> 
> I am all for a "dictionary of code snippets", but as with all
> dictionaries, you need a way to group them, either by alphabetical
> order or "birth date". It sounds like you have an idea how to group
> those code samples, so why don't you share it? I would love to build
> my own "pipeline" from a series of algorithms that someone else
> published for me to reuse. I am also for more sharing of datacentric
> programs, but where would the data be stored? Wikidata is for data
> that can be used by Wikipedia, not by other projects, though maybe
> someday we will find the need to put actual weather measurements in
> Wikidata for some oddball Wikisource project tp do with the history of
> global warming or something like that.
> 
> I just don't quite see how your idea would translate in the
> Wiki(p/m)edia world into a project that could be indexed.
> 
> But then I never felt the need for "high-fidelity simulations of
> virtual worlds" either.
> 
> 2013/7/6, Michael Hale <[email protected]>:
> > I have been pondering this for some time, and I would like some feedback. I
> > figure there are many programmers on this list, but I think others might
> > find it interesting as well.
> > Are you satisfied with our progress in increasing software sophistication as
> > compared to, say, increasing the size of datacenters? Personally, I think
> > there is still too much "reinventing the wheel" going on, and the best way
> > to get to software that is complex enough to do things like high-fidelity
> > simulations of virtual worlds is to essentially crowd-source the translation
> > of Wikipedia into code. The existing structure of the Wikipedia articles
> > would serve as a scaffold for a large, consistently designed, open-source
> > software library. Then, whether I was making software for weather prediction
> > and I needed code to slowly simulate physically accurate clouds or I was
> > making a game and I needed code to quickly draw stylized clouds I could just
> > go to the article for clouds, click on C++ (or whatever programming language
> > is appropriate) and then find some useful chunks of code. Every article
> > could link to useful algorithms, data structures, and interface designs that
> > are relevant to the subject of the article. You could also find data-centric
> > programs too. Like, maybe a JavaScript weather statistics browser and
> > visualizer that accesses Wikidata. The big advantage would be that
> > constraining the design of the library to the structure of Wikipedia would
> > handle the encapsulation and modularity aspects of the software engineering
> > so that the components could improve independently. Creating a simulation or
> > visualization where you zoom in from a whole cloud to see its constituent
> > microscopic particles is certainly doable right now, but it would be a lot
> > easier with a function library like this.
> > If you look at the existing Wikicode and Rosetta Code the code samples are
> > small and isolated. They will show, for example, how to open a file in 10
> > different languages. However, the search engines already do a great job of
> > helping us find those types of code samples across blog posts of people who
> > have had to do that specific task before. However, a problem that I run into
> > frequently that the search engines don't help me solve is if I read a
> > nanoelectronics paper and I want to do a simulation of the physical system
> > they describe I often have to go to the websites of several different
> > professors and do a fair bit of manual work to assemble their different
> > programs into a pipeline, and then the result of my hacking is not easy to
> > expand to new scenarios. We've made enough progress on Wikipedia that I can
> > often just click on a couple of articles to get an understanding of the
> > paper, but if I want to experiment with the ideas in a software context I
> > have to do a lot of scavenging and gluing.
> > I'm not yet convinced that this could work. Maybe Wikipedia works so well
> > because the internet reached a point where there was so much redundant
> > knowledge listed in many places that there was immense social and economic
> > pressure to utilize knowledgeable people to summarize it in a free
> > encyclopedia. Maybe the total amount of software that has been written is
> > still too small, there are still too few programmers, and it's still too
> > difficult compared to writing natural languages for the crowdsourcing
> > dynamics to work. There have been a lot of successful open-source software
> > projects of course, but most of them are focused on creating software for a
> > specific task instead of library components that cover all of the knowledge
> > in the encyclopedia.
> 
> _______________________________________________
> Wikidata-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode

Reply via email to