I was asking Philippe but hope to see you also at ESUG ! Envoyé de mon iPhone
> Le 16 mai 2017 à 19:02, Oleksandr Zaytsev <[email protected]> a écrit : > > I would love to, but to go to Lille from my country I would need a visa. > Which is not that easy to acquire. > So maybe I will come to PharoDays 2018. > And I will definitely try to come to ESUG Conference in September. > > Oleks > >> On Tue, May 16, 2017 at 7:26 PM, <[email protected]> wrote: >> >> >> Envoyé de mon iPhone >> >>> Le 11 mai 2017 à 11:43, "[email protected]" <[email protected]> a écrit : >>> >>> ---------- Message transféré ---------- >>> De : "[email protected]" <[email protected]> >>> Date : 11 mai 2017 10:54 >>> Objet : Re: 11/05/17 - Tabular Data Structures for Data Analysis - >>> Oleksandr Zaytsev >>> À : "Nick Papoylias" <[email protected]> >>> Cc : >>> >>> >>> >>>> On Thu, May 11, 2017 at 10:20 AM, Nick Papoylias <[email protected]> >>>> wrote: >>>> >>>> >>>>> On Thu, May 11, 2017 at 5:24 AM, Oleksandr Zaytsev >>>>> <[email protected]> wrote: >>>>> A. Work done >>>>> Downloaded the threaded VM as suggested by Esteban Lorenzano to make >>>>> Iceberg work. And it does! I have successfully pushed my NeuralNetwork >>>>> code to GitHub: https://github.com/olekscode/MLNeuralNetwork >>>>> Joined the PolyMath organization on GitHub >>>>> Created a repository for the TabularDataset project >>>>> https://github.com/PolyMathOrg/TabularDataset as a part of PolyMath >>>>> organization on GitHub >>>>> Fixed a PolyMath issue #25 and made a PR >>>>> Read an article from Wolfram Mathematica documentation regarding Dataset. >>>>> It was one of the reading suggestions sent to me by Nick Papoylias >>>>> B. Next steps >>>>> Fix more issues of PolyMath, using Iceberg. I have to get used to it by >>>>> the time the coding phase starts >>>>> Read the rest of Nick Papoylias's suggestions >>>>> C. Help needed >>>>> The Dataset in Wolfram, as well as Pandas in Python, has a very advanced >>>>> indexing system. Smalltalk has its own special conventions for indexing, >>>>> so I think that it would be great if I got familiar with them. Could you >>>>> suggest me some reading on this topic (what are the indexing conventions >>>>> in Smalltalk?). >>>>> For example, in Wolfram, I can write dataset[[-1]] to extract the last >>>>> row. But in Pharo indexes can not be negative. In Pharo I would say >>>>> dataset last. But how about dataset[[-5]]? >>>> This would be a good exercise for you ;) In Pharo you can easily add >>>> negative indexing yourself. >>>> >>>> Hint: You know the index of the last element, since this is the size of >>>> the collection, so... ;) >>>> >>> No need for changes, this exists already. >>> >>> Use atWrap: index put: value and atWrap: with negative indexes. >>> 'hello' atWrap: -2 >>> >>> There is a specific version for Array using a primitive. >>> #[ 10 20 30 40 ] atWrap: -1 >>> >>> atWrap:0 gives you the last item. >>> atWrap: -1 gives 30 >>> >>> This is different from 0 based index languages. >>> >>> The interesing thing about atWrap: is that it uses modulo interally so you >>> do not need to care about that. >>> >>> ($/ split: 'abc/def/ghi/jkl') atWrap: -1 >>> --> 'ghi' >>> >>> The Matrix class has a bunch of things API wise but the class is highly >>> inefficient, doing copies all the time etc. It would be nice to have some >>> kind of futures/copy on write style things in there. >>> >>> I miss cbind and rbind. These are useful. I have some half baked super >>> inefficient implementations of these things for Matrix. >>> >>> https://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html >>> >>> The ability to name columns is also nice to have. >>> >>> In R one does: >>> >>> df <- dataframe() >>> cbind(df, c(1,2,3)) >>> cbind(df, c(4,5,6)) >>> names(df)<-("C1", "C2", "C3") >>> names can be found back with: >>> >>> names(df) >>> >>> A Smalltalkish style would be welcome. >>> >> >> >> >> Interesting ! Are you coming to PharoDays ? We can talk about that if we >> found time. >> >>> Maybe looking at the Voyage queries can be helpful. >>> >>> Phil >>> >>> >>>> Try adding an extention method to Ordrered or SequenceableCollection. >>>> >>>> If the Pharo by example chapter is not enough or the MOOC, read the source >>>> itself in the core, to see how basic methods are implemented (it is less >>>> scary, >>>> than it sounds). >>>> >>>> You can also try Chapters 9, 10, 11 of the blue book (some API changes may >>>> apply): >>>> >>>> http://sdmeta.gforge.inria.fr/FreeBooks/BlueBook/Bluebook.pdf >>>> >>>>> Or what is the best way of implementing this index: dataset[["name"]] >>>>> (extracts a named row), dataset[[1]] (extracts the first row)? Should I >>>>> create two separate messages: dataset rowNamed: 'name' and dataset rowAt: >>>>> 1? >>> rowNamed: >>> rowAt: >>> >>> yes, look like it. >>> >>> But if we want to model things like R dataframes for example, this has to >>> be seen as a vectorized operation, so you can to use row slices, column >>> slices, and logical indexes. >>> >>> Check this out: >>> >>> http://www.r-tutor.com/r-introduction/data-frame/data-frame-row-slice >>> https://www.r-bloggers.com/working-with-data-frames/ >>> >>> >>>> The internal representation of your data-structure can be anything at the >>>> moment, as long as you encapsulate it. >>>> >>>> (ie it can be nested OrderedCollections with meta-data for column-names to >>>> indexes, or dictionary of collections etc). >>>> >>>> If you don't expose it to the user (ie return it from the public api, or >>>> expect knowledge of it in argument passing), >>>> we can easily change it later. So first make it work, and we optimize >>>> later ;) >>>> >>>> For your case it will be a little bit trickier because you also have the >>>> notions of a) rows and b) columns, which >>>> are exposed to the user. So you would need to create abstractions for >>>> these too. >>>> >>>> Cheers, >>>> >>>> Nick >>>>> >>>>> >>>>> If someone else is having problems with Iceberg on Linux, try downloading >>>>> the threaded VM: >>>>> wget -O- get.pharo.org/vmT60 | bash >>>>> And use SSH (not HTTPS) remote URL. >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Pharo Google Summer of Code" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/pharo-gsoc/CAEp0Uzu-8fw3dA6ezVoj-QptvLcB8cWPHvZ1tfLg1Ce8qkTqfQ%40mail.gmail.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "Pharo Google Summer of Code" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/pharo-gsoc/CACEStOgLC6HbYJ8HBLHWfs5%2BwqN3ib_kdVGuVizx7Gh1c0sM%3DA%40mail.gmail.com. >>>> For more options, visit https://groups.google.com/d/optout. >>> >
