Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-08-31 Thread Ben Goertzel
Thinking on this slightly more... So we know smaller models in any reasonable language including CoDD (https://arxiv.org/abs/2004.05268) will tend to lead to more accurate models, thus also to better agreement of models learned across different subsamples (and thus to smaller decision tree

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-08-30 Thread James Bowery
It _is_ obvious that even _deploying_ Big Language Model inference is expensive. Sure, running a given inference is nothing compared to the model induction, but if you're, say, Google, and you want to deploy your latest-greatest BLM to a sizable fraction of Earth's population, it might pay to at

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-08-30 Thread Ben Goertzel
It's not obvious that cheap/simple/straightforward methods of knowledge distillation would actually perform the needed compactification effectively, though... It may be that performing this compactification of the huge models, is a much harder learning problem than actually learning the huge

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-08-30 Thread James Bowery
My point is that when one obtains a good model, one _is_ finding a compact model but its compactness is obscured by the sloppy definition of the word "parameter". A big step toward saving civilization from going down the rat hole would be for these Big Language Model guys to run knowledge

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-08-30 Thread Ben Goertzel
I.e. taking a big model and then averaging its behavior across multiple subsamples, one is effectively getting a "big ensemble of big models" that has the same information content as a more compact model (which one doesn't actually have explicitly on hand) It may seem a perverse way to do things,

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-08-30 Thread Ben Goertzel
I think the point is that A -- model compactness B -- consistency of model results across data subsamples [regardless of model size] are under broad conditions basically equivalent, and since we have learning algorithms for finding overparameterized models that are consistent across data

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-08-30 Thread James Bowery
Not having read his paper, I can state from my own experience going back to the late 1980s doing multi-source neural image segmentation that overparameterized models are what you need during initial training. An appropriate learning algorithm will naturally reduce the complexity of the model

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-08-30 Thread Ben Goertzel
James, have you seen Poggio's attempt to argue that these overparametrized models are actually OK in terms of learning theory? https://dspace.mit.edu/handle/1721.1/124343 The basic argument seems to be: -- In a space of overparametrized models, stability under subsampling (e.g. leave-one-out

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-04 Thread Rob Freeman
On Sat, Jul 4, 2020 at 2:04 PM Ben Goertzel wrote: > ... I believe we discussed some time ago what sort of chaotic dynamical > model I think would be most interesting to explore in a language > learning context, and my thoughts were a little different than what > you're doing, but I haven't had

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-04 Thread Ben Goertzel
I think that that BOTH chaotic dynamical networks AND grammar rule learning/utilization are part of how human minds handle language ... IMO they both work together... Obviously, language learning related work has been a very small fraction of what i've been doing w/ myself for the last N years...

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread immortal . discoveries
Grounded images? Formal text? GPT-2 works on text, GPT-2 works on images. It's about the patterns. The patterns! -- Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T100f708e32ae7327-M8686257819f9e11a4d885639

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread Ben Goertzel
Similarly btw, the attractors in a dynamical system only capture a portion of the dynamics -- they, like emergent symbolic-dynamics grammars, are also a layer of abstraction that ignores a lot of the complexity that exists in the trajectories... On Fri, Jul 3, 2020 at 6:46 PM Ben Goertzel wrote:

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread Ben Goertzel
> So you found grammars which adequately summarize a symbolic dynamics for > Cisco networks, and are still happy with the idea such generalizations will > capture all the important behaviour? You don't think there are some > behaviours of Cisco networks which are only explained at the network

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread Rob Freeman
On Sat, Jul 4, 2020 at 3:28 AM Ben Goertzel wrote: > We have indeed found some simple grammars emerging from the attractor > structure of the dynamics of computer networks, with the grammatical > forms correlating with network anomalies. Currently are wondering if > looking at data from more

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread James Bowery
On Fri, Jul 3, 2020 at 6:53 PM Ben Goertzel wrote: > > > > Is it true that selecting the smallest executable archive of a training > dataset corresponds to the model that is most-likely to out-predict other > models? > > > > Right? Isn't that the experimental program in a nutshell? > > Well for

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread Ben Goertzel
> > Is it true that selecting the smallest executable archive of a training > dataset corresponds to the model that is most-likely to out-predict other > models? > > Right? Isn't that the experimental program in a nutshell? Well for nontrivial datasets finding the smallest compressing program

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread James Bowery
On Fri, Jul 3, 2020 at 4:17 PM Ben Goertzel wrote: > ...So before setting up a practical causal modeling competition w/o out of > sample data, premised on this not-quite-applicable math, I would like to > have more practical understanding of how these AIT based methods of causal > inference pan

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread Ben Goertzel
Hmmm... yeah I understand that if you look at algorithmic Markov conditions, and causal dags defined by algorithmic information rather than Shannon/statistical information, you can infer causality without needing out-of-sample testing https://arxiv.org/abs/0804.3678 However as you presumably

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread James Bowery
Everything you say applies to _any_ approach to climate science except for one, AIT, about which you are obviously and egregiously mistaken in this respect: The whole point of AIT is the idea that you don't _need_ out-of-sample data to do causal inference. Indeed, this is the reason I am so

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread immortal . discoveries
Hey Ben did you know it takes less information if you line up houses and group stores near stores, restaurants near restaurants, and group time events near same times? This stacking and grouping is syntactics and semantics. For syntactics, you don't store the same word twice, you just update

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread Ben Goertzel
Hmm... I am not sure that mixing up AI with climate-change is the cleanest, simplest approach ... there are so many weird issues with data related to climate And importantly, we don't really have a good idea of how effective a model can be found based on available data, even in principle --

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread Ben Goertzel
We have indeed found some simple grammars emerging from the attractor structure of the dynamics of computer networks, with the grammatical forms correlating with network anomalies. Currently are wondering if looking at data from more complex computer networks will yield more interesting and

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread Rob Freeman
Ben, How did the network, symbolic dynamics, work you planned last year work out? Specifically you said (July 17, 2019): "...applying grammar induction to languages derived from nonlinear dynamics of complex systems via symbolic dynamics, is not exactly about artificial languages, it's about a

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread James Bowery
Huawei, the corporate sponsor, isn't really interested in AI as evidenced by their business _and_ by the very tight resource requirements consistent only with embedded systems compressing realtime data streams. I'd be pleasantly shocked if such tight resource constraints were to advance AI to any

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread immortal . discoveries
What we *want to work on* to improve our current AI (or anything, ex. cars) is going to be something that is most used or needed frequently or is agnostic/universal. Which is a pattern. So everything in life is about Compression. (We can compress a text file if we find patterns (syntactic and

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread Matt Mahoney
In addition to the Hutter prize, there is the global data compression contest. https://globalcompetition.compression.ru/ On Thu, Jul 2, 2020, 3:19 PM James Bowery wrote: > Just spitballing here: > > Occam's Razor Models of Climate Change > > The competition classes would be defined in terms of

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread James Bowery
Just spitballing here: Occam's Razor Models of Climate Change The competition classes would be defined in terms of the scale of the datasets available as well as the computer resources. On Thu, Jul 2, 2020 at 2:06 PM Ben Goertzel wrote: > X-Prize tends to be driven by corporate donors who

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread Ben Goertzel
X-Prize tends to be driven by corporate donors who fund prizes, so if you/we could convince someone to fund a prize purse for " a series of AI competitions based on resource constraint classes." then X-Prize Foundation would likely strongly consider it... I do know the X-Prize folks fairly well

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread Ben Goertzel
> Can we agree that, regardless of the frontier search heuristics, it would > benefit AI, both general and narrow, to wave about the garlic of "Resource > Constraint" providing "competition classes" _within_ which metrics (of > whatever justification) are fairly compared? Yeah, while I feel

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread James Bowery
On Thu, Jul 2, 2020 at 10:36 AM Ben Goertzel wrote: > For sure "better accuracy with fewer parameters" is a way, way better > rough guide to model selection than "more parameters makes your model > more impressive" > > But obviously the OpenAI researchers know this well, and advertising > the

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread Ben Goertzel
For sure "better accuracy with fewer parameters" is a way, way better rough guide to model selection than "more parameters makes your model more impressive" But obviously the OpenAI researchers know this well, and advertising the big size of their models is more of a business-marketing thing

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread James Bowery
On Thu, Jul 2, 2020 at 12:50 AM Ben Goertzel wrote: > ...I.e. I believe morphisms like the ones alluded to in > https://arxiv.org/abs/1703.04368 , https://arxiv.org/abs/1703.04361 > are more likely to work out well in the context of such grammars, than > in the context of whatever ends up being

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread Ben Goertzel
On Thu, Jul 2, 2020 at 6:24 AM John Rose wrote: > > On Wednesday, July 01, 2020, at 9:02 PM, Ben Goertzel wrote: > > Basically what these NNs are doing is finding very large volumes of > simple/shallow data patterns and combining them together. Whereas there are > in fact deeper and more

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread John Rose
On Wednesday, July 01, 2020, at 9:02 PM, Ben Goertzel wrote: > Basically what these NNs are doing is finding very large volumes of simple/shallow data patterns and combining them together. Whereas there are in fact deeper and more abstract patterns present in language data, the transformer NNs

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread immortal . discoveries
I must sleep but I think I just made a big discovery above. -- Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T100f708e32ae7327-M15ac8084b1258783d75d6b14 Delivery options:

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread immortal . discoveries
To be clear, I mean, hey, who would work on things that aren't going to "kill 9 birds with 1 stone". GPT-2/ my code does a lot for being simple code. We don't code every response it says! So, patterns, compression. Get it? :) Same same. -- Artificial

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread immortal . discoveries
I do have something to add to that Ben. Yes while compression is an incredible way to improve prediction/AGI, I did find some things that are odd. So you know how AGI needs to do Not, And, Nor, If, Else, segment words, edit Papers, find similar words, change task, and so much more? Well some of

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread Ben Goertzel
> When we're talking about such practicalities, it behooves us to do better > than pull a lightswitch-brain maneuver and say that "the whole beautiful > mathematical house of cards falls apart" and that's that! Rationality, in > fact, demands of us taking RATIOs of rather than dealing in

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread James Bowery
On Wed, Jul 1, 2020 at 8:03 PM Ben Goertzel wrote: > *** > But to address what I think is your strongest, albeit not directly > stated point, if I may put words in _your_ mouth (forgive me if I'm > wrong): > > "Algorithmic Information Theory's theorem about the quality of a model > based on the

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread Ben Goertzel
*** But to address what I think is your strongest, albeit not directly stated point, if I may put words in _your_ mouth (forgive me if I'm wrong): "Algorithmic Information Theory's theorem about the quality of a model based on the smallest executable archive of the data, while strong in that

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread James Bowery
Not to obsess about DSM-IV diagnoses, but one man's "obsession" is another man's principles and, after all, if you abandon principles when it is inconvenient, why pretend to have principles at all? This includes, of course, the Minimum Description Length Principle, as some call it. But you make

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread Ben Goertzel
IMO obsessing on compression ratios is not useful in this context... I am pretty confident someone will come up with a 10x smaller version of GPT3 via a laborious distillation process -- but this doesn't necessarily imply this distilled version will be smarter -- in fact it may make exactly or

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
Note, italy and france can be both said recently equally pretty much, but if you say france 30 times and italy 32 times, "italian" becomes more likely predicted, wrongly. -- Artificial General Intelligence List: AGI Permalink:

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
All because it was activated more recently!! -- Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T100f708e32ae7327-M2f12fec7c96da8cd79356ba7 Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
GPT-2: "*I grew up in France. I never visit Italy. I speak fluent *Italian" -- Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T100f708e32ae7327-M4d420fba51e2acd2c3167678 Delivery options:

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
it's a spectrum it's more accurate than otherwise, just not the best -- Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T100f708e32ae7327-M28d509c028bb38b0df5bba8d Delivery options:

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread stefan.reich.maker.of.eye via AGI
Excerpt from Gary Marcus's article if anyone hasn't read it: Here's the problem: upon careful inspection, it becomes apparent the system has no idea what it is talking about: it cannot follow simple sequence of events nor reliably

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread stefan.reich.maker.of.eye via AGI
> their biggest problem is that they don't capture semantics, as Gary Marcus's in-depth critique of GPT2 highlighted So good to read this. -- Artificial General Intelligence List: AGI Permalink:

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
And how bout fixed code size too. -- Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T100f708e32ae7327-Mb7c4346de1ec852920e61b3b Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
Even if you keep the dataset the same size, the net can "appear" smarter if it stores all the answers! Looking human like, but didn't come up with the answers itself. So to test how smart the AI is requires fixed dataset size and *maybe even fixed net size*! Or if want, variable net size, but

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
Correct. The gold standard evaluation is Lossless Compression, than Perplexity. To measure/compare progress, we need keep the dataset the same size and include the net size (as it could just store all the dataset), then we can see which has better intelligence / can utilize limited data better.

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread James Bowery
That's my point: If benchmarks were in terms of code bits + parameter bits + correction bits (ie: executable archive size), _any_ progress in *semantics* would show up in a major way. Perhaps that's why people like bad benchmarks? It doesn't take as much intelligence to throw money and

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
It got so big and less useful because it is using now less general, long or rare strings! -- Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T100f708e32ae7327-Mb83229a8b3b2aead860737ed Delivery options:

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
*Point in focus: GPT-3 is only a barely noticeably better than GPT-2 small model.* *GPT-3: 175,000,000,000 parameters* *GPT-2 (small model) 117,000,000 parameters* *GPT-2 largest model was 1.5B and barely noticeably better than the Small model as well. All those digits after 117million were

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
The thing about AI is you don't need to scale it to test it if it's smarter, you can easily see in small scale trained models! My and others algorithms all show this is exactly true. Scale is for when you want to sell it, it's when you're done making the code smarter and just want to feed it

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread Alan Grimes via AGI
immortal.discover...@gmail.com wrote: > Making a network bigger (wider and/or deeper) should lead to adding to > the net less general systems. If it sees enough data, it will learn > very rare words, and longer memories that are less shared by other > memories. > > A small net can store/ model

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread Ben Goertzel
Distillation of large NN models into smaller ones is a major sub-industry in the NN research field, and of commercial value due to the need to run models on phones or embedded devices. So, no doubt we will see distilled GPT3 type models before long... Note though that while GPT3 is huge, it's

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread stefan.reich.maker.of.eye via AGI
They probably scare away competitors this way too. Who can afford a machine of that size? -- Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T100f708e32ae7327-Mfdc174f3e357bf20f5146e81 Delivery options:

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
Making a network bigger (wider and/or deeper) should lead to adding to the net less general systems. If it sees enough data, it will learn very rare words, and longer memories that are less shared by other memories. A small net can store/ model TONS. Just look at DNA size, it makes a human. You

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread James Bowery
On Wed, Jul 1, 2020 at 11:44 AM Matt Mahoney wrote: > I estimate the complexity of the global economy to be 10^17 bits. Google's > language model has a factor of 10^5 to go. > But that misses my main point, which is that if Google were simply to run standard neural network compression

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread immortal . discoveries
Haha! 600B! Pride! But scaling more doesn't give you as much as easy, it's a curve. If you give a "wow impression" rating on the models of GPT-2 (100% being true AGI), it's be something like: gpt2 small model - 40%, medium - 46%, large - 48%, XL - 49%.so adding data has a limit and won't

Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-01 Thread Matt Mahoney
I estimate the complexity of the global economy to be 10^17 bits. Google's language model has a factor of 10^5 to go. 10^17 = 10^9 (Landauer's estimate of human long term memory) x 10^10 (population of Earth) x 10^-2 (fraction of knowledge unique to each person, estimated as the cost of replacing