Replying to Abu. On Wed, Jan 8, 2025 at 12:34 PM Abu Naser <[email protected]> wrote: > > Good to hear from you. > I have done some googling about the LLM, I have found many people are using > LLM for analysing genomic data.
I'd be amazed if there weren't. Pharma is a $1.6 trillion-dollar business in the US alone. https://www.statista.com/topics/1764/global-pharmaceutical-industry/ If some of that money *wasn't* going into LLM's, I would conclude that I had died and been reanimated in a crappy universe simulation. > (https://github.com/MAGICS-LAB/DNABERT_2?tab=readme-ov-file that can easily > be used via https://huggingface.co/docs/transformers/en/index) > Their approach is usual, 1st train a model and then use it to predict. In our > case, where do we get the knowledge to store on atomspace? That's a great question. (If I understand you correctly) I assume you already know how to get, have access to oodles and poodles of genomic data. There are open, public databases of genomic data, in all shapes and sizes. No doubt there's even more that's proprietary, say, the 23+me dataset. I think the issue is "how do I hook up an LLM to the AtomSpace?" and the short answer is "I don't know". Well, I do know, but I am unhappy with all the ways I know how. So I've recently and with some urgency started to think about "what is the *best* way to hook up LLMs to the atomspace?" and I don't have an answer to that, yet. Might take a while > I can certainly to do some reading on their work and figure out how they do > it. Yes, please! If you can then explain it to me, in email, that would be excellent. If you can't explain it, then some paper references... > Do you have the pattern matching tool set in github? Yes. https://github.com/opencog/learn Terminology: in comp-sci, "pattern matching" usually refers to a very simple kind of matching, called "regular expressions" (regex), with theory developed in 1960's and a standard part of Unix by the 1980's see e.g. "perl regex". Besides regex, many programming languages have a similar but different idea: scheme has "hygenic macros". as do other functional languages. Python does not javascript does not. I think some of the latest and weirdest c++ standards track is trying to go that way. C++ templates are kind-of pattern-matcher-like-ish, but they're simple, and 30-35 years old, now. In atomese, I made the mistake of calling it's graph rewriting system "pattern matching". Bad mistake, because it makes people think of the above rather simple systems. In fact, Atomese has 2 or 3 or 4 distinct systems that, uhh, "process patterns" At the bottom end, its the "query engine", which is a sophisticated and fast graph rewrite engine. Tutorials here: https://github.com/opencog/atomspace/tree/master/examples/pattern-matcher you might find these to be .. mind-bendingly complicated. A theory paper is here: https://github.com/opencog/atomspace/raw/master/opencog/sheaf/docs/ram-cpu.pdf At the mid-range, there's a rule system and a unifier. The unifier works. The rule system needs to be torched and rewritten. At the "high-end", there's https://github.com/opencog/learn In many ways, it kind-of-ish resembles transformers. Except that it works with structures, rather than linear strings of data. And that kind-of changes everything. It gets kind-of-ish similar results, but since its also kind-of-ish completely different (because instead of working with strings, it works with trees) its ... well, its a weird-ass half-finished prototype. I love/hate it because I know why its great and why it's utterly mis-designed. Its a steep hill to climb. > I am a command line person. I would not mind even if it is a bit messy. I am > a biologist by training but > professionally I don't do biology. It would be fun for me to do some biology > on the sideline of my profession. Ah! Well, let's start small. Look at and plan what is doable and interesting and fun. > My shortcoming is that I am not a good coder. Heh. I'm a *very good coder*, and so when I say "this shit is difficult", trust me. This shit is difficult. (yes, that's an "appeal to authority", but .. hey.) --linas -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/opencog/CAHrUA37Be-ak%3DvBrc7%2B4QXB6zYWOfGCB1BuSkxb0VFfh6N%2BNKw%40mail.gmail.com.
