Replying to Abu.

On Wed, Jan 8, 2025 at 12:34 PM Abu Naser <[email protected]> wrote:
>
> Good to hear from you.
> I have done some googling about the LLM, I have found many people are using 
> LLM for analysing genomic data.

I'd be amazed if there weren't. Pharma is a $1.6 trillion-dollar
business in the US alone.
https://www.statista.com/topics/1764/global-pharmaceutical-industry/
If some of that money *wasn't* going into LLM's, I would conclude that
I had died and been reanimated in a crappy universe simulation.

> (https://github.com/MAGICS-LAB/DNABERT_2?tab=readme-ov-file that can easily 
> be used via  https://huggingface.co/docs/transformers/en/index)
> Their approach is usual, 1st train a model and then use it to predict. In our 
> case, where do we get the knowledge to store on atomspace?

That's a great question. (If I understand you correctly) I assume you
already know how to get, have access to oodles and poodles of genomic
data. There are open, public databases of genomic data, in all shapes
and sizes. No doubt there's even more that's proprietary, say, the
23+me dataset.

I think the issue is "how do I hook up an LLM to the AtomSpace?" and
the short answer is "I don't know". Well, I do know, but I am unhappy
with all the ways I know how. So I've recently and with some urgency
started to think about "what is the *best* way to hook up LLMs to the
atomspace?" and I don't have an answer to that, yet. Might take a
while

> I can certainly to do some reading on their work and figure out how they do 
> it.

Yes, please!  If you can then explain it to me, in email, that would
be excellent.  If you can't explain it, then some paper references...

> Do you have the pattern matching tool set in github?

Yes. https://github.com/opencog/learn

Terminology: in comp-sci, "pattern matching" usually refers to a very
simple kind of matching, called "regular expressions" (regex), with
theory developed in 1960's and a standard part of Unix by the 1980's
see e.g. "perl regex".

Besides regex, many programming languages have a similar but different
idea: scheme has "hygenic macros". as do other functional languages.
Python does not    javascript does not. I think some of the latest and
weirdest c++ standards track is trying to go that way. C++ templates
are kind-of pattern-matcher-like-ish, but they're simple, and 30-35
years old, now.

In atomese, I made the mistake of calling it's graph rewriting system
"pattern matching". Bad mistake, because it makes people think of the
above rather simple systems. In fact, Atomese has 2 or 3 or 4 distinct
systems that, uhh, "process patterns"

At the bottom end, its the "query engine", which is a sophisticated
and fast graph rewrite engine. Tutorials here:
https://github.com/opencog/atomspace/tree/master/examples/pattern-matcher
 you might find these to be .. mind-bendingly complicated. A theory
paper is here: 
https://github.com/opencog/atomspace/raw/master/opencog/sheaf/docs/ram-cpu.pdf

At the mid-range, there's a rule system and a unifier. The unifier
works. The rule system needs to be torched and rewritten.

At the "high-end", there's https://github.com/opencog/learn In many
ways, it kind-of-ish resembles transformers. Except that it works with
structures, rather than linear strings of data. And that kind-of
changes everything. It gets kind-of-ish similar results, but since its
also kind-of-ish completely different (because instead of working with
strings, it works with trees) its ... well, its a weird-ass
half-finished prototype. I love/hate it because I know why its great
and why it's utterly mis-designed. Its a steep hill to climb.

> I am a command line person. I would not mind even if it is a bit messy. I am 
> a biologist by training but
> professionally I don't do biology. It would be fun for me to do some biology 
> on the sideline of my profession.

Ah! Well, let's start small. Look at and plan what is doable and
interesting and fun.

> My shortcoming is that I am not a good coder.

Heh. I'm a *very good coder*, and so when I say "this shit is
difficult", trust me. This shit is difficult.

(yes, that's an "appeal to authority", but .. hey.)

--linas

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37Be-ak%3DvBrc7%2B4QXB6zYWOfGCB1BuSkxb0VFfh6N%2BNKw%40mail.gmail.com.

Reply via email to