Responding to Griffith Thank you very much for your email and suggestions. My immediate plan is to work with viral genomes, which are much simpler and smaller, and I guess it is possible to put those genomes in a MongoDB. At the moment I am still in the dark about what to do and how to implement some ideas using atomspace. Currently, I am doing some reading on atomspace. I will keep you posted about my progress and seek help if I may.
Responding to Linas I am planning to read about atomspace and to execute some of the examples that came with the package. Python would be an easier choice for me. While I was trying to compile atomspace with python bindings, I have got the following error: [ 97%] Built target utilities_cython make[2]: *** No rule to make target '../opencog/persist/api/cython/../../storage/storage_types.pyx', needed by 'opencog/persist/api/cython/storage.cpp'. Stop. Please let me know the potential solutions for this error. Kind regards, Abu On Sat, 11 Jan 2025 at 05:26, Linas Vepstas <[email protected]> wrote: > Replying to Abu. > > On Wed, Jan 8, 2025 at 12:34 PM Abu Naser <[email protected]> wrote: > > > > Good to hear from you. > > I have done some googling about the LLM, I have found many people are > using LLM for analysing genomic data. > > I'd be amazed if there weren't. Pharma is a $1.6 trillion-dollar > business in the US alone. > https://www.statista.com/topics/1764/global-pharmaceutical-industry/ > If some of that money *wasn't* going into LLM's, I would conclude that > I had died and been reanimated in a crappy universe simulation. > > > (https://github.com/MAGICS-LAB/DNABERT_2?tab=readme-ov-file that can > easily be used via https://huggingface.co/docs/transformers/en/index) > > Their approach is usual, 1st train a model and then use it to predict. > In our case, where do we get the knowledge to store on atomspace? > > That's a great question. (If I understand you correctly) I assume you > already know how to get, have access to oodles and poodles of genomic > data. There are open, public databases of genomic data, in all shapes > and sizes. No doubt there's even more that's proprietary, say, the > 23+me dataset. > > I think the issue is "how do I hook up an LLM to the AtomSpace?" and > the short answer is "I don't know". Well, I do know, but I am unhappy > with all the ways I know how. So I've recently and with some urgency > started to think about "what is the *best* way to hook up LLMs to the > atomspace?" and I don't have an answer to that, yet. Might take a > while > > > I can certainly to do some reading on their work and figure out how they > do it. > > Yes, please! If you can then explain it to me, in email, that would > be excellent. If you can't explain it, then some paper references... > > > Do you have the pattern matching tool set in github? > > Yes. https://github.com/opencog/learn > > Terminology: in comp-sci, "pattern matching" usually refers to a very > simple kind of matching, called "regular expressions" (regex), with > theory developed in 1960's and a standard part of Unix by the 1980's > see e.g. "perl regex". > > Besides regex, many programming languages have a similar but different > idea: scheme has "hygenic macros". as do other functional languages. > Python does not javascript does not. I think some of the latest and > weirdest c++ standards track is trying to go that way. C++ templates > are kind-of pattern-matcher-like-ish, but they're simple, and 30-35 > years old, now. > > In atomese, I made the mistake of calling it's graph rewriting system > "pattern matching". Bad mistake, because it makes people think of the > above rather simple systems. In fact, Atomese has 2 or 3 or 4 distinct > systems that, uhh, "process patterns" > > At the bottom end, its the "query engine", which is a sophisticated > and fast graph rewrite engine. Tutorials here: > https://github.com/opencog/atomspace/tree/master/examples/pattern-matcher > you might find these to be .. mind-bendingly complicated. A theory > paper is here: > https://github.com/opencog/atomspace/raw/master/opencog/sheaf/docs/ram-cpu.pdf > > At the mid-range, there's a rule system and a unifier. The unifier > works. The rule system needs to be torched and rewritten. > > At the "high-end", there's https://github.com/opencog/learn In many > ways, it kind-of-ish resembles transformers. Except that it works with > structures, rather than linear strings of data. And that kind-of > changes everything. It gets kind-of-ish similar results, but since its > also kind-of-ish completely different (because instead of working with > strings, it works with trees) its ... well, its a weird-ass > half-finished prototype. I love/hate it because I know why its great > and why it's utterly mis-designed. Its a steep hill to climb. > > > I am a command line person. I would not mind even if it is a bit messy. > I am a biologist by training but > > professionally I don't do biology. It would be fun for me to do some > biology on the sideline of my profession. > > Ah! Well, let's start small. Look at and plan what is doable and > interesting and fun. > > > My shortcoming is that I am not a good coder. > > Heh. I'm a *very good coder*, and so when I say "this shit is > difficult", trust me. This shit is difficult. > > (yes, that's an "appeal to authority", but .. hey.) > > --linas > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/opencog/CAHrUA37Be-ak%3DvBrc7%2B4QXB6zYWOfGCB1BuSkxb0VFfh6N%2BNKw%40mail.gmail.com > . > -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/opencog/CAMw3wdi2YGioOgDSiNf75pm5HY3pyUfuoFqX4pSSSEMzuj9mKQ%40mail.gmail.com.
