GitHub user usbrandon added a comment to the discussion: Further LLM Support
Hi @vdwals (Dennis), I am happy to see you and Tristan contributing more innovation and thoughts in this area. I am super interested in your work and furthering developments in these areas. I agree with Tristan to try to make any action/step as friendly as possible to using the popular APIs & Ollama. I will check out your github repo and try what you've created in the latest Hop. As far as embeddings go, I am very interested in the semantic search nature of creating and using embeddings. Right now it seems like there are two separate tasks. Creating embeddings and then applying one as a search query for some kind of top n responses of closest matchs, like the minimum cosine difference of the responses. Since it may not just be a matter of putting a single approach to work, what could we do to help people have a fuller exploratory cycle so they can understand what is happening? For example, the way we can us t-SNE to reduce the dimensionality of the embeddings to put them on a 2d or 3d plot. https://www.datacamp.com/tutorial/introduction-t-sne Being someone new in an applied sense, in my own repo, I had to create singular examples to try to break down concepts to understand how to apply them in a system. Hop, generally, also breaks activities down into singular actions that make composable and reusable blocks. Areas where I am trying to apply these in the real world are for mastering data. Say you have rows from several systems representing a person or product, how can we automatically steward those to come to a reasonable conclusion that they are similar enough to be called 'a person' or 'a product'? That way we can generate durable (master keys) for them, and the data become joinable across different systems. To do things like that, but also referring back to embeddings, I am curious how knowledge graphs help that along. The embeddings themselves we know are in high dimensional space, but the graphs, as I currently know, they may represent an ontology, but I am not sure how at present others encode the semantic meaning that exist due to the shape of the graph into an embedding that models will understand. My examples are super simple, but perhaps they will inspire some ideas. Probably over this next week I will start populating more and more ideas for the data quality approaches. My first inclination there was to use the function calling abilities of the LLM to provide guarantees about the layout of the outputs. That way we could bring questions into the model and get a predictable output JSON structure and convert that to a tabular form Hop understands. https://github.com/usbrandon/gptplayground What kinds of things are you trying to solve using these technologies? I will take a deeper look at your work this week and reply back more concretely on the plugins. Again. Thank you for putting so much thought and effort into these things. We are shaping the future for that experience for many users who will encounter Hop so the thought and time is worth it. Warmly, Brandon GitHub link: https://github.com/apache/hop/discussions/4732#discussioncomment-11717758 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
