GitHub user usbrandon added a comment to the discussion: Further LLM Support

Hi @vdwals (Dennis),

I am happy to see you and Tristan contributing more innovation and thoughts in 
this area.  I am super interested in your work and furthering developments in 
these areas.  I agree with Tristan to try to make any action/step as friendly 
as possible to using the popular APIs & Ollama.  I will check out your github 
repo and try what you've created in the latest Hop.

As far as embeddings go, I am very interested in the semantic search nature of 
creating and using embeddings. Right now it seems like there are two separate 
tasks.  Creating embeddings and then applying one as a search query for some 
kind of top n responses of closest matchs, like the minimum cosine difference 
of the responses.  Since it may not just be a matter of putting a single 
approach to work, what could we do to help people have a fuller exploratory 
cycle so they can understand what is happening?  For example, the way we can us 
t-SNE to reduce the dimensionality of the embeddings to put them on a 2d or 3d 
plot.  https://www.datacamp.com/tutorial/introduction-t-sne
Being someone new in an applied sense, in my own repo, I had to create singular 
examples to try to break down concepts to understand how to apply them in a 
system.  Hop, generally, also breaks activities down into singular actions that 
make composable and reusable blocks.

Areas where I am trying to apply these in the real world are for mastering 
data.  Say you have rows from several systems representing a person or product, 
how can we automatically steward those to come to a reasonable conclusion that 
they are similar enough to be called 'a person' or 'a product'?  That way we 
can generate durable (master keys) for them, and the data become joinable 
across different systems.

To do things like that, but also referring back to embeddings, I am curious how 
knowledge graphs help that along.  The embeddings themselves we know are in 
high dimensional space, but the graphs, as I currently know, they may represent 
an ontology, but I am not sure how at present others encode the semantic 
meaning that exist due to the shape of the graph into an embedding that models 
will understand.

My examples are super simple, but perhaps they will inspire some ideas.  
Probably over this next week I will start populating more and more ideas for 
the data quality approaches.  My first inclination there was to use the 
function calling abilities of the LLM to provide guarantees about the layout of 
the outputs.   That way we could bring questions into the model and get a 
predictable output JSON structure and convert that to a tabular form Hop 
understands.
https://github.com/usbrandon/gptplayground

What kinds of things are you trying to solve using these technologies?

I will take a deeper look at your work this week and reply back more concretely 
on the plugins.  Again. Thank you for putting so much thought and effort into 
these things.  We are shaping the future for that experience for many users who 
will encounter Hop so the thought and time is worth it.

Warmly,

Brandon



 




GitHub link: 
https://github.com/apache/hop/discussions/4732#discussioncomment-11717758

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to