GitHub user vdwals added a comment to the discussion: Further LLM Support
Hi all, First of all, thank you for your detailed and expressive response. As background to my plugin: We had an actual request from a customer that I wanted to address using HOP to demonstrate its benefits. The request was to identify ingredients in recipes and check for allergies, among other things. Since the same ingredient can have multiple names, and the free sources available compile them in various languages, the most practical solution was to use semantic search with LangChain and Ollama as a local model—primarily due to GDPR concerns, as usual. To achieve this, I started by cloning the stream lookup and tackled the task holistically. Once the initial implementation was successful, I began exploring additional scenarios where LLM features could benefit my customers. These included: * Similar Search * Structured Data Extraction from Text (e.g., Table Extraction): Here I faced scraping several web sites and extracting like service names and ports from different pages. I discovered that LLMs are great at this. * Classification: Classifying texts based on a predefined set of classes from a database, another stream, or constants. With these use cases in mind, and after satisfying my customer's requirements, I began restructuring the plugin to make it more reusable and flexible. This included extracting metadata into specific, expandable types. However, I encountered challenges, particularly since vector storages lacked native support. Integrating Neo4J as a vector storage took more effort than anticipated, as I wanted to avoid duplicating metadata and implementation. Additionally, making the metadata types easily extensible required considerable effort. I was also thinking about separating embedding creation, storage, and retrieval to better handle data scaling and support asynchronous processing for embedding and lookups. This is an area I’d love to focus on further. GitHub link: https://github.com/apache/hop/discussions/4732#discussioncomment-11720859 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
