GitHub user vdwals added a comment to the discussion: Further LLM Support

Hi all,

First of all, thank you for your detailed and expressive response.

As background to my plugin: We had an actual request from a customer that I 
wanted to address using HOP to demonstrate its benefits. The request was to 
identify ingredients in recipes and check for allergies, among other things. 
Since the same ingredient can have multiple names, and the free sources 
available compile them in various languages, the most practical solution was to 
use semantic search with LangChain and Ollama as a local model—primarily due to 
GDPR concerns, as usual.

To achieve this, I started by cloning the stream lookup and tackled the task 
holistically. Once the initial implementation was successful, I began exploring 
additional scenarios where LLM features could benefit my customers. These 
included:

* Similar Search
* Structured Data Extraction from Text (e.g., Table Extraction): Here I faced 
scraping several web sites and extracting like service names and ports from 
different pages. I discovered that LLMs are great at this.
* Classification: Classifying texts based on a predefined set of classes from a 
database, another stream, or constants.

With these use cases in mind, and after satisfying my customer's requirements, 
I began restructuring the plugin to make it more reusable and flexible. This 
included extracting metadata into specific, expandable types. However, I 
encountered challenges, particularly since vector storages lacked native 
support. Integrating Neo4J as a vector storage took more effort than 
anticipated, as I wanted to avoid duplicating metadata and implementation. 
Additionally, making the metadata types easily extensible required considerable 
effort.

I was also thinking about separating embedding creation, storage, and retrieval 
to better handle data scaling and support asynchronous processing for embedding 
and lookups. This is an area I’d love to focus on further.

GitHub link: 
https://github.com/apache/hop/discussions/4732#discussioncomment-11720859

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to