Re: LLM based data pre-processing

Holden Karau Fri, 03 Jan 2025 09:03:48 -0800

So I've been working on similar LLM pre-processing of data and I would say
one of the questions worth answering is do you want/need your models to be
collocated? If you're running on prem in a GPU rich env there's a lot of
benefits, but even with a custom model, if your using 3rd party inference
or even just trying to keep your GPUs warm in general the co-location may
not be as important.


On Fri, Jan 3, 2025 at 9:01 AM Russell Jurney <[email protected]>
wrote:

> Thanks! The first link is old, here is a more recent one:
>
> 1)
> https://python.langchain.com/docs/integrations/providers/spark/#spark-sql-individual-tools
>
> Russell
>
> On Fri, Jan 3, 2025 at 8:50 AM Gurunandan <[email protected]>
> wrote:
>
>> HI Mayur,
>> Please evaluate Langchain's Spark Dataframe Agent for your use case.
>>
>> documentation:
>> 1) https://python.langchain.com/v0.1/docs/integrations/toolkits/spark/
>> 2) https://python.langchain.com/docs/integrations/tools/spark_sql/
>>
>> regards,
>> Guru
>>
>> On Fri, Jan 3, 2025 at 6:38 PM Mayur Dattatray Bhosale <[email protected]>
>> wrote:
>> >
>> > Hi team,
>> >
>> > We are planning to use Spark for pre-processing the ML training data
>> given the data is 500+ TBs.
>> >
>> > One of the steps in the data-preprocessing requires us to use a LLM
>> (own deployment of model). I wanted to understand what is the right way to
>> architect this? These are the options that I can think of:
>> >
>> > - Split this into multiple applications at the LLM use case step. Use a
>> workflow manager to feed the output of the application-1 to LLM and feed
>> the output of LLM to application 2
>> > - Split this into multiple stages by writing the orchestration code of
>> feeding output of the pre-LLM processing stages to externally hosted LLM
>> and vice versa
>> >
>> > I wanted to know if within Spark there is an easier way to do this or
>> any plans of having such functionality as a first class citizen of Spark in
>> future? Also, please suggest any other better alternatives.
>> >
>> > Thanks,
>> > Mayur
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>
>>

-- 
Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/
<https://www.fighthealthinsurance.com/?q=hk_email>
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her

Re: LLM based data pre-processing

Reply via email to