I don't mind building something, I don't know where to start.
What are some keywords to look for, or some articles to start from?
I'm asking here exactly because neither I, nor the two data scientists who
now ostensibly work for me, seem to be able to figure out where to start at
it.
(Obviously
Well, it *is* a "thing". We're doing something very similar on our project,
classifying patient types. It's just that there's no standard/generic/singular way to do
it. I get the feeling you're looking for some sort of black box process you can blindly
apply. And that's not a thing. But
Yeah, there are two different efforts I'm trying to play with
simultaneously in that area... in addition to the 4 or 5 efforts in
unrelated areas
We ARE trying to do a relatively clean attrition-prediction model, and that
will likely be something like what you were suggesting at the end.
One tangential solution I've seen work well enough in synthetic health data is to treat
the longitudinal data as a sequence in the same way the LLMs treat text. Rather than
focus on the 2nd problem EricC mentioned (clustering based on *similarity*), focus more
on the 1st ("around 10 different
Interesting problem.
Eric, as you said earlier, K-means requires a way to measure the distance
between objects -- so that those with smaller distances can be grouped
together. A problem is that there are a number of features, which may not
be correlated. For example, there is an income
To my uneducated eye, this seemed like one of Jon’s problems.Sent from my Dumb PhoneOn Jan 7, 2023, at 6:23 AM, Frank Wimberly wrote:This answer seems reasonable to me. I worked on Project Talent during 1967 which had some similar goals and data.
>From what I can tell "one-hot encoding" is just another term for dummy
coding the data, i.e., make it a bunch of 1/0 columns. H2o seems more
promising, but seems to require a backbone of quantitative data that you
can substitute (based on something akin to a regression) for the
categorical
One way to handle categorical input data for machine learning is to convert
it using one-hot encoding - it's not difficult but a bit cumbersome.
Fortunately there are other options. H2O is a machine learning library
available in both Python and R that does this conversion "under the hood".
I
That's somewhat helpful. Having looked up several of these algorithms (I'm
still checking a few), it seems like they all input some sort of distance
measure between the items (analogous to the distance between
their coordinates on a cartesian graph), and then do some sort of
distance-minimization
This answer seems reasonable to me. I worked on Project Talent during 1967
which had some similar goals and data. See
https://en.m.wikipedia.org/wiki/Project_Talent
Our data was for thousands of highschool students and our software was all
written in Fortran.
---
Frank C. Wimberly
140 Calle
I asked https://chat.openai.com/chat and here is the conversation:
*Pieter Steenekamp*
can you suggest a solution for the following problem "I'm hoping someone
here could help out. Let's imagine I had some data where each row was a
person's career. We could list major events every year.For
Greetings all,
I'm hoping someone here could help out. Let's imagine I had some data where
each row was a person's career. We could list major events every year.
For example: 2004 they were highered, 2007 they get a promotion, 2010 they
leave for a different company, 2012 they come back at a
12 matches
Mail list logo