Hi
The idea was to do a simple text classification. I manage to get a quick &
dirty query that fulfill my needs :
text_file.csv :
A ;Get faster insights without the overheadB ;Leverage your existing SQL
skillsets
Query
SELECT columns[0] id, FLATTEN(CONVERT_FROM('["' || REGEXP_REPLACE(columns[1],'
','","') || '"]','JSON')) text FROM dfs.tmp.`text_file.csv`
Result :
+-----+------------+| id | text |+-----+------------+| A | Get
|| A | faster || A | insights || A | without || A | the
|| A | overhead || B | Leverage || B | your || B | existing
|| B | SQL || B | skillsets |+-----+------------+
Boris
Le Jeudi 19 novembre 2015 4h37, Ted Dunning <[email protected]> a écrit
:
Boris,
I think I know what you mean by text vectorization, but I am uncertain how
you are trying to do it with SQL.
Can you give a sample query?
What result did you expect?
On Wed, Nov 18, 2015 at 12:43 AM, Boris Chmiel <
[email protected]> wrote:
> hello users,
> I'm trying to vectorize a text field of a CSV file and planning to use the
> FLATTEN function to make an analysis of used words. I tryed a combination
> of regex / Convert_From to transform the text into a parquet array without
> success.Does anyone already achived that ?
> thxBoris