Hi
The idea was to do a simple text classification. I manage to get a quick & 
dirty query that fulfill my needs :

text_file.csv :
A ;Get faster insights without the overheadB ;Leverage your existing SQL 
skillsets
Query
SELECT columns[0] id, FLATTEN(CONVERT_FROM('["' || REGEXP_REPLACE(columns[1],' 
','","') || '"]','JSON')) text FROM dfs.tmp.`text_file.csv`
Result :
+-----+------------+| id  |     text      |+-----+------------+| A   | Get      
  || A   | faster     || A   | insights   || A   | without    || A   | the      
  || A   | overhead   || B   | Leverage   || B   | your       || B   | existing 
  || B   | SQL        || B   | skillsets  |+-----+------------+
Boris 


    Le Jeudi 19 novembre 2015 4h37, Ted Dunning <[email protected]> a écrit 
:
 

 Boris,

I think I know what you mean by text vectorization, but I am uncertain how
you are trying to do it with SQL.

Can you give a sample query?

What result did  you expect?

On Wed, Nov 18, 2015 at 12:43 AM, Boris Chmiel <
[email protected]> wrote:

> hello users,
> I'm trying to vectorize a text field of a CSV file and planning to use the
> FLATTEN function to make an analysis of used words. I tryed a combination
> of regex / Convert_From to transform the text into a parquet array without
> success.Does anyone already achived that ?
> thxBoris


  

Reply via email to