Skip,

Just in case you are interested in a very very small part
of NLP implemented in J, you may take a look at my latest
article/paper in the Journal of J.

http://www.journalofj.com/index.php/vol-5-no-1-august-2017

It's about word vectorization, and the implementation of
the Term-Frequency Inverse-Document-Frequency model to
calculate the cosine similarity between two documents.

Best,

Martin


> Date: Wed, 8 Nov 2017 02:41:05 -0600
> From: "'Skip Cave' via Programming" <[email protected]>
> To: "[email protected]" <[email protected]>
> Subject: [Jprogramming] Natural Language Processing
> Message-ID:
>   <caj8lg_fs1jfrnlr70nab1cq5c3u6yaf6iwervcxzuxyuhhs...@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> Natural Language Processing is one of the hottest fields in
> programming today. Recent machine learning and neural network advances have
> made significant improvements in all aspects of NLP. Speech Recognition,
> Speech Synthesis, Knowledge Extraction, and Natural Language Understanding
> have all improved  dramatically, just within the last few years..
> 
> Conversational AI devices like Amazon's Echo (Alexa) and Google Home are
> showing up in homes everywhere. Conversational software applications such
> as Google's Assistant (Android), Microsoft's Cortana (Windows), and Apple's
> Siri (iOS) are on every phone and PC.
> 
> There are lots of open-source NLP toolkits available to help one build
> these conversational apps. They are written in various languages:
> 
>  -
> 
>  Natural Language Toolkit (Python) - http://www.nltk.org/  and
>  https://github.com/nltk
>  -
> 
>  The Stanford NLP Group (Java) - https://nlp.stanford.edu/software/ and
>  https://stanfordnlp.github.io/CoreNLP/
>  -
> 
>  Apache Open NLP - http://opennlp.apache.org/
>  -
> 
>  CRAN NLP (in R) https://cran.r-project.org/web/packages/NLP/index.html
> 
> 
> Two of the newest algorithms used for extracting meaning from text are
> word2vec & doc2vec (doc2vec is also called Paragraph Vectors). Both of
> these algorithms use a technique called "word embeddings" to encode
> individual words. This is particularly interesting because the algorithms
> are able to extract significant information from unstructured text by
> simply analyzing word sequences and the probabilistic relationships between
> neighboring words in any text.
> 
> NLP processes are by nature highly parallel array oriented processes,
> dealing with strings and arrays of words. Word 2vec and doc2vec are typical
> in this regard. Both of these algorithms encode words (word2vec) and
> sentences (doc2vec) in a multi-dimensional space (usually 100-200
> dimensions) where machine learning techniques will then cause similar words
> and concepts to gravitate into clusters in the multi-dimensional space.
> 
> Here's an overview of how Word2vec extracts meaning from text:
> The amazing power of word vectors | the morning paper
> <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&ved=0ahUKEwihvviiwq7XAhUPymMKHbjzClUQFghcMAc&url=https%3A%2F%2Fblog.acolyer.org%2F2016%2F04%2F21%2Fthe-amazing-power-of-word-vectors%2F&usg=AOvVaw0yR2dyckfOJVIRyxp8rqc0>
> 
> Python seems to be the popular language for coding these algorithms, though
> it is not particularly noted for its array-handling properties.  It would
> seem that array oriented languages such as J would be more suited to
> implementing word2vec and doc2vec.
> 
> Here are implementations of both algorithms in Python:
> 
>  -
> 
>  Word2vec (Python)   https://radimrehurek.com/gensim/models/word2vec.html
>    https://github.com/danielfrg/word2vec
>  -
> 
>  Doc2Vec (Paragraph Vectors) (Python)   https://github.com/jhlau/doc2vec
> 
> 
> How hard would it be to implement these two algorithms in J? I don't know
> Python, so I can't judge the complexity.
> 
> Skip

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to