Re: [Jprogramming] Natural Language Processing

'Bo Jacoby' via Programming Wed, 08 Nov 2017 10:35:17 -0800

Justin Terry's voice is ugly on https://www.youtube.com/watch?v=QEtnJdDP1gQ. He 
cannot speak. I doubt that he knows what he is talking about either. Sorry. Bo.


    Den 18:09 onsdag den 8. november 2017 skrev J. Patrick Harrington 
<[email protected]>:
 

 Hi Skip,
    ML (Machine Learning) is a really hot topic and we all need to know
something about it since it will more and more affect our lives.
    I've been sitting in on a course here at U of Maryland called
"Machine Learning for Physicists". I can't follow it at depth, but it
is an introduction to the terminology & state of the art. The lecturer
explains why Python is the language of choice in this area. Basically
because there are open libraries -- here is a youtube video of the
lecture by Justin Terry on why Python:
                  https://www.youtube.com/watch?v=QEtnJdDP1gQ
and there are further lectures you can find. Here is the course site
    https://sites.google.com/view/umdphysicsml/home
A book introducing these concepts/tools I've started to look at is
"Hands-On Machine Learning with Scikit-Learn & TensorFlow" by Geron.
    The really disturbing thing about this field is (a) it is producing
really impressive results, but (b) you can't really analyze what a neural
net is doing.
    I'm not at the point where I can even think about how some of these
techniques might be implimented in J. I'm glad you have introduced the
topic and have given some links to explore.

    Patrick
    ("I, for one, welcome our new ML overlords")


On Wed, 8 Nov 2017, 'Skip Cave' via Programming wrote:
> Natural Language Processing is one of the hottest fields in
> programming today. Recent machine learning and neural network advances have
> made significant improvements in all aspects of NLP. Speech Recognition,
> Speech Synthesis, Knowledge Extraction, and Natural Language Understanding
> have all improved  dramatically, just within the last few years..
>
> Conversational AI devices like Amazon's Echo (Alexa) and Google Home are
> showing up in homes everywhere. Conversational software applications such
> as Google's Assistant (Android), Microsoft's Cortana (Windows), and Apple's
> Siri (iOS) are on every phone and PC.
>
> There are lots of open-source NLP toolkits available to help one build
> these conversational apps. They are written in various languages:
>
>  -
>
>  Natural Language Toolkit (Python) - http://www.nltk.org/  and
>  https://github.com/nltk
>  -
>
>  The Stanford NLP Group (Java) - https://nlp.stanford.edu/software/ and
>  https://stanfordnlp.github.io/CoreNLP/
>  -
>
>  Apache Open NLP - http://opennlp.apache.org/
>  -
>
>  CRAN NLP (in R) https://cran.r-project.org/web/packages/NLP/index.html
>
>
> Two of the newest algorithms used for extracting meaning from text are
> word2vec & doc2vec (doc2vec is also called Paragraph Vectors). Both of
> these algorithms use a technique called "word embeddings" to encode
> individual words. This is particularly interesting because the algorithms
> are able to extract significant information from unstructured text by
> simply analyzing word sequences and the probabilistic relationships between
> neighboring words in any text.
>
> NLP processes are by nature highly parallel array oriented processes,
> dealing with strings and arrays of words. Word 2vec and doc2vec are typical
> in this regard. Both of these algorithms encode words (word2vec) and
> sentences (doc2vec) in a multi-dimensional space (usually 100-200
> dimensions) where machine learning techniques will then cause similar words
> and concepts to gravitate into clusters in the multi-dimensional space.
>
> Here's an overview of how Word2vec extracts meaning from text:
> The amazing power of word vectors | the morning paper
> <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&ved=0ahUKEwihvviiwq7XAhUPymMKHbjzClUQFghcMAc&url=https%3A%2F%2Fblog.acolyer.org%2F2016%2F04%2F21%2Fthe-amazing-power-of-word-vectors%2F&usg=AOvVaw0yR2dyckfOJVIRyxp8rqc0>
>
> Python seems to be the popular language for coding these algorithms, though
> it is not particularly noted for its array-handling properties.  It would
> seem that array oriented languages such as J would be more suited to
> implementing word2vec and doc2vec.
>
> Here are implementations of both algorithms in Python:
>
>  -
>
>  Word2vec (Python)  https://radimrehurek.com/gensim/models/word2vec.html
>    https://github.com/danielfrg/word2vec
>  -
>
>  Doc2Vec (Paragraph Vectors) (Python)  https://github.com/jhlau/doc2vec
>
>
> How hard would it be to implement these two algorithms in J? I don't know
> Python, so I can't judge the complexity.
>
> Skip
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

   
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Natural Language Processing

Reply via email to