Hello Pavel and Joel,
I forked the repository and cloned it on my machine. I'm using pycharm
on a Mac, and while looking at text.py, I'm getting an unresolved
reference for "xrange" at line 28:
from ..externals.six.movesimport range
Pycharm says Function 'six.py' is too large to analyze, so I'm not
sure if this error is somehow related to that. I decided to try to
build the code as a sanity check but I can't find any reliable
instructions as to how to do that. Naively, I opened terminal and cd
to the directory above "scikit-learn" folder (where I had cloned my
fork) and tried to run:
$ python3 setup.py install
Which didn't work. I got this error:
ImportError: No module named 'sklearn'
Can someone point me in the right direction? And how can the code try
to import sklearn if it doesn't exist yet? Note I haven't installed
the release version of scikit-learn using pip or any other tool, but I
should be able to bootstrap it from the source code, right?
Here's the full error message if it helps. Forgive me if it's a silly
mistake, but I haven't found any reliable guidelines online.
File "setup.py", line 84, in <module>
from numpy.distutils.core import setup
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/distutils/core.py",
line 26, in <module>
from numpy.distutils.command import config, config_compiler, \
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/distutils/command/build_ext.py",
line 18, in <module>
from numpy.distutils.system_info import combine_paths
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/distutils/system_info.py",
line 232, in <module>
triplet = str(p.communicate()[0].decode().strip())
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py",
line 791, in communicate
stdout = _eintr_retry_call(self.stdout.read)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py",
line 476, in _eintr_retry_call
return func(*args)
KeyboardInterrupt
Basils-MacBook-Pro:sklearn basilbeirouti$ python3 setup.py install
non-existing path in '__check_build': '_check_build.c'
Appending sklearn.__check_build configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.__check_build')
Appending sklearn._build_utils configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn._build_utils')
Appending sklearn.covariance configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.covariance')
Appending sklearn.covariance/tests configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.covariance/tests')
Appending sklearn.cross_decomposition configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.cross_decomposition')
Appending sklearn.cross_decomposition/tests configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.cross_decomposition/tests')
Appending sklearn.feature_selection configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.feature_selection')
Appending sklearn.feature_selection/tests configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.feature_selection/tests')
Appending sklearn.gaussian_process configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.gaussian_process')
Appending sklearn.gaussian_process/tests configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.gaussian_process/tests')
Appending sklearn.mixture configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.mixture')
Appending sklearn.mixture/tests configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.mixture/tests')
Appending sklearn.model_selection configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.model_selection')
Appending sklearn.model_selection/tests configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.model_selection/tests')
Appending sklearn.neural_network configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.neural_network')
Appending sklearn.neural_network/tests configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.neural_network/tests')
Appending sklearn.preprocessing configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to 'sklearn.preprocessing')
Appending sklearn.preprocessing/tests configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.preprocessing/tests')
Appending sklearn.semi_supervised configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.semi_supervised')
Appending sklearn.semi_supervised/tests configuration to sklearn
Ignoring attempt to set 'name' (from 'sklearn' to
'sklearn.semi_supervised/tests')
Warning: Assuming default configuration
(./_build_utils/{setup__build_utils,setup}.py was not found)Warning:
Assuming default configuration
(./covariance/{setup_covariance,setup}.py was not found)Warning:
Assuming default configuration
(./covariance/tests/setup_covariance/{setup_covariance/tests,setup}.py
was not found)Warning: Assuming default configuration
(./cross_decomposition/{setup_cross_decomposition,setup}.py was not
found)Warning: Assuming default configuration
(./cross_decomposition/tests/setup_cross_decomposition/{setup_cross_decomposition/tests,setup}.py
was not found)Warning: Assuming default configuration
(./feature_selection/{setup_feature_selection,setup}.py was not
found)Warning: Assuming default configuration
(./feature_selection/tests/setup_feature_selection/{setup_feature_selection/tests,setup}.py
was not found)Warning: Assuming default configuration
(./gaussian_process/{setup_gaussian_process,setup}.py was not
found)Warning: Assuming default configuration
(./gaussian_process/tests/setup_gaussian_process/{setup_gaussian_process/tests,setup}.py
was not found)Warning: Assuming default configuration
(./mixture/{setup_mixture,setup}.py was not found)Warning: Assuming
default configuration
(./mixture/tests/setup_mixture/{setup_mixture/tests,setup}.py was not
found)Warning: Assuming default configuration
(./model_selection/{setup_model_selection,setup}.py was not
found)Warning: Assuming default configuration
(./model_selection/tests/setup_model_selection/{setup_model_selection/tests,setup}.py
was not found)Warning: Assuming default configuration
(./neural_network/{setup_neural_network,setup}.py was not
found)Warning: Assuming default configuration
(./neural_network/tests/setup_neural_network/{setup_neural_network/tests,setup}.py
was not found)Warning: Assuming default configuration
(./preprocessing/{setup_preprocessing,setup}.py was not found)Warning:
Assuming default configuration
(./preprocessing/tests/setup_preprocessing/{setup_preprocessing/tests,setup}.py
was not found)Warning: Assuming default configuration
(./semi_supervised/{setup_semi_supervised,setup}.py was not
found)Warning: Assuming default configuration
(./semi_supervised/tests/setup_semi_supervised/{setup_semi_supervised/tests,setup}.py
was not found)Traceback (most recent call last):
File "setup.py", line 85, in <module>
setup(**configuration(top_path='').todict())
File "setup.py", line 44, in configuration
config.add_subpackage('cluster')
File
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/distutils/misc_util.py",
line 1003, in add_subpackage
caller_level = 2)
File
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/distutils/misc_util.py",
line 972, in get_subpackage
caller_level = caller_level + 1)
File
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/distutils/misc_util.py",
line 884, in _get_configuration_from_setup_py
('.py', 'U', 1))
File
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/imp.py",
line 234, in load_module
return load_source(name, filename, file)
File
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/imp.py",
line 172, in load_source
module = _load(spec)
File "<frozen importlib._bootstrap>", line 693, in _load
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 662, in exec_module
File "<frozen importlib._bootstrap>", line 222, in
_call_with_frames_removed
File "./cluster/setup.py", line 8, in <module>
from sklearn._build_utils import get_blas_info
ImportError: No module named 'sklearn'
On Tue, Jun 14, 2016 at 11:41 AM, <[email protected]
<mailto:[email protected]>> wrote:
Send scikit-learn mailing list submissions to
[email protected] <mailto:[email protected]>
To subscribe or unsubscribe via the World Wide Web, visit
https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
[email protected]
<mailto:[email protected]>
You can reach the person managing the list at
[email protected] <mailto:[email protected]>
When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."
Today's Topics:
1. Re: Adding BM25 relevance function (Pavel Soriano)
2. Re: The culture of commit squashing (Andreas Mueller)
3. Re: The culture of commit squashing (Tom DLT)
----------------------------------------------------------------------
Message: 1
Date: Tue, 14 Jun 2016 16:11:10 +0000
From: Pavel Soriano <[email protected]
<mailto:[email protected]>>
To: Scikit-learn user and developer mailing list
<[email protected] <mailto:[email protected]>>
Subject: Re: [scikit-learn] Adding BM25 relevance function
Message-ID:
<can0wwk93r2aw9no65cgicw5hqg7-ofyvzamjqpxpegtxmsq...@mail.gmail.com <mailto:can0wwk93r2aw9no65cgicw5hqg7-ofyvzamjqpxpegtxmsq...@mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"
Hey,
Good thing that you are trying to finish this.
Well, I looked into my old notes, and the Delta tf-idf comes from
the "Delta
TFIDF: An Improved Feature Space for Sentiment Analysis"
<http://ebiquity.umbc.edu/_file_directory_/papers/446.pdf> paper.
I guess
it is not very popular and apparently it has a drawback: it does
not take
into account the number of times a word occurs in each document while
calculating the distribution amongst classes. At least that is
what I wrote
on my notes...
As for the delta idf... If it helps, I can look into my old code
cause I do
not know what I was talking about. I guess it has to do somehow
with the
paper cited before.
Cheers,
Pavel Soriano
On Tue, Jun 14, 2016 at 5:49 PM Basil Beirouti
<[email protected] <mailto:[email protected]>>
wrote:
> Hi Joel,
>
> Thanks for your response and for digging up that archived
thread, it gives
> me a lot of clarity.
>
> I see your point about BM25, but I think in most cases where
TFIDF makes
> sense, BM25 makes sense as well, but it could be "overkill".
>
> Consider that TFIDF does not produce normalized results either
>
<http://scikit-learn.org/stable/auto_examples/text/document_clustering.html#example-text-document-clustering-py>,
> If BM25 requires dimensionality reduction (eg. using LSA) , so
too would
> TFIDF. The term-document matrix is the same size no matter which
weighting
> scheme is used. The only difference is that BM25 produces better
results
> when the corpus is large enough that the term frequency in a
document, and
> the document frequency in the corpus, can vary considerably
across a broad
> range of values.Maybe you could even say TFIDF and BM25 are the same
> equation except BM25 has a few additional hyperparameters (b and k).
>
> So is the advantage that BM25 provides for large diverse corpora
with it?
> or is it marginal? Perhaps you can point me to some more
examples where
> TFIDF is used (in supervised setting preferably) and I can plug
in BM25 in
> place of TFIDF and see how it compares. Here are some I found:
>
>
>
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
> *(supervised)*
>
>
http://scikit-learn.org/stable/auto_examples/text/document_clustering.html#example-text-document-clustering-py
> (*unsupervised)*
>
> Thank you!
> Basil
>
> PS: By the way, I'm not familiar with the delta-idf transform
that Pavel
> mentions in the archive you linked, I'll have to delve deeper
into that. I
> agree with the response to Pavel that he should be putting it in
a separate
> class, not adding on to the TFIDF. I think it would take me
about 6-8 weeks
> to adapt my code to the fit transform model and submit a pull
request.
>
>
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected] <mailto:[email protected]>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
--
Pavel SORIANO
PhD Student
ERIC Laboratory
Universit? de Lyon
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.python.org/pipermail/scikit-learn/attachments/20160614/cbe49979/attachment-0001.html>
------------------------------
Message: 2
Date: Tue, 14 Jun 2016 12:13:29 -0400
From: Andreas Mueller <[email protected] <mailto:[email protected]>>
To: Scikit-learn user and developer mailing list
<[email protected] <mailto:[email protected]>>
Subject: Re: [scikit-learn] The culture of commit squashing
Message-ID: <[email protected]
<mailto:[email protected]>>
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
I'm +1 for using the button when appropriate.
I think it should be up to the merging person to make a call whether a
squash is a better
logical unit than all the commits.
I would set like a soft limit at ~5 commits or something. If your
PR has
more than 5 separate
big logical units, it's probably too big.
The button is enabled in the settings but I can't see it.
Am I being stupid?
On 06/14/2016 06:58 AM, Joel Nothman wrote:
> Sounds good to me. Thank goodness someone reads the documentation!
>
> On 14 June 2016 at 19:51, Alexandre Gramfort
> <[email protected]
<mailto:[email protected]>
> <mailto:[email protected]
<mailto:[email protected]>>> wrote:
>
> > We could stop squashing during development, and use the
new Squash-and-Merge
> > button on GitHub.
> > What do you think?
>
> +1
>
> the reason I see for squashing during dev is to avoid
killing the
> browser when reviewing. It really rarely happens though.
>
> A
> _______________________________________________
> scikit-learn mailing list
> [email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected] <mailto:[email protected]>
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.python.org/pipermail/scikit-learn/attachments/20160614/135d4c27/attachment-0001.html>
------------------------------
Message: 3
Date: Tue, 14 Jun 2016 18:40:39 +0200
From: Tom DLT <[email protected]
<mailto:[email protected]>>
To: Scikit-learn user and developer mailing list
<[email protected] <mailto:[email protected]>>
Subject: Re: [scikit-learn] The culture of commit squashing
Message-ID:
<CAGKmC=sRMbwo1Pjm=ph3r6oqsmvzuzdbmjvj09yjwkk0+yq...@mail.gmail.com <mailto:ph3r6oqsmvzuzdbmjvj09yjwkk0%[email protected]>>
Content-Type: text/plain; charset="utf-8"
@Andreas
It's a bit hidden: You need to click on "Merge pull-request", then
do *not*
click on "Confirm merge", but on the small arrow to the right, and
select
"Squash and merge".
2016-06-14 18:13 GMT+02:00 Andreas Mueller <[email protected]
<mailto:[email protected]>>:
> I'm +1 for using the button when appropriate.
> I think it should be up to the merging person to make a call
whether a
> squash is a better
> logical unit than all the commits.
> I would set like a soft limit at ~5 commits or something. If
your PR has
> more than 5 separate
> big logical units, it's probably too big.
>
> The button is enabled in the settings but I can't see it.
> Am I being stupid?
>
>
> On 06/14/2016 06:58 AM, Joel Nothman wrote:
>
> Sounds good to me. Thank goodness someone reads the documentation!
>
> On 14 June 2016 at 19:51, Alexandre Gramfort <
> [email protected]
<mailto:[email protected]>> wrote:
>
>> > We could stop squashing during development, and use the new
>> Squash-and-Merge
>> > button on GitHub.
>> > What do you think?
>>
>> +1
>>
>> the reason I see for squashing during dev is to avoid killing the
>> browser when reviewing. It really rarely happens though.
>>
>> A
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected] <mailto:[email protected]>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
>
> _______________________________________________
> scikit-learn mailing
[email protected]https://mail.python.org/mailman/listinfo/scikit-learn
<http://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected] <mailto:[email protected]>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.python.org/pipermail/scikit-learn/attachments/20160614/511d2a1d/attachment.html>
------------------------------
Subject: Digest Footer
_______________________________________________
scikit-learn mailing list
[email protected] <mailto:[email protected]>
https://mail.python.org/mailman/listinfo/scikit-learn
------------------------------
End of scikit-learn Digest, Vol 3, Issue 27
*******************************************
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn