Hi there, I am trying to apply and test several dimension reduction methods
on 20Newsgroup data. However, I got errors, which I did not get how, on all
of them except RandomPCA. Would you please help me to get a better
understand of the issue?
My code generally looks like this:
----
# Preprocessing
data_set = load_files('20Newsgroup/raw', categories = categories,
shuffle = True, random_state = 42)
categories = ['comp.graphics', 'misc.forsale', 'talk.politics.guns',]
X = Vectorizer(max_features=10000).fit_transform(data_set.data)
y = data_set.target
# Test various methods to reduce dimension
X_r = manifold.Isomap(n_neighbors=15, out_dim=2,
eigen_solver='arpack').fit(X).transform(X)
X_r = LDA(n_components=2).fit(X, y).transform(X)
X_r = PCA(n_components=2).fit(X, y).transform(X)
X_r = SparsePCA(n_components=2).fit(X, y).transform(X)
X_r = RandomizedPCA(n_components=2).fit(X, y).transform(X)
----
For LDA, the error is "X must be a 2D array".
For Isomap and PCA, it is "ValueError : setting an array element with a
sequence"
For SparsePCA, the error is "n_features = X.shape[1] IndexError: tuple index
out of range", which is strange because if I print X.shape[1], it does
exist.
I tried to solve these errors by looking into the class reference, but could
not figure it out.
As mentioned, the output from Vectorizer should be "*vectors: array,
[n_samples, n_features]* " which should be proper input for the methods I
tried.
Any help would be appreciated.
------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general