Hi,
the load_svmlight_file module fails for me if I load a large svmlight file.
The trace is as follows:
---------------------------------------------------------------------------OverflowError
Traceback (most recent call
last)<ipython-input-212-299804ae137e> in <module>()----> 1
training_data4, ytrain = load_svmlight_file('../data.txt')
/usr/local/lib/python2.7/dist-packages/sklearn/datasets/svmlight_format.pyc
in load_svmlight_file(f, n_features, dtype, multilabel, zero_based,
query_id) 127 """ 128 return
tuple(load_svmlight_files([f], n_features, dtype, multilabel,--> 129
zero_based, query_id)) 130
131
/usr/local/lib/python2.7/dist-packages/sklearn/datasets/svmlight_format.pyc
in load_svmlight_files(files, n_features, dtype, multilabel,
zero_based, query_id) 240 """ 241 r = [_open_and_load(f,
dtype, multilabel, bool(zero_based), bool(query_id))--> 242
for f in files] 243 244 if (zero_based is False
/usr/local/lib/python2.7/dist-packages/sklearn/datasets/svmlight_format.pyc
in _open_and_load(f, dtype, multilabel, zero_based, query_id) 155
with closing(_gen_open(f)) as f: 156
actual_dtype, data, ind, indptr, labels, query = \--> 157
_load_svmlight_file(f, dtype, multilabel, zero_based, query_id)
158 159 # convert from array.array, give data the right dtype
/usr/local/lib/python2.7/dist-packages/sklearn/datasets/_svmlight_format.so
in sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn/datasets/_svmlight_format.c:2431)()
OverflowError: value too large to convert to int
A sample line from my dataset is :
1 1:1682 2:53 5:15 13:12 17:14 50:31 65:49 66:16 256:33 257:56
1025:12 3073:13 4097:15 12545:31 15105:11 16385:47 16449:22
16641:18 24577:53 25601:13 26113:11 26368:51 26689:15 27649:16
33857:15 35585:14 49217:13 56385:19 58433:19 59393:23 60481:14
61505:19 65281:49 65536:15 65537:40 196609:11 262145:14 1048577:22
2097153:30 4194305:41 4194317:11 4194525:14 4206953:14 4206981:16
4207077:14 4207085:15 4207089:19 4210689:20 4259841:13 4269426:13
4325340:12 6291457:54 6356993:19 6553601:19 6619137:26 6832193:14
6849537:18 6881281:27 7077889:13 7208961:26 7274497:21 7471105:21
7536641:17 7602177:20 8667201:14 8699905:12 12599361:13 12648258:27
14429953:14 14434369:18 14477313:15 14958657:19 14968833:15
15204417:15 15264769:19 15400961:12 15482945:14 15527937:14
15745089:19 15787009:14 16711681:21 16777217:50 67108865:20
268435457:20 271800666:11 628179456:14 637468865:22 754991616:12
788587625:12 788589809:14 788590821:13 822140137:15 822143213:19
837562113:12 837829889:14 1073741825:56 1073741889:20 1073758209:47
1073758258:80 1073758308:11 1073798192:14 1077013552:12 1077014598:14
1077948208:28 1077948720:14 1077948722:62 1090535936:49 1092972866:14
1097953602:13 1107280065:19 1107286857:12 1157654661:13 1346454597:15
1610612833:49 1610612839:49 1694507009:12 1694528001:12 1744830657:12
1749041202:14 1753481217:18 1900150721:14 1912602625:19 2218803250:13
2300602469:12 2334523137:22 2341152001:20 2344651558:26 2366649178:11
3225436208:12 3230344704:26 3233873804:18 3237953906:15 3237972876:19
3435973837:12 3678994143:12 3690987757:14 3694067949:14 3695198209:18
3706191873:15 3825205381:14 3829415986:19 3832020993:15 3892314113:12
3892314353:15 3895263298:12 3895394545:13 3907780609:19 3909091073:16
3909091308:13 3909091328:13 3959423205:19 3962503397:12 3963633714:14
3975151617:14 4026531945:14 4030742578:15 4031053929:14 4041474049:14
4278190081:16 4278239488:25 4278255361:11 4278255616:13 4282478962:13
4287335206:18 4290822400:18 4292561151:12 4294901761:21 4294902016:18
4294967041:15 4294967296:54
I am unable to understand why should it fail when maxint for python
is 9223372036854775807.
Is there any workaround available for this? Or is it just not possible to
load at all?
Thank you,
Regards,
Abhishek
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general