Hi,

the load_svmlight_file module fails for me if I load a large svmlight file.
The trace is as follows:

---------------------------------------------------------------------------OverflowError
                            Traceback (most recent call
last)<ipython-input-212-299804ae137e> in <module>()----> 1
training_data4, ytrain = load_svmlight_file('../data.txt')
/usr/local/lib/python2.7/dist-packages/sklearn/datasets/svmlight_format.pyc
in load_svmlight_file(f, n_features, dtype, multilabel, zero_based,
query_id)    127     """    128     return
tuple(load_svmlight_files([f], n_features, dtype, multilabel,--> 129
                                   zero_based, query_id))    130
131
/usr/local/lib/python2.7/dist-packages/sklearn/datasets/svmlight_format.pyc
in load_svmlight_files(files, n_features, dtype, multilabel,
zero_based, query_id)    240     """    241     r = [_open_and_load(f,
dtype, multilabel, bool(zero_based), bool(query_id))--> 242
for f in files]    243     244     if (zero_based is False
/usr/local/lib/python2.7/dist-packages/sklearn/datasets/svmlight_format.pyc
in _open_and_load(f, dtype, multilabel, zero_based, query_id)    155
      with closing(_gen_open(f)) as f:    156
actual_dtype, data, ind, indptr, labels, query = \--> 157
   _load_svmlight_file(f, dtype, multilabel, zero_based, query_id)
158     159     # convert from array.array, give data the right dtype
/usr/local/lib/python2.7/dist-packages/sklearn/datasets/_svmlight_format.so
in sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn/datasets/_svmlight_format.c:2431)()
OverflowError: value too large to convert to int



A sample line from my dataset is :

1 1:1682  2:53  5:15  13:12  17:14  50:31  65:49  66:16  256:33  257:56
 1025:12  3073:13  4097:15  12545:31  15105:11  16385:47  16449:22
 16641:18  24577:53  25601:13  26113:11  26368:51  26689:15  27649:16
 33857:15  35585:14  49217:13  56385:19  58433:19  59393:23  60481:14
 61505:19  65281:49  65536:15  65537:40  196609:11  262145:14  1048577:22
 2097153:30  4194305:41  4194317:11  4194525:14  4206953:14  4206981:16
 4207077:14  4207085:15  4207089:19  4210689:20  4259841:13  4269426:13
 4325340:12  6291457:54  6356993:19  6553601:19  6619137:26  6832193:14
 6849537:18  6881281:27  7077889:13  7208961:26  7274497:21  7471105:21
 7536641:17  7602177:20  8667201:14  8699905:12  12599361:13  12648258:27
 14429953:14  14434369:18  14477313:15  14958657:19  14968833:15
 15204417:15  15264769:19  15400961:12  15482945:14  15527937:14
 15745089:19  15787009:14  16711681:21  16777217:50  67108865:20
 268435457:20  271800666:11  628179456:14  637468865:22  754991616:12
 788587625:12  788589809:14  788590821:13  822140137:15  822143213:19
 837562113:12  837829889:14  1073741825:56  1073741889:20  1073758209:47
 1073758258:80  1073758308:11  1073798192:14  1077013552:12  1077014598:14
 1077948208:28  1077948720:14  1077948722:62  1090535936:49  1092972866:14
 1097953602:13  1107280065:19  1107286857:12  1157654661:13  1346454597:15
 1610612833:49  1610612839:49  1694507009:12  1694528001:12  1744830657:12
 1749041202:14  1753481217:18  1900150721:14  1912602625:19  2218803250:13
 2300602469:12  2334523137:22  2341152001:20  2344651558:26  2366649178:11
 3225436208:12  3230344704:26  3233873804:18  3237953906:15  3237972876:19
 3435973837:12  3678994143:12  3690987757:14  3694067949:14  3695198209:18
 3706191873:15  3825205381:14  3829415986:19  3832020993:15  3892314113:12
 3892314353:15  3895263298:12  3895394545:13  3907780609:19  3909091073:16
 3909091308:13  3909091328:13  3959423205:19  3962503397:12  3963633714:14
 3975151617:14  4026531945:14  4030742578:15  4031053929:14  4041474049:14
 4278190081:16  4278239488:25  4278255361:11  4278255616:13  4282478962:13
 4287335206:18  4290822400:18  4292561151:12  4294901761:21  4294902016:18
 4294967041:15  4294967296:54

I am unable to understand why should it fail when maxint for python
is 9223372036854775807.

Is there any workaround available for this? Or is it just not possible to
load at all?

Thank you,

Regards,
Abhishek
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to