Ah - how have you configured your machine for spark? Inside of a docker
container?

The .numRows will actually need to run through the entire file in sequence
(it calls rdd.count() under the hood) - 10 minutes sounds a little long but
not unreasonable if on a single machine.


On Thu, Nov 21, 2013 at 9:24 AM, sudhir vaidya <[email protected]> wrote:

> Hey Evan,
>
> I do get the output when i load the file. I also see an output when i do
> the "x.take(5)" command.
>
> But x.numRows takes a long time to execute.. i waited for like 10 mins ...
> and had to do a Ctrl + C. My take on that is.. since the file is around 40
> Gigs and I am running it on a quadcore machine (not a very high end machine
> and its just one machine and not a cluster).. maybe it takes a lot more
> time... I am not sure though...
>
> Regards,
> Sudhir
>
>
> On Thu, Nov 21, 2013 at 11:18 AM, Evan R. Sparks <[email protected]>wrote:
>
>> What happens when you do:
>> val x = mc.loadFile("/enwiki_txt")
>>
>> and then
>> x.numRows
>> or
>> x.take(5)
>>
>> Do you see output there?
>>
>>
>>
>> On Wed, Nov 20, 2013 at 4:41 PM, sudhir vaidya <[email protected]>wrote:
>>
>>> I am a beginner and have started to go through the Mlbase exercises.
>>>
>>> But i get a java.io.indexoutofbounds.exception when i run the first
>>> command of step 2.1 here :
>>>
>>> http://ampcamp.berkeley.edu/3/exercises/mli-document-categorization.html
>>>
>>> All i am doing is Copying the command and pasting it to the spark shell
>>> interface.
>>>
>>> I tried splitting the command by loading the data set initially and
>>> filtering subsequently.. but that didnt work.
>>>
>>> I also tried to change value of "r(0)" to "r(1)" in that step. But i
>>> still get the same error.
>>>
>>> Any help is really appreciated.
>>>
>>> -Sudhir
>>>
>>>
>>>
>>>
>>>
>>
>

Reply via email to