Re: ml models distribution

2016-07-22 Thread Sergio Fernández
Hi Sean,

On Fri, Jul 22, 2016 at 12:52 PM, Sean Owen  wrote:
>
> If you mean, how do you distribute a new model in your application,
> then there's no magic to it. Just reference the new model in the
> functions you're executing in your driver.
>
> If you implemented some other manual way of deploying model info, just
> do that again. There's no special thing to know.
>

Well, because some huge model, we typically bundle both logic
(pipeline/application)  and models separately. Normally we use a shared
stores (e.g., HDFS) or coordinated distribution of the models. But I wanted
to know if there is any infrastructure in Spark that specifically addresses
such need.

Thanks.

Cheers,

P.S.: sorry Jacek, with "ml" I meant "Machine Learning". I thought is a
quite spread acronym. Sorry for the possible confusion.


-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co


ml models distribution

2016-07-22 Thread Sergio Fernández
Hi,

 I have one question:

How is the ML models distribution done across all nodes of a Spark cluster?

I'm thinking about scenarios where the pipeline implementation does not
necessary need to change, but the models have been upgraded.

Thanks in advance.

Best regards,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co


Re: processing 50 gb data using just one machine

2016-06-15 Thread Sergio Fernández
In theory yes... the common sense say that:

volume / resources = time

So more volume on the same processing resources would just take more time.
On Jun 15, 2016 6:43 PM, "spR"  wrote:

> I have 16 gb ram, i7
>
> Will this config be able to handle the processing without my ipythin
> notebook dying?
>
> The local mode is for testing purpose. But, I do not have any cluster at
> my disposal. So can I make this work with the configuration that I have?
> Thank you.
> On Jun 15, 2016 9:40 AM, "Deepak Goel"  wrote:
>
>> What do you mean by "EFFECIENTLY"?
>>
>> Hey
>>
>> Namaskara~Nalama~Guten Tag~Bonjour
>>
>>
>>--
>> Keigu
>>
>> Deepak
>> 73500 12833
>> www.simtree.net, dee...@simtree.net
>> deic...@gmail.com
>>
>> LinkedIn: www.linkedin.com/in/deicool
>> Skype: thumsupdeicool
>> Google talk: deicool
>> Blog: http://loveandfearless.wordpress.com
>> Facebook: http://www.facebook.com/deicool
>>
>> "Contribute to the world, environment and more :
>> http://www.gridrepublic.org
>> "
>>
>> On Wed, Jun 15, 2016 at 9:33 PM, spR  wrote:
>>
>>> Hi,
>>>
>>> can I use spark in local mode using 4 cores to process 50gb data
>>> effeciently?
>>>
>>> Thank you
>>>
>>> misha
>>>
>>
>>


Re: ImportError: No module named numpy

2016-06-02 Thread Sergio Fernández
On Thu, Jun 2, 2016 at 9:59 AM, Bhupendra Mishra  wrote:
>
> and i have already exported environment variable in spark-env.sh as
> follows.. error still there  error: ImportError: No module named numpy
>
> export PYSPARK_PYTHON=/usr/bin/python
>

According the documentation at
http://spark.apache.org/docs/latest/configuration.html#environment-variables
the PYSPARK_PYTHON environment variable is for poniting to the Python
interpreter binary.

If you check the programming guide
https://spark.apache.org/docs/0.9.0/python-programming-guide.html#installing-and-configuring-pyspark
it says you need to add your custom path to PYTHONPATH (the script
automatically adds the bin/pyspark there).

So typically in Linux you would need to add the following (assuming you
installed numpy there):

export PYTHONPATH=$PYTHONPATH:/usr/lib/python2.7/dist-packages

Hope that helps.




> On Thu, Jun 2, 2016 at 12:04 AM, Julio Antonio Soto de Vicente <
> ju...@esbet.es> wrote:
>
>> Try adding to spark-env.sh (renaming if you still have it with .template
>> at the end):
>>
>> PYSPARK_PYTHON=/path/to/your/bin/python
>>
>> Where your bin/python is your actual Python environment with Numpy
>> installed.
>>
>>
>> El 1 jun 2016, a las 20:16, Bhupendra Mishra 
>> escribió:
>>
>> I have numpy installed but where I should setup PYTHONPATH?
>>
>>
>> On Wed, Jun 1, 2016 at 11:39 PM, Sergio Fernández 
>> wrote:
>>
>>> sudo pip install numpy
>>>
>>> On Wed, Jun 1, 2016 at 5:56 PM, Bhupendra Mishra <
>>> bhupendra.mis...@gmail.com> wrote:
>>>
>>>> Thanks .
>>>> How can this be resolved?
>>>>
>>>> On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau 
>>>> wrote:
>>>>
>>>>> Generally this means numpy isn't installed on the system or your
>>>>> PYTHONPATH has somehow gotten pointed somewhere odd,
>>>>>
>>>>> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra <
>>>>> bhupendra.mis...@gmail.com> wrote:
>>>>>
>>>>>> If any one please can help me with following error.
>>>>>>
>>>>>>  File
>>>>>> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
>>>>>> line 25, in 
>>>>>>
>>>>>> ImportError: No module named numpy
>>>>>>
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Cell : 425-233-8271
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Sergio Fernández
>>> Partner Technology Manager
>>> Redlink GmbH
>>> m: +43 6602747925
>>> e: sergio.fernan...@redlink.co
>>> w: http://redlink.co
>>>
>>
>>
>


-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co


Re: ImportError: No module named numpy

2016-06-01 Thread Sergio Fernández
sudo pip install numpy

On Wed, Jun 1, 2016 at 5:56 PM, Bhupendra Mishra  wrote:

> Thanks .
> How can this be resolved?
>
> On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau  wrote:
>
>> Generally this means numpy isn't installed on the system or your
>> PYTHONPATH has somehow gotten pointed somewhere odd,
>>
>> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra <
>> bhupendra.mis...@gmail.com> wrote:
>>
>>> If any one please can help me with following error.
>>>
>>>  File
>>> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
>>> line 25, in 
>>>
>>> ImportError: No module named numpy
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>
>


-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co