Re: Removing MAHOUT_LOCAL option

Mihai Dascalu Mon, 21 Mar 2016 13:01:22 -0700

We still have a legacy code that uses for a Stochastic SVD the local HADOOP 
instance directly in a Java desktop application. But if the desire is to 
eliminate it, we’ve been inclining for a while to migrate everything to Spark.


Sorry, I’m old school and use MR, plus I’m new to Spark :) Is there an easy way 
to migrate your Spark example into the Java source code so that we do not 
disrupt the overall flow?


Have a great evening!
Mihai

> On 21 Mar 2016, at 19:31, Dmitriy Lyubimov <[email protected]> wrote:
> 
> my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy
> packaging. as long as MR is still here (and I would say it needs to be
> still here, unless it falls in complete disrepair and totally out of sync
> with even dated mapreduce apis), MAHOUT_LOCAL needs to stay. As soon as MR
> goes, it goes too.
> 
> maybe we just simply need a separate mahout script for non-legacy things,
> or factor out legacy related shell things into another script (something
> like mahout-mr.sh instead of mahout.sh)
> 
> On Mon, Mar 21, 2016 at 8:45 AM, Suneel Marthi <[email protected]> wrote:
> 
>> Some background on this issue:
>> 
>> 1.  Now that we support Spark and H2O as back ends since 0.10.0 and Flink
>> coming soon in 0.12.0, its been bloating the size of our release artifacts
>> when pushing releases to Apache mirrors. Hence we were looking at pruning
>> some of the components that have not been used or have been long marked
>> deprecated and are not being worked on.
>> 
>> 2.  Since Mahout 0.7 release in June 2012, the project has diverged from
>> the MiA book even for legacy MapReduce.  Not sure if that's indeed helping
>> onboard new users.
>> 
>> 3.  Seems like the consensus so far based on the user responses is to
>> retain the MAHOUT_LOCAL the option, thanks all for your responses.
>> 
>> 
>> On Mon, Mar 21, 2016 at 11:38 AM, scott cote <[email protected]> wrote:
>> 
>>> one more comment - I understand that it only works for the legacy code.
>>> Kill it when the legacy code is no longer deprecated, but gone ….
>>> 
>>> Otherwise - you will shut out people who buy the older mahout books (such
>>> as MIA) which are still good reads, even though the tech is dated.
>>> 
>>> SCott
>>> 
>>>> On Mar 21, 2016, at 2:24 AM, David Starina <[email protected]>
>>> wrote:
>>>> 
>>>> Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
>>>> MapReduce-based code still makes sense if it is running well on Ignite.
>>>> 
>>>> On Mon, Mar 21, 2016 at 8:20 AM, David Starina <
>> [email protected]>
>>>> wrote:
>>>> 
>>>>> Has anyone tried to run the deprecated MapReduce code on Ignite? Is
>> the
>>>>> performance improvement good enough to reconsider leaving those
>>> algorithms
>>>>> in Mahout?
>>>>> 
>>>>> On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Yes I agree; will leave the question open a couple days.
>>>>>> 
>>>>>> On Sunday, March 20, 2016, Pat Ferrel <[email protected]> wrote:
>>>>>> 
>>>>>>> Maybe a better user question is: How many people are still using the
>>>>>>> deprecated Hadoop code?
>>>>>>> 
>>>>>>> If the number is small +1 for removal.
>>>>>>> 
>>>>>>> On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
>>>>>> [email protected]
>>>>>>> <javascript:;>> wrote:
>>>>>>> 
>>>>>>> To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
>>>>>>> MapReduce-based jobs which officially became deprecated in 0.10.0.
>>>>>>> 
>>>>>>> On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
>>>>>>> [email protected] <javascript:;>> wrote:
>>>>>>> 
>>>>>>>> Yes as I understand it.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sunday, March 20, 2016, Pat Ferrel <[email protected]
>>>>>>> <javascript:;>> wrote:
>>>>>>>> 
>>>>>>>>> Are we just talking about Hadoop Mapreduce? I thought is was
>> ignored
>>>>>>> when
>>>>>>>>> using Spark.
>>>>>>>>> 
>>>>>>>>> On Mar 20, 2016, at 8:20 AM, alok tanna <[email protected]
>>>>>>> <javascript:;>> wrote:
>>>>>>>>> 
>>>>>>>>> -1 MAHOUT_LOCAL  is very useful for quick POC .
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Alok Tanna
>>>>>>>>> Sent from my iPhone
>>>>>>>>> 
>>>>>>>>>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu <
>>> [email protected]
>>>>>>> <javascript:;>>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> -1 I still use it for fast deployment and it’s really helpful for
>>>>>> small
>>>>>>>>> local processing
>>>>>>>>>> 
>>>>>>>>>> Have a great weekend!
>>>>>>>>>> Mihai
>>>>>>>>>> 
>>>>>>>>>>> On 20 Mar 2016, at 06:13, Suneel Marthi <
>> [email protected]
>>>>>>> <javascript:;>>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> +1 to remove this
>>>>>>>>>>> 
>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>> 
>>>>>>>>>>>> On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
>>>>>>>>> [email protected] <javascript:;>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> We're discussing removing the MAHOUT_LOCAL option in order to
>>> trim
>>>>>>>>> artifact
>>>>>>>>>>>> sizes.
>>>>>>>>>>>> 
>>>>>>>>>>>> If you think keeping the option to use MAHOUT_LOCAL for testing
>>>>>> with
>>>>>>>>> the
>>>>>>>>>>>> single-node mode of Hadoop is important please let us know. It
>>>>>> can be
>>>>>>>>> handy
>>>>>>>>>>>> for trying things out but it would be nice to ditch the effort
>>>>>>>>> required to
>>>>>>>>>>>> maintain it.
>>>>>>>>>>>> 
>>>>>>>>>>>> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
>>>>>>>>> context.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks!
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>

Re: Removing MAHOUT_LOCAL option

Reply via email to