Yes, the user is responsible for using the correct model for a given piece
of testing (or unlabeled) data.


2013/12/2 unmesha sreeveni <[email protected]>

> To make it more general, it's better to separate them. Since there might
> be multiple batches of training (or to-be-label), and you only need to
> train the model once (if your data is stable).
>
> Ok , I will go for the second one.
> So if we are going for separate.They will not have any connection with
> both. So we should tell what test data belongs to which train data.
> And load the corresponding playtennnis_tree.txt (so the result file should
> be named in a manner that the training result name can be noticed by its
> file name) for the train data and predict the test data.
>
>
> On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <[email protected]> wrote:
>
>> Actually the training and testing (or prediction) are not necessary to be
>> done in one shot. If you need to do them consecutively in your particular
>> scenario, you can do it as what you said.
>>
>> To make it more general, it's better to separate them. Since there might
>> be multiple batches of training (or to-be-label), and you only need to
>> train the model once (if your data is stable).
>>
>>
>> 2013/12/1 unmesha sreeveni <[email protected]>
>>
>>> 1. I jst thought of building a model using a project named say DT and
>>> wen a huge input comes do another mr job test.java with in DT.
>>> If not chaining jobs we need to create seperate project right DT_build
>>> and DT_test projects
>>> NO need for seperate project file?
>>>
>>> 2. M1_train - dataset for training.
>>>
>>> M1_test - test data or prediction.
>>> 1. Will it be one data as input for prediction or  set of data given
>>> as input at-once.
>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>> only. we shld check that also ...right? if M1_test is given into
>>> M2_train it should show error. is nt 'it?.
>>>
>>> Any thing wrong in my inference...
>>> Are u able to guess wt i am trying to accomplish.
>>> I am confused if i need to create only 1 project that includes train and
>>> test.or 2 projects
>>>
>>>
>>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <[email protected]> wrote:
>>>
>>>> What is your motivation of using chaining jobs?
>>>>
>>>>
>>>> 2013/12/1 unmesha sreeveni <[email protected]>
>>>>
>>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>>> Explained in a very simple way which is really understandable for
>>>>> beginners..Thanks a lot.
>>>>> I can go for chaining jobs right?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <[email protected]>wrote:
>>>>>
>>>>>> In my opinion.
>>>>>>
>>>>>> 1. Build the decision tree model with the training data.
>>>>>> 2. Store it somewhere.
>>>>>> 3. When the unlabeled data is available:
>>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>>> them, load the model at the setup stage, use the model to label the data
>>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/12/1 unmesha sreeveni <[email protected]>
>>>>>>
>>>>>>> Thanks Yexi ,
>>>>>>>
>>>>>>> But how  it can be accomplished.
>>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>>> predicting a data it will be a one line data without classlabel
>>>>>>> right?
>>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>>
>>>>>>> -------
>>>>>>>
>>>>>>> M1_train - dataset for training.
>>>>>>> M1_test - test data or prediction.
>>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>>> as input at-once.
>>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>>
>>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>>
>>>>>>> On 11/30/13, Yexi Jiang <[email protected]> wrote:
>>>>>>> > I watched the video in it but I cannot access its source code due
>>>>>>> to
>>>>>>> > permission issue.
>>>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>>>> small
>>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>>> another
>>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>>> streaming way.
>>>>>>> >
>>>>>>> >
>>>>>>> > 2013/11/30 unmesha sreeveni <[email protected]>
>>>>>>> >
>>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>>> >>
>>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>>> >>
>>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>>> >> Can we also include the prediction along with  that?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <[email protected]>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>> You are welcome :)
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> 2013/11/25 unmesha sreeveni <[email protected]>
>>>>>>> >>>
>>>>>>> >>>> ok . Thx Yexi
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <
>>>>>>> [email protected]>
>>>>>>> >>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>>> currently,
>>>>>>> >>>>> but you can use the decision forest instead.
>>>>>>> >>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> 2013/11/25 unmesha sreeveni <[email protected]>
>>>>>>> >>>>>
>>>>>>> >>>>>> Is that ID3 classification?
>>>>>>> >>>>>> It includes prediction also?
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>>> >>>>>> <[email protected]>wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>>>> or you
>>>>>>> >>>>>>> can check out from svn by following
>>>>>>> >>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <[email protected]>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>  I want to go through Decision tree implementation in
>>>>>>> mahout.
>>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>>> encouraged
>>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>>> regression
>>>>>>> >>>>>>>> problems
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Should I download mahout
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> --
>>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> *Junior Developer*
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> --
>>>>>>> >>>>>>> ------
>>>>>>> >>>>>>> Yexi Jiang,
>>>>>>> >>>>>>> ECS 251,  [email protected]
>>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>>> >>>>>>> Florida International University
>>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> --
>>>>>>> >>>>>> *Thanks & Regards*
>>>>>>> >>>>>>
>>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>>> >>>>>>
>>>>>>> >>>>>> *Junior Developer*
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>> ------
>>>>>>> >>>>> Yexi Jiang,
>>>>>>> >>>>> ECS 251,  [email protected]
>>>>>>> >>>>> School of Computer and Information Science,
>>>>>>> >>>>> Florida International University
>>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> --
>>>>>>> >>>> *Thanks & Regards*
>>>>>>> >>>>
>>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>>> >>>>
>>>>>>> >>>> *Junior Developer*
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> ------
>>>>>>> >>> Yexi Jiang,
>>>>>>> >>> ECS 251,  [email protected]
>>>>>>> >>> School of Computer and Information Science,
>>>>>>> >>> Florida International University
>>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> *Thanks & Regards*
>>>>>>> >>
>>>>>>> >> Unmesha Sreeveni U.B
>>>>>>> >>
>>>>>>> >> *Junior Developer*
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > ------
>>>>>>> > Yexi Jiang,
>>>>>>> > ECS 251,  [email protected]
>>>>>>> > School of Computer and Information Science,
>>>>>>> > Florida International University
>>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Thanks & Regards*
>>>>>>>
>>>>>>> Unmesha Sreeveni U.B
>>>>>>>
>>>>>>> *Junior Developer*
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ------
>>>>>> Yexi Jiang,
>>>>>> ECS 251,  [email protected]
>>>>>> School of Computer and Information Science,
>>>>>> Florida International University
>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  [email protected]
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  [email protected]
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  [email protected]
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Reply via email to