Yes, the user is responsible for using the correct model for a given piece of testing (or unlabeled) data.
2013/12/2 unmesha sreeveni <[email protected]> > To make it more general, it's better to separate them. Since there might > be multiple batches of training (or to-be-label), and you only need to > train the model once (if your data is stable). > > Ok , I will go for the second one. > So if we are going for separate.They will not have any connection with > both. So we should tell what test data belongs to which train data. > And load the corresponding playtennnis_tree.txt (so the result file should > be named in a manner that the training result name can be noticed by its > file name) for the train data and predict the test data. > > > On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <[email protected]> wrote: > >> Actually the training and testing (or prediction) are not necessary to be >> done in one shot. If you need to do them consecutively in your particular >> scenario, you can do it as what you said. >> >> To make it more general, it's better to separate them. Since there might >> be multiple batches of training (or to-be-label), and you only need to >> train the model once (if your data is stable). >> >> >> 2013/12/1 unmesha sreeveni <[email protected]> >> >>> 1. I jst thought of building a model using a project named say DT and >>> wen a huge input comes do another mr job test.java with in DT. >>> If not chaining jobs we need to create seperate project right DT_build >>> and DT_test projects >>> NO need for seperate project file? >>> >>> 2. M1_train - dataset for training. >>> >>> M1_test - test data or prediction. >>> 1. Will it be one data as input for prediction or set of data given >>> as input at-once. >>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train >>> only. we shld check that also ...right? if M1_test is given into >>> M2_train it should show error. is nt 'it?. >>> >>> Any thing wrong in my inference... >>> Are u able to guess wt i am trying to accomplish. >>> I am confused if i need to create only 1 project that includes train and >>> test.or 2 projects >>> >>> >>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <[email protected]> wrote: >>> >>>> What is your motivation of using chaining jobs? >>>> >>>> >>>> 2013/12/1 unmesha sreeveni <[email protected]> >>>> >>>>> Thanks Yexi...A very nice explanation...Thanks a lot.. >>>>> Explained in a very simple way which is really understandable for >>>>> beginners..Thanks a lot. >>>>> I can go for chaining jobs right? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <[email protected]>wrote: >>>>> >>>>>> In my opinion. >>>>>> >>>>>> 1. Build the decision tree model with the training data. >>>>>> 2. Store it somewhere. >>>>>> 3. When the unlabeled data is available: >>>>>> 3.1 if the unlabeled data is huge, write another mrjob to process >>>>>> them, load the model at the setup stage, use the model to label the data >>>>>> one by one in map stage. There is no necessary to have a reducer. >>>>>> 3.2 if the unlabeled data is small, it is trivial. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2013/12/1 unmesha sreeveni <[email protected]> >>>>>> >>>>>>> Thanks Yexi , >>>>>>> >>>>>>> But how it can be accomplished. >>>>>>> The input to Desicion Tree MR will be a set of data. But while >>>>>>> predicting a data it will be a one line data without classlabel >>>>>>> right? >>>>>>> So what changes will be there in mrjob.Should we design like this. >>>>>>> 1. When a set of data is coming draw Desicion tree >>>>>>> 2. else if a one line data is coming.check the output of decision >>>>>>> tree(Decision tree generated from mr) and predict the class label. >>>>>>> >>>>>>> ------- >>>>>>> >>>>>>> M1_train - dataset for training. >>>>>>> M1_test - test data or prediction. >>>>>>> 1. Will it be one data as input for prediction or set of data given >>>>>>> as input at-once. >>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train >>>>>>> only. we shld check that also ...right? if M1_test is given into >>>>>>> M2_train it should show error. is nt 'it?. >>>>>>> >>>>>>> Pls suggest if my thoughts are wrong. >>>>>>> >>>>>>> On 11/30/13, Yexi Jiang <[email protected]> wrote: >>>>>>> > I watched the video in it but I cannot access its source code due >>>>>>> to >>>>>>> > permission issue. >>>>>>> > In my opinion, once the decision tree model is built, the model is >>>>>>> small >>>>>>> > enough to be loaded into memory and can be used directly without >>>>>>> another >>>>>>> > mrjob for prediction. The prediction can be conducted in a >>>>>>> streaming way. >>>>>>> > >>>>>>> > >>>>>>> > 2013/11/30 unmesha sreeveni <[email protected]> >>>>>>> > >>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in >>>>>>> >> >>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html >>>>>>> >> >>>>>>> >> Here a decision tree is build. So my doubt is >>>>>>> >> Can we also include the prediction along with that? >>>>>>> >> >>>>>>> >> >>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <[email protected]> >>>>>>> wrote: >>>>>>> >> >>>>>>> >>> You are welcome :) >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> 2013/11/25 unmesha sreeveni <[email protected]> >>>>>>> >>> >>>>>>> >>>> ok . Thx Yexi >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang < >>>>>>> [email protected]> >>>>>>> >>>> wrote: >>>>>>> >>>> >>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout >>>>>>> currently, >>>>>>> >>>>> but you can use the decision forest instead. >>>>>>> >>>>> >>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example. >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> 2013/11/25 unmesha sreeveni <[email protected]> >>>>>>> >>>>> >>>>>>> >>>>>> Is that ID3 classification? >>>>>>> >>>>>> It includes prediction also? >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang >>>>>>> >>>>>> <[email protected]>wrote: >>>>>>> >>>>>> >>>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout, >>>>>>> or you >>>>>>> >>>>>>> can check out from svn by following >>>>>>> >>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <[email protected]> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I want to go through Decision tree implementation in >>>>>>> mahout. >>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released >>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are >>>>>>> encouraged >>>>>>> >>>>>>>> to begin using version 0.6. Highlights include: >>>>>>> >>>>>>>> Improved Decision Tree performance and added support for >>>>>>> regression >>>>>>> >>>>>>>> problems >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> Where can I find its source code and documentation. >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> Should I download mahout >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> -- >>>>>>> >>>>>>>> *Thanks & Regards* >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> Unmesha Sreeveni U.B >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> *Junior Developer* >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> ------ >>>>>>> >>>>>>> Yexi Jiang, >>>>>>> >>>>>>> ECS 251, [email protected] >>>>>>> >>>>>>> School of Computer and Information Science, >>>>>>> >>>>>>> Florida International University >>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> -- >>>>>>> >>>>>> *Thanks & Regards* >>>>>>> >>>>>> >>>>>>> >>>>>> Unmesha Sreeveni U.B >>>>>>> >>>>>> >>>>>>> >>>>>> *Junior Developer* >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> -- >>>>>>> >>>>> ------ >>>>>>> >>>>> Yexi Jiang, >>>>>>> >>>>> ECS 251, [email protected] >>>>>>> >>>>> School of Computer and Information Science, >>>>>>> >>>>> Florida International University >>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/ >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> -- >>>>>>> >>>> *Thanks & Regards* >>>>>>> >>>> >>>>>>> >>>> Unmesha Sreeveni U.B >>>>>>> >>>> >>>>>>> >>>> *Junior Developer* >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> -- >>>>>>> >>> ------ >>>>>>> >>> Yexi Jiang, >>>>>>> >>> ECS 251, [email protected] >>>>>>> >>> School of Computer and Information Science, >>>>>>> >>> Florida International University >>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/ >>>>>>> >>> >>>>>>> >>> >>>>>>> >> >>>>>>> >> >>>>>>> >> -- >>>>>>> >> *Thanks & Regards* >>>>>>> >> >>>>>>> >> Unmesha Sreeveni U.B >>>>>>> >> >>>>>>> >> *Junior Developer* >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > ------ >>>>>>> > Yexi Jiang, >>>>>>> > ECS 251, [email protected] >>>>>>> > School of Computer and Information Science, >>>>>>> > Florida International University >>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/ >>>>>>> > >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Thanks & Regards* >>>>>>> >>>>>>> Unmesha Sreeveni U.B >>>>>>> >>>>>>> *Junior Developer* >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ------ >>>>>> Yexi Jiang, >>>>>> ECS 251, [email protected] >>>>>> School of Computer and Information Science, >>>>>> Florida International University >>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/ >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Thanks & Regards* >>>>> >>>>> Unmesha Sreeveni U.B >>>>> >>>>> *Junior Developer* >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ------ >>>> Yexi Jiang, >>>> ECS 251, [email protected] >>>> School of Computer and Information Science, >>>> Florida International University >>>> Homepage: http://users.cis.fiu.edu/~yjian004/ >>>> >>>> >>> >>> >>> -- >>> *Thanks & Regards* >>> >>> Unmesha Sreeveni U.B >>> >>> *Junior Developer* >>> >>> >>> >> >> >> -- >> ------ >> Yexi Jiang, >> ECS 251, [email protected] >> School of Computer and Information Science, >> Florida International University >> Homepage: http://users.cis.fiu.edu/~yjian004/ >> >> > > > -- > *Thanks & Regards* > > Unmesha Sreeveni U.B > > *Junior Developer* > > > -- ------ Yexi Jiang, ECS 251, [email protected] School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
