Hi,

Thanks for your reply.
I have got the table of contents, meta-data, title, author, etc for the
books.
Can you please tell me the next steps to proceed.
I have read in Mahout In Action book that there are few tools available for
vectorization Ex: Lucene analyzers, Mahout vector encoders
Can you please tell me which is good and how to use it.?


Thanks,
Suresh


On 16 January 2014 14:49, Saeed Iqbal KhattaK
<[email protected]>wrote:

> Dear Suresh,
>
> I am also working in Classification of books.
>
> First of all I collect a meta-data of my e-books, after collecting a
> meta-data than I start my second level to pre-process an e-book. In
> pre-processing, I collect information regarding *books title, chapter
> titles sections, subsection paragraph, sub-paragraph and Bold fonts* etc.
> and remove all other formatted style than i got a result.
>
>
>
>
> On Thu, Jan 16, 2014 at 2:09 PM, Ted Dunning <[email protected]>
> wrote:
>
> > You generally want to do linguistic pre-processing (finding phrases,
> > synonymizing certain forms such as abbreviations, tokenizing, dropping
> stop
> > words, removing boilerplate, removing tables) before doing vectorization.
> >  Altogether, these form pre-processing.
> >
> > To classify books, you need to recognize that many books are about many
> > topics.  You may want to segment your books down to the chapter, section
> or
> > even paragraph level.
> >
> >
> >
> > On Wed, Jan 15, 2014 at 10:25 PM, Suresh M <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > Can you please tell me what does that pre-processing mean? Is it
> > > vectorization(as explained in Mahout in Action book)
> > > Can it be done using java and Mahout AP ?
> > > And, the model means, is it a class?
> > >
> > >
> > >
> > >
> > > On 16 January 2014 11:38, KK R <[email protected]> wrote:
> > >
> > > > Hi Suresh,
> > > >
> > > > Apache Mahout has certain classification algorithms which you can use
> > to
> > > do
> > > > the classifcation.
> > > >
> > > > Step 1: Your data may require any pre-processing. If so, it can be
> done
> > > > using Hadoop / Hive / Mahout utilities.
> > > >
> > > > Step 2: Run classification algorithm on your training data and build
> > your
> > > > model using Mahout classification algorithms.
> > > >
> > > > Step 3: When the actual data comes, it needs to be classified with
> the
> > > help
> > > > of trained model. This can be done sequentially in java or mapreduce
> > can
> > > be
> > > > used if the size of the data is huge and scalability is a
> requirement.
> > > >
> > > > Thanks,
> > > > Kirubakumaresh
> > > > @http://www.linkedin.com/pub/kirubakumaresh-rajendran/66/411/305
> > > >
> > > >
> > > > On Thu, Jan 16, 2014 at 11:28 AM, Suresh M <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > Our application will be getting books from different users.
> > > > > We have to classify them accordingly.
> > > > > Some one please tell me how to do that using apache mahout and
> java.
> > > > > Is hadoop necessary for that?
> > > > >
> > > > >
> > > > > --
> > > > > Thank &Regards
> > > > > Suresh
> > > > >
> > > >
> > >
> >
>
>
>
> --
> *Saeed Iqbal KhattaK*
> Lecturer (FoIT)  -- University of Central Punjab, Lahore
> Tel: +92-42-35880007 - (ext 194)
> MS CS, FAST-NUCES, Peshawar
> BS IT (Hons), Punjab University College of Information Technology (PUCIT),
> University Of The Punjab, Lahore.
> http://saeedkhattak.wordpress.com
> Cell No # +92-333-9533493
>

Reply via email to