RE: Regarding Internal Links

2018-03-07 Thread Yossi Tamari
d as well, but using the format options, or as an attachment. 5. Click "Create" at the bottom of the dialog, and you're done! > -Original Message- > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in> > Sent: 07 March 2018 12:51 > To: user@nutch.apache.org > Subje

RE: Regarding Internal Links

2018-03-07 Thread Yash Thenuan Thenuan
- > From: Sebastian Nagel <wastl.na...@googlemail.com> > Sent: 07 March 2018 12:36 > To: user@nutch.apache.org > Subject: Re: Regarding Internal Links > > Hi, > > that needs to be fixed. It's because there is no CrawlDb entry for the partial > documents. May also be

RE: Regarding Internal Links

2018-03-07 Thread Yossi Tamari
2:36 > To: user@nutch.apache.org > Subject: Re: Regarding Internal Links > > Hi, > > that needs to be fixed. It's because there is no CrawlDb entry for the partial > documents. May also be happen after NUTCH-2456. Could you open a Jira issue > to address the problem? Thanks! > &g

Re: Regarding Internal Links

2018-03-07 Thread Sebastian Nagel
t; constructor that does not require it if you don't care about the metadata. >> This should all be easier to understand if you look at what the HTML >> Parser does with each of these fields. >> >>> -Original Message- >>> From: Yash Thenuan Thenuan <rit2014.

Re: Regarding Internal Links

2018-03-07 Thread Yash Thenuan Thenuan
t; > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in> > > Sent: 06 March 2018 20:17 > > To: user@nutch.apache.org > > Subject: RE: Regarding Internal Links > > > > I am able to get parsetext data structure. > > But having trouble with parseData as

RE: Regarding Internal Links

2018-03-06 Thread Yossi Tamari
t;rit2014...@iiita.ac.in> > Sent: 06 March 2018 20:17 > To: user@nutch.apache.org > Subject: RE: Regarding Internal Links > > I am able to get parsetext data structure. > But having trouble with parseData as it's constructor is asking for > parsestatus, > outlinks, contentmeta and

RE: Regarding Internal Links

2018-03-06 Thread Yossi Tamari
t;rit2014...@iiita.ac.in> > Sent: 06 March 2018 14:45 > To: user@nutch.apache.org > Subject: RE: Regarding Internal Links > > > I am able to get the content corresponding to each Internal link by > > writing a parse filter plugin. Now I am not getting how to proceed > >

RE: Regarding Internal Links

2018-03-06 Thread Yash Thenuan Thenuan
> I am able to get the content corresponding to each Internal link by > writing a parse filter plugin. Now I am not getting how to proceed > further. How can I parse them as separate document and what should > my ParseResult filter return??

RE: Regarding Internal Links

2018-03-05 Thread Yossi Tamari
around this. > -Original Message- > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in> > Sent: 05 March 2018 13:59 > To: user@nutch.apache.org > Subject: Re: Regarding Internal Links > > Please help me out regarding this. > It's urgent. > > On 5 M

Re: Regarding Internal Links

2018-03-05 Thread Yash Thenuan Thenuan
Please help me out regarding this. It's urgent. On 5 Mar 2018 15:41, "Yash Thenuan Thenuan" wrote: > How can I achieve this in nutch 1.x? > > On 1 Mar 2018 22:30, "Sebastian Nagel" wrote: > >> Hi, >> >> Yes, that's possible but only for Nutch

Re: Regarding Internal Links

2018-03-05 Thread Yash Thenuan Thenuan
How can I achieve this in nutch 1.x? On 1 Mar 2018 22:30, "Sebastian Nagel" wrote: > Hi, > > Yes, that's possible but only for Nutch 1.x: > a ParseResult [1] may contain multiple ParseData objects > each accessible by a separate URL. > This feature is not available

Re: Regarding Internal Links

2018-03-01 Thread Sebastian Nagel
Hi, Yes, that's possible but only for Nutch 1.x: a ParseResult [1] may contain multiple ParseData objects each accessible by a separate URL. This feature is not available for 2.x [2]. It's used by the feed parser plugin to add a single entry for every feed item. Afaik, that's not supported out

Regarding Internal Links

2018-02-28 Thread Yash Thenuan Thenuan
Hi there, For example we have a url https://wiki.apache.org/nutch/NutchTutorial#Table_of_Contents here #table_of _contents is a internal link. I want to separate the contents of the page on the basis of internal links. Is this possible in nutch?? I want to index the contents of each internal link