Re: Modifying Nutch Ivy & Maven settings [WAS] Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-15 Thread Markus Jelsma
I manually added 0.21 and proceed with porting. I now at least have both API's and mapfileoutputformat. On Thursday 15 December 2011 13:59:09 Lewis John Mcgibbney wrote: > Hi Markus, > > I thought I would branch off from your thread here as I see this as a > different problem (albeit substantial

Re: Modifying Nutch Ivy & Maven settings [WAS] Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-15 Thread Lewis John Mcgibbney
>> >> Do you mean repository.apache.org. as oppose to >> http://repo1.maven.org/maven2 ? > > Yes. It a bit outdated in the settings. Doesn't matter, i cannot seem to tell > Ivy to load the poms for a specific dep from a specific repo. Ivy confuses me > ;) ivy/ivysettings.xml must be the correct fi

Modifying Nutch Ivy & Maven settings [WAS] Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-15 Thread Lewis John Mcgibbney
Hi Markus, I thought I would branch off from your thread here as I see this as a different problem (albeit substantially more minor in nature). The question were trying to address here is > Does anyone know how i can modify Ivy to use Apache's maven repo for the > Hadoop dependencies? It keeps tr

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-15 Thread Markus Jelsma
I've already ported all our custom jobs (they use sequencefiles) and i ported the DomainStatistics tool (NUTCH-1221) but all jobs using mapfileoutputformat cannot be ported on 0.20.x. It is indeed different in a consistent way but it is tedious (as you said earlier). I want to work on porting

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-15 Thread Andrzej Bialecki
On 15/12/2011 13:13, Markus Jelsma wrote: hmm, i don't see how i can use the old mapred MapOutputFormat API with the new Job API. job.setOutputFormatClass(MapFileOutputFormat.class) expects an the mapreduce.lib.output.MapFileOutputFormat class and won't accept the old API. setOutputFormatClass(j

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-15 Thread Markus Jelsma
I've looked into it again. This is not going to work well when we stay in 0.20.x. Holding on to 0.20x means doing migration partially now and again just before upgrading to 0.22+. This is a _lot_ of extra work! I strongly prefer an intermediate upgrade to 0.21 where both API's are present. Doe

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-15 Thread Markus Jelsma
hmm, i don't see how i can use the old mapred MapOutputFormat API with the new Job API. job.setOutputFormatClass(MapFileOutputFormat.class) expects an the mapreduce.lib.output.MapFileOutputFormat class and won't accept the old API. setOutputFormatClass(java.lang.Class) in org.apache.hadoop.mapre

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-14 Thread Andrzej Bialecki
On 14/12/2011 19:14, Markus Jelsma wrote: Yes, the goal is to upgade to 0.22 or higher. The problem is that 0.22 doesn't have the old mapred API so we can only upgrade to 0.22 is all jobs are ported. I thought the entire mapred package was deprecated but it seems that class is not deprecated. It

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-14 Thread Markus Jelsma
Yes, the goal is to upgade to 0.22 or higher. The problem is that 0.22 doesn't have the old mapred API so we can only upgrade to 0.22 is all jobs are ported. I thought the entire mapred package was deprecated but it seems that class is not deprecated. It feels a bit strange though, this still me

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-14 Thread Andrzej Bialecki
On 14/12/2011 18:30, Markus Jelsma wrote: proper link: http://hadoop.apache.org/common/docs/r0.20.205.0/api/org/apache/hadoop/mapreduce/lib/output/package-summary.html I thought the goal was to upgrade to 0.22, where this class is present. In 0.20.205 org.apache.hadoop.mapred.MapFileOutputFor

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-14 Thread Markus Jelsma
proper link: http://hadoop.apache.org/common/docs/r0.20.205.0/api/org/apache/hadoop/mapreduce/lib/output/package- summary.html > Hi, > > I get class not found exceptions. When browsing java api docs of various > versions i see it missing in maprduce.lib.output until 0.21. > > Missing in 0.20.X

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-14 Thread Markus Jelsma
Hi, I get class not found exceptions. When browsing java api docs of various versions i see it missing in maprduce.lib.output until 0.21. Missing in 0.20.X http://hadoop.apache.org/common/docs/r0.20.205.0/api/index.html Back again in 0.21+ http://hadoop.apache.org/mapreduce/docs/current/api/org

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-14 Thread Andrzej Bialecki
On 14/12/2011 16:01, Markus Jelsma wrote: This is highly annoying, MapFileOutputFormat is not present in the MapReduce API until 0.21! AFAIK that's not the case ... there is both an old api and a new api implementation (the old one is deprecated). The new api is in org.apache.hadoop.mapreduce

Re: [jira] [Created] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API

2011-12-14 Thread Markus Jelsma
This is highly annoying, MapFileOutputFormat is not present in the MapReduce API until 0.21! Any hints? Use from old API? Something? On Wednesday 14 December 2011 15:35:30 Markus Jelsma (Created) (JIRA) wrote: > Migrate CrawlDBScanner to MapReduce API > --- >