Re: Lucene query support in Nutch
Hi Björn, yes, the error you point out will happen indeed... A possible workaround would be: public Hits search(String queryString, int numHits, String dedupField, String sortField, boolean reverse) throws IOException { org.apache.lucene.queryParser.QueryParser parser = new org.apache.lucene.queryParser.QueryParser("content", new org.apache.lucene.analysis.standard.StandardAnalyzer()); org.apache.lucene.search.Query luceneQuery = null; try { luceneQuery = parser.parse(queryString); } catch(Exception ex) { } org.apache.lucene.search.BooleanQuery boolQuery = new org.apache.lucene.search.BooleanQuery(); boolQuery.add(luceneQuery, org.apache.lucene.search.BooleanClause.Occur.MUST); return translateHits (optimizer.optimize(boolQuery, luceneSearcher, numHits, sortField, reverse), dedupField, sortField); } Please notice that I'm not sure this will work as it should: right now, it just compiles... I still need to modify the NutchBean class so it can pass on the raw query, as Ravi says. Regards, Cristina On 10/5/06, Björn Wilmsmann <[EMAIL PROTECTED]> wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi everybody, On 05/10/2006 05:44 Ravi Chintakunta wrote: > public Hits search(String queryString, int numHits, > String dedupField, String sortField, boolean > reverse) throws IOException { > >org.apache.lucene.queryParser.QueryParser parser = new > org.apache.lucene.queryParser.QueryParser("content", new > org.apache.lucene.analysis.standard.StandardAnalyzer()); > > org.apache.lucene.search.Query luceneQuery = parser.parse > (queryString); > > return translateHits > (optimizer.optimize(luceneQuery, luceneSearcher, numHits, > sortField, reverse), > dedupField, sortField); > } This seems to be a good approach. I have not yet tried it out in detail, however, the method optimize() in LuceneQueryOptimizer does only take BooleanQuery as an argument, so the line 'return translateHits...' would cause a compile error, wouldn't it? - -- Best regards, Björn Wilmsmann -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Darwin) iD8DBQFFJV9Fgz0R1bg11MERAt3sAJ4pKJ8voEhWSo+94SI6bam4iVPYgACbBQmm sFAZIcCv3CoIBJC5g8FbOyo= =vzdw -END PGP SIGNATURE-
Re: Lucene query support in Nutch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi everybody, On 05/10/2006 05:44 Ravi Chintakunta wrote: public Hits search(String queryString, int numHits, String dedupField, String sortField, boolean reverse) throws IOException { org.apache.lucene.queryParser.QueryParser parser = new org.apache.lucene.queryParser.QueryParser("content", new org.apache.lucene.analysis.standard.StandardAnalyzer()); org.apache.lucene.search.Query luceneQuery = parser.parse (queryString); return translateHits (optimizer.optimize(luceneQuery, luceneSearcher, numHits, sortField, reverse), dedupField, sortField); } This seems to be a good approach. I have not yet tried it out in detail, however, the method optimize() in LuceneQueryOptimizer does only take BooleanQuery as an argument, so the line 'return translateHits...' would cause a compile error, wouldn't it? - -- Best regards, Björn Wilmsmann -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Darwin) iD8DBQFFJV9Fgz0R1bg11MERAt3sAJ4pKJ8voEhWSo+94SI6bam4iVPYgACbBQmm sFAZIcCv3CoIBJC5g8FbOyo= =vzdw -END PGP SIGNATURE-
Re: backup/failover NameNode
Hello Mohan, I posted this question not long ago and a Hadoop developer replied that this feature is not implemented yet. On 10/4/06, Mohan Lal <[EMAIL PROTECTED]> wrote: Hi all, JobTracker will handle all its datanode and tasktraker, if the JobTracker failed, how can we overcome failover the NameNode ? is there any possible solution for backup/failover NameNode ? if so how can please help me .. Thanks & Regards Mohan Lal -- View this message in context: http://www.nabble.com/backup-failover-NameNode-tf2382399.html#a6639742 Sent from the Nutch - User mailing list archive at Nabble.com.
Re: focussed crawling
Although I havent use it. After making a crawl, at least in nutch 0.8, you can make a "./bin/nutch/mergedb outputdb your_db -filter". Using the filter option you can generate a new db, filtering links you wanna remove. And use it to make a recrawl. Hope it helps.
RE: 0.7.2 Compile Problems
Hi just to let you know I rebuild ant and it now compiles fine!!! I’ll never trust yum again!! G From: Gary Bone [mailto:[EMAIL PROTECTED] Sent: 04 October 2006 21:57 To: nutch-user@lucene.apache.org Subject: 0.7.2 Compile Problems Hi, i'm having problems compiling nutch after adding the edits to get stemming functioning. i'm getting the attached error and wondered if anyone had any ideas? I'm running Fedora Core 5 with sun java 1.5.06 and ant 1.6.5 and i'm using the ant package command. Cheers G CAUTION - This message may contain privileged and confidential information intended only for the use of the addressee named above. If you are not the intended recipient of this message you are hereby notified that any use, dissemination, distribution or reproduction of this message is prohibited. If you have received this message in error please notify SPG Media Group Plc immediately via email at [EMAIL PROTECTED]. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of SPG Media Group PLC This email has been scanned by SPG's Email Security System. CAUTION - This message may contain privileged and confidential information intended only for the use of the addressee named above. If you are not the intended recipient of this message you are hereby notified that any use, dissemination, distribution or reproduction of this message is prohibited. If you have received this message in error please notify SPG Media Group Plc immediately via email at [EMAIL PROTECTED] Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of SPG Media Group PLC This email has been scanned by SPG's Email Security System. generate-locale: [echo] Generating docs for locale=ca [mkdir] Created dir: /apps/nutch-production/build/docs/ca/include [xslt] DEPRECATED - xalan processor is deprecated. Use trax instead. [xslt] DEPRECATED - xslp processor is deprecated. Use trax instead. [xslt] java.lang.ClassNotFoundException: org.apache.tools.ant.taskdefs.optional.XslpLiaison [xslt] at java.net.URLClassLoader$1.run(URLClassLoader.java:200) [xslt] at java.security.AccessController.doPrivileged(Native Method) [xslt] at java.net.URLClassLoader.findClass(URLClassLoader.java:188) [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:306) [xslt] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268) [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:251) [xslt] at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) [xslt] at java.lang.Class.forName0(Native Method) [xslt] at java.lang.Class.forName(Class.java:164) [xslt] at org.apache.tools.ant.taskdefs.XSLTProcess.loadClass(XSLTProcess.java:419) [xslt] at org.apache.tools.ant.taskdefs.XSLTProcess.resolveProcessor(XSLTProcess.java:397) [xslt] at org.apache.tools.ant.taskdefs.XSLTProcess.getLiaison(XSLTProcess.java:619) [xslt] at org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:212) [xslt] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:275) [xslt] at org.apache.tools.ant.Task.perform(Task.java:364) [xslt] at org.apache.tools.ant.Target.execute(Target.java:341) [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:369) [xslt] at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1216) [xslt] at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:37) [xslt] at org.apache.tools.ant.Project.executeTargets(Project.java:1068) [xslt] at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:382) [xslt] at org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:107) [xslt] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:275) [xslt] at org.apache.tools.ant.Task.perform(Task.java:364) [xslt] at org.apache.tools.ant.Target.execute(Target.java:341) [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:369) [xslt] at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1216) [xslt] at org.apache.tools.ant.Project.executeTarget(Project.java:1185) [xslt] at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:40) [xslt] at org.apache.tools.ant.Project.executeTargets(Project.java:1068) [xslt] at org.apache.tools.ant.Main.runBuild(Main.java:668) [xslt] at org.apache.tools.ant.Main.startAnt(Main.java:187) [xslt] at org.apache.tools.ant.launch.Launcher.run(Launcher.java:246) [xslt] at org.apache.tools.ant.launch.Launcher.main(Launcher.java:67) [xslt] java.lang.ClassNotFoundException: org.apache.t
DFS Shadow Server
Hi, I have a number of question regarding DFS. 1. Can any one explain how to setup DFS shadow servers? 2. How does the switch happens one goes down? 3. If the master down, do all the nodes have to be restarted? 4. Does the master node maintain state that needs to be synced between both master nodes? Experts please help Thanks, Sunil
Re: Problem parsing some MS Excel documents (Office 2003)
Any suggestions, or should I maybe post this on the Nutch-dev list too? To me it seems a bit strange that the MSBaseParser.java opens for the possibility that your properties object may be set to null and then later can give rise to an NPE at the call: title = properties.getProperty(DublinCore.TITLE); Comments? Thanks, Trym tryma wrote: > > Hi, > > I initially thought there was an issue with POI so I posted my initial > question on the POI-user list. > Actually, now I see this is happening in the Nutch classes for the MS > parse plugin, not POI, so I'm giving this list a go. > > Here's a trace I get when I catch any exception occurring as I attempt to > call the MSExcelParser's getParse(Content). It seems I get an NPE in > MSBaseParser.getParse(). > > [#|2006-10-04T09:13:15.102+0200|WARNING|sun-appserver-ee9.1|javax.enterprise.system.stream.err|_ThreadID=16;_ThreadName=httpWorkerThread-8080-1;_RequestID=0b18e2ae-0f79-4241-9e29-a322c8ae2bc6;| > java.lang.NullPointerException > at org.apache.nutch.parse.ms.MSBaseParser.getParse(MSBaseParser.java:94) > at > org.apache.nutch.parse.msexcel.MSExcelParser.getParse(MSExcelParser.java:40) > at > .DocumentParser.parseDocument(DocumentParser.java:154) > ... > > Looking at the source (MSBaseParser.java) at this line, it goes: > > SNIP > extractor.extract(new ByteArrayInputStream(raw)); > text = extractor.getText(); > properties = extractor.getProperties(); > outlinks = OutlinkExtractor.getOutlinks(text, content.getUrl(), > getConf()); > > } catch (Exception e) { > return new ParseStatus(ParseStatus.FAILED, > "Can't be handled as micrsosoft document. " + > e) > .getEmptyParse(this.conf); > } > > // collect meta data > Metadata metadata = new Metadata(); > title = properties.getProperty(DublinCore.TITLE); <== > This is line 94 as indicated in the trace > properties.remove(DublinCore.TITLE); > SNIP > > So I can only gather that my properties object is null. As seen above in > the snippet from the MSBaseParser source, properties is initially null but > assigned a value from the ExcelExtractor (properties = > extractor.getProperties();) which I assume is becoming null. > > Any ideas how I can get around this or if I'm not setting some required > properties? > > Btw, I've noticed a spelling mistake in the ParseStatus that is returned > in the above lines of code; "Micrsosoft" > > > Thanks, > Trym > -- View this message in context: http://www.nabble.com/Problem-parsing-some-MS-Excel-documents-%28Office-2003%29-tf2380851.html#a6654362 Sent from the Nutch - User mailing list archive at Nabble.com.