Re: Lucene query support in Nutch

2006-10-05 Thread Cristina Belderrain

Hi Björn,

yes, the error you point out will happen indeed... A possible
workaround would be:

   public Hits search(String queryString, int numHits,
   String dedupField, String sortField, boolean reverse)

   throws IOException {

   org.apache.lucene.queryParser.QueryParser parser = new
   org.apache.lucene.queryParser.QueryParser("content", new
   org.apache.lucene.analysis.standard.StandardAnalyzer());

   org.apache.lucene.search.Query luceneQuery = null;
   try {
   luceneQuery = parser.parse(queryString);
   } catch(Exception ex) {
   }

   org.apache.lucene.search.BooleanQuery boolQuery = new
   org.apache.lucene.search.BooleanQuery();
   boolQuery.add(luceneQuery,
   org.apache.lucene.search.BooleanClause.Occur.MUST);
   return translateHits
   (optimizer.optimize(boolQuery, luceneSearcher, numHits,
   sortField, reverse),
   dedupField, sortField);
   }

Please notice that I'm not sure this will work as it should: right
now, it just compiles... I still need to modify the NutchBean class so
it can pass on the raw query, as Ravi says.

Regards,

Cristina


On 10/5/06, Björn Wilmsmann <[EMAIL PROTECTED]> wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi everybody,


On 05/10/2006 05:44 Ravi Chintakunta wrote:

> public Hits search(String queryString, int numHits,
> String dedupField, String sortField, boolean
> reverse)  throws IOException {
>
>org.apache.lucene.queryParser.QueryParser parser = new
> org.apache.lucene.queryParser.QueryParser("content", new
> org.apache.lucene.analysis.standard.StandardAnalyzer());
>
>   org.apache.lucene.search.Query luceneQuery = parser.parse
> (queryString);
>
>   return translateHits
>  (optimizer.optimize(luceneQuery, luceneSearcher, numHits,
>  sortField, reverse),
>   dedupField, sortField);
>  }

This seems to be a good approach. I have not yet tried it out in
detail, however, the method optimize() in LuceneQueryOptimizer does
only take BooleanQuery as an argument, so the line 'return
translateHits...'  would cause a compile error, wouldn't it?


- --
Best regards,
Björn Wilmsmann


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFFJV9Fgz0R1bg11MERAt3sAJ4pKJ8voEhWSo+94SI6bam4iVPYgACbBQmm
sFAZIcCv3CoIBJC5g8FbOyo=
=vzdw
-END PGP SIGNATURE-


Re: Lucene query support in Nutch

2006-10-05 Thread Björn Wilmsmann

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi everybody,


On 05/10/2006 05:44 Ravi Chintakunta wrote:


public Hits search(String queryString, int numHits,
String dedupField, String sortField, boolean
reverse)  throws IOException {

   org.apache.lucene.queryParser.QueryParser parser = new
org.apache.lucene.queryParser.QueryParser("content", new
org.apache.lucene.analysis.standard.StandardAnalyzer());

  org.apache.lucene.search.Query luceneQuery = parser.parse 
(queryString);


  return translateHits
 (optimizer.optimize(luceneQuery, luceneSearcher, numHits,
 sortField, reverse),
  dedupField, sortField);
 }


This seems to be a good approach. I have not yet tried it out in  
detail, however, the method optimize() in LuceneQueryOptimizer does  
only take BooleanQuery as an argument, so the line 'return  
translateHits...'  would cause a compile error, wouldn't it?



- --
Best regards,
Björn Wilmsmann


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFFJV9Fgz0R1bg11MERAt3sAJ4pKJ8voEhWSo+94SI6bam4iVPYgACbBQmm
sFAZIcCv3CoIBJC5g8FbOyo=
=vzdw
-END PGP SIGNATURE-


Re: backup/failover NameNode

2006-10-05 Thread Albert Chern

Hello Mohan,

I posted this question not long ago and a Hadoop developer replied
that this feature is not implemented yet.

On 10/4/06, Mohan Lal <[EMAIL PROTECTED]> wrote:


Hi all,

JobTracker will handle all its datanode and tasktraker, if the
JobTracker failed,

how can we overcome failover the NameNode ?
is there any possible solution for backup/failover NameNode ?

if so how can please help me ..

Thanks & Regards
Mohan Lal
--
View this message in context: 
http://www.nabble.com/backup-failover-NameNode-tf2382399.html#a6639742
Sent from the Nutch - User mailing list archive at Nabble.com.




Re: focussed crawling

2006-10-05 Thread Alvaro Cabrerizo

Although I havent use it. After making a crawl, at least in nutch 0.8,
you can make a "./bin/nutch/mergedb outputdb your_db -filter". Using
the filter option you can generate a new db, filtering links you wanna
remove. And use it to make a recrawl.

Hope it helps.


RE: 0.7.2 Compile Problems

2006-10-05 Thread Gary Bone








Hi just to let you know I rebuild ant and it now compiles
fine!!! I’ll never trust yum again!!

 

G

 





From: Gary
Bone [mailto:[EMAIL PROTECTED] 
Sent: 04 October 2006 21:57
To: nutch-user@lucene.apache.org
Subject: 0.7.2 Compile Problems





 



Hi,
i'm having problems compiling nutch after adding the edits to get stemming
functioning. i'm getting the attached error and wondered if anyone had any
ideas?





 





I'm
running Fedora Core 5 with sun java 1.5.06 and ant 1.6.5 and i'm using the ant
package command.





 





Cheers





 





G




CAUTION - This message may contain privileged and confidential information
intended only for the use of the addressee named above. If you are not the
intended recipient of this message you are hereby notified that any use,
dissemination, distribution or reproduction of this message is prohibited. If
you have received this message in error please notify SPG Media Group Plc
immediately via email at [EMAIL PROTECTED].
Any views expressed in this message are those of the individual sender and may
not necessarily reflect the views of SPG Media Group PLC 

This email has been scanned by SPG's Email Security System.





CAUTION - This message may contain privileged and confidential information intended only for the use of the addressee named above. If you are not the intended recipient of this message you are hereby notified that any use, dissemination, distribution or reproduction of this message is prohibited. If you have received this message in error please notify SPG Media Group Plc immediately via email at [EMAIL PROTECTED] Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of SPG Media Group PLC 

This email has been scanned by SPG's Email Security System.



generate-locale:
 [echo] Generating docs for locale=ca
[mkdir] Created dir: /apps/nutch-production/build/docs/ca/include
 [xslt] DEPRECATED - xalan processor is deprecated. Use trax instead.
 [xslt] DEPRECATED - xslp processor is deprecated. Use trax instead.
 [xslt] java.lang.ClassNotFoundException: 
org.apache.tools.ant.taskdefs.optional.XslpLiaison
 [xslt] at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
 [xslt] at java.security.AccessController.doPrivileged(Native Method)
 [xslt] at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 [xslt] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)
 [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
 [xslt] at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
 [xslt] at java.lang.Class.forName0(Native Method)
 [xslt] at java.lang.Class.forName(Class.java:164)
 [xslt] at 
org.apache.tools.ant.taskdefs.XSLTProcess.loadClass(XSLTProcess.java:419)
 [xslt] at 
org.apache.tools.ant.taskdefs.XSLTProcess.resolveProcessor(XSLTProcess.java:397)
 [xslt] at 
org.apache.tools.ant.taskdefs.XSLTProcess.getLiaison(XSLTProcess.java:619)
 [xslt] at 
org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:212)
 [xslt] at 
org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:275)
 [xslt] at org.apache.tools.ant.Task.perform(Task.java:364)
 [xslt] at org.apache.tools.ant.Target.execute(Target.java:341)
 [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:369)
 [xslt] at 
org.apache.tools.ant.Project.executeSortedTargets(Project.java:1216)
 [xslt] at 
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:37)
 [xslt] at 
org.apache.tools.ant.Project.executeTargets(Project.java:1068)
 [xslt] at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:382)
 [xslt] at 
org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:107)
 [xslt] at 
org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:275)
 [xslt] at org.apache.tools.ant.Task.perform(Task.java:364)
 [xslt] at org.apache.tools.ant.Target.execute(Target.java:341)
 [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:369)
 [xslt] at 
org.apache.tools.ant.Project.executeSortedTargets(Project.java:1216)
 [xslt] at org.apache.tools.ant.Project.executeTarget(Project.java:1185)
 [xslt] at 
org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:40)
 [xslt] at 
org.apache.tools.ant.Project.executeTargets(Project.java:1068)
 [xslt] at org.apache.tools.ant.Main.runBuild(Main.java:668)
 [xslt] at org.apache.tools.ant.Main.startAnt(Main.java:187)
 [xslt] at org.apache.tools.ant.launch.Launcher.run(Launcher.java:246)
 [xslt] at org.apache.tools.ant.launch.Launcher.main(Launcher.java:67)
 [xslt] java.lang.ClassNotFoundException: 
org.apache.t

DFS Shadow Server

2006-10-05 Thread Sunil Kumar PK

Hi,

I have a number of question regarding DFS.

1. Can any one explain how to setup DFS shadow servers?
2. How does the switch happens one goes down?
3. If the master down,  do all the nodes have to be restarted?
4. Does the master node maintain state that needs to be synced between both
master nodes?

Experts please help

Thanks,
Sunil


Re: Problem parsing some MS Excel documents (Office 2003)

2006-10-05 Thread tryma

Any suggestions, or should I maybe post this on the Nutch-dev list too?

To me it seems a bit strange that the MSBaseParser.java opens for the
possibility that your properties object may be set to null and then later
can give rise to an NPE at the call:

title = properties.getProperty(DublinCore.TITLE);

Comments?


Thanks,
Trym


tryma wrote:
> 
> Hi,
> 
> I initially thought there was an issue with POI so I posted my initial
> question on the POI-user list.
> Actually, now I see this is happening in the Nutch classes for the MS
> parse plugin, not POI, so I'm giving this list a go.
> 
> Here's a trace I get when I catch any exception occurring as I attempt to
> call the MSExcelParser's getParse(Content). It seems I get an NPE in
> MSBaseParser.getParse().
> 
> [#|2006-10-04T09:13:15.102+0200|WARNING|sun-appserver-ee9.1|javax.enterprise.system.stream.err|_ThreadID=16;_ThreadName=httpWorkerThread-8080-1;_RequestID=0b18e2ae-0f79-4241-9e29-a322c8ae2bc6;|
> java.lang.NullPointerException
>   at org.apache.nutch.parse.ms.MSBaseParser.getParse(MSBaseParser.java:94)
>   at
> org.apache.nutch.parse.msexcel.MSExcelParser.getParse(MSExcelParser.java:40)
> at
> .DocumentParser.parseDocument(DocumentParser.java:154)
> ...
> 
> Looking at the source (MSBaseParser.java) at this line, it goes:
> 
> SNIP
>   extractor.extract(new ByteArrayInputStream(raw));
>   text = extractor.getText();
>   properties = extractor.getProperties();
>   outlinks = OutlinkExtractor.getOutlinks(text, content.getUrl(),
> getConf());
>   
> } catch (Exception e) {
>   return new ParseStatus(ParseStatus.FAILED,
>  "Can't be handled as micrsosoft document. " +
> e)
>  .getEmptyParse(this.conf);
> }
> 
> // collect meta data
> Metadata metadata = new Metadata();
> title = properties.getProperty(DublinCore.TITLE);  <==
> This is line 94 as indicated in the trace
> properties.remove(DublinCore.TITLE);
> SNIP
> 
> So I can only gather that my properties object is null. As seen above in
> the snippet from the MSBaseParser source, properties is initially null but
> assigned a value from the ExcelExtractor (properties =
> extractor.getProperties();) which I assume is becoming null.
> 
> Any ideas how I can get around this or if I'm not setting some required
> properties?
> 
> Btw, I've noticed a spelling mistake in the ParseStatus that is returned
> in the above lines of code; "Micrsosoft"
> 
> 
> Thanks,
> Trym
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-parsing-some-MS-Excel-documents-%28Office-2003%29-tf2380851.html#a6654362
Sent from the Nutch - User mailing list archive at Nabble.com.