Re: Need Help: Lucene with PHP/Java Bridge

2010-10-24 Thread dian puma
Ahaaa, you're right, I used lucene-core-3.0.1.jar and I got an error on it
cause
the IndexWriter Constructor cannot work.

When I try to use the lower version, it works.
Thank alot

On Sun, Oct 24, 2010 at 6:52 AM, Uwe Schindler  wrote:

> Are you sure that you use the same Lucene version? If you use latest
> (3.0.x)
> now, then your IndexWriter ctor cannot work, because you have to call
> FSDirectory.open() in java code first. Directly passing a native
> java.io.File to IW is no longer possible. So maybe it simply does not find
> the correct ctor?
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: dian puma [mailto:dianp...@gmail.com]
> > Sent: Saturday, October 23, 2010 6:01 PM
> > To: java-user@lucene.apache.org
> > Subject: Need Help: Lucene with PHP/Java Bridge
> >
> > Dear All,
> >
> > Currently, I'm using PHP/Java Bridge to have Lucene in my PHP web
> > application, and also using the java extension for PHP.
> >
> > FYI, I'd setup lucene on my PC several months ago and my code below
> worked
> > well.
> >
> > But, Today I try to setup lucene on another PC, and I get an error
> message:
> >
> > ==
> > indexing ... Exception occured: [[o:Exception]:"java.lang.Exception:
> > CreateInstance failed: new
> > org.apache.lucene.index.IndexWriter((o:Directory)[o:String],
> > (o:Analyzer)[c:StandardAnalyzer],
> > (o:IndexWriter$MaxFieldLength)[o:Boolean]). Cause:
> > java.lang.IllegalArgumentException:
> > java.lang.classcastexcept...@18a992f Responsible VM:
> > 1.6.0...@http://java.sun.com/"; at: #-8
> > sun.reflect.GeneratedConstructorAccessor2.newInstance(Unknown Source)
> > #-7
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstru
> > ctorAccessorImpl.java:27)
> > #-6 java.lang.reflect.Constructor.newInstance(Constructor.java:513) #0
> > Java.inc(161): java_ThrowExceptionProxyFactory->getProxy(6, false) #1
> > Java.inc(314): java_Arg->getResult(false) #2 Java.inc(317):
> > java_Client->getWrappedResult(false) #3 Java.inc(481):
> > java_Client->getInternalResult() #4 Java.inc(703):
> > java_Client->createObject('org.apache.luce...', Array, true) #5
> > Java.inc(702): java_create(Array, true) #6 Java.inc(834):
> > java_create(Array, true) #7
> > /usr/share/pear/lucene/org_apache_lucene_index_IndexWriter.php(29):
> > Java->Java(Array) #8 /var/www/html/DLL/lucene/lucene_search.php(14):
> > org_apache_lucene_index_IndexWriter->__construct('/tmp/idxHSQALS',
> > Object(org_apache_lucene_analysis_standard_StandardAnalyzer), true) #9
> > {main}] ==
> >
> > Here is the code:
> >  > require_once('rt/java_io_File.php');
> > require_once('rt/java_lang_System.php');
> > require_once('rt/java_util_LinkedList.php');
> > require_once('lucene/All.php');
> >
> > try {
> >   echo "indexing ... ";
> >   /* Create an index */
> >   $cwd=getcwd();
> >   /* create the index files in the tmp dir */
> >   $tmp = create_index_dir();
> >   $analyzer = new org_apache_lucene_analysis_standard_StandardAnalyzer();
> >   $writer = new org_apache_lucene_index_IndexWriter($tmp, $analyzer,
> true);
> >   $file = new java_io_File($cwd);
> >   $files = $file->listFiles();
> >   if(is_null($files)) {
> > $user = java_lang_System()->getProperty("user.name");
> > echo("$cwd does not exist or is not readable.\n");
> > echo("The directory must be readable by the user $user and it must
> not\n");
> > echo("be protected by a SEL rule.\n");
> > exit(1);
> >   }
> >   foreach($files as $f) {
> > $doc = new org_apache_lucene_document_Document();
> > $doc->add(new org_apache_lucene_document_Field(
> >  "name",
> >  $f->getName(),
> >  org_apache_lucene_document_Field__Store()->YES,
> >  org_apache_lucene_document_Field__Index()->UN_TOKENIZED));
> > $writer->addDocument($doc);
> >   }
> >   $writer->optimize();
> >   $writer->close();
> >   echo "done\n";
> >
> >   echo "searching... ";
> >   /* Search */
> >   $searcher = new org_apache_lucene_search_IndexSearcher($tmp);
> >   $phrase = new org_apache_lucene_search_MatchAllDocsQuery();
> >   $hits = $searcher->search($phrase);
> >
> >   /* Print result */
> >   $iter = $hits->iterator();
> >   $n = $hits->length();
> >   echo "done\n";
> >   echo "Hits: $n\n";
> >
> >   /* Instead of retrieving the values one-by-one, we store them into a
> >* LinkedList on the server side and then retrieve the list in one
> >* query:
> >*/
> >   $resultList = new java_util_LinkedList();
> >
> > // create an XML document from the
> > // following PHP code, ...
> >   java_lang_System::javaBeginDocument();
> >   while($n--) {
> > $next = $iter->next();
> > $name = $next->get("name");
> > $resultList->add($name);
> >   }
> > //  ... execute the XML document on
> > //  the server side, ...
> >   java_lang

Use of hyphens in StandardAnalyzer

2010-10-24 Thread Martin O'Shea
Hello

 

I have a StandardAnalyzer working which retrieves words and frequencies from
a single document using a TermVectorMapper which is populating a HashMap.

 

But if I use the following text as a field in my document, i.e. 

 

addDoc(w, "lucene Lawton-Browne Lucene");

 

The word frequencies returned in the HashMap are:

 

browne 1

lucene 2

lawton 1

 

The problem is the words 'lawton' and 'browne'. If this is an actual
'double-barreled' name, can Lucene recognise it as 'Lawton-Browne' where the
name is actually a single word?

 

I've tried combinations of:

 

addDoc(w, "lucene \"Lawton-Browne\" Lucene");

 

And single quotes but without success.

 

Thanks

 

Martin O'Shea.

 

 

 



RE: Use of hyphens in StandardAnalyzer

2010-10-24 Thread Steven A Rowe
Hi Martin,

StandardTokenizer and -Analyzer have been changed, as of future version 3.1 
(the next release) to support the Unicode segmentation rules in UAX#29.  My 
(untested) guess is that your hyphenated word will be kept as a single token if 
you set the version to 3.1 or higher in the constructor.

Steve

> -Original Message-
> From: Martin O'Shea [mailto:app...@dsl.pipex.com]
> Sent: Sunday, October 24, 2010 3:59 PM
> To: java-user@lucene.apache.org
> Subject: Use of hyphens in StandardAnalyzer
> 
> Hello
> 
> 
> 
> I have a StandardAnalyzer working which retrieves words and frequencies
> from
> a single document using a TermVectorMapper which is populating a HashMap.
> 
> 
> 
> But if I use the following text as a field in my document, i.e.
> 
> 
> 
> addDoc(w, "lucene Lawton-Browne Lucene");
> 
> 
> 
> The word frequencies returned in the HashMap are:
> 
> 
> 
> browne 1
> 
> lucene 2
> 
> lawton 1
> 
> 
> 
> The problem is the words 'lawton' and 'browne'. If this is an actual
> 'double-barreled' name, can Lucene recognise it as 'Lawton-Browne' where
> the
> name is actually a single word?
> 
> 
> 
> I've tried combinations of:
> 
> 
> 
> addDoc(w, "lucene \"Lawton-Browne\" Lucene");
> 
> 
> 
> And single quotes but without success.
> 
> 
> 
> Thanks
> 
> 
> 
> Martin O'Shea.
> 
> 
> 
> 
> 
> 



RE: Use of hyphens in StandardAnalyzer

2010-10-24 Thread Martin O'Shea
A good suggestion. But I'm using Lucene 3.2 and the constructor for a 
StandardAnalyzer has Version_30 as its highest value.

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu] 
Sent: 24 Oct 2010 21 31
To: java-user@lucene.apache.org
Subject: RE: Use of hyphens in StandardAnalyzer

Hi Martin,

StandardTokenizer and -Analyzer have been changed, as of future version 3.1 
(the next release) to support the Unicode segmentation rules in UAX#29.  My 
(untested) guess is that your hyphenated word will be kept as a single token if 
you set the version to 3.1 or higher in the constructor.

Steve

> -Original Message-
> From: Martin O'Shea [mailto:app...@dsl.pipex.com]
> Sent: Sunday, October 24, 2010 3:59 PM
> To: java-user@lucene.apache.org
> Subject: Use of hyphens in StandardAnalyzer
> 
> Hello
> 
> 
> 
> I have a StandardAnalyzer working which retrieves words and frequencies
> from
> a single document using a TermVectorMapper which is populating a HashMap.
> 
> 
> 
> But if I use the following text as a field in my document, i.e.
> 
> 
> 
> addDoc(w, "lucene Lawton-Browne Lucene");
> 
> 
> 
> The word frequencies returned in the HashMap are:
> 
> 
> 
> browne 1
> 
> lucene 2
> 
> lawton 1
> 
> 
> 
> The problem is the words 'lawton' and 'browne'. If this is an actual
> 'double-barreled' name, can Lucene recognise it as 'Lawton-Browne' where
> the
> name is actually a single word?
> 
> 
> 
> I've tried combinations of:
> 
> 
> 
> addDoc(w, "lucene \"Lawton-Browne\" Lucene");
> 
> 
> 
> And single quotes but without success.
> 
> 
> 
> Thanks
> 
> 
> 
> Martin O'Shea.
> 
> 
> 
> 
> 
> 




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



FW: Use of hyphens in StandardAnalyzer

2010-10-24 Thread Martin O'Shea
A good suggestion. But I'm using Lucene 3.0.2 and the constructor for a 
StandardAnalyzer has Version_30 as its highest value. Do you know when 3.1 is 
due?

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu] 
Sent: 24 Oct 2010 21 31
To: java-user@lucene.apache.org
Subject: RE: Use of hyphens in StandardAnalyzer

Hi Martin,

StandardTokenizer and -Analyzer have been changed, as of future version 3.1 
(the next release) to support the Unicode segmentation rules in UAX#29.  My 
(untested) guess is that your hyphenated word will be kept as a single token if 
you set the version to 3.1 or higher in the constructor.

Steve

> -Original Message-
> From: Martin O'Shea [mailto:app...@dsl.pipex.com]
> Sent: Sunday, October 24, 2010 3:59 PM
> To: java-user@lucene.apache.org
> Subject: Use of hyphens in StandardAnalyzer
> 
> Hello
> 
> 
> 
> I have a StandardAnalyzer working which retrieves words and frequencies
> from
> a single document using a TermVectorMapper which is populating a HashMap.
> 
> 
> 
> But if I use the following text as a field in my document, i.e.
> 
> 
> 
> addDoc(w, "lucene Lawton-Browne Lucene");
> 
> 
> 
> The word frequencies returned in the HashMap are:
> 
> 
> 
> browne 1
> 
> lucene 2
> 
> lawton 1
> 
> 
> 
> The problem is the words 'lawton' and 'browne'. If this is an actual
> 'double-barreled' name, can Lucene recognise it as 'Lawton-Browne' where
> the
> name is actually a single word?
> 
> 
> 
> I've tried combinations of:
> 
> 
> 
> addDoc(w, "lucene \"Lawton-Browne\" Lucene");
> 
> 
> 
> And single quotes but without success.
> 
> 
> 
> Thanks
> 
> 
> 
> Martin O'Shea.
> 
> 
> 
> 
> 
> 






-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: FW: Use of hyphens in StandardAnalyzer

2010-10-24 Thread Ahmet Arslan
How about replacing "-" with some arbitrary character sequence with 
MappingCharFilter before tokenizer and then restoring that '-' with 
PatternReplaceFilter after the tokenizer?

May be you can just eat '-' with charFilter so that Lawton-Browne becomes 
LawtonBrowne.


--- On Mon, 10/25/10, Martin O'Shea  wrote:

> From: Martin O'Shea 
> Subject: FW: Use of hyphens in StandardAnalyzer
> To: java-user@lucene.apache.org
> Date: Monday, October 25, 2010, 12:28 AM
> A good suggestion. But I'm using
> Lucene 3.0.2 and the constructor for a StandardAnalyzer has
> Version_30 as its highest value. Do you know when 3.1 is
> due?
> 
> -Original Message-
> From: Steven A Rowe [mailto:sar...@syr.edu] 
> Sent: 24 Oct 2010 21 31
> To: java-user@lucene.apache.org
> Subject: RE: Use of hyphens in StandardAnalyzer
> 
> Hi Martin,
> 
> StandardTokenizer and -Analyzer have been changed, as of
> future version 3.1 (the next release) to support the Unicode
> segmentation rules in UAX#29.  My (untested) guess is
> that your hyphenated word will be kept as a single token if
> you set the version to 3.1 or higher in the constructor.
> 
> Steve
> 
> > -Original Message-
> > From: Martin O'Shea [mailto:app...@dsl.pipex.com]
> > Sent: Sunday, October 24, 2010 3:59 PM
> > To: java-user@lucene.apache.org
> > Subject: Use of hyphens in StandardAnalyzer
> > 
> > Hello
> > 
> > 
> > 
> > I have a StandardAnalyzer working which retrieves
> words and frequencies
> > from
> > a single document using a TermVectorMapper which is
> populating a HashMap.
> > 
> > 
> > 
> > But if I use the following text as a field in my
> document, i.e.
> > 
> > 
> > 
> > addDoc(w, "lucene Lawton-Browne Lucene");
> > 
> > 
> > 
> > The word frequencies returned in the HashMap are:
> > 
> > 
> > 
> > browne 1
> > 
> > lucene 2
> > 
> > lawton 1
> > 
> > 
> > 
> > The problem is the words 'lawton' and 'browne'. If
> this is an actual
> > 'double-barreled' name, can Lucene recognise it as
> 'Lawton-Browne' where
> > the
> > name is actually a single word?
> > 
> > 
> > 
> > I've tried combinations of:
> > 
> > 
> > 
> > addDoc(w, "lucene \"Lawton-Browne\" Lucene");
> > 
> > 
> > 
> > And single quotes but without success.
> > 
> > 
> > 
> > Thanks
> > 
> > 
> > 
> > Martin O'Shea.
> > 
> > 
> > 
> > 
> > 
> > 
> 
> 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Use of hyphens in StandardAnalyzer

2010-10-24 Thread Steven A Rowe
Sorry, releases are not scheduled.

There is a general feeling that a 3.1 release could happen fairly soon, though.

Currently, there is a push to improve test coverage and fix bugs that shake out 
as a result.

As another measure of how close the release is, you can check here to see how 
many issues remain targeting the 3.1 release - once these go to zero, a release 
is likely imminent:

Lucene open/reopened fix for 3.1: 

 
Solr open/reopened fix for 3.1: 


My estimate of when a release will occur: sometime in the next two or three 
months.

The 3.X branch (where the 3.1 release will be cut from) is quite stable - you 
should consider using it even pre-release.

Steve

> -Original Message-
> From: Martin O'Shea [mailto:app...@dsl.pipex.com]
> Sent: Sunday, October 24, 2010 5:29 PM
> To: java-user@lucene.apache.org
> Subject: FW: Use of hyphens in StandardAnalyzer
> 
> A good suggestion. But I'm using Lucene 3.0.2 and the constructor for a
> StandardAnalyzer has Version_30 as its highest value. Do you know when 3.1
> is due?
> 
> -Original Message-
> From: Steven A Rowe [mailto:sar...@syr.edu]
> Sent: 24 Oct 2010 21 31
> To: java-user@lucene.apache.org
> Subject: RE: Use of hyphens in StandardAnalyzer
> 
> Hi Martin,
> 
> StandardTokenizer and -Analyzer have been changed, as of future version
> 3.1 (the next release) to support the Unicode segmentation rules in
> UAX#29.  My (untested) guess is that your hyphenated word will be kept as
> a single token if you set the version to 3.1 or higher in the constructor.
> 
> Steve
> 
> > -Original Message-
> > From: Martin O'Shea [mailto:app...@dsl.pipex.com]
> > Sent: Sunday, October 24, 2010 3:59 PM
> > To: java-user@lucene.apache.org
> > Subject: Use of hyphens in StandardAnalyzer
> >
> > Hello
> >
> > I have a StandardAnalyzer working which retrieves words and frequencies
> > from a single document using a TermVectorMapper which is populating a
> > HashMap.
> >
> > But if I use the following text as a field in my document, i.e.
> >
> > addDoc(w, "lucene Lawton-Browne Lucene");
> >
> > The word frequencies returned in the HashMap are:
> >
> > browne 1
> > lucene 2
> > lawton 1
> >
> > The problem is the words 'lawton' and 'browne'. If this is an actual
> > 'double-barreled' name, can Lucene recognise it as 'Lawton-Browne' where
> > the name is actually a single word?
> >
> > I've tried combinations of:
> >
> > addDoc(w, "lucene \"Lawton-Browne\" Lucene");
> >
> > And single quotes but without success.
> >
> > Thanks
> >
> > Martin O'Shea.



Re: Need Help: Lucene with PHP/Java Bridge

2010-10-24 Thread dian puma
Hi.
I still have problem with it

My code worked well when I run it by command line, ex."php srcLucene.php"
But it didn't work on web browser, still got an error like this.

indexing ... Exception occured: [[o:Exception]:"java.lang.Exception:
CreateInstance failed: new
org.apache.lucene.index.IndexWriter((o:Directory)[o:String],
(o:Analyzer)[c:StandardAnalyzer],
(o:IndexWriter$MaxFieldLength)[o:Boolean]). Cause:
java.lang.IllegalArgumentException:
java.lang.classcastexcept...@506411 Responsible VM:
1.6.0...@http://java.sun.com/"; at: #-8
sun.reflect.GeneratedConstructorAccessor1.newInstance(Unknown Source)
#-7 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
#-6 java.lang.reflect.Constructor.newInstance(Constructor.java:513) #0
Java.inc(161): java_ThrowExceptionProxyFactory->getProxy(6, false) #1
Java.inc(314): java_Arg->getResult(false) #2 Java.inc(317):
java_Client->getWrappedResult(false) #3 Java.inc(481):
java_Client->getInternalResult() #4 Java.inc(703):
java_Client->createObject('org.apache.luce...', Array, true) #5
Java.inc(834): java_create(Array, true) #6
/var/www/html/DLL/luceneweb/lucenetes/srcLucene.php(22):
Java->Java('org.apache.luce...', '/tmp/idxOIP8mb', Object(Java),
true) #7 {main}]
==

I'm using
Linux CentOS 5.2,
Jdk 1.6.0_20,
PHP 5.1.6,
Apache Web Server 2.2

Any suggestion?

Thanks
--
Dian


On 10/24/10, Uwe Schindler  wrote:
> Are you sure that you use the same Lucene version? If you use latest (3.0.x)
> now, then your IndexWriter ctor cannot work, because you have to call
> FSDirectory.open() in java code first. Directly passing a native
> java.io.File to IW is no longer possible. So maybe it simply does not find
> the correct ctor?
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Need Help: Lucene with PHP/Java Bridge

2010-10-24 Thread Uwe Schindler
In general this happens if you have two different Lucene versions in your
classpath. Try to find out if there may be another 3.0.1 version somewhere
in your classpath. I have no idea how to configure that in your
webserver's/php.ini config, but it looks like that is the problem.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: dian puma [mailto:dianp...@gmail.com]
> Sent: Monday, October 25, 2010 6:03 AM
> To: java-user@lucene.apache.org
> Subject: Re: Need Help: Lucene with PHP/Java Bridge
> 
> Hi.
> I still have problem with it
> 
> My code worked well when I run it by command line, ex."php srcLucene.php"
> But it didn't work on web browser, still got an error like this.
> 
> indexing ... Exception occured: [[o:Exception]:"java.lang.Exception:
> CreateInstance failed: new
> org.apache.lucene.index.IndexWriter((o:Directory)[o:String],
> (o:Analyzer)[c:StandardAnalyzer],
> (o:IndexWriter$MaxFieldLength)[o:Boolean]). Cause:
> java.lang.IllegalArgumentException:
> java.lang.classcastexcept...@506411 Responsible VM:
> 1.6.0...@http://java.sun.com/"; at: #-8
> sun.reflect.GeneratedConstructorAccessor1.newInstance(Unknown Source)
> #-7
>
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstru
> ctorAccessorImpl.java:27)
> #-6 java.lang.reflect.Constructor.newInstance(Constructor.java:513) #0
> Java.inc(161): java_ThrowExceptionProxyFactory->getProxy(6, false) #1
> Java.inc(314): java_Arg->getResult(false) #2 Java.inc(317):
> java_Client->getWrappedResult(false) #3 Java.inc(481):
> java_Client->getInternalResult() #4 Java.inc(703):
> java_Client->createObject('org.apache.luce...', Array, true) #5
> Java.inc(834): java_create(Array, true) #6
> /var/www/html/DLL/luceneweb/lucenetes/srcLucene.php(22):
> Java->Java('org.apache.luce...', '/tmp/idxOIP8mb', Object(Java),
> true) #7 {main}]
> 
> ==
> 
> I'm using
> Linux CentOS 5.2,
> Jdk 1.6.0_20,
> PHP 5.1.6,
> Apache Web Server 2.2
> 
> Any suggestion?
> 
> Thanks
> --
> Dian
> 
> 
> On 10/24/10, Uwe Schindler  wrote:
> > Are you sure that you use the same Lucene version? If you use latest
> > (3.0.x) now, then your IndexWriter ctor cannot work, because you have
> > to call
> > FSDirectory.open() in java code first. Directly passing a native
> > java.io.File to IW is no longer possible. So maybe it simply does not
> > find the correct ctor?
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org