from:"vempap"

SpanNearQuery distance issue

2012-09-19 Thread vempap

Hello All,

I've a issue with respect to the distance measure of SpanNearQuery in
Lucene. Let's say I've following two documents:

DocID: 6, cotent:"1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1001
1002 1003 1004 1005 1006 1007 1008 1009 1100", 
DocID: 7, content:"a b c d e a b c f g h i j k l m l k j z z z"

If my span query is :
a) "3n(a,e)" - It matches doc 7
But, if it is:
b) "3n(1,5)" - It does not match doc 6
If query is:
c) "4n(1,5)" - it matches doc 6

I have no clue why a) works rather not b). I tried to debug the code, but
couldn't figure it out.

Any help ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpanNearQuery-distance-issue-tp4008975.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: SpanNearQuery distance issue

2012-09-19 Thread vempap

Shoot me. Thanks, I did not notice that the doc has ".. e a .." in the
content. Thanks again for the reply :) 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpanNearQuery-distance-issue-tp4008975p4009033.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

StandardTokenizer generation from JFlex grammar

2012-10-04 Thread vempap

Hello,

  I'm trying to generate the standard tokenizer again using the jflex
specification (StandardTokenizerImpl.jflex) but I'm not able to do so due to
some errors (I would like to create my own jflex file using the standard
tokenizer which is why I'm trying to first generate using that to get a hang
of things).

I'm using jflex 1.4.3 and I ran into the following error:

Error in file "" (line 64): 
Syntax error.
HangulEx   = (!(!\p{Script:Hangul}|!\p{WB:ALetter})) ({Format} |
{Extend})*


Also, I tried installing an eclipse plugin from
http://cup-lex-eclipse.sourceforge.net/ which I thought would provide
options similar to JavaCC (http://eclipse-javacc.sourceforge.net/) through
which we can generate classes within eclipse - but had a hard luck.

Any help would be very helpful.

Regards,
Phani.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizer-generation-from-JFlex-grammar-tp4011940.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: StandardTokenizer generation from JFlex grammar

2012-10-04 Thread vempap

Thanks Steve for the pointers. I'll look into it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizer-generation-from-JFlex-grammar-tp4011940p4011944.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

To get Term Offsets of a term per document

2013-02-20 Thread vempap

Hello,

  Is there a way to get Term Offsets of a given term per document without
enabling the termVectors ?

Is it that Lucene index stores the positions but not the offsets by default
- is it correct ?

Thanks,
Phani.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-get-Term-Offsets-of-a-term-per-document-tp4041699.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

WeightedSpanTermsExtractor

2013-04-05 Thread vempap

Hi,

  I've multiple fields (name, name2 - content copied below). And if I
extract the weighted span terms out based on a query (the query is with a
specific field) why am I not getting the positions properly out of the
WeightedSpanTerm covering multiple fields ? 

 Is it because the query is specific to a field & not to others ?

query: "Running Apple" (phrase query)
document content:
name: Running Apple 60 GB iPod with Video Playback Black - Apple
name2: Sample Running Apple 60 GB iPod with Video Playback Black - Apple

I'm getting the positions as : 0,1 & 3,4 which I don't understand as it
should be 0,1 & 1,2  for the fields name & name2 respectively.

Am I doing something wrong expecting that behavior.

Help or pointers would be appreciated.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/WeightedSpanTermsExtractor-tp4054149.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Token Stream with Offsets (Token Sources class)

2013-04-07 Thread vempap

Hi,

  I've the following snippet code where I'm trying to extract weighted span
terms from the query (I do have term vectors enabled on the fields):

File path = new File(
"");
FSDirectory directory = FSDirectory.open(path);
IndexReader indexReader = DirectoryReader.open(directory);

Map allWeightedSpanTerms = new 
HashMap();

WeightedSpanTermExtractor extractor = null;
extractor = new WeightedSpanTermExtractor();
TokenStream tokenStream = null;
tokenStream = 
TokenSources.getTokenStreamWithOffsets(indexReader, 0,
"name");
allWeightedSpanTerms.putAll(extractor.getWeightedSpanTerms(q,
tokenStream));

In the end, if I look at the map "allWeightedSpanTerms" - I don't have any
weighted span terms & when I tried to debug the code I found that when it is
trying to build the TermContext the statement "fields.terms(field);" is
returning "null" which I don't understand.

My query is : "Running Apple" (a phrase query)
my doc contents are :
name : Running Apple 60 GB iPod with Video Playback Black - Apple

Please let me know on what I'm doing anything wrong.

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Token-Stream-with-Offsets-Token-Sources-class-tp4054383.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Token Stream with Offsets (Token Sources class)

2013-04-08 Thread vempap

I apologize. I did not know where exactly I needed to post this - I'll remove
the others.

As for indexing, I'm using Solr  example docs script to post the documents &
then using the mentioned code to get the token stream of that index.

I've the following doc :

ipod_video_1.xml :


  MA147LL/A
  Running Apple 60 GB iPod with Video Playback Black -
Apple
  Sample Running Apple 60 GB iPod with Video Playback
Black - Apple



And I indexed it using the "post.sh" script in example docs via "sh post.sh
ipod_video_1.xml"

Thanks.



-
--
Phani
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Token-Stream-with-Offsets-Token-Sources-class-tp4054383p4054514.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Token Stream with Offsets (Token Sources class)

2013-04-09 Thread vempap

well, I found out the issue - it is because the maxDocCharsToAnalyze is 0 in
the weightedSpanTermsExtractor by default. Works fine if I change there or
use the QueryScorer which has a default limit of 51200.

Thanks.



-
--
Phani
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Token-Stream-with-Offsets-Token-Sources-class-tp4054383p4054830.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

SpanNearQuery distance issue

Re: SpanNearQuery distance issue

StandardTokenizer generation from JFlex grammar

RE: StandardTokenizer generation from JFlex grammar

To get Term Offsets of a term per document

WeightedSpanTermsExtractor

Token Stream with Offsets (Token Sources class)

Re: Token Stream with Offsets (Token Sources class)

Re: Token Stream with Offsets (Token Sources class)

9 matches

Site Navigation

Mail list logo

Footer information