SpanNearQuery distance issue

2012-09-19 Thread vempap
Hello All,

I've a issue with respect to the distance measure of SpanNearQuery in
Lucene. Let's say I've following two documents:

DocID: 6, cotent:"1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1001
1002 1003 1004 1005 1006 1007 1008 1009 1100", 
DocID: 7, content:"a b c d e a b c f g h i j k l m l k j z z z"

If my span query is :
a) "3n(a,e)" - It matches doc 7
But, if it is:
b) "3n(1,5)" - It does not match doc 6
If query is:
c) "4n(1,5)" - it matches doc 6

I have no clue why a) works rather not b). I tried to debug the code, but
couldn't figure it out.

Any help ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpanNearQuery-distance-issue-tp4008975.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: SpanNearQuery distance issue

2012-09-19 Thread vempap
Shoot me. Thanks, I did not notice that the doc has ".. e a .." in the
content. Thanks again for the reply :) 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpanNearQuery-distance-issue-tp4008975p4009033.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



StandardTokenizer generation from JFlex grammar

2012-10-04 Thread vempap
Hello,

  I'm trying to generate the standard tokenizer again using the jflex
specification (StandardTokenizerImpl.jflex) but I'm not able to do so due to
some errors (I would like to create my own jflex file using the standard
tokenizer which is why I'm trying to first generate using that to get a hang
of things).

I'm using jflex 1.4.3 and I ran into the following error:

Error in file "" (line 64): 
Syntax error.
HangulEx   = (!(!\p{Script:Hangul}|!\p{WB:ALetter})) ({Format} |
{Extend})*


Also, I tried installing an eclipse plugin from
http://cup-lex-eclipse.sourceforge.net/ which I thought would provide
options similar to JavaCC (http://eclipse-javacc.sourceforge.net/) through
which we can generate classes within eclipse - but had a hard luck.

Any help would be very helpful.

Regards,
Phani.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizer-generation-from-JFlex-grammar-tp4011940.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: StandardTokenizer generation from JFlex grammar

2012-10-04 Thread vempap
Thanks Steve for the pointers. I'll look into it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizer-generation-from-JFlex-grammar-tp4011940p4011944.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



To get Term Offsets of a term per document

2013-02-20 Thread vempap
Hello,

  Is there a way to get Term Offsets of a given term per document without
enabling the termVectors ?

Is it that Lucene index stores the positions but not the offsets by default
- is it correct ?

Thanks,
Phani.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-get-Term-Offsets-of-a-term-per-document-tp4041699.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



WeightedSpanTermsExtractor

2013-04-05 Thread vempap
Hi,

  I've multiple fields (name, name2 - content copied below). And if I
extract the weighted span terms out based on a query (the query is with a
specific field) why am I not getting the positions properly out of the
WeightedSpanTerm covering multiple fields ? 

 Is it because the query is specific to a field & not to others ?

query: "Running Apple" (phrase query)
document content:
name: Running Apple 60 GB iPod with Video Playback Black - Apple
name2: Sample Running Apple 60 GB iPod with Video Playback Black - Apple

I'm getting the positions as : 0,1 & 3,4 which I don't understand as it
should be 0,1 & 1,2  for the fields name & name2 respectively.

Am I doing something wrong expecting that behavior.

Help or pointers would be appreciated.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/WeightedSpanTermsExtractor-tp4054149.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Token Stream with Offsets (Token Sources class)

2013-04-07 Thread vempap
Hi,

  I've the following snippet code where I'm trying to extract weighted span
terms from the query (I do have term vectors enabled on the fields):

File path = new File(
"");
FSDirectory directory = FSDirectory.open(path);
IndexReader indexReader = DirectoryReader.open(directory);

Map allWeightedSpanTerms = new 
HashMap();

WeightedSpanTermExtractor extractor = null;
extractor = new WeightedSpanTermExtractor();
TokenStream tokenStream = null;
tokenStream = 
TokenSources.getTokenStreamWithOffsets(indexReader, 0,
"name");
allWeightedSpanTerms.putAll(extractor.getWeightedSpanTerms(q,
tokenStream));

In the end, if I look at the map "allWeightedSpanTerms" - I don't have any
weighted span terms & when I tried to debug the code I found that when it is
trying to build the TermContext the statement "fields.terms(field);" is
returning "null" which I don't understand.

My query is : "Running Apple" (a phrase query)
my doc contents are :
name : Running Apple 60 GB iPod with Video Playback Black - Apple

Please let me know on what I'm doing anything wrong.

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Token-Stream-with-Offsets-Token-Sources-class-tp4054383.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Token Stream with Offsets (Token Sources class)

2013-04-08 Thread vempap
I apologize. I did not know where exactly I needed to post this - I'll remove
the others.

As for indexing, I'm using Solr  example docs script to post the documents &
then using the mentioned code to get the token stream of that index.

I've the following doc :

ipod_video_1.xml :


  MA147LL/A
  Running Apple 60 GB iPod with Video Playback Black -
Apple
  Sample Running Apple 60 GB iPod with Video Playback
Black - Apple



And I indexed it using the "post.sh" script in example docs via "sh post.sh
ipod_video_1.xml"

Thanks.



-
--
Phani
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Token-Stream-with-Offsets-Token-Sources-class-tp4054383p4054514.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Token Stream with Offsets (Token Sources class)

2013-04-09 Thread vempap
well, I found out the issue - it is because the maxDocCharsToAnalyze is 0 in
the weightedSpanTermsExtractor by default. Works fine if I change there or
use the QueryScorer which has a default limit of 51200.

Thanks.



-
--
Phani
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Token-Stream-with-Offsets-Token-Sources-class-tp4054383p4054830.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org