Re: Language Detection for Analysis?

2009-08-09 Thread Lucas F. A. Teixeira
Google Translate just released (last week) its language API with translation
and LANGUAGE DETECTION.
:)

It's very simple to use, and you can query it with some text to define witch
language is it.

Here is a simple example using groovy, but all you need is the url to
query: http://groovyconsole.appspot.com/view.groovy?id=16


[]s,

Lucas Frare Teixeira .ยท.
- lucas...@gmail.com
- blog.lucastex.com
- twitter.com/lucastex


On Thu, Aug 6, 2009 at 4:46 PM, Bradford Stephens <
bradfordsteph...@gmail.com> wrote:

> Hey there,
>
> We're trying to add foreign language support into our new search
> engine -- languages like Arabic, Farsi, and Urdu (that don't work with
> standard analyzers). But our data source doesn't tell us which
> languages we're actually collecting -- we just get blocks of text. Has
> anyone here worked on language detection so we can figure out what
> analyzers to use? Are there commercial solutions?
>
> Much appreciated!
>
> --
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>


Re: score from spans

2009-08-09 Thread Eran Sevi
Thanks for the answer.

I tried to further understand the weight and score mechanism when running a
span query search.
I noticed that indeed the SpanScorer and SpanWeight are being called and
some score is returned but it seems to me that these basic implementations
are more appropriate for the basic SpanTermQuery.
For the other types of span queries, the inner queries scores and weights
are not taken into account - for example if I run a simple SpanOrQuery and
boost one of it's child SpanTermQuery, the boost is not taken into account.

It seems to me that some recursive calculation is required in order to take
into account all the weights and scores of the span's sub queries.
I'm trying to come up with a correct implementation for SpanOrQuery,
SpanNearQuery, SpanNotQuery based on similiar calculations of BooleanQuery.

Do you have a better idea on how to achieve the correct scoring? the score
calculations are quite complex for each case of span queries so any help is
appreciated.

Thanks, Eran.

On Tue, Aug 4, 2009 at 8:51 PM, Grant Ingersoll  wrote:

> A SpanQuery is a Query, so if you do a search for it, you will get scores.
>  However, the mechanism is a bit complicated, b/c actually getting the Spans
> is separate from doing the query.  I agree there could be tighter
> integration.  However, what you could do is use Spans.skipTo to move to the
> document you are examining in the search results.
>
> -Grant
>
>
> On Aug 2, 2009, at 11:30 AM, Eran Sevi wrote:
>
> Hi,
>>
>> How can I get the score of a span that is the result of
>> SpanQuery.getSpans()
>> ? The score should can be the same for each document, but if it's unique
>> per
>> span, it's even better.
>>
>> I tried looking for a way to expose this functionality through the Spans
>> class but it looks too complicated.
>> I'm not even sure that by default some score calculation is even performed
>> when using span queries.
>>
>> I've noticed that some calculations are made using payloads and
>> BoostingTermQuery but the score result is used internally and can't be
>> accessed from the Spans results.
>> I don't want to re-run the query again using a HitCollector and since the
>> reader is passed to getSpans, I think it should be possible to do what I
>> want.
>>
>> Any help on the correct way to expose the span score will be appreciated.
>>
>> Thanks,
>> Eran.
>>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Handling synonyms using Lucene

2009-08-09 Thread mitu2009

Just wanted to add this my original question:
FYI, The synonyms in my application are totally custom and not from English
dictionary...ie. "Global Leader in Finance" could also mean "Top Investment
Bank" or "Fortune 500 Finance company" etc etc.


Anshum-2 wrote:
> 
> Hi Mitu,
> Though your approach would work I'd suggest you build a custom analyzer
> instead. Perhaps that'd be a bettter approach.
> 
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
> 
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw
> 
> 
> On Sat, Aug 8, 2009 at 11:14 AM, mitu2009  wrote:
> 
>>
>> Hi,
>>
>> What is the best way to handle synonyms (phrases) using Lucene?
>> Especially,
>> when I need to execute queries like :a OR b OR c NOT d
>>
>> How about adding a new field called "synonyms" to each document while
>> indexing? This field's value would have a list of all synonyms. It would
>> be
>> added to a document only when that document has any of the synonyms.
>>
>> I would then execute an "OR" search query which would look for search
>> keyword in this field alongwith other fields.
>>
>> Can this approach work well for any kind of query?
>>
>> Please suggest.
>>
>> Thanks.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Handling-synonyms-using-Lucene-tp24875308p24875308.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Handling-synonyms-using-Lucene-tp24875308p24888495.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org