Re: Get last updated/committed document

2007-11-26 Thread Thorsten Scherler
On Sat, 2007-11-24 at 00:17 +1100, climbingrose wrote: Assuming that you have the timestamp field defined: q=*:*sort=timestamp desc Thanks. salu2 On Nov 23, 2007 10:43 PM, Thorsten Scherler [EMAIL PROTECTED] wrote: Hi all, I need to ask solr to return me the id of the last committed

LSA Implementation

2007-11-26 Thread Eswar K
All, Is there any plan to implement Latent Semantic Analysis as part of Solr anytime in the near future? Regards, Eswar

CJK Analyzers for Solr

2007-11-26 Thread Eswar K
Hi, Does Solr come with Language analyzers for CJK? If not, can you please direct me to some good CJK analyzers? Regards, Eswar

Re: Opensearch XSLT

2007-11-26 Thread Ed Summers
On Oct 12, 2007 10:13 AM, Walter Underwood [EMAIL PROTECTED] wrote: OpenSearch was a pretty poor design and is dead now, so I wouldn't expect any new implementations. Google's GData (based on Atom) reuses the few useful OpenSearch elements needed for things like number of hits. Solr's Atom

Re: LSA Implementation

2007-11-26 Thread Grant Ingersoll
LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is patented, so it is not likely to happen unless the authors donate the patent to the ASF. -Grant On Nov 26, 2007, at 8:23 AM, Eswar K wrote: All, Is there any plan to implement Latent Semantic Analysis as part of Solr

Re: LSA Implementation

2007-11-26 Thread Jack
Interesting. Patents are valid for 20 years so it expires next year? :) PLSA does not seem to have been patented, at least not mentioned in http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis On Nov 26, 2007 6:58 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: LSA

Re: LSA Implementation

2007-11-26 Thread Eswar K
I was just searching for info on LSA and came across Semantic Indexing project under GNU license...which of couse is still under development in C++ though. - Eswar On Nov 26, 2007 9:56 PM, Jack [EMAIL PROTECTED] wrote: Interesting. Patents are valid for 20 years so it expires next year? :)

Re: LSA Implementation

2007-11-26 Thread Brian Whitman
On Nov 26, 2007 6:58 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is patented, so it is not likely to happen unless the authors donate the patent to the ASF. -Grant There are many ways to catch a bird... LSA reduces to SVD on the

Re: CJK Analyzers for Solr

2007-11-26 Thread Chris Hostetter
: Does Solr come with Language analyzers for CJK? If not, can you please : direct me to some good CJK analyzers? Lucene has a CJKTokenizer and CJKAnalyzer in the contrib/analyzers jar. they can be used in Solr. both have been included in Solr for a while now, so you can specify CJKAnalyzer

Re: CJK Analyzers for Solr

2007-11-26 Thread Eswar K
Hoss, Thanks a lot. Will look into it. Regards, Eswar On Nov 26, 2007 11:55 PM, Chris Hostetter [EMAIL PROTECTED] wrote: : Does Solr come with Language analyzers for CJK? If not, can you please : direct me to some good CJK analyzers? Lucene has a CJKTokenizer and CJKAnalyzer in the

Re: LSA Implementation

2007-11-26 Thread Renaud Delbru
LDA (Latent Dirichlet Allocation) is a similar technique that extends pLSI. You can find some implementation in C++ and Java on the Web. Grant Ingersoll wrote: Interesting. I am not a lawyer, but my understanding has always been that this is not something we could do. The question has come

Re: Document update based on ID

2007-11-26 Thread evgeniy . strokin
So, you think there is a big chance that this will be in 1.3 version? BTW: When 1.3 you think will be released? At least rough estimate.. - Original Message From: Ryan McKinley [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, November 21, 2007 10:56:48 AM Subject: Re:

Re: Grouping multiValued fields

2007-11-26 Thread Chris Hostetter
This thread is pretty much on point with your question. it starts out with some simpler suggestions hat may work for you, and then evolves into a discussion of some much more complicated approaches thta (as far as i know) no one has ever actually implemented...

Re: LSA Implementation

2007-11-26 Thread Chris Hostetter
: A more interesting solr related question is where a very heavy process like : SVD would operate. You'd want to run the 'training' half of it separate from a : indexing or querying. It'd almost be like an optimize. Is there any hook right : now to give Solr a command like updateModels/ and map it

RE: CJK Analyzers for Solr

2007-11-26 Thread Norskog, Lance
I notice this is in the future tense. Is the CJKTokenizer available yet? From what I can see, the CJK code should be a Filter instead anyway. Also, the ChineseFilter and CJKTokenizer do two different things. CJKTokenizer turns C1C2C3C4 into 'C1C2 C2C3 C3C4'. ChineseFilter (from 2001) turns C1C2

RE: Performance problems for OR-queries

2007-11-26 Thread Norskog, Lance
https://issues.apache.org/jira/browse/lucene-997 is a patch to limit the time used for a query. Google clearly estimates the total # of results, and over-estimates. Lance -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Thursday, November 22, 2007 1:37 PM To:

RE: CJK Analyzers for Solr

2007-11-26 Thread Chris Hostetter
: I notice this is in the future tense. Is the CJKTokenizer available yet? CJKTokenizer and CJKAnalyzer are both available in Solr 1.2, but no TokenizerFactory was provided for CJKTokenizer in 1.2, so it wasn't possible to use out of the box without writing a 3 line java plugin. that 3 line

Re: Opensearch XSLT

2007-11-26 Thread Otis Gospodnetic
Ed, Wunder minght be right. As far as I know, only A9 was pushing OpenSearch. Now that A9 is not *really* around much, I think nobody is pushing it. I don't know of anyone pushing GData either, other than Google, but Google is doing rather (too?) well these days. Otis -- Sematext --

Re: CJK Analyzers for Solr

2007-11-26 Thread Otis Gospodnetic
Eswar, We've uses the NGram stuff that exists in Lucene's contrib/analyzers instead of CJK. Doesn't that allow you to do everything that the Chinese and CJK analyzers do? It's been a few months since I've looked at Chinese and CJK Analzyers, so I could be off. Otis -- Sematext --

Re: Opensearch XSLT

2007-11-26 Thread Walter Underwood
GData is using a few elements from OpenSearch, but those would be hard to get wrong: start index, results per page, total number of results. I'd be happier if Google had joined the Atom WG instead and worked on the Feed Paging and Archiving standard (http://tools.ietf.org/html/rfc5005), but that

DirectUpdateHandler and DirectUpdateHandler2

2007-11-26 Thread Norskog, Lance
Hi- We have a situation where we are submitting the same document several times, and have not handled this the right way yet. So, DirectUpdateHandler2 overwrites the existing record. If we used DirectUpdateHandler, we could use the feature where we tell it to not overwrite existing records.

Re: Opensearch XSLT

2007-11-26 Thread Koji Sekiguchi
Doesn't Microsoft push OpenSearch? http://www.microsoft.com/presspass/press/2007/nov07/11-06SearchServer08ExpressPR.mspx Koji Otis Gospodnetic wrote: Ed, Wunder minght be right. As far as I know, only A9 was pushing OpenSearch. Now that A9 is not *really* around much, I think nobody is

Re: CJK Analyzers for Solr

2007-11-26 Thread zx zhang
lance, The following is a instance schema fieldtype using solr1.2 and CJK package. And it works. As you said, CJK does parse cjk string in a bi-gram way, just like turning 'C1C2C3C4' into 'C1C2 C2C3 C3C4'. More to the point, it is worthwhile to mention that the index expand beyond tolerance to

Re: CJK Analyzers for Solr

2007-11-26 Thread James liu
I don't think NGram is good method for Chinese. CJKAnalyzer of Lucene is 2-Gram. Eswar K: if it is chinese analyzer,,i recommend hylanda(www.hylanda.com),,,it is the best chinese analyzer and it not free. if u wanna free chinese analyzer, maybe u can try je-analyzer. it have some problem

Re: CJK Analyzers for Solr

2007-11-26 Thread James liu
if ur analyzer is standard, u can try use tokenize.(u can find the answer from analyzer source code and schema.xml) On Nov 27, 2007 9:39 AM, zx zhang [EMAIL PROTECTED] wrote: lance, The following is a instance schema fieldtype using solr1.2 and CJK package. And it works. As you said, CJK

Re: CJK Analyzers for Solr

2007-11-26 Thread Eswar K
What is the performance of these CJK analyzers (one in lucene and hylanda )? We would potentially be indexing millions of documents. James, We would have a look at hylanda too. What abt japanese and korean analyzers, any recommendations? - Eswar On Nov 27, 2007 7:21 AM, James liu [EMAIL

Re: LSA Implementation

2007-11-26 Thread Eswar K
We essentially are looking at having an implementation for doing search which can return documents having conceptually similar words without necessarily having the original word searched for. - Eswar On Nov 27, 2007 12:06 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: Interesting. I am not a

Re: Opensearch XSLT

2007-11-26 Thread Bill Fowler
According to the guy in their booth, they support federated searches on engines that support OpenSearch (meaning you can use their federation tool to search content indexed by search engines that have an OpenSearch interface -- e.g., A9) but SearchServer '08 does NOT have an OpenSearch interface

Re: CJK Analyzers for Solr

2007-11-26 Thread James liu
i not use HYLANDA analyzer. i use je-analyzer and indexing at least 18m docs. i m sorry i only use chinese analyzer. On Nov 27, 2007 10:01 AM, Eswar K [EMAIL PROTECTED] wrote: What is the performance of these CJK analyzers (one in lucene and hylanda )? We would potentially be indexing

Re: LSA Implementation

2007-11-26 Thread Marvin Humphrey
On Nov 26, 2007, at 6:06 PM, Eswar K wrote: We essentially are looking at having an implementation for doing search which can return documents having conceptually similar words without necessarily having the original word searched for. Very challenging. Say someone searches for LSA and

Re: CJK Analyzers for Solr

2007-11-26 Thread Eswar K
thanks james... How much time does it take to index 18m docs? - Eswar On Nov 27, 2007 7:43 AM, James liu [EMAIL PROTECTED] wrote: i not use HYLANDA analyzer. i use je-analyzer and indexing at least 18m docs. i m sorry i only use chinese analyzer. On Nov 27, 2007 10:01 AM, Eswar K

Re: Opensearch XSLT

2007-11-26 Thread Ed Summers
On Nov 26, 2007 5:35 PM, Walter Underwood [EMAIL PROTECTED] wrote: GData is really pretty useful. OpenSearch was just sloppy. Some element names were capitalized, some weren't. A bunch of stuff specific to A9's UI was mixed in. They insisted on using RSS in addition to Atom for a new

Re: LSA Implementation

2007-11-26 Thread Eswar K
In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. this algo should consider documents that have many words in common to be semantically close, and ones with few words

RE: LSA Implementation

2007-11-26 Thread Norskog, Lance
The WordNet project at Princeton (USA) is a large database of synonyms. If you're only working in English this might be useful instead of running your own analyses. http://en.wikipedia.org/wiki/WordNet http://wordnet.princeton.edu/ Lance -Original Message- From: Eswar K [mailto:[EMAIL

Re: LSA Implementation

2007-11-26 Thread Eswar K
The languages also include CJK :) among others. - Eswar On Nov 27, 2007 8:16 AM, Norskog, Lance [EMAIL PROTECTED] wrote: The WordNet project at Princeton (USA) is a large database of synonyms. If you're only working in English this might be useful instead of running your own analyses.

Re: LSA Implementation

2007-11-26 Thread Marvin Humphrey
On Nov 26, 2007, at 6:34 PM, Eswar K wrote: Although the algorithm doesn't understand anything about what the words *mean*, the patterns it notices can make it seem astonishingly intelligent. When you search an such an index, the search engine looks at similarity values it has calculated

Re: Opensearch XSLT

2007-11-26 Thread Walter Underwood
FUD is pretty strong language. I'll provide some context for my opinions. The year before OpenSearch came out, I'd designed and implemented a SOAP distributed search protocol to go across Ultraseek and Verity K2, so I was pretty familiar with heterogeneous search protocols, especially those that

Re: Opensearch XSLT

2007-11-26 Thread Ed Summers
Thanks for the additional context and the pointers to STARTS. I realize solr-user is hardly a venue for discussing the details of OpenSearch so I'll refrain from commenting any further. I apologize for the harshness of my FUD comment. //Ed