Solr/Lucene Faceted Search Too Many Unique Values?

2014-01-22 Thread Bing Hua
Hi, I am going to evaluate some Lucene/Solr capabilities on handling faceted queries, in particular, with a single facet field that contains large number (say up to 1 million) of distinct values. Does anyone have some experience on how lucene performs in this scenario? e.g. Doc1 has tags A B C

How to define a lowercase fieldtype without tokenizer

2013-02-14 Thread Bing Hua
Hi, I don't want the field to be tokenized because Solr doesn't support sorting on a tokenized field. In order to do case insensitive sorting I need to copy a field to a lowercase but not tokenized field. How to define this? I did below but it says I need to specify a tokenizer or a class for

Re: How to define a lowercase fieldtype without tokenizer

2013-02-14 Thread Bing Hua
Works perfectly. Thank you. I didn't know this tokenizer does nothing before :) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500p4040507.html Sent from the Solr - User mailing list archive at Nabble.com.

Search match all tokens in Query Text

2013-01-31 Thread Bing Hua
Hello, I have a field text with type text_general here. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory /

Re: Search match all tokens in Query Text

2013-01-31 Thread Bing Hua
Thanks for the quick reply. Seems like you are suggesting to add explicitly AND operator. I don't think this solves my problem. I found it solrQueryParser defaultOperator=AND/ somewhere, and this works. -- View this message in context:

SolrCell takes InputStream

2012-12-04 Thread Bing Hua
Hi, While using ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); The two ways of adding a file are up.addFile(File) up.addContentStream(ContentStream) However my raw files are stored on some remote storage devices. I am able to get an InputStream object for the

Re: Send plain text file to solr for indexing

2012-08-31 Thread Bing Hua
So in order to use solrcell I'll have to add a number of dependent libraries, which is one of what I'm trying to avoid. The second thing is, solrcell still parses the plain text files and I don't want it to make any change to those of my exported files. Any ideas? Bing -- View this message in

Re: Send plain text file to solr for indexing

2012-08-31 Thread Bing Hua
Thanks Mr.Yagami. I'll look into that. Jack, for the latter two options, they both require reading the entire text file into memory, right? Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Send-plain-text-file-to-solr-for-indexing-tp4004515p4004772.html Sent from the

Send plain text file to solr for indexing

2012-08-30 Thread Bing Hua
Hello, I used to use solrcell, which has built-in tika support to handle both extraction and indexing of raw documents. Now I got another text extraction provider to convert raw document to a plain text txt file so I want to let solr bypass that extraction phase. Is there a way I can send the

Re: Getting Suggestions without Search Results

2012-08-14 Thread Bing Hua
Great comments. Thanks to you all. Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-Suggestions-without-Search-Results-tp4000968p4001192.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing thousands file on solr

2012-08-14 Thread Bing Hua
You may write a client using solrj and loop through all files in that folder. Something like, ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(fileLocation), null); ModifiableSolrParams p = new ModifiableSolrParams(); p.add(literal.id, str); ...

Re: Are there any comparisons of Elastic Search specifically with SOLR 4?

2012-08-14 Thread Bing Hua
Most of existing comparisons were done on Solr3.x or earlier against ES. After Solr4 added those cloud concepts similar to ES's, there are really less differences. Solr is more heavier loaded and was not designed for maximize elasticity In my opinion. It's not hard to decide which way to go as

Solr4.0 Partially update document

2012-08-13 Thread Bing Hua
Hi, Several days ago I came across some solrj test code on partially updating document field values. Sadly I forgot where that was. In Solr 4.0, /update is able to take in document id and fields as hashmaps like id: doc1 field1: {set:new_value} Just trying to figure out what's the solrj client

Re: Solr4.0 Partially update document

2012-08-13 Thread Bing Hua
Got it at https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/test/org/apache/solr/client/solrj/SolrExampleTests.java Problem solved. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-Partially-update-document-tp4000875p4000878.html Sent from the Solr

Getting Suggestions without Search Results

2012-08-13 Thread Bing Hua
Hi, I'm having a spell check component that does auto-complete suggestions. It is part of last-components of my /select search handler. So apart from normal search results I also get a list of suggestions. Now I want to split things up. Is there a way that I can only get suggestions of a query

Re: Tlog vs. buffer + softcommit.

2012-08-10 Thread Bing Hua
Thanks for the information. It definitely helps a lot. There're numDeletesToKeep = 1000; numRecordsToKeep = 100; in UpdateLog so this should probably be what you're referring to. However when I was doing indexing the total size of TLogs kept on increasing. It doesn't sound like the case where

Re: Tlog vs. buffer + softcommit.

2012-08-10 Thread Bing Hua
I remember I did set the 15sec autocommit and still saw the Tlogs growing unboundedly. But sounds like theoretically it should not if I index in a constant rate. I'll probably try it again sometime. For the peersync, I think solr cloud now uses push-replication over pull. Hmm, it makes sense to

Re: Multiple Embedded Servers Pointing to single solrhome/index

2012-08-09 Thread Bing Hua
Makes sense. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p4000180.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does Solr support 'Value Search'?

2012-08-09 Thread Bing Hua
Thanks Kuli and Mikhail, Using either termcomponent or suggester I could get some suggested terms but it's still confusing me how to get the respective field names. In order to get that, Use TermComponent I'll need to do a term query to every possible field. Similar things as using

Re: Multiple Embedded Servers Pointing to single solrhome/index

2012-08-09 Thread Bing Hua
I agree. We chose embedded to minimize the maintenance cost of http solr servers. One more concern. Even if I have only one node doing indexing, other nodes need to reopen index reader periodically to catch up with new changes, right? Is there a solr request that does this? Thanks, Bing --

Multiple SpellCheckComponents

2012-08-09 Thread Bing Hua
Hello, Background is that I want to use both Suggest and SpellCheck features in a single query to have alternatives returned at one time. Right now I can only specify one of them using spellcheck.dictionary at query time. searchComponent name=spellcheck class=solr.SpellCheckComponent lst

SpellCheckComponent Collation query

2012-08-09 Thread Bing Hua
Hello, From spell check component I'm able to get the collation query and its # of hits. Is it possible to have solr execute the collated query automatically and return doc search results without resending it on client side? Thanks, Bing -- View this message in context:

Tlog vs. buffer + softcommit.

2012-08-09 Thread Bing Hua
Hello, I'm a bit confused with the purpose of Transaction Logs (Update Logs) in Solr. My understanding is, update request comes in, first the new item is put in RAM buffer as well as T-Log. After a soft commit happens, the new item becomes searchable but not hard committed in stable storage.

Re: Does Solr support 'Value Search'?

2012-08-08 Thread Bing Hua
Thanks for the response but wait... Is it related to my question searching for field values? I was not asking how to use wildcards though. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p3999817.html Sent from the Solr - User

Re: Does Solr support 'Value Search'?

2012-08-08 Thread Bing Hua
Not quite understand but I'd explain the problem I had. The response would contain only fields and a list of field values that match the query. Essentially it's querying for field values rather than documents. The underlying use case would be, when typing in a quick search box, the drill down menu

Re: Multiple Embedded Servers Pointing to single solrhome/index

2012-08-07 Thread Bing Hua
Thanks Lance. The use case is to have a cluster of nodes which runs the same application with EmbeddedSolrServer on each of them, and they all point to the same index on NFS. Every application is designed equal, meaning that everyone may index and/or search. In such way, after every commit the

Does Solr support 'Value Search'?

2012-08-07 Thread Bing Hua
Hi folks, Just wondering if there is a query handler that simply takes a query string and search on all/part of fields for field values? e.g. q=*admin* Response may look like author: [admin, system_admin, sub_admin] last_modifier: [admin, system_admin, sub_admin] doctitle: [AdminGuide,

Solr index storage strategy on FileSystem

2012-08-07 Thread Bing Hua
Hi folks, With StandardDirectoryFactory, index is stored under data/index in forms of frq, tim, tip and a few other files. While index grows larger, more files are generated and sometimes it merges a few of them. It's like there're some kinds of separation and merging strategies there. My

Multiple Embedded Servers Pointing to single solrhome/index

2012-08-06 Thread Bing Hua
Hi, I'm trying to use two embedded solr servers pointing to a same solrhome / index. So that's something like System.setProperty(solr.solr.home, SomeSolrDir); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer =