Re: UIMA DictionaryAnnotator partOfSpeach

2011-09-29 Thread Tommaso Teofili
I think one problem is that the featurePath is not set correctly. Note that you are assuming PoS are written somewhere in some annotation feature so this mean you should've setup the UIMA pipeline to include also, for example, the HMM Tagger [1] which adds (by default) the posTag feature to

Re: autosuggest combination of data from documents and popular queries

2011-09-29 Thread abhayd
hi Hoss, This helps. Only thing i am not sure is use of TermsComponent. As I understand TermsComponent allows sorking only on count|index. So I m not sure how popularity could be used for sort or boost. Any thoughts around using TermsComponent with popularity? If this is possible then i dont

32-bit to 64-bit

2011-09-29 Thread - -
Hi, I indexed my data on my 32-bit computer.Do I need to re-index if I upload my data to a 64-bit server or does copying the data directories would suffice? Thank you.

About solr distributed search

2011-09-29 Thread Pengkai Qin
Hi all, Now I'm doing research on solr distributed search, and it is said documents more than one million is reasonable to use distributed search. So I want to know, does anyone have the test result(Such as time cost) of using single index and distributed search of more than one million data? I

Re: SOLR Index Speed

2011-09-29 Thread Lord Khan Han
Hi, The no-op run completed in 20 minutes. The only commented line was solr.addBean(doc) We've tried SUSS as a drop in replacement for CommonsHttpSolrServer but it's behavior was weird. We have seen 10Ks of seconds for updates and it continues for a very long time after sending to solr is

Re: basic solr cloud questions

2011-09-29 Thread Darren Govoni
That was kinda my point. The new cloud implementation is not about replication, nor should it be. But rather about horizontal scalability where nodes manage different parts of a unified index. One of the design goals of the new cloud implementation is for this to happen more or less

Re: Query failing because of omitTermFreqAndPositions

2011-09-29 Thread Michael McCandless
Once a given field has omitted positions in the past, even for just one document, it sticks and that field will forever omit positions. Try creating a new index, never omitting positions from that field? Mike McCandless http://blog.mikemccandless.com On Thu, Sep 29, 2011 at 1:14 AM, Isan Fulia

Re: SolrCloud: is there a programmatic way to create an ensemble

2011-09-29 Thread Yury Kats
Nope On 9/29/2011 12:17 AM, Pulkit Singhal wrote: Did you find out about this? 2011/8/2 Yury Kats yuryk...@yahoo.com: I have multiple SolrCloud instances, each running its own Zookeeper (Solr launched with -DzkRun). I would like to create an ensemble out of them. I know about -DzkHost

Re: basic solr cloud questions

2011-09-29 Thread Yury Kats
On 9/29/2011 7:22 AM, Darren Govoni wrote: That was kinda my point. The new cloud implementation is not about replication, nor should it be. But rather about horizontal scalability where nodes manage different parts of a unified index. It;s about many things. You stated one, but there are

RE: 32-bit to 64-bit

2011-09-29 Thread Jaeger, Jay - DOT
Are you changing just the host OS or the JVM, or both, from 32 bit to 64 bit? If it is just the OS, the answer is definitely no, you don't need to do anything more than copy. If the answer is the JVM, I *think* the answer is still no, but others more authoritative than I may wish to respond.

Errors in requesthandler statistics

2011-09-29 Thread roySolr
Hello, I was taking a look to my SOLR statistics and i see in part of the requesthandler a count of 23 by errors. How can i see which requests returns this errors? Can i log this somewhere? Thanks Roy -- View this message in context:

Re: Solr stopword problem in Query

2011-09-29 Thread Erick Erickson
I think your problem is that you've set omitTermFreqAndPositions=true It's not real clear from the Wiki page, but the tricky little phrase Queries that rely on position that are issued on a field with this option will silently fail to find documents. And phrase queries rely on position

RE: About solr distributed search

2011-09-29 Thread Jaeger, Jay - DOT
I am no expert, but here is my take and our situation. Firstly, are you asking what the minimum number of documents is before it makes *any* sense at all to use a distributed search, or are you asking what the maximum number of documents is before a distributed search is essentially required?

Re: SolrCloud: is there a programmatic way to create an ensemble

2011-09-29 Thread Mark Miller
That's normally what you want to do - setup a separate quorum for production. On Sep 29, 2011, at 1:36 AM, Jamie Johnson wrote: I'm not a solrcloud guru, but why not start your zookeeper quorum separately? I also believe that you can specify a zoo.cfg file which will create a zk quorum from

Re: DIH when using XML Files questions

2011-09-29 Thread Erick Erickson
Specific replies below, but what I'd seriously consider is writing my own filesystem-aware hook that pushed documents to known Solr servers rather than using DIH to pull them. You could use the code from FileSystemEntityProcessor as a base and go from there. The FileSystemEntityProcessor isn't

RE: Errors in requesthandler statistics

2011-09-29 Thread Jaeger, Jay - DOT
I am not expert, but based on my experience, the information you are looking for should indeed be in your logs. There are at least three logs you might look for / at: - An HTTP request log - The solr log - Logging by the application server / JVM Some information is available at

Re: How to reserve ids?

2011-09-29 Thread Erick Erickson
Hmmm, if treating them as stopwords, wouldn't you have to list all the possible variants? E.g. mystuff.msn.com yourstuff.msn.com etc? Is that sufficient or do you want *.msn.com (which isn't legal in a stopword file as far as I know)? Best Erick On Tue, Sep 27, 2011 at 11:39 PM, Otis

Indexing geohash in solrj - Multivalued spatial search

2011-09-29 Thread Alessandro Benedetti
Hi all, I have already read the topics in the mailing list that are regarding spatial search, but I haven't found an answer ... I have to index a multivalued field of type : geohash via solrj. Now I build a string with the lat and lon comma separated ( like 54.569468,67.58494 ) and index it in

RE: Errors in requesthandler statistics

2011-09-29 Thread roySolr
Hi, Thanks for your answer. I have some logging by jetty. Every request looks like this: record date2011-09-29T12:28:47/date millis1317292127479/millis sequence18470/sequence loggerorg.apache.solr.core.SolrCore/logger levelINFO/level classorg.apache.solr.core.SolrCore/class

RE: Errors in requesthandler statistics

2011-09-29 Thread Jaeger, Jay - DOT
If you are asking how to tell which of 94000 records failed in a SINGLE HTTP update request, I have no idea, but I suspect that you cannot necessarily tell. It might help if you copied and pasted what you find in the solr log for the failure (see my previous response for how to figure out where

Re: Upgrading from 3.1 to 3.4

2011-09-29 Thread Erick Erickson
They should be outlined in CHANGES.txt if there are any. But usually changes to minor versions don't require any special steps... Best Erick On Wed, Sep 28, 2011 at 4:14 AM, Rohit ro...@in-rev.com wrote: I have been using solr 3.1 am planning to update to solr 3.4, whats the steps to be

Re: Distributed search has problems with some field names

2011-09-29 Thread Erick Erickson
I know I've seen other anomalies with odd characters in field names. In general, it's much safer to use only letters, numbers, and underscores. In fact, I even prefer lowercase letters. Since you're pretty sure those work, why not just use them? Best Erick On Wed, Sep 28, 2011 at 6:59 AM, Luis

Solr on OC4J

2011-09-29 Thread Raja Ghulam Rasool
Hi all, I have installed solr on oc4j. but when i try to access the admin page it throws a 'StackOverflowError' Sep 28, 2011 3:35:25 PM org.apache.solr.common.SolrException log SEVERE: java.lang.StackOverflowError is there something i am doing wrong ? any tweak or config that i need to change ?

Re: Solr on OC4J

2011-09-29 Thread Raja Ghulam Rasool
Just to explain a bit more, OC4J standalone version is 10.1.3.5.0 and Solr version is 3.4.0. Any help will be greatly appreciated guys :) On Thu, Sep 29, 2011 at 6:15 PM, Raja Ghulam Rasool the.r...@gmail.comwrote: Hi all, I have installed solr on oc4j. but when i try to access the admin

Re: synonym filtering at index time

2011-09-29 Thread Erick Erickson
Biggest red flag is KeywordTokenizerFactory. You don't say whether your input is multi-word or not, but that tokenizer does NOT break up input, so even the input my watche would not trigger a synonym substitution. Try something like WhitespaceTokenizer. Second red flag. Changing your analysis

Re: Indexing geohash in solrj - Multivalued spatial search

2011-09-29 Thread Smiley, David W.
Hi Alessandro. I can't think of any good reason anyone would use the geohash field type that is a part of Solr today. If you are shocked I would say that, keep in mind the work I've done with geohashes is an extension of what's in Solr, it's not what's in Solr today. Recently I ported

Re: basic solr cloud questions

2011-09-29 Thread Darren Govoni
Agree. Thanks also for clarifying. It helps. On 09/29/2011 08:50 AM, Yury Kats wrote: On 9/29/2011 7:22 AM, Darren Govoni wrote: That was kinda my point. The new cloud implementation is not about replication, nor should it be. But rather about horizontal scalability where nodes manage

Re: basic solr cloud questions

2011-09-29 Thread Sami Siren
2011/9/29 Yury Kats yuryk...@yahoo.com: True, but there is a big gap between goals and current state. Right now, there is distributed search, but not distributed indexing or auto-sharding, or auto-replication. So if you want to use the SolrCloud now (as many of us do), you need do a number of

Re: Distributed search has problems with some field names

2011-09-29 Thread Luis Neves
Hi, On 09/29/2011 03:10 PM, Erick Erickson wrote: I know I've seen other anomalies with odd characters in field names. In general, it's much safer to use only letters, numbers, and underscores. In fact, I even prefer lowercase letters. Since you're pretty sure those work, why not just use

Re: Errors in requesthandler statistics

2011-09-29 Thread Shawn Heisey
On 9/29/2011 7:42 AM, roySolr wrote: I have some logging by jetty. Every request looks like this: record date2011-09-29T12:28:47/date millis1317292127479/millis sequence18470/sequence loggerorg.apache.solr.core.SolrCore/logger levelINFO/level

Query with plus sign failing

2011-09-29 Thread Shawn Heisey
The following query is failing: ((Google +)) This is ultimately reduced to 'google' by my analysis chain, but the following is in my log (3.2.0, but 3.4.0 also fails): SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse '( (Google +))':

PDF indexing

2011-09-29 Thread Jón Helgi Jónsson
Good day, I'm checking if Solr would work for indexing PDFs. My requirements are: 1) I must know which page has what contents. 2) Left to right search support. Such as Hebrew. This has been the most trickiest to achieve. I also prefer to know the position of the searched contents on the page

Re: Query with plus sign failing

2011-09-29 Thread Erik Hatcher
Just a fact of life with the Lucene query parser. You'll need to escape the + with a backslash for this to work. Erik On Sep 29, 2011, at 12:31 , Shawn Heisey wrote: The following query is failing: ((Google +)) This is ultimately reduced to 'google' by my analysis chain, but

Re: Trouble configuring multicore / accessing admin page

2011-09-29 Thread Joshua Miller
On Sep 28, 2011, at 2:16 PM, Joshua Miller wrote: On Sep 28, 2011, at 2:11 PM, Jaeger, Jay - DOT wrote: cores adminPath=/admij/cores Was that a cut and paste? If so, the /admij/cores is presumably incorrect, and ought to be /admin/cores No, that was a typo -- the config file

Solr integration with Hbase

2011-09-29 Thread Stuti Awasthi
Hi all, I am newbee in Solr. I have my application on Hbase and Hadoop and I want to provide search functionality using Solr. I read http://wiki.apache.org/solr/DataImportHandler and got to know that there is support for SQL database. My question is : Is Solr is also good for NoSQL like

Re: About solr distributed search

2011-09-29 Thread Gregor Kaczor
Hi Pengkai, my experience is based on http://www.findfiles.net/ which holds 700 Mio documents, each about 2kb size. A single Index containing that kind of data should hold below 80 Mio documents. In case you have complex queries with lots of facets, sorting, function queries then even 50

Automate startup/shutdown of SolrCloud Shards

2011-09-29 Thread Jamie Johnson
I am trying to automate the startup/shutdown of SolrCloud shards and have noticed that there is a bit of a timing issue where if the server which is to bootstrap ZK with the configs does not complete it's process (i.e. there is no data at the Conf yet) the other servers will fail to start. An

Re: Solr integration with Hbase

2011-09-29 Thread pulkitsinghal
Try lilyproject.com I think they do exactly what you are asking for. Sent from my iPhone On Sep 29, 2011, at 6:27 AM, Stuti Awasthi stutiawas...@hcl.com wrote: Hi all, I am newbee in Solr. I have my application on Hbase and Hadoop and I want to provide search functionality using Solr. I

Re: Solr integration with Hbase

2011-09-29 Thread Haspadar
http://www.lilyproject.org 2011/9/29 pulkitsing...@gmail.com Try lilyproject.com I think they do exactly what you are asking for. Sent from my iPhone On Sep 29, 2011, at 6:27 AM, Stuti Awasthi stutiawas...@hcl.com wrote: Hi all, I am newbee in Solr. I have my application on Hbase and

Re: Indexing geohash in solrj - Multivalued spatial search

2011-09-29 Thread Alessandro Benedetti
Sorry David, probably I misunderstood your reply, what do you mean? I'm using Lucid Work Enterprise 1.8, and, as I know , it includes geohashes patch. I have to index a multivalued location field and I have to make location queries on it! So I figured to use the geohash type ... Any hint about

Re: Indexing geohash in solrj - Multivalued spatial search

2011-09-29 Thread Smiley, David W.
On Sep 29, 2011, at 5:10 PM, Alessandro Benedetti wrote: Sorry David, probably I misunderstood your reply, what do you mean? I'm using Lucid Work Enterprise 1.8, and, as I know , it includes geohashes patch. Solr 3x, trunk, and I suspect Lucid Works Enterprise 2.0 (doubtful 1.8)) supports

removing dynamic fields

2011-09-29 Thread zarni aung
Hi, I've been experimenting with Solr dynamic fields. Here is what I've gathered based on my research. For instance, I have a setup where I am catching undefined custom fields this way. I am using (trie) types by the way. dynamicField name=int* type=tint indexed=true stored=true

Re: Getting facet counts for 10,000 most relevant hits

2011-09-29 Thread Lan
I implemented a similar feature for a categorization suggestion service. I did the faceting in the client code, which is not exactly the best performing but it worked very well. It would be nice to have the Solr server do the faceting for performance. Burton-West, Tom wrote: If relevance

dismax with AND/OR combination

2011-09-29 Thread abhayd
hi i m using solr from trunk 4.0 Also dismax is set as default qt with str name=qf text^2.5 features^1.1 displayName^15.0 mfg^4.0 description^3.0 /str myquery is = q=+ab sx+OR+(mfg:abc+OR+sx)+OR+(displayName:abc+OR+sx)qt=dismax It is not working as per my

Re: dismax with AND/OR combination

2011-09-29 Thread Erick Erickson
Well, you have to tell us what you expected and what you're seeing. Including the output with debugQuery=on and telling us what you disagree with would be the best way. You might also include your definition from your solrconfig file. You included a fragment of it, but other parts may have

split index horizontally

2011-09-29 Thread Robert Yu
Is there a efficient way to handle my case? Each document has several group fields, some of them are updated frequently, some of them are updated infrequently. Is it possible to maintain index based on groups but can search over all of them as ONE index? To some extent, it is a three layer of

Re: dismax with AND/OR combination

2011-09-29 Thread yingshou guo
you cann't use this kind of query syntax against dismax query parser. your query can by understood by standard query parser or edismax query parser. qt request parameter is used by solr to select the request handler plugin, not query parser. keep in mind that different query parser can understand

Re: dismax with AND/OR combination

2011-09-29 Thread Jason Toy
Can dismax understand that query in a translated form? 在 Sep 29, 2011 10:01 PM 時,yingshou guo guoyings...@gmail.com 寫到: you cann't use this kind of query syntax against dismax query parser. your query can by understood by standard query parser or edismax query parser. qt request parameter is

Re: dismax with AND/OR combination

2011-09-29 Thread yingshou guo
I don't understand what do you mean by a translated form. The only special symbols that dismax query parser can understand is +-, eg phrase, mandatory and prohibitory semantic, something like: term1 term2 +term3 -term4. Dismax parser will take the other operators as query string. I guess when you

Re: autosuggest combination of data from documents and popular queries

2011-09-29 Thread abhayd
anyone? How to sort for termscomponent? -- View this message in context: http://lucene.472066.n3.nabble.com/autosuggest-combination-of-data-from-documents-and-popular-queries-tp3360657p3381201.html Sent from the Solr - User mailing list archive at Nabble.com.

About solr distributed search

2011-09-29 Thread 秦鹏凯
Hi all, Now I'm doing research on solr distributed search, and it is said documents more than one million is reasonable to use distributed search. So I want to know, does anyone have the test result(Such as time cost) of using single index and distributed search of more than one million data?

Re: About solr distributed search

2011-09-29 Thread Jerry Li
hi 建议你自己搭个环境测试一下吧,1M这点儿数据一点儿问题没有 2011/9/30 秦鹏凯 qinpeng...@yahoo.cn: Hi all, Now I'm doing research on solr distributed search, and it is said documents more than one million is reasonable to use distributed search. So I want to know, does anyone have the test result(Such as time cost)