Re: Benchmarking Solr

2013-05-28 Thread Otis Gospodnetic
Hi Benson, We typically use https://github.com/sematext/ActionGenerator As a matter of fact, we are using it right now to test one of our search clusters... Otis -- Solr ElasticSearch Support http://sematext.com/ On Sun, May 26, 2013 at 10:38 AM, Benson Margulies bimargul...@gmail.com

RE: sourceId of JMX

2013-05-28 Thread 菅沼 嘉一
Thank you, Shalin. -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Tuesday, May 28, 2013 2:22 PM To: solr-user@lucene.apache.org Subject: Re: sourceId of JMX Suganuma, No, there shouldn't be any side effects. On Tue, May 28, 2013 at 7:13 AM, 菅沼 嘉一

delta-import tweaking?

2013-05-28 Thread Kristian Rink
Folks; playing with Solr and an existing (legacy) RDBMS structure which we can't change much, I am trying to figure out how to best make Solrs full/delta import work for me. A few thoughts: (a) The usual tutorials outline something like WHERE LASTMODIFIED '${dih.last_index_time} in order to

Re: multiple cache for same field

2013-05-28 Thread J Mohamed Zahoor
It does not seem to be memory footprint also ? looks too high for my index. ./zahoor On 20-May-2013, at 10:55 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Most definitely not the number of unique elements in each segment. My 32 document sample index (built from the default

HyperLogLog for Solr

2013-05-28 Thread J Mohamed Zahoor
Hi Has anyone tried using HLL for using finding unique values of a field in solr? I am planning to use them to facet count on certain fields to reduce memory footprint. ./Zahoor

Re: Indexing message module

2013-05-28 Thread Arkadi Colson
Is it ok to just change the multivalue attribute to true and reindex the message module data? There are also other modules indexed on the same schema with multivalued = false. Will it become a problem? BR, Arkadi On 05/27/2013 09:33 AM, Gora Mohanty wrote: On 27 May 2013 12:58, Arkadi Colson

Re: FieldCache insanity with field used as facet and group

2013-05-28 Thread Elodie Sannier
I've created https://issues.apache.org/jira/browse/SOLR-4866 Elodie Le 07.05.2013 18:19, Chris Hostetter a écrit : : I am using the Lucene FieldCache with SolrCloud and I have insane instances : with messages like: FWIW: I'm the one that named the result of these sanity checks

Strange behavior on text field with number-text content

2013-05-28 Thread Michał Matulka
Hello, I've got following problem. I have a text type in my schema and a field name of that type. That field contains a data, there is, for example, record that has 300letters as name. Now field type definition: fieldType name=text class=solr.TextField/fieldType And, of course, field

Re: Indexing message module

2013-05-28 Thread Upayavira
Switching from single to multivalued shouldn't cause your index to break (but your app might not like it). Do you have a deduplication issue, or does each message have a unique ID? You might be able to use the DedupUpdateProcessorFactory to prevent updates to an existing message getting into the

Re: Solr faceted search UI

2013-05-28 Thread Fergus McDowall
Hi Richa Solrstrap is probably the best way to go if you just want to get up a PoC as fast as possible. Solrstrap requires no installation of middleware, you just add in the address of your solr server and open the file in your browser. Regards Fergus On Wed, Apr 24, 2013 at 5:23 PM, richa

Re: Solr faceted search UI

2013-05-28 Thread Fergus McDowall
You also get some smooth UI stuff for free F On Tue, May 28, 2013 at 10:58 AM, Fergus McDowall fergusmcdow...@gmail.comwrote: Hi Richa Solrstrap is probably the best way to go if you just want to get up a PoC as fast as possible. Solrstrap requires no installation of middleware, you just

Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery

2013-05-28 Thread AlexeyK
The cluster state problem reported above is not an issue - it was caused by our own code. Speaking about the update log - i have noticed a strange behavior concerning the replay. The replay is *supposed* to be done for a predefined number of log entries, but actually it is always done for the

Paging with all Hits

2013-05-28 Thread Andreas Niekler
Hello, i indexed some monographs with solr. Within each document a have a multi-valued field where i store the paragraphs. When i search for a specific term within the monographs i get the whole monograph as a result object. The single hits can be accessed via the highlight component. The

What exactly happens to extant documents when the schema changes?

2013-05-28 Thread Dotan Cohen
When adding or removing a text field to/from the schema and then restarting Solr, what exactly happens to extant documents? Is the schema only consulted when Solr writes a document, therefore extant documents are unaffected? Considering that Solr supports dynamic fields, my experimentation with

Re: What exactly happens to extant documents when the schema changes?

2013-05-28 Thread Upayavira
On Tue, May 28, 2013, at 10:21 AM, Dotan Cohen wrote: When adding or removing a text field to/from the schema and then restarting Solr, what exactly happens to extant documents? Is the schema only consulted when Solr writes a document, therefore extant documents are unaffected?

Re: Strange behavior on text field with number-text content

2013-05-28 Thread Alexandre Rafalovitch
What does analyzer screen say in the Web AdminUI when you try to do that? Also, what are the tokens stored in the field (also in Web AdminUI). I think it is very strange to have TextField without a tokenizer chain. Maybe you get a standard one assigned by default, but I don't know what the

Re: Core admin action CREATE fails to persist some settings in solr.xml with Solr 4.3

2013-05-28 Thread Erick Erickson
Hmmm, that's the second time somebody's had that problem. It's assigned to me now anyway, thanks for creating it! Erick On Mon, May 27, 2013 at 10:11 AM, André Widhani andre.widh...@digicol.de wrote: I created SOLR-4862 ... I found no way to assign the ticket to somebody though (I guess it is

Associate item with more than one location

2013-05-28 Thread Spadez
currently have an item which gets imported into solr, lets call it a book entry. Well that has a single location associated with it as a coordinate and location name but I am now finding out that a single entry may actually need to be associated with more than one location, for example New York

Re: Strange behavior on text field with number-text content

2013-05-28 Thread Erick Erickson
Hmmm, with 4.x I get much different behavior than you're describing, what version of Solr are you using? Besides Alex's comments, try adding debug=query to the url and see what comes out from the query parser. A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do any

Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery

2013-05-28 Thread Shalin Shekhar Mangar
This sounds like a bug. I'll open an issue. Thanks! On Tue, May 28, 2013 at 2:29 PM, AlexeyK lex.kudi...@gmail.com wrote: The cluster state problem reported above is not an issue - it was caused by our own code. Speaking about the update log - i have noticed a strange behavior concerning

Re: Strange behavior on text field with number-text content

2013-05-28 Thread Michał Matulka
Thanks for your responses, I must admit that after hours of trying I made some mistakes. So the most problematic phrase will now be: "4nSolution Inc." which cannot be found using query: name:4nSolution or even name:4nSolution Inc.

Re: What exactly happens to extant documents when the schema changes?

2013-05-28 Thread Jack Krupansky
The technical answer: Undefined and not guaranteed. Sure, you can experiment and see what the effects happen to be in any given release, and maybe they don't tend to change (too much) between most releases, but there is no guarantee that any given change schema but keep existing data without

Re: Paging with all Hits

2013-05-28 Thread Jack Krupansky
Dynamic and multi-valued fields are both powerful but dangerous features. Yes, there offer wonderful capabilities - if used within moderation, but expecting that they are get out of jail free / go past go as many times as you want cards to ignore the limits of Solr and do anything you want is a

Wiki pages for Solr releases

2013-05-28 Thread Jan Høydahl
Hi, I have added the missing WIKI pages for https://wiki.apache.org/solr/Solr4.1 https://wiki.apache.org/solr/Solr4.2 https://wiki.apache.org/solr/Solr4.3 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com

Re: Strange behavior on text field with number-text content

2013-05-28 Thread Алексей Цой
solr-user-unsubscribe solr-user-unsubscr...@lucene.apache.org 2013/5/28 Michał Matulka michal.matu...@gowork.pl Thanks for your responses, I must admit that after hours of trying I made some mistakes. So the most problematic phrase will now be: 4nSolution Inc. which cannot be found using

Disable all caches in solr

2013-05-28 Thread yriveiro
Hi, How I can disable all caches that solr use? Regards /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-all-caches-in-solr-tp4066517.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Paging with all Hits

2013-05-28 Thread Alexandre Rafalovitch
counter-rant I feel that the strength of the Jack's rant is somewhat unprovoked by the original question. I also feel that the rant itself is worth being printed and framed :-) But more than anything else, I feel that supposedly-known limitations of Solr/Lucene are not actually exposed all that

Re: Disable all caches in solr

2013-05-28 Thread Shalin Shekhar Mangar
Edit the solrconfig.xml and remove/comment filterCache, documentCache, queryResultCache. Note that some caches such as FieldCache (created for sorting/faceting on demand) cannot be disabled. On Tue, May 28, 2013 at 8:10 PM, yriveiro yago.rive...@gmail.com wrote: Hi, How I can disable all

Re: Disable all caches in solr

2013-05-28 Thread Yago Riveiro
Indeed, I commented all entries for cache in solrconfig, but solrmeter shows me cache for field cache type, Now I know why. Thanks Shalin, -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, May 28, 2013 at 3:53 PM, Shalin Shekhar Mangar wrote: Edit the

Re: Paging with all Hits

2013-05-28 Thread Jack Krupansky
:) -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Tuesday, May 28, 2013 10:41 AM To: solr-user@lucene.apache.org Subject: Re: Paging with all Hits counter-rant I feel that the strength of the Jack's rant is somewhat unprovoked by the original question. I also

Solr Composite Unique key from existing fields in schema

2013-05-28 Thread Rishi Easwaran
Hi All, Historically we have used a single field in our schema as a uniqueKey. field name=docidtype=string indexed=true stored=true multiValued=false required=true/ field name=userid type=string indexed=true stored=true multiValued=false required=true/

Re: Solr Composite Unique key from existing fields in schema

2013-05-28 Thread Jan Høydahl
The cleanest is to do this from the outside. Alternatively, it will perhaps work to populate your uniqueKey in a custom UpdateProcessor. You can try. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 28. mai 2013 kl. 17:12 skrev Rishi Easwaran rishi.easwa...@aol.com:

Re: Not able to search Spanish word with ascent in solr

2013-05-28 Thread jignesh
Hello Jack Thanks for your reply.. I have tried to add below contents to solr, as you suggest - add doc field name=iddoc-1/field field name=nameHola Mañana en le Café, habla el Académie française!/field /doc /add -- BUT I am getting below error

Re: delta-import tweaking?

2013-05-28 Thread Shawn Heisey
On 5/28/2013 12:31 AM, Kristian Rink wrote: (a) The usual tutorials outline something like WHERE LASTMODIFIED '${dih.last_index_time} [snip] (b) I see that last_index_time returns a particularly fixed format. In our database, with a modestly more complex SELECT, we also could figure out

Re: Distributed query: strange behavior.

2013-05-28 Thread Valery Giner
Eric, Thank you for the explanation. My problem was that allowing the docs with the same unique ids to be present in the multiple shards in a normal situation, makes it impossible to estimate the number of shards needed for an index with a really large number of docs. Thanks, Val On

Re: Not able to search Spanish word with ascent in solr

2013-05-28 Thread jignesh
Hello Steve Thanks for your reply I don't want to upgrade solr 4 so your suggestion will be as below --- you should instead convert these HTML character entities yourself to the characters they represent (e.g. amp;eacute; - é) before sending the docs to Solr. ---

Re: Associate item with more than one location

2013-05-28 Thread Smiley, David W.
Absolutely. Use location_rpt in the example schema. Do *not* use LatLonType, which doesn't support multiValued data. ~ David Smiley On 5/28/13 8:02 AM, Spadez james_will...@hotmail.com wrote: currently have an item which gets imported into solr, lets call it a book entry. Well that has a

Re: Solr Composite Unique key from existing fields in schema

2013-05-28 Thread Jack Krupansky
You can do this by combining the builtin update processors. Add this to your solrconfig: updateRequestProcessorChain name=composite-id processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcedocid_s/str str name=sourceuserid_s/str str name=destid/str /processor

Re: Not able to search Spanish word with ascent in solr

2013-05-28 Thread Jack Krupansky
I copied those accented words directly from web pages in Google Chrome on a Windows PC, but then copied them to a text file as well, so their encoding is dubious. You will have to make sure to use accented characters for UTF-8 in your environment. And... make sure that you are using an editor

Re: Note on The Book

2013-05-28 Thread Alexandre Rafalovitch
Jack, It is worth considering something like https://leanpub.com/ . That way people can pre-pay for the result and enjoy (however 'draft'-y) results earlier. In terms of reference vs narrative, my strong desire would have been for the narrative part. The problem always seems to be around

Query syntax error: Cannot parse ....

2013-05-28 Thread yriveiro
Hi, When I try run this query, http://localhost:8983/solr/coreA/select?q=source_id:(7D1FFB# OR 7D1FFB) city:ES, I have the error below: response lst name=responseHeader int name=status400/int int name=QTime1/int /lst lst name=error str name=msg org.apache.solr.search.SyntaxError: Cannot parse

Re: Solr Composite Unique key from existing fields in schema

2013-05-28 Thread Rishi Easwaran
Thanks Jack, looks like that will do the trick from me. I will try it out. -Original Message- From: Jack Krupansky j...@basetechnology.com To: solr-user solr-user@lucene.apache.org Sent: Tue, May 28, 2013 12:07 pm Subject: Re: Solr Composite Unique key from existing fields in

RE: Note on The Book

2013-05-28 Thread Swati Swoboda
I'd definitely prefer the spiral bound as well. E-books are great and your draft version seems very reasonably priced (aka I would definitely get it). Really looking forward to this. Is there a separate mailing list / etc. for the book for those who would like to receive updates on the status

Re: Query syntax error: Cannot parse ....

2013-05-28 Thread gpssolr2020
Hi, Try to pass URL encode value(%23) for # . Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-syntax-error-Cannot-parse-tp4066560p4066566.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Composite Unique key from existing fields in schema

2013-05-28 Thread Rishi Easwaran
Jack, No sure if this is the correct behaviour. I set up updateRequestorPorcess chain as mentioned below, but looks like the compositeId that is generated is based on input order. For example: If my input comes in as field name=docid1/field field name=userid12345/field I get the following

RE: Keeping a rolling window of indexes around solr

2013-05-28 Thread Saikat Kanjilal
At first glance unless I missed something hourglass will definitely not work for our use-case which just involves real time inserts of new log data and no appends at all. However I would like to examine the guts of hourglass to see if we can customize it for our use-case. From:

Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Andy Lester
I'm working on using spellcheck for giving suggestions, and collations are giving me good results, but they turn out to be very slow if my original query has any FQs in it. We can do 100 maxCollationTries in no time at all, but if there are FQs in the query, things get very slow. As

Re: Solr Composite Unique key from existing fields in schema

2013-05-28 Thread Jack Krupansky
The order in the ID should be purely dependent on the order of the field names in the processor configuration: str name=sourcedocid_s/str str name=sourceuserid_s/str -- Jack Krupansky -Original Message- From: Rishi Easwaran Sent: Tuesday, May 28, 2013 2:54 PM To:

Re: Nested Facets and distributed shard system.

2013-05-28 Thread vibhoreng04
Hi Erick and Markus, Any Idea on this ? can we resolve this by group by queries? -- View this message in context: http://lucene.472066.n3.nabble.com/Nested-Facets-and-distributed-shard-system-tp4065847p4066583.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Composite Unique key from existing fields in schema

2013-05-28 Thread Rishi Easwaran
I thought the same, but that doesn't seem to be the case. -Original Message- From: Jack Krupansky j...@basetechnology.com To: solr-user solr-user@lucene.apache.org Sent: Tue, May 28, 2013 3:32 pm Subject: Re: Solr Composite Unique key from existing fields in schema The order

SOLR 4.3.0 - How to make fq optional?

2013-05-28 Thread bbarani
I am using the SOLR geospatial capabilities for filtering the results based on the particular radius (something like below).. I have added the below fq query in solrconfig and passing the latitude and longitude information dynamically..

Re: Solr Composite Unique key from existing fields in schema

2013-05-28 Thread Jack Krupansky
The TL;DR response: Try this: updateRequestProcessorChain name=composite-id processor class=solr.CloneFieldUpdateProcessorFactory str name=sourceuserid_s/str str name=destid/str /processor processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcedocid_s/str str

RE: Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Dyer, James
Andy, What are the QTimes for the 0fq,1fq,2fq,4fq 4fq cases with spellcheck entirely turned off? Is it about (or a little more than) half the total when maxCollationTries=1 ? Also, with the varying # of fq's, how many collation tries does it take to get 10 collations? Possibly, a better

Re: SOLR 4.3.0 - How to make fq optional?

2013-05-28 Thread David Smiley (@MITRE.org)
Your client needs to know to submit the proper filter query conditionally. It's not really a spatial issue, and I disagree with the idea to make bbox (and all other query parsers for that matter) do nothing if not given an expected input. ~ David bbarani wrote I am using the SOLR geospatial

Re: Note on The Book

2013-05-28 Thread Jack Krupansky
We'll have a blog for the book. We hope to have a first raw/rough/partial/draft published as an e-book in maybe 10 days to 2 weeks. As soon as we get that process under control, we'll start the blog. I'll keep your email on file and keep you posted. -- Jack Krupansky -Original

Re: SOLR 4.3.0 - How to make fq optional?

2013-05-28 Thread bbarani
David, I felt like there should be a flag with which we can either throw the error message or do nothing in case of bad inputs.. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-3-0-How-to-make-fq-optional-tp4066592p4066610.html Sent from the Solr - User mailing list

Re: Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Andy Lester
Thanks for looking at this. What are the QTimes for the 0fq,1fq,2fq,4fq 4fq cases with spellcheck entirely turned off? Is it about (or a little more than) half the total when maxCollationTries=1 ? With spellcheck off I get 8ms for 4fq query. Also, with the varying # of fq's, how many

Re: SOLR 4.3.0 - How to make fq optional?

2013-05-28 Thread Erik Hatcher
I imagine the new switch query parser could help here somehow. Erik On May 28, 2013, at 16:43, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: Your client needs to know to submit the proper filter query conditionally. It's not really a spatial issue, and I disagree with the idea to

Re: Nested Facets and distributed shard system.

2013-05-28 Thread Jason Hellman
You have mentioned Pivot Facets, but have you looked at the Path Hierarchy Tokenizer Factory: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory This matches your use case, as best as I understand it. Jason On May 28, 2013, at 12:47 PM, vibhoreng04

Re: SOLR 4.3.0 - How to make fq optional?

2013-05-28 Thread bbarani
Erik, I am trying to enable / disable a part of fq based on a particular value passed from the query. For Ex: If I have the value for the keyword where in the query then I would like to enable this fq, else just ignore it.. select?where=New york,NY Enable only when where has some value. (I

Re: Solr Composite Unique key from existing fields in schema

2013-05-28 Thread Rishi Easwaran
Thanks Jack, That fixed it and guarantees the order. As far as I can tell SOLR cloud 4.2.1 needs a uniquekey defined in its schema, or I get an exception. SolrCore Initialization Failures * testCloud2_shard1_replica1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:

Re: SOLR 4.3.0 - How to make fq optional?

2013-05-28 Thread Chris Hostetter
: David, I felt like there should be a flag with which we can either throw the : error message or do nothing in case of bad inputs.. As erik alluded to in his response, you should be able to configure an appended fq using the switch QParserPlugin to get something like what you are

Re: split document or not

2013-05-28 Thread Hard_Club
Thanks, Alexandre. But I need to know in which paragraph is matched the request. I need it because paragraphs are binded to some extra data that I need to output on result page. So I need to know paragraphs is'd. How to bind such attribute to multivalued field? -- View this message in context:

Re: split document or not

2013-05-28 Thread Jason Hellman
You may wish to explore the concept of using the Result Grouping (Field Collapsing) feature in which your paragraphs are individual documents that share a field to group them by (the ID of the document/book/article/whatever). http://wiki.apache.org/solr/FieldCollapsing This will net you

Re: SOLR 4.3.0 - How to make fq optional?

2013-05-28 Thread bbarani
Hoss, you read my mind Thanks a lott for your awesome explanation! You rock!!! -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-3-0-How-to-make-fq-optional-tp4066592p4066630.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: [blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

2013-05-28 Thread Koji Sekiguchi
Hi Rajesh, Thanks! I'm planning to open an NLP tool kit for Lucene, and the tool kit will include the following synonym library. koji (13/05/28 14:12), Rajesh Nikam wrote: Hello Koji, This is seems pretty useful post on how to create synonyms file. Thanks a lot for sharing this ! Have you

Re: Solr Composite Unique key from existing fields in schema

2013-05-28 Thread Jack Krupansky
Great. And I did verify that the field order cannot be guaranteed by a single CloneFieldUpdateProcessorFactory with multiple field names - the underlying code iterates over the input values, checks the field selector for membership and then immediately adds to the output, so changing the input

Re: Keeping a rolling window of indexes around solr

2013-05-28 Thread Chris Hostetter
: This is kind of the approach used by elastic search , if I'm not using : solrcloud will I be able to use shard aliasing, also with this approach : how would replication work, is it even needed? you haven't said much about hte volume of data you expect to deal with, nor have you really

Re: Solr/Lucene Analayzer That Writes To File

2013-05-28 Thread Chris Hostetter
: I want to use Solr for an academical research. One step of my purpose is I : want to store tokens in a file (I will store it at a database later) and I you could absolutely write a java program which access the analyzers directly nad does whatever you want with the results of analysing a

Re: SOLR 4.3.0 - How to make fq optional?

2013-05-28 Thread Chris Hostetter
: As erik alluded to in his response, you should be able to configure an : appended fq using the switch QParserPlugin to get something like what you are : describing, by taking advantage of the default behavior. I've updated the javadocs with 2 additiona examples inspired by this thread..

Re: Restaurant availability from database

2013-05-28 Thread Chris Hostetter
: I've created a custom ValueSourceParser and ValueSource that retrieve the : availability information from a MySQL database. An example query is as : follows. : : http://localhost:8983/solr/collection1/select?q=restaurant_id:*fl=*,available:availability(2013-05-23, : 2, 1700, 2359) : : This

Re: Keeping a rolling window of indexes around solr

2013-05-28 Thread Saikat Kanjilal
Volume of data: 1 log insert every 30 seconds, queries done sporadically asynchronously every so often at a much lower frequency every few days Also the majority of the requests are indeed going to be within a splice of time (typically hours or at most a few days) Type of queries: Keyword or

RE: OPENNLP current patch compiling problem for 4.x branch

2013-05-28 Thread Patrick Mi
Thanks Steve, that worked for branch_4x -Original Message- From: Steve Rowe [mailto:sar...@gmail.com] Sent: Friday, 24 May 2013 3:19 a.m. To: solr-user@lucene.apache.org Subject: Re: OPENNLP current patch compiling problem for 4.x branch Hi Patrick, I think you should check out and

Not so concurrent concurrency

2013-05-28 Thread Benson Margulies
I can't quite apply SolrMeter to my problem, so I did something of my own. The brains of the operation are the function here. This feeds a ConcurrentUpdateSolrServer about 95 documents, each about 10mb, and 'threads' is six. Yet Solr just barely uses more than one core. private long

Re: Solr/Lucene Analayzer That Writes To File

2013-05-28 Thread Roman Chyla
You can store them and then use different analyzer chains on it (stored, doesn't need to be indexed) I'd probably use the collector pattern se.search(new MatchAllDocsQuery(), new Collector() { private AtomicReader reader; private int i = 0; @Override public boolean

How apache solr stores indexes

2013-05-28 Thread Kamal Palei
Dear All I have a basic doubt how the data is stored in apache solr indexes. Say I have thousand registered users in my site. Lets say I want to store skills of each users as a multivalued string index. Say user 1 has skill set - Java, MySql, PHP user 2 has skill set - C++, MySql, PHP user 3 has

Re: How apache solr stores indexes

2013-05-28 Thread Alexandre Rafalovitch
And you need to know this why? If you are really trying to understand how this all works under the covers, you need to look at Lucene's inverted index as a start. Start here: http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description

Re: How apache solr stores indexes

2013-05-28 Thread Shashi Kant
Better still start here: http://en.wikipedia.org/wiki/Inverted_index http://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html And there are several books on search engines and related algorithms. On Tue, May 28, 2013 at 10:41 PM, Alexandre Rafalovitch

Re: How apache solr stores indexes

2013-05-28 Thread Kamal Palei
Thanks Alex. I am in dilemma how do I store the skill sets with solr index as a string token or as an integer. To give little background - As of today, each skill I assign a unique id (take as auto increment field in mysql table), and the store them against user id in a separate table. That's

Re: How apache solr stores indexes

2013-05-28 Thread Alexandre Rafalovitch
Store them as a string token in multivalued fields. Solr/Lucene will do the necessary mapping and lookups. That's what you are paying it for. :-) That way you can easily facet and so on. You may need to change some parts of your architecture later, but you seem to be over-thinking it too early in

Re: How apache solr stores indexes

2013-05-28 Thread Jack Krupansky
As a general rule with Solr, do a proof of concept implementation with the simplest sensible approach and only start piling on complexity if performance or capacity become problematic. If the data is naturally a string, use a string. If it is naturally a number, use a number. Use whatever the

Re: How apache solr stores indexes

2013-05-28 Thread Kamal Palei
Thanks a lot for all your input. I will go ahead and store as strings. Best Regards Kamal On Wed, May 29, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.comwrote: As a general rule with Solr, do a proof of concept implementation with the simplest sensible approach and only start piling

Choosing specific fields for suggestions in SpellCheckerComponent

2013-05-28 Thread Wilson Passos
Hi everyone, I've been searching about how to configure the SpellCheckerComponent in Solr 4.0 to support suggestion queries based on s subset of the configured fields in schema.xml. Let's say the spell checking is configured to use these 4 fields: field name=field1 type=text_general/ field

Re: [blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

2013-05-28 Thread Rajesh Nikam
Hi Koji, Great news ! I am looking forward for this OpenNLP toolkit. Thanks a lot ! Rajesh On Wed, May 29, 2013 at 4:12 AM, Koji Sekiguchi k...@r.email.ne.jp wrote: Hi Rajesh, Thanks! I'm planning to open an NLP tool kit for Lucene, and the tool kit will include the following synonym

OPENNLP problems

2013-05-28 Thread Patrick Mi
Hi there, Checked out branch_4x and applied the latest patch LUCENE-2899-current.patch however I ran into 2 problems Followed the wiki page instruction and set up a field with this type aiming to keep nouns and verbs and do a facet on the field == fieldType name=text_opennlp_nvf