question on word parsing control

2012-05-01 Thread kenf_nc
I have a field that is defined using what I believe is fairly standard text fieldType. I have documents with the words 'evaluate', 'evaluating', 'evaluation' in them. When I search on the whole word, obviously it works, if I search on 'eval' it finds nothing. However for some reason if I search on

Re: optional nested queries

2011-07-01 Thread kenf_nc
I don't use dismax, but do something similar with a regular query. I have a field defined in my schema.xml called 'dummy' (not sure why its called that actually) but it defaults to 1 on every document indexed. So say I want to give a score bump to documents that have an image, I can do queries

Re: How to optimize solr indexes

2011-07-01 Thread kenf_nc
I believe that is not a setting, it's not telling you that you have 'optimize turned on' it's a state, your index is currently optimized. If you index a new document or delete an existing document, and don't issue an optimize command, then your index should be optimize=false. -- View this message

Compound word search not what I expected

2011-06-07 Thread kenf_nc
I have a field defined as: field name=content type=text indexed=true stored=false termVectors=true multiValued=true / where text is unmodified from the schema.xml example that came with Solr 1.4.1. I have documents with some compound words indexed, words like Sandstone. And in several cases

Re: Compound word search not what I expected

2011-06-07 Thread kenf_nc
I tried setting catenateWords=1 on the Query analyzer and that didn't do anything. I think what I need is to set my Index Analyzer to have preserveOriginal=1 and then re-index everything. That will be a pain, so I'll do a small test to make sure first. I'm really surprised preserveOriginal=1 isn't

Re: sorting on date field in facet query

2011-05-19 Thread kenf_nc
This is more a speculation than direction, I don't currently use Field Collapsing but my take on it is that it returns the number of docs collapsed. So instead of faceting could you do a search returning DocID, collapsing on DocID sorting on date, then the count of collapsed docs *should* match

Re: Anyone having these Replication issues as well?

2011-05-18 Thread kenf_nc
Thanks Markus, for your patience with getting the response in as well the comments. This is my Dev environment, I'm actually going to be setting up a new master-slave configuration in a different environment today. I'll see if it's environment specific or not. One thing I didn't mention, wasn't

Re: Anyone familiar with Solandra or Lucandra?

2011-05-17 Thread kenf_nc
But I can query Cassandra directly for the documents if I wanted/needed to? And, when I need to re-index, I could read from Cassandra, index into Solr, which will write back to Cassandra overwriting the existing document(s)? Basically the steps would be, index documents into Solr which would

Re: Anyone familiar with Solandra or Lucandra?

2011-05-17 Thread kenf_nc
Ah. I see. That reduces its usefulness to me some. The multi-master aspect is still a big draw of course. But I was hoping this also added an integrated persistence layer to Solr as well. -- View this message in context:

Anyone having these Replication issues as well?

2011-05-17 Thread kenf_nc
Is it just me or is Replication a POS? (Solr 1.4.1, Tomcat 6.0.32) 1) I had set my pollInterval to 60 seconds but it appears to fire constantly so I set it to 5 minutes and I see in the Tomcat logs where it fires the replication check anywhere from 2 minutes to 4 1/2 minutes, but never anything

RE: Schema Design Question

2011-05-15 Thread kenf_nc
create a separate document for each book-bookshelf combination. doc 1 = book 1,shelf 1 doc 2 = book 1,shelf 3 doc 3 = book 2,shelf 1 etc. then your queries are q=book_id to get all bookshelfs a given book is on or q=shelf_id to get all books on a given bookshelf. Biggest problem people face

Re: Anyone familiar with Solandra or Lucendra?

2011-05-12 Thread kenf_nc
I modified the subject to include Lucendra, in case anyone has heard of it by that name. -- View this message in context: http://lucene.472066.n3.nabble.com/Anyone-familiar-with-Solandra-or-Lucendra-tp2927357p2933051.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to do offline adding/updating index

2011-05-11 Thread kenf_nc
My understanding is that the Master has done all the indexing, that replication is a series of file copies to a temp directory, then a move and commit. The slave only gets hit with the effects of a commit, so whatever warming queries are in place, and the caches get reset. Doing too many commits

Anyone familiar with Solandra?

2011-05-11 Thread kenf_nc
The recent Amazon outage exposed a weakness in our architecture. We could really use a Master-Master redundancy. We already have Master to multiple Slaves. I've looked at the various options of converting a Slave into a Master, of having a Repeater (hybrid master/slave) become the Master etc. But,

Re: how to do offline adding/updating index

2011-05-10 Thread kenf_nc
Master/slave replication does this out of the box, easily. Just set the slave to update on Optimize only. Then you can update the master as much as you want. When you are ready to update the slave (the search instance), just optimize the master. On the slave's next cycle check it will refresh

Replication question

2011-05-06 Thread kenf_nc
I have Replication set up with str name=pollInterval00:00:60/str I assumed that meant it would poll the master for updates once a minute. But my logs make it look like it is trying to sync up almost constantly. Below is an example of my log from just 1 minute in time. Am I reading this wrong?

Result order when score is the same

2011-04-13 Thread kenf_nc
I'm using version 1.4.1. It appears that when several documents in a result set have the same score, the secondary sort is by 'indexed_at' ascending. Can this be altered in the config xml files? If I wanted the secondary sort to be indexed_at descending for example, or by a different field, say

Re: Indexing Question for large dataset

2011-04-13 Thread kenf_nc
Indexing isn't a problem, it's just disk space and space is cheap. But, if you do facets on all those price columns, that gets put into RAM which isn't as cheap or plentiful. Your cache buffers may get overloaded a lot and performance will suffer. 2000 price columns seems like a lot, could the

RE: Indexing Question for large dataset

2011-04-13 Thread kenf_nc
Is NAME a product name? Why would it be multivalue? And why would it appear on more than one document? Is each 'document' a package of products? And the pricing tiers are on the package, not individual pieces? So sounds like you could, potentially, have a PriceListX column for each user. As your

Re: Result order when score is the same

2011-04-13 Thread kenf_nc
Is sort order when 'score' is the same a Lucene thing? Should I ask on the Lucene forum? -- View this message in context: http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2817330.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Result order when score is the same

2011-04-13 Thread kenf_nc
Au contraire, I have almost 4 million documents, representing businesses in the US. And having the score be the same is a very common occurrence. It is quite clear from testing that if score is the same, then it sorts on indexed_at ascending. It seems silly to make me add a sort on every query,

Re: Result order when score is the same

2011-04-13 Thread kenf_nc
Is a new DocID generated everytime a doc with the same UniqueID is added to the index? If so, then docID must be incremental and would look like indexed_at ascending. What I see (and why it's a problem for me) is the following. a search brings back the first 5 documents in a result set of say 60.

Re: Index Design Question

2011-02-17 Thread kenf_nc
Some options to reduce performance implications are: replication... index your documents in one solr instance, and query in a different one. that way the users of the query side will not be as adversely impacted by frequent changes. You have better control over when change occurs. separate

Re: Any contribs available for Range field type?

2011-02-15 Thread kenf_nc
I've tried several times to get an active account on solr-...@lucene.apache.org and the mailing list won't send me a confirmation email, and therefore won't let me post because I'm not confirmed. Could I get someone that is a member of Solr-Dev to post either my original request in this thread,

Any contribs available for Range field type?

2011-02-11 Thread kenf_nc
I have a huge need for a new field type. It would be a Poly field, similar to Point or Payload. It would take 2 data elements and a search would return a hit if the search term fell within the range of the elements. For example let's say I have a document representing an Employment record. I may

Re: Any contribs available for Range field type?

2011-02-11 Thread kenf_nc
True. And that's my temporary solution. But it's ugly code, even uglier queries. I may have several such fields in a single query. A PolyField solution would be so much more elegant and useful. I'm actually shocked more people don't need/want something like it. -- View this message in context:

List of indexed or stored fields

2011-01-25 Thread kenf_nc
I use a lot of dynamic fields, so looking at my schema isn't a good way to see all the field names that may be indexed across all documents. Is there a way to query solr for that information? All field names that are indexed, or stored? Possibly a count by field name? Is there any other metadata

Re: List of indexed or stored fields

2011-01-25 Thread kenf_nc
That's exactly what I wanted, thanks. Any idea what long name=version1294513299077/long refers to under the index section? I have 2 cores on one Tomcat instance, and 1 on a second instance (different server) and all 3 have different numbers for version, so I don't think it's the version of

Re: Single value vs multi value setting in tokenized field

2011-01-20 Thread kenf_nc
Thanks guys. I read (and actually digested this time) how multivalued fields work and now realize my question came from a 'structured language/dbms' background. The multivalued field is stored basically as a single value with extra spacing between terms (the positionIncrementGap previously

Re: solrconfig.xml settings question

2011-01-20 Thread kenf_nc
Is that it? Of all the strange, esoteric, little understood configuration settings available in solrconfig.xml, the only thing that affects Index Speed vs Query Speed is turning on/off the Query Cache and RamBufferSize? And for the latter, why wouldn't RamBufferSize be the same for both...that

Re: Single value vs multi value setting in tokenized field

2011-01-17 Thread kenf_nc
No, I have both, a single field (for free form text search), and individual fields (for directed search). I already duplicate the data and that's not a problem, disk space is cheap. What I wanted to know was whether it is best to make the single field multiValued=true or not. That is, should my

solrconfig.xml settings question

2011-01-17 Thread kenf_nc
In the Wiki and the book by Smiley and Pugh, and in the comments inside the solrconfig.xml file itself, it always talks about the various settings in the context of a blended use solr index. By that I mean, it assumes you are indexing and querying from the same solr instance. However, if I have a

Single value vs multi value setting in tokenized field

2011-01-16 Thread kenf_nc
I have to support both general searches (free form text) and directed searches (field:val field2:val). To do the general search I have a field defined as: field name=content type=text indexed=true stored=false termVectors=true multiValued=true / and several copyField commands like: copyField

Re: Query : FAQ? Forum?

2011-01-14 Thread kenf_nc
http://wiki.apache.org/solr/FrontPage Solr Wiki http://wiki.apache.org/solr/FAQ Solr FAQ http://www.amazon.com/Solr-1-4-Enterprise-Search-Server/dp/1847195881/ref=sr_1_1?ie=UTF8qid=1295018231sr=8-1 A good book on Solr And this forum you posted to

Re: Question on deleting all rows for an index

2011-01-13 Thread kenf_nc
If this is a one-time cleanup, not something you need to do programmatically, you could delete the index directory ( solrDir/data/index ). In my case I have to stop Tomcat, delete .\index and restart Tomcat. It is very fast and starts me out with a fresh, empty, index. Noticed you are multi-core,

Re: basic document crud in an index

2011-01-13 Thread kenf_nc
A/ You have to update all the fields, if you leave one off, it won't be in the document anymore. I have my 'persisted' data stored outside of Solr, so on update I get the stored data, modify it and update Solr with every field (even if one changed). You could also do a Query/Modify/Update

Re: Consequences for using multivalued on all fields

2010-12-21 Thread kenf_nc
I have about 30 million documents and with the exception of the Unique ID, Type and a couple of date fields, every document is made of dynamic fields. Now, I only have maybe 1 in 5 being multi-value, but search and facet performance doesn't look appreciably different from a fixed schema solution.

Re: Solr site not accessible

2010-12-17 Thread kenf_nc
Yep, www.apache.org is down. They tick off the wikihackers too? :) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-site-not-accessible-tp2105072p2105095.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Thank you!

2010-12-16 Thread kenf_nc
Hear hear! In the beginning of my journey with Solr/Lucene I couldn't have done it without this site. Smiley and Pugh's book was useful, but this forum was invaluable. I don't have as many questions now, but each new venture, Geospatial searching, replication and redundancy, performance tuning,

Re: Multi Word searches in Solr

2010-11-17 Thread kenf_nc
Multi word queries is the bread and butter of Solr/Lucene, so I'm not sure I understand the complete issue here. For clarity, is 'abstract' the name of your default text field, or is your query q=abstract: mouse genome if the latter, my thought was is it possible that the query is being

Re: Link to download solr4.0 is not working?

2010-11-15 Thread kenf_nc
While we are on this subject...my company is kind of new to the whole open source as a production tool concept. I can't push anything to production that isn't labeled as 'release' or similar designation. So, 1.4.1 is what I have right now. I can play with other versions but that's about it. I'm

Re: Link to download solr4.0 is not working?

2010-11-15 Thread kenf_nc
Thanks Jan. I didn't know about 1.4.2 I'll give it a look. However, your link is something I've already seen. I understand the different Solr versions, my question was more on what is the process, and timeline, for the community to turn the current trunk into a 'release'. From that link, and

Re: Query question

2010-11-03 Thread kenf_nc
Unfortunately the default operator is set to AND and I can't change that at this time. If I do (city:Chicago^10 OR Romantic OR View) it returns way too many unwanted results. If I do (city:Chicago^10 OR (Romantic AND View)) it returns less unwanted results, but still a lot. iorixxx's solution

Query question

2010-11-02 Thread kenf_nc
I can't seem to find the right formula for this. I have a need to build a query where one of the fields should boost the score, but not affect the query if there isn't a match. For example, if I have documents with restaurants, name, address, cuisine, description, etc. I want to search on, say,

Re: Query question

2010-11-02 Thread kenf_nc
Jonathan, Dismax is something I've been meaning to look into, and bq does seem to fit the bill, although I'm worried about this line in the wiki :TODO: That latter part is deprecated behavior but still works. It can be problematic so avoid it. It still seems to be the closest to what I want

Re: Reverse range query

2010-10-29 Thread kenf_nc
I modified the text of this hopefully to make it clearer. I wasn't sure what I was asking was coming across well. And I'm adding this comment in a shameless attempt to boost my question back to the top for people to see. Before I write a messy work around, just wanted to check the community to

Reverse range search

2010-10-28 Thread kenf_nc
Doing a range search is straightforward. I have a fixed value in a document field, I search on [x TO y] and if the fixed value is in the range requested it gets a hit. But, what if I have data in a document where there is a min value and a max value and my query is a fixed value and I want to get

Re: Stored or indexed?

2010-10-27 Thread kenf_nc
Interesting wiki link, I hadn't seen that table before. And to answer your specific question about indexed=true, stored=false, this is most often done when you are using analyzers/tokenizers on your field. This field is for search only, you would never retrieve it's contents for display. It may

Re: Missing facet values for zero counts

2010-09-29 Thread kenf_nc
I don't understand why you would want to show Sweden if it isn't in the index, what will your UI do if the user selects Sweden? However, one way to handle this would be to make a second document type. Have a field called type or some such, and make the new document type be 'dummy' or 'system' or

Re: Re:The search response time is too loong

2010-09-27 Thread kenf_nc
mem usage is over 400M, do you mean Tomcat mem size? If you don't give your cache sizes enough room to grow you will choke the performance. You should adjust your Tomcat settings to let the cache grow to at least 1GB or better would be 2GB. You may also want to look into

Re: Solr Reporting

2010-09-23 Thread kenf_nc
keep in mind that the str name=id paradigm isn't completely useless, the str is a data type (string), it can be int, float, double, date, and others. So to not lose any information you may want to do something like: id type=int123/id title type=strxyz/title Which I agree makes more sense to

Re: How can I delete the entire contents of the index?

2010-09-23 Thread kenf_nc
Quick tangent... I went to the link you provided, and the delete part makes sense. But the next tip, how to re-index after a schema change. What is the point of step 5. Send an optimize/ command. ? Why do you need to optimize an empty index? Or is my understanding of Optimize incorrect? --

Re: Searches with a period (.) in the query

2010-09-23 Thread kenf_nc
Do you have any other Analyzers or Formatters involved? I use delimiters in certain string fields all the time. Usually a colon : or slash / but should be the same for a period. I've never seen this behavior. But if you have any kind of tokenizer or formatter involved beyond fieldType

Re: Searches with a period (.) in the query

2010-09-22 Thread kenf_nc
Could it be a case-sensitivity issue? The StrField type is not analyzed, but indexed/stored verbatim. (from the schema comments). If you are looking for ab.pqr but it is in fact ab.Pqr in the solr document, it wouldn't find it. -- View this message in context:

Re: getting a list of top page-ranked webpages

2010-09-17 Thread kenf_nc
A slightly different route to take, but one that should help test/refine a semantic parser is wikipedia. They make available their entire corpus, or any subset you define. The whole thing is like 14 terabytes, but you can get smaller sets. -- View this message in context:

Re: Can i do relavence and sorting together?

2010-09-17 Thread kenf_nc
Those are at least 3 different questions. Easiest first, sorting. addsort=ad_post_date+desc (or asc) for sorting on date, descending or ascending check out how http://www.supermind.org/blog/378/lucene-scoring-for-dummies Lucene scores by default. It might close to what you want. The

Re: DataImportHandler with multiline SQL

2010-09-17 Thread kenf_nc
Sounds like you want the http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor CachedSqlEntityProcessor it lets you make one query that is cached locally and can be joined to with a separate query. -- View this message in context:

Re: Get all results from a solr query

2010-09-17 Thread kenf_nc
Chris, I agree, having the ability to make rows something like -1 to bring back everything would be convenient. However, the 2 call approach (q=blahrows=0 followed by q=blahrows=numFound) isn't that slow, and does give you more information up front. You can optimize your Array or List sizes in

Re: Index partitioned/ Full indexing by MSSQL or MySQL

2010-09-17 Thread kenf_nc
You don't give an indication of size. How large are the documents being indexed and how many of them are there. However, my opinion would be a single index with an 'active' flag. In your queries you can use FilterQueries (fq=) to optimize on just active if you wish, or just inactive if that is

Re: Does SolrNet support indexing of Database tables and XML files

2010-09-03 Thread kenf_nc
Alok, I noticed you also posted to the SolrNet forum, and that's a better place for this question. But basically, SolrNet is a wrapper around Solr functionality. It lets you build your Solr interactions (Queries, Stats, Facets, etc) and Inserts/Deletes using .Net objects. The reading of a data

Re: solr user

2010-09-02 Thread kenf_nc
You are querying for 'branch' and trying to place it in 'skill'. Also, you have Name and Column backwards, it should be: field column=id name=id/ field column=name name=name/ field column=city name=city_t/ field column=skill name=skill_t/ -- View this message in context:

Re: High - Low field value?

2010-09-01 Thread kenf_nc
That's exactly what I want. I was just searching the wiki using the wrong terms. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/High-Low-field-value-tp1402568p1403164.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr

2010-08-31 Thread kenf_nc
We would really need to see more information, but some first things to look for are: are your field definitions in the schema.xml set to indexed=true (if you want to search it) and stored=true (if you want to see it in the return results)? is the case of the field names the same in schema.xml

Re: Problem related to Sorting in Solr1.4

2010-08-27 Thread kenf_nc
the 'text' fieldType is not suitable for sorting. You need to use the copyField directive in your schema and at indexing time copy the data to your TITLE and UPDBY fields, and you need to create 2 new fields: field name=TITLE_sort type=string indexed=true stored=true / field name=UPDBY_sort

Re: Private data within SOLR Schema

2010-08-27 Thread kenf_nc
my feeling is that private fields in a public document will be the hardest nut to crack, unless you have an intermediary layer that users call instead of hitting your solr instance directly. If you front it with a web service you could handle various authorization scenarios a little easier.

Re: Schema Definition Question

2010-08-12 Thread kenf_nc
One way I've done to handle this, and it works only for some types of data, is to put the searchable part of the sub-doc in a search field (indexed=true) and put an xml or json representation of the sub-doc in a stored only field. Then if the main doc is hit via search I can grab the xml or json,

Re: Solr Doc Lucene Doc !?

2010-08-12 Thread kenf_nc
Are you just trying to learn the tiny details of how Solr and DIH work? Is this just an intellectual curiosity? Or are you having some specific problem that you are trying to solve? If you have a problem, could you describe the symptoms of the problem? I am using Solr, DIH, and several other

Re: Delta-import with solrj client

2010-08-11 Thread kenf_nc
Short answer is no, there isn't a way. Solr doesn't have the concept of 'Update' to an indexed document. You need to add the full document (all 'columns') each time any one field changes. If doing that in your DataImportHandler logic is difficult you may need to write a separate Update Service

Re: Data Import Handler Query

2010-08-11 Thread kenf_nc
It may not be the data config. Do you have the fields in the schema.xml that the image data is going to set to be multiValued=true? Although, I would think the last image would be stored, not the first, but haven't really tested this. -- View this message in context:

Re: Facet Fields - ID vs. Display Value

2010-08-10 Thread kenf_nc
If your concern is performance, faceting integers versus faceting strings, I believe Lucene makes the differences negligible. Given that choice I'd go with string. Now..if you need to keep an association between id and string, you may want to facet a combined field id:string or some other

Re: delete Problem..

2010-08-10 Thread kenf_nc
I'd try 2 things. First do a query q=EMAIL_HEADER_FROM:test.de and make sure some documents are found. If nothing is found, there is nothing to delete. Second, how are you testing to see if the document is deleted? The physical data isn't removed from the index until you Optimize I believe.

Re: DIH and multivariable fields problems

2010-08-10 Thread kenf_nc
Glad I could help. I also would think it was a very common issue. Personally my schema is almost all dynamic fields. I have unique_id, content, last_update_date and maybe one other field specifically defined, the rest are all dynamic. This lets me accept an almost endless variety of document

Re: SOLR QUERY

2010-08-06 Thread kenf_nc
In your schema.xml there is a field called defaultSearchFieldcontent/defaultSearchField it may be something other than 'content'. This field is the one searched if you don't specify one in the query. You can explicitly put something there with an add or you can have a copyField directive in

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread kenf_nc
Not sure the processing would be any faster than just querying again, but, in your original result set the first doc that has a field value that matches a to 10 facet, will be the number 1 item if you fq on that facet value. So you don't need to query it again. You would only need to query those

Re: Customize order field list ???

2010-07-30 Thread kenf_nc
I believe they come back alphabetically sorted (not sure if this is language specific or not), so a quick way might be to change the name from createdate to zz_createdate or something like that. Generally with XML you should not be worried about order however. It's usually a sign of a design

Nabble problems?

2010-07-29 Thread kenf_nc
The Nabble.com page for Solr - User seems to be broken. I haven't seen an update on it since early this morning. However I'm still getting email notifications so people are seeing and responding to posts. I'm just curious, are you just using email and responding to solr-u...@lucene.apache.org? Or

Re: Indexing Problem: Where's my data?

2010-07-27 Thread kenf_nc
for STRING_VALUE, I assume there is a property in the 'select *' results called string_value? if so I'm not sure why it wouldn't work. If not, then that's why, it doesn't have anything to put there. For ATTRIBUTE_NAME, is it possibly a case issue? you called it 'Attribute_Name' in your query,

Re: Solr Doc Lucene Doc !?

2010-07-26 Thread kenf_nc
DataImportHandler (DIH) is an add-on to Solr. It lets you import documents from a number of sources in a flexible way. The only connection DIH has to Lucene is that Solr uses Lucene as the index engine. When you work with Solr you naturally talk about Solr Documents, if you were working with

Re: nested query and number of matched records

2010-07-21 Thread kenf_nc
parallel calls. simultaneously query for type:short rows=10 and type:extensive rows=1 and merge your results. This would also let you separate your short docs from your extensive docs into different solr instances if you wished...depending on your document architecture this could speed up one

Re: nested query and number of matched records

2010-07-21 Thread kenf_nc
That just gives a count of documents by type. The use-case, I believe, is to return from a search, 10 documents of type 'short' and 1 document of type 'extensive'. -- View this message in context:

Re: how to change the default path of Solr Tomcat

2010-07-21 Thread kenf_nc
Your environment may be different, but this is how I did it. (Apache Tomcat on Windows 2008) go to \program files\apache...\Tomcat\conf\catalina\localhost rename solr.xml to search.xml recycle Tomcat service -- View this message in context:

Re: Finding distinct unique IDs in documents returned by fq -- Urgent Help Req

2010-07-19 Thread kenf_nc
Oh, okay. Got it now. Unfortunately I don't believe Solr supplies a total count of matching facet values. One way to do this, although performance may suffer, is to set your limit to -1 and just get back everything, that will give you the count. You may want to set mincount to 1 so you aren't

Re: indexing best practices

2010-07-18 Thread kenf_nc
No one has done performance analysis? Or has a link to anywhere where it's been done? basically fastest way to get documents into Solr. So many options available, what's the fastest: 1) file import (xml, csv) vs DIH vs POSTing 2) number of concurrent clients 1 vs 10 vs 100 ...is there a

Re: Finding distinct unique IDs in documents returned by fq -- Urgent Help Req

2010-07-16 Thread kenf_nc
It may just be a mis-wording, but if you do distinct on 'unique' IDs, the count should be the same as response.numFound. But if you didn't mean 'unique', just count of some field in the results, Rebecca is correct, facets should do the job. Something like:

Re: Fwd: send to list

2010-07-16 Thread kenf_nc
If at all possible I like to do any processing work up front and not deal with extravagant queries. If your grid definitions don't change, or don't change often, just assign a cell number to each 100 square grid. Then in a pre-processing step assign the appropriate cell number to your document

indexing best practices

2010-07-16 Thread kenf_nc
I was curious if anyone has done work on finding what an optimal (or max) number of client processes are for indexing. That is, if I have the ability to spin up N number of processes that construct a POST to add/update a Solr document, is there a point at which the number of clients posting

Re: Tag generation

2010-07-16 Thread kenf_nc
Thanks for all the suggestions! I'm absorbing them as quickly as I can. -- View this message in context: http://lucene.472066.n3.nabble.com/Tag-generation-tp969888p973277.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query help

2010-07-15 Thread kenf_nc
Your example though doesn't show different ContentType, it shows a different sort order. That would be difficult to achieve in one call. Sounds like your best bet is asynchronous (multi-threaded) calls if your architecture will allow for it. -- View this message in context:

Tag generation

2010-07-15 Thread kenf_nc
A colleague mentioned that he knew of services where you pass some content and it spits out some suggested Tags or Keywords that would be best suited to associate with that content. Does anyone know if there is a contrib to Solr or Lucene that does something like this? Or a third party tool that

Re: Strange the when search with dismax

2010-07-14 Thread kenf_nc
Sounds like you want the 'text' fieldType (or equivalent) and are using 'string' or 'lowercase'. Those must match all exactly (well, case insensitively in the case of 'lowercase'). The TextType field types (like 'text') do tokenizations so matches will occur under many more conditions. -- View

Re: MultiValue dynamicField and copyField

2010-07-14 Thread kenf_nc
Yep, my schema does this all day long. -- View this message in context: http://lucene.472066.n3.nabble.com/MultiValue-dynamicField-and-copyField-tp965941p966536.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query: URl too long

2010-07-12 Thread kenf_nc
Frederico, You should also pose your question on the SolrNet forum, http://groups.google.com/group/solrnet?hl=en Switching from GET to POST isn't a Solr issue, but a SolrNet issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-URl-too-long-tp959990p960208.html Sent