Managed schema used with Cloudera MapreduceIndexerTool and morphlines?

2017-03-17 Thread Jay Hill
I've got a very difficult project to tackle. I've been tasked with using schemaless mode to index json files that we receive. The structure of the json files will always be very different as we're receiving files from different customers totally unrelated to one another. We are attempting to build

Re: Very long running replication.

2014-02-27 Thread Jay Hill
Bumping this. I'm seeing the error mentioned earlier in the thread - Unable to download segment filename completely. Downloaded 0!=size often in my logs. I'm dealing with a situation where maxDoc count is growing at a faster rate than numDocs and is now almost twice as large. I'm not optimizing

Loading custom update request handler on startup

2012-07-09 Thread Jay Hill
I'm writing a custom update request handler that will poll a hot directory for Solr xml files and index anything it finds there. The custom class implements Runnable, and when the run method is called the loop starts to do the polling. How can I tell Solr to load this class on startup to fire off

Re: Loading custom update request handler on startup

2012-07-09 Thread Jay Hill
=com.bestbuy.search.foundation.solr.DynamicIndexerEventListener / Then in the newSearcher() method I startup up the thread for my polling UpdateRequestHandler. This seems to work, but if anyone has a better (or more tested) approach please let us know. -Jay On Mon, Jul 9, 2012 at 2:33 PM, Jay Hill jayallenh

Re: TermsComponent show only terms that matched query?

2012-02-27 Thread Jay Hill
On Fri, Feb 24, 2012 at 3:31 PM, Jay Hill jayallenh...@gmail.com wrote: I have a situation where I want to show the term counts as is done in the TermsComponent, but *only* for terms that are *matched* in a query, so I get something returned like this (pseudo code): q=title:(golf swing) doc

TermsComponent show only terms that matched query?

2012-02-24 Thread Jay Hill
I have a situation where I want to show the term counts as is done in the TermsComponent, but *only* for terms that are *matched* in a query, so I get something returned like this (pseudo code): q=title:(golf swing) doc title: golf legends show how to improve your golf swing on the golf course

Complex query, need filtering after query not before

2012-01-27 Thread Jay Hill
I have a project where we need to search 1B docs and still have results 700ms. The problem is, we are using geofiltering and that is happening * before* the queries, so we have to geofilter on the 1B docs to restrict our set of docs first, and then do the query on a name field. But it seems that

Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
I'm on a project where we have 1B docs sharded across 20 servers. We're not in production yet and we're doing load tests now. We're sending load to hit 100qps per server. As the load increases we're seeing query times sporadically increasing to 10 seconds, 20 seconds, etc. at times. What we're

Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
We're on the trunk: 4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47 Client timeouts are set to 4 seconds. Thanks, -Jay On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller markrmil...@gmail.com wrote: On Jan 26, 2012, at 1:28 PM, Jay Hill wrote: I've tried setting the following

Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
if a response wasn't received w/in the timeAllowed, and if partialResults is true, then that shard would not be waited on for results. is that correct? thanks, -jay On Thu, Jan 26, 2012 at 2:23 PM, Jay Hill jayallenh...@gmail.com wrote: We're on the trunk: 4.0-2011-10-26_08-46-59 1189079 - hudson

/no_coord in dismax scoring explain

2012-01-06 Thread Jay Hill
What does /no_coord mean in the dismax scoring output? I've looked through the wiki mail archives, lucidfind, and can't find any reference. -- ¡jah!

Re: facet search and UnInverted multi-valued field?

2011-05-03 Thread Jay Hill
UnInvertedField is similar to Lucene's FieldCache, except, while the FieldCache cannot work with multivalued fields, UnInvertedField is designed for that very purpose. So since your f_dcperson field is multivalued, by default you use UnInvertedField. You're not doing anything wrong, that's default

Scaling Search with Big Data/Hadoop and Solr now available at Lucene Revolution

2011-04-25 Thread Jay Hill
I've worked with a lot of different Solr implementations, and one area that is emerging more and more is using Solr in combination with other big data solutions. My company, Lucid Imagination, has added a two-day course to our upcoming Lucene Revolution conference, Scaling Search with Big Data and

Re: Multiple Tags and Facets

2011-04-21 Thread Jay Hill
I don't think I understand what you're trying to do. Are you trying to preserve all facets after a user clicks on a facet, and thereby triggers a filter query, which excludes the other facets? If that's the case, you can use local parameters to tag the filter queries so they are not used for the

Re: Understanding the DisMax tie parameter

2011-04-15 Thread Jay Hill
Looks good, thanks Tom. -Jay On Fri, Apr 15, 2011 at 8:55 AM, Burton-West, Tom tburt...@umich.eduwrote: Thanks everyone. I updated the wiki. If you have a chance please take a look and check to make sure I got it right on the wiki.

Re: Understanding the DisMax tie parameter

2011-04-14 Thread Jay Hill
Dismax works by first selecting the highest scoring sub-query of all the sub-queries that were run. If I want to search on three fields, manu, name and features, I can configure dismax like this: requestHandler name=search_dismax class=solr.SearchHandler lst name=defaults str

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-13 Thread Jay Hill
As Hoss mentioned earlier in the thread, you can use the statistics page from the admin console to view the current number of segments. But if you want to know by looking at the files, each segment will have a unique prefix, such as _u. There will be one unique prefix for every segment in the

Re: phrase, inidividual term, prefix, fuzzy and stemming search

2011-02-04 Thread Jay Hill
You mentioned that dismax does not support wildcards, but edismax does. Not sure if dismax would have solved your other problems, or whether you just had to shift gears because of the wildcard issue, but you might want to have a look at edismax. -Jay http://www.lucidimagination.com On Mon, Jan

Re: WordDelimiterFilterFactory

2011-02-04 Thread Jay Hill
You can always try something like this out in the analysis.jsp page, accessible from the Solr Admin home. Check out that page and see how it allows you to enter text to represent what was indexed, and text for a query. You can then see if there are matches. Very handy to see how the various

Re: Tuning Solr

2010-10-05 Thread Jay Hill
Removing those components is not likely to impact performance very much, if at all. I would focus on other areas when tuning performance, such as looking memory usage and configuration, query design, etc. But there isn't any harm in removing them either. Why not do some load tests with the

Creating new Solr cores using relative paths

2010-08-17 Thread Jay Hill
I'm having trouble getting the core CREATE command to work with relative paths in the solr.xml configuration. I'm working with a layout like this: /opt/solr [this is solr.solr.home: $SOLR_HOME] /opt/solr/solr.xml /opt/solr/core0/ [this is the template core] /opt/solr/core0/conf/schema.xml [etc.]

Re: OutOfMemoryErrors

2010-08-17 Thread Jay Hill
A merge factor of 100 is very high and out of the norm. Try starting with a value of 10. I've never seen a running system with a value anywhere near this high. Also, what is your setting for ramBufferSizeMB? -Jay On Tue, Aug 17, 2010 at 10:46 AM, rajini maski rajinima...@gmail.comwrote: yeah

SolrJ: Setting multiple parameters

2010-06-20 Thread Jay Hill
Working with SolrJ I'm doing a query using the StatsComponent, and the stats.facet parameter. I'm not able to set multiple fields for the stats.facet parameter using SolrJ. Here is the query I'm trying to create:

Anyone using Solr spatial from trunk?

2010-06-07 Thread Jay Hill
I was wondering about the production readiness of the new-in-trunk spatial functionality. Is anyone using this in a production environment? -Jay

Re: Index-time vs. search-time boosting performance

2010-06-04 Thread Jay Hill
I've done a lot of recency boosting to documents, and I'm wondering why you would want to do that at index time. If you are continuously indexing new documents, what was recent when it was indexed becomes, over time less recent. Are you unsatisfied with your current performance with the boost

Auto-suggest internal terms

2010-06-02 Thread Jay Hill
I've got a situation where I'm looking to build an auto-suggest where any term entered will lead to suggestions. For example, if I type wine I want to see suggestions like this: french *wine* classes *wine* book discounts burgundy *wine* etc. I've tried some tricks with shingles, but the only

Re: field length normalization

2010-03-11 Thread Jay Hill
The fieldNorm is computed like this: fieldNorm = lengthNorm * documentBoost * documentFieldBoosts and the lengthNorm is: lengthNorm = 1/(numTermsInField)**.5 [note that the value is encoded as a single byte, so there is some precision loss] So the values are not pre-set for the lengthNorm, but

Re: Question about fieldNorms

2010-03-07 Thread Jay Hill
Yes, if omitNorms=true, then no lengthNorm calculation will be done, and the fieldNorm value will be 1.0, and lengths of the field in question will not be a factor in the score. To see an example of this you can do a quick test. Add two text fields, and on one omitNorms: field name=foo

Re: Free Webinar: Mastering Solr 1.4 with Yonik Seeley

2010-02-26 Thread Jay Hill
Yes, it will be recorded and available to view after the presentation. -Jay On Thu, Feb 25, 2010 at 2:19 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Yonk, can you please advise whether this event will be recorded and available for later download? (It starts 5am our time

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-22 Thread Jay Hill
Looks like multi-threaded support was added to the DIH recently: http://issues.apache.org/jira/browse/SOLR-1352 -Jay On Fri, Feb 19, 2010 at 6:27 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Glen may be referring to LuSql indexing with multiple threads? Does/can DIH do that, too?

Re: score computation for dismax handler

2010-02-22 Thread Jay Hill
Set the tie parameter to 1.0. This param is set between 0.0 (pure disjunction maximum) and 1.0 (pure disjunction sum): http://wiki.apache.org/solr/DisMaxRequestHandler#tie_.28Tie_breaker.29 -Jay On Thu, Feb 18, 2010 at 4:24 AM, bharath venkatesh bharathv6.proj...@gmail.com wrote: Hi ,

Re: optimize is taking too much time

2010-02-22 Thread Jay Hill
With a mergeFactor set to anything 1 you would never have only one segment - unless you optimized. So Lucene will never naturally merge all the segments into one. Unless, I suppose, the mergeFactor was set to 1, but I've never tested that. It's hard to picture how that would work. If I

Re: optimize is taking too much time

2010-02-22 Thread Jay Hill
Thanks for clearing that up guys, I misspoke slightly. It's just that, in a running system, it's probably very rare that there is only a single segment for any meaningful length of time. Unless that merge-down-to-one occurs right when indexing stops there will almost always be a new (small)

Solr Analysis Webinar Jan 28, 2010

2010-01-20 Thread Jay Hill
My colleague at Lucid Imagination, Tom Hill, will be presenting a free webinar focused on analysis in Lucene/Solr. If you're interested, please sign up and join us. Here is the official notice: We'd like to invite you to a free webinar our company is offering next Thursday, 28 January, at 2PM

Re: solr blocking on commit

2010-01-19 Thread Jay Hill
A couple of follow up questions: - What type of garbage collector is in use? - How often are you optimizing the index? - In solrconfig.xml what is the setting for mainIndexramBufferSizeMB? - Right before and after you see this pause, check the output of http://host:port/solr/admin/system,

Re: Solr 1.4 - stats page slow

2010-01-08 Thread Jay Hill
It's definitely still an issue. I've seen this with at least four different Solr implementations. It clearly seems to be a problem when there is a large field cache. It would be bad enough if the stats.jsp was just slow to load (usually takes 1 to 2 minutes), but when monitoring memory usage with

Re: Solr 1.4 - stats page slow

2010-01-08 Thread Jay Hill
Actually my cases were all with customers I work with, not just one case. A common practice is to monitor cache stats to tune the caches properly. Also, noting the warmup times for new IndexSearchers, etc. I've worked with people that have excessive auto-warm count values which is causing

Re: Indexing the latests MS Office documents

2010-01-05 Thread Jay Hill
The version of Tika in the 1.4 release definitely parses the most current Office formats (.docx, .pptx, etc.) and they index as expected. -Jay On Mon, Jan 4, 2010 at 6:02 PM, Peter Wolanin peter.wola...@acquia.comwrote: You must have been searching old documentation - I think tika 0,3+ has

Re: Solr 1.4 - stats page slow

2009-12-24 Thread Jay Hill
I've noticed this as well, usually when working with a large field cache. I haven't done in-depth analysis of this yet, but it seems like when the stats page is trying to pull data from a large field cache it takes quite a long time. Are you doing a lot of sorting? If so, what are the field types

Re: Solr 1.4 - stats page slow

2009-12-24 Thread Jay Hill
Also, what is your heap size and the amount of RAM on the machine? I've also noticed that, when watching memory usage through JConsole or YourKit while loading the stats page, the memory usage spikes dramatically - are you seeing this as well? -Jay On Thu, Dec 24, 2009 at 9:12 AM, Jay Hill

Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill
I'm on a project where I'm trying to determine the size of the field cache. We're seeing lots of memory problems, and I suspect that the field cache is extremely large, but I'm trying to get exact counts on what's in the field cache. One thing that struck me as odd in the output of the stats.jsp

Re: Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill
, Dec 19, 2009 at 11:37 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Sat, Dec 19, 2009 at 2:25 PM, Jay Hill jayallenh...@gmail.com wrote: One thing that struck me as odd in the output of the stats.jsp page is that the field cache always shows a String type for a field, even

Re: Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill
Oh, forgot to add (just to keep the thread complete), the field is being used for a sort, so it was able to use TrieDoubleField. Thanks again, -Jay On Sat, Dec 19, 2009 at 12:21 PM, Jay Hill jayallenh...@gmail.com wrote: This field is of class type solr.SortableDoubleField. I'm actually

Re: nested queries

2009-11-19 Thread Jay Hill
I don't think your queries are actually nested queries. Nested queries key off of the magic field name _query_. You're right however that there is very little in the way of documentation of examples of nested queries. If you haven't seen this blog about them yet you might find this a helpful

Re: Wildcards at the Beginning of a Search.

2009-11-16 Thread Jay Hill
There is a text_rev field type in the example schema.xml file in the official release of 1.4. It uses the ReversedWildcardFilterFactory to revers a field. You can do a copyField from the field you want to use for leading wildcard searches to a field using the text_rev field, and then do a regular

Replication admin page auto-reload

2009-11-16 Thread Jay Hill
The replication admin page on slaves used to have an auto-reload set to reload every few seconds. In the official 1.4 release this doesn't seem to be working, but it does in a nightly build from early June. Was this changed on purpose or is this a bug? I looked through CHANGES.txt to see if

Re: Sending file to Solr via HTTP POST

2009-11-05 Thread Jay Hill
Here is a brief example of how to use SolrJ with the ExtractingRequestHandler: ContentStreamUpdateRequest req = new ContentStreamUpdateRequest(/update/extract); req.addFile(fileToIndex); req.setParam(literal.id, getId(fileToIndex));

Re: specify multiple files in lst for DataImportHandler

2009-11-05 Thread Jay Hill
You can set up multiple request handlers each with their own configuration file. For example, in addition to the config you listed you could add something like this: requestHandler name=/dataimport-two class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str

Re: CPU utilization and query time high on Solr slave when snapshot install

2009-11-02 Thread Jay Hill
So assuming you set up a few sample sort queries to run in the firstSearcher config, and had very low query volume during that ten minutes so that there were no evictions before a new Searcher was loaded, would those queries run by the firstSearcher be passed along to the cache for the next

Re: solr web ui

2009-10-30 Thread Jay Hill
Have a look at the VelocityResponseWriter ( http://wiki.apache.org/solr/VelocityResponseWriter). It's in the contrib area, but the wiki has instructions on how to move it into your core Solr. Solr uses response writers to return results. The default is XML but responses can be returned in JSON,

Re: Facets - ORing attribute values

2009-10-29 Thread Jay Hill
1.4 has a good chance of being released next week. There was a hope that it might make it this week, but another bug in Lucene 2.9.1 was found, pushing things back just a little bit longer. -Jay http://www.lucidimagination.com On Thu, Oct 29, 2009 at 11:43 AM, beaviebugeater

Re: DIH: Setting rows= on full-import has no effect

2009-10-09 Thread Jay Hill
://issues.apache.org/jira/browse/SOLR-1501 On Fri, Oct 9, 2009 at 6:10 AM, Jay Hill jayallenh...@gmail.com wrote: In the past setting rows=n with the full-import command has stopped the DIH importing at the number I passed in, but now this doesn't seem to be working. Here is the command I'm using

Re: concatenating tokens

2009-10-09 Thread Jay Hill
Use copyField to copy to a field with a field type like this: fieldType name=special class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter

Re: Dynamic Data Import from multiple identical tables

2009-10-09 Thread Jay Hill
You could use separate DIH config files for each of your three tables. This might be overkill, but it would keep them separate. The DIH is not limited to one request handler setup, so you could create a unique handler for each case with a unique name: requestHandler name=/indexer/table1

Re: java -Dsolr.solr.home=core -jar start.jar not working for me

2009-10-09 Thread Jay Hill
Shouldn't that be: java -Dsolr.solr.home=multicore -jar start.jar and then hit url: http://localhost:8983/solr/core0/admin/ or http://localhost:8983/solr/core1/admin/ -Jay http://www.lucidimagination.com On Fri, Oct 9, 2009 at 1:17 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I

Re: java -Dsolr.solr.home=core -jar start.jar not working for me

2009-10-09 Thread Jay Hill
: Started SocketConnector @ 0.0.0.0:8983 And http://localhost:8983/solr/admin yields a 404 error. On Fri, Oct 9, 2009 at 1:27 PM, Jay Hill jayallenh...@gmail.com wrote: Shouldn't that be: java -Dsolr.solr.home=multicore -jar start.jar and then hit url: http://localhost:8983/solr/core0/admin

DIH: Setting rows= on full-import has no effect

2009-10-08 Thread Jay Hill
In the past setting rows=n with the full-import command has stopped the DIH importing at the number I passed in, but now this doesn't seem to be working. Here is the command I'm using: curl ' http://localhost:8983/solr/indexer/mediawiki?command=full-importrows=100' But when 100 docs are imported

Re: TermsComponent or auto-suggest with filter

2009-10-07 Thread Jay Hill
approaches are to use either the TermsComponent (new in Solr 1.4) or faceting. On Wed, Oct 7, 2009 at 1:51 AM, Jay Hill jayallenh...@gmail.com wrote: Have a look at a blog I posted on how to use EdgeNGrams to build an auto-suggest tool: http://www.lucidimagination.com/blog/2009/09/08/auto

Re: ISOLatin1AccentFilter before or after Snowball?

2009-10-07 Thread Jay Hill
Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory deprecated in favor of: charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ in 1.4? -Jay http://www.lucidimagination.com On Wed, Oct 7, 2009 at 1:44 AM, Shalin Shekhar Mangar

Re: TermsComponent or auto-suggest with filter

2009-10-06 Thread Jay Hill
Have a look at a blog I posted on how to use EdgeNGrams to build an auto-suggest tool: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ You could easily add filter queries to this approach. Ffor example, the query used in the blog could add

Batching requests using SolrCell with SolrJ

2009-09-19 Thread Jay Hill
When working with SolrJ I have typically batched a Collection of SolrInputDocument objects before sending them to the Solr server. I'm working with the latest nightly build and using the ExtractingRequestHandler to index documents, and everything is working fine. Except I haven't been able to

Any way to encrypt/decrypt stored fields?

2009-09-16 Thread Jay Hill
For security reasons (say I'm indexing very sensitive data, medical records for example) is there a way to encrypt data that is stored in Solr? Some businesses I've encountered have such needs and this is a barrier to them adopting Solr to replace other legacy systems. Would it require a

Re: Is it possible to query for everything ?

2009-09-14 Thread Jay Hill
Use: ?q=*:* -Jay http://www.lucidimagination.com On Mon, Sep 14, 2009 at 4:18 PM, Jonathan Vanasco jvana...@2xlp.com wrote: I'm using Solr for seach and faceted browsing Is it possible to have solr search for 'everything' , at least as far as q is concerned ? The request handlers I've

Re: Is it possible to query for everything ?

2009-09-14 Thread Jay Hill
With dismax you can use q.alt when the q param is missing: q.alt=*:* should work. -Jay On Mon, Sep 14, 2009 at 5:38 PM, Jonathan Vanasco jvana...@2xlp.com wrote: Thanks Jay Matt I tried *:* on my app, and it didn't work I tried it on the solr admin, and it did I checked the solr config

Re: KStem download

2009-09-14 Thread Jay Hill
The two jar files are all you should need, and the configuration is correct. However I noticed that you are on Solr 1.3. I haven't tested the Lucid KStemmer on a non-Lucid-certified distribution of 1.3. I have tested it on recent versions of 1.4 and it works fine (just tested with the most recent

Re: Highlighting in SolrJ?

2009-09-12 Thread Jay Hill
Will do Shalin. -Jay http://www.lucidimagination.com On Fri, Sep 11, 2009 at 9:23 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Jay, it would be great if you can add this example to the Solrj wiki: http://wiki.apache.org/solr/Solrj On Fri, Sep 11, 2009 at 5:15 AM, Jay Hill

Re: Highlighting in SolrJ?

2009-09-11 Thread Jay Hill
field as a snippet. On Thu, Sep 10, 2009 at 7:45 PM, Jay Hill jayallenh...@gmail.com wrote: Set up the query like this to highlight a field named content: SolrQuery query = new SolrQuery(); query.setQuery(foo); query.setHighlight(true).setHighlightSnippets(1); //set other params

Re: Highlighting in SolrJ?

2009-09-11 Thread Jay Hill
high lighted, even if the search term only occurs in the first line of a 300 page field. I'm not sure if mergeContinuous will do that, or if it will miss everything after the last line that contains the search term. On Fri, Sep 11, 2009 at 10:42 AM, Jay Hill jayallenh...@gmail.com wrote: It's

Re: standard requestHandler components

2009-09-11 Thread Jay Hill
RequestHandlers are configured in solrconfig.xml. If no components are explicitly declared in the request handler config the the defaults are used. They are: - QueryComponent - FacetComponent - MoreLikeThisComponent - HighlightComponent - StatsComponent - DebugComponent If you wanted to have a

Re: Pagination with solr json data

2009-09-10 Thread Jay Hill
All you have to do is use the start and rows parameters to get the results you want. For example, the query for the first page of results might look like this, ?q=solrstart=0rows=10 (other params omitted). So you'll start at the beginning (0) and get 10 results. They next page would be

Re: TermsComponent

2009-09-10 Thread Jay Hill
If you need an alternative to using the TermsComponent for auto-suggest, have a look at this blog on using EdgeNGrams instead of the TermsComponent. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ -Jay http://www.lucidimagination.com On Wed,

Re: Highlighting in SolrJ?

2009-09-10 Thread Jay Hill
Set up the query like this to highlight a field named content: SolrQuery query = new SolrQuery(); query.setQuery(foo); query.setHighlight(true).setHighlightSnippets(1); //set other params as needed query.setParam(hl.fl, content); QueryResponse queryResponse

Re: Sort a Multivalue field

2009-09-09 Thread Jay Hill
Unfortunately you can't sort on a multi-valued field. In order to sort on a field it must be indexed but not multi-valued. Have a look at the FieldOptions wiki page for a good description of what values to set for different use cases: http://wiki.apache.org/solr/FieldOptionsByUseCase -Jay

Re: Field names with whitespaces

2009-08-31 Thread Jay Hill
This seems to work: ?q=field\ name:something Probably not a good idea to have field names with whitespace though. -Jay 2009/8/28 Marcin Kuptel marcinkup...@gmail.com Hi, Is there a way to query solr about fields which names contain whitespaces? Indexing such data does not cause any

Re: MoreLikeThis: How to get quality terms from html from content stream?

2009-08-09 Thread Jay Hill
wrote: On Aug 7, 2009, at 5:23pm, Jay Hill wrote: I'm using the MoreLikeThisHandler with a content stream to get documents from my index that match content from an html page like this: http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi ?f=/c/a/2009/08/06

MoreLikeThis: How to get quality terms from html from content stream?

2009-08-07 Thread Jay Hill
I'm using the MoreLikeThisHandler with a content stream to get documents from my index that match content from an html page like this: http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/08/06/SP5R194Q13.DTLmlt.fl=bodyrows=4debugQuery=true But, not

Re: DIH: Any way to make update on db table?

2009-08-04 Thread Jay Hill
affected and not a resultSet. DIH expects one and hence the exception. Cheers Avlesh On Tue, Aug 4, 2009 at 1:49 AM, Jay Hill jayallenh...@gmail.com wrote: Is it possible for the DataImportHandler to update records in the table it is querying? For example, say I have

DIH: Any way to make update on db table?

2009-08-03 Thread Jay Hill
Is it possible for the DataImportHandler to update records in the table it is querying? For example, say I have a query like this in my entity: query=select field1, field2, from someTable where hasBeenIndexed=false Is there a way I can mark each record processed by updating the hasBeenIndexed

Re: How can i get lucene index format version information?

2009-07-30 Thread Jay Hill
Check the system request handler: http://localhost:8983/solr/admin/system Should look something like this: lst name=lucene str name=solr-spec-version1.3.0.2009.07.28.10.39.42/str str name=solr-impl-version1.4-dev 797693M - jayhill - 2009-07-28 10:39:42/str str name=lucene-spec-version2.9-dev/str

FieldCollapsing: Two response elements returned?

2009-07-27 Thread Jay Hill
I'm doing some testing with field collapsing, and early results look good. One thing seems odd to me however. I would expect to get back one block of results, but I get two - the first one contains the collapsed results, the second one contains the full non-collapsed results: result name=response

DIH: On import (full or delta) commit=false seems to not take effect

2009-07-15 Thread Jay Hill
I am trying to run full and delta imports with the commit=false option, but it doesn't seem to take effect - after the import a commit always happens no matter what params I send. I've looked at the source and unless I'm missing something it doesn't seem to process the commit param. Here's the

Re: spellcheck with misspelled words in index

2009-07-15 Thread Jay Hill
We had the same thing to deal with recently, and a great solution was posted to the list. Create a stopwords filter on the field your using for your spell checking, and then populate a custom stopwords file with known misspelled words: fieldType name=textSpell class=solr.TextField

Re: DIH: On import (full or delta) commit=false seems to not take effect

2009-07-15 Thread Jay Hill
My bad, I had a configuration setting overriding this value. Sorry for the mistake. -Jay On Wed, Jul 15, 2009 at 12:07 PM, Jay Hill jayallenh...@gmail.com wrote: I am trying to run full and delta imports with the commit=false option, but it doesn't seem to take effect - after the import

Re: DIH: On import (full or delta) commit=false seems to not take effect

2009-07-15 Thread Jay Hill
Actually, my good after all. The parameter does not take effect. If commit=false is passed in a commit still happens. Will open and JIRA and supply a patch shortly. -Jay On Wed, Jul 15, 2009 at 5:50 PM, Jay Hill jayallenh...@gmail.com wrote: My bad, I had a configuration setting overriding

Spell checking: Is there a way to exclude words known to be wrong?

2009-07-13 Thread Jay Hill
We're building a spell index from a field in our main index with the following configuration: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str

Re: Creating DataSource for DIH to Oracle Database

2009-07-09 Thread Jay Hill
Francis, your question is a little vague. Are you looking for the configuration for connecting the DIH to a JNDI datasource set up in Weblogic? dataSource name=dsDb jndiName=java:comp/env/jdbc/myWeblogicDatasource type=JdbcDataSource user=/ -Jay On Mon, Jul 6,

Re: about defaultSearchField

2009-07-08 Thread Jay Hill
Just to be sure: You mentioned that you adjusted schema.xml - did you re-index after making your changes? -Jay On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin beckl...@gmail.com wrote: Thanks for your reply. But it works not. Yang 2009/7/8 Yao Ge yao...@gmail.com Try with fl=* or fl=*,score

Re: Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread Jay Hill
I haven't tried this myself, but it sounds like what you're looking for is enabling remote streaming: http://wiki.apache.org/solr/ContentStream#head-7179a128a2fdd5dde6b1af553ed41735402aadbf As the link above shows you should be able to enable remote streaming like this: requestParsers

Re: Indexing XML

2009-07-07 Thread Jay Hill
Mathieu, have a look at Solr's DataImportHandler. It provides a configuration-based approach to index different types of datasources including relational databases and XML files. In particular have a look at the XpathEntityProcessor (

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
Thanks Noble, I gave those examples a try. If I use field column=body xpath=/book/body/chapter/p / I only get the text from the last p element, not from all elements. If I use field column=body xpath=/book/body/chapter flatten=true/ or field column=body xpath=/book/body/chapter/ flatten=true/ I

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
It is not multivalued. The intention is to get all text under they body element into one body field in the index that is not multivalued. Essentially everything within the body element minus the markup. Thanks, -Jay On Thu, Jul 2, 2009 at 8:55 AM, Fergus McMenemie fer...@twig.me.uk wrote:

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
I'm on the trunk, built on July 2: 1.4-dev 789506 Thanks, -Jay On Thu, Jul 2, 2009 at 11:33 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Jul 2, 2009 at 11:38 PM, Mark Miller markrmil...@gmail.com wrote: Shalin Shekhar Mangar wrote: It selects all matching nodes.

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
Thanks Fergus, setting the field to multivalued did work: field column=body xpath=/book/body/chapter/p flatten=true/ gets all the p elements as multivalue fields in the body field. The only thing is, the body field is used by some other content sources, so I have to look at the implications

DIH: Distributing docs to more than one Solr instance

2009-07-01 Thread Jay Hill
I'm using the DIH to index records from a relational database. No problems, everything works great. But now, due to the size of index (70GB w/ 25M+ docs) I need to shard and want the DIH to distribute documents evenly between two shards. Current approach is to modify the sql query in the config

DIH: Limited xpath syntax unable to parse all xml elements

2009-07-01 Thread Jay Hill
I'm using the XPathEntityProcessor to parse an xml structure that looks like this: book authorJoe Smith/author titleWorld Atlas/title body chapter pContent I want is here/p pMore content I want is here./p pStill more content here./p

PlainTextEntitiyProcessor not putting any text into a field in index

2009-06-18 Thread Jay Hill
I'm having some trouble getting the PlainTextEntityProcessor to populate a field in an index. I'm using the TemplateTransformer to fill 2 fields, and have a timestamp field in schema.xml, and these fields make it into the index. Only the plaintText data is missing. Here is my configuration:

Re: query issue /special character and case

2009-06-08 Thread Jay Hill
Regarding being able to search SCHOLKOPF (o with no umlaut) and match SCHÖLKOPF (with umlaut) try using the ISOLatin1AccentFilterFactory in your analysis chain: filter class=solr.ISOLatin1AccentFilterFactory / This filter removes accented chars and replaces them with non-accented

Re: Query faceting

2009-06-08 Thread Jay Hill
In order to get the the values you want for the service field you will need to change the fieldType definition in schema.xml for service to use something that doesn't alter your original values. Try the string fieldType to start and look at the fieldType definition for string. I'm guessing you

Re: Highlighting and Field options

2009-06-01 Thread Jay Hill
Use the fl param to ask for only the fields you need, but also keep hl=true. Something like this: http://localhost:8080/solr/select/?q=bearversion=2.2start=0rows=10indent=onhl=truefl=id Note that fl=id means the only field returned in the XML will be the id field. Highlights are still returned

Re: Question about field types and querying

2009-05-28 Thread Jay Hill
Try using the admin analysis tool (http://host:port/solr/admin/analysis.jsp) too see what the analysis chain is doing to your query. Enter the field name (question in your case) and the Field value (Index) customize (since that's what's in the document). For Field value (Query) enter customer.

  1   2   >