Re: Solr http post performance seems slow - help?

2009-09-24 Thread Lance Norskog
-- Dan A. Dickey | Senior Software Engineer Savvis 10900 Hampshire Ave. S., Bloomington, MN  55438 Office: 952.852.4803 | Fax: 952.852.4951 E-mail: dan.dic...@savvis.net -- Lance Norskog goks...@gmail.com

Re: solr caching problem

2009-09-24 Thread Lance Norskog
-- Lance Norskog goks...@gmail.com

Re: Sorting/paging problem

2009-09-24 Thread Lance Norskog
anyone else run into a problem like this? I'm using the Sept 22 nightly build. - Charlie -- Lance Norskog goks...@gmail.com

Re: Regular expression not working

2009-09-28 Thread Lance Norskog
result is Gilmore Girls If I search on Gilmore, it gives me result Gilmore Girls in the output as desired. However, if I search on string gilmore* or gilm , it does not work whereas we want it to work. Any help highly appreciated. Thanks! -- Lance Norskog goks...@gmail.com

Re: Writing optimized index to different storage?

2009-09-28 Thread Lance Norskog
-- Lance Norskog goks...@gmail.com

Re: Regular expression not working

2009-09-28 Thread Lance Norskog
attributes case insensitive while building an index... I am trying to research on it... Do you got any pointer? Thanks... On Mon, Sep 28, 2009 at 2:29 PM, Lance Norskog goks...@gmail.com wrote: Wildcards don't really get processed like other queries - Gilmore* will work. On Mon, Sep 28

Re: Why isn't the DateField implementation of ISO 8601 broader?

2009-10-01 Thread Lance Norskog
My question is why isn't the DateField implementation of ISO 8601 broader so that it could include and MM as acceptable date strings? What would it take to do so? Nobody ever cared? But yes, you're right, the spurious precision is annoying. However, there is no fuzzy search for

Re: trie fields and sortMissingLast

2009-10-01 Thread Lance Norskog
show that they don't).  If not, is there any plan for adding it in? Regards, Steve -- Lance Norskog goks...@gmail.com

Re: ExtractingRequestHandler unknown field 'stream_source_info'

2009-10-01 Thread Lance Norskog
-search-server-2 gave me more information.  The LucidImagination article helps too. Now that the wiki is up again it is more obvious that I need to add: str name=fmap.contentfulltext/str str name=defaultFieldtext/str to my solrconfig.xml Tricia -- Lance Norskog goks...@gmail.com

Re: index size before and after commit

2009-10-01 Thread Lance Norskog
where optimization can take more than 2x?  I've heard of cases but have not observed them in my system. I seem to recall a case where it can be 3x, but I don't know that it has been observed much. -- - Mark http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com

Re: index size before and after commit

2009-10-01 Thread Lance Norskog
segments, and I have no idea how this will translate to disk space. To minimize disk space, you could run it repetitively with the number of segments decreasing to one. On Thu, Oct 1, 2009 at 11:49 AM, Lance Norskog goks...@gmail.com wrote: I've heard there is a new partial optimize feature

Google Side-By-Side UI

2009-10-02 Thread Lance Norskog
http://googleenterprise.blogspot.com/2009/08/compare-enterprise-search-relevance.html This is really cool, and a version for Solr would help in doing relevance experiments. We don't need the select A or B feature, just seeing search result sets side-by-side would be great. -- Lance Norskog goks

Re: conditional sorting

2009-10-02 Thread Lance Norskog
by popularity. Does anyone know if there is a way to do that with a single query, or I'll have to send another query with desired sort criterion after I inspect number of hits on my client? Thx -- Lance Norskog goks...@gmail.com

Re: Specifying all except field in field list?

2009-10-02 Thread Lance Norskog
field is not needed and is likely to be really large so the queries will be much faster if it isn't returned. Thanks, Paul -- Lance Norskog goks...@gmail.com

Re: Specifying all except field in field list?

2009-10-02 Thread Lance Norskog
to get that info. Lance Norskog wrote: No, there is only list of fields, star, and score.  You can choose to index it and not store it, and then have your application fetch it from the original data store. This is a common system design pattern to avoid storing giant text blobs in the index

Re: Garbled data in response - reading from mySQL database

2009-10-03 Thread Lance Norskog
-data-in-response---reading-from-mySQL-database-tp25726655p25726976.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com

DataImportHandler problem: Feeding the XPathEntityProcessor with the FieldReaderDataSource

2009-10-05 Thread Lance Norskog
I've added a unit test for the problem down below. It feeds document field data into the XPathEntityProcessor via the FieldReaderDataSource, and the XPath EP does not emit unpacked fields. Running this under the debugger, I can see the supplied StringReader, with the XML string, being piped into

Re: Solr Timeouts

2009-10-06 Thread Lance Norskog
[]) org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run() java.lang.Thread.run() -- Lance Norskog goks...@gmail.com

Re: DataImportHandler problem: Feeding the XPathEntityProcessor with the FieldReaderDataSource

2009-10-06 Thread Lance Norskog
A side note that might help: if I change the dataField from 'db.blob' to 'blob', this DIH stack emits no documents. On 10/5/09, Lance Norskog goks...@gmail.com wrote: I've added a unit test for the problem down below. It feeds document field data into the XPathEntityProcessor via

Re: How much disk space does optimize really take

2009-10-07 Thread Lance Norskog
the same problem and it never took up more than 2x. If your index disks are really bursting at the seams, you could try creating an empty index on a separate disk and merging your large index into that index. The resulting index will be mostly optimized. Lance Norskog * in solrconfig.xml

Re: manage rights

2009-10-07 Thread Lance Norskog
this configuration has to be done? Thanks -- View this message in context: http://www.nabble.com/manage-rights-tp25784152p25784152.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com

Re: solr reporting tool adapter

2009-10-07 Thread Lance Norskog
of a reporting tool which can hook into Solr for creating such things. -- Regards, Shalin Shekhar Mangar. -- Lance Norskog goks...@gmail.com

Re: Help with denormalizing issues

2009-10-07 Thread Lance Norskog
it would be very helpful. Thanks, Eric -- Lance Norskog goks...@gmail.com

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Lance Norskog
Wow! That's great. And it's a lot of work, especially getting it all keyboard-complete. Thank you. On 03/14/2013 01:29 AM, Chantal Ackermann wrote: Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic

Re: Flow Chart of Solr

2013-04-07 Thread Lance Norskog
Seconded. Single-stepping really is the best way to follow the logic chains and see how the data mutates. On 04/05/2013 06:36 AM, Erick Erickson wrote: Then there's my lazy method. Fire up the IDE and find a test case that looks close to something you want to understand further. Step through

Re: Spatial search question

2013-04-12 Thread Lance Norskog
Outer distance AND NOT inner distance? On 04/12/2013 09:02 AM, kfdroid wrote: We currently do a radius search from a given Lat/Long point and it works great. I have a new requirement to do a search on a larger radius from the same point, but not include the smaller radius. Kind of a donut

Re: SolrCloud vs Solr master-slave replication

2013-04-18 Thread Lance Norskog
Run checksums on all files in both master and slave, and verify that they are the same. TCP/IP has a checksum algorithm that was state-of-the-art in 1969. On 04/18/2013 02:10 AM, Victor Ruiz wrote: Also, I forgot to say... the same error started to happen again.. the index is again corrupted

Re: Why is SolrCloud doing a full copy of the index?

2013-05-04 Thread Lance Norskog
Great! Thank you very much Shawn. On 05/04/2013 10:55 AM, Shawn Heisey wrote: On 5/4/2013 11:45 AM, Shawn Heisey wrote: Advance warning: this is a long reply. I have condensed some relevant performance problem information into the following wiki page:

Re: SOLR guidance required

2013-05-13 Thread Lance Norskog
If this is for the US, remove the age range feature before you get sued. On 05/09/2013 08:41 PM, Kamal Palei wrote: Dear SOLR experts I might be asking a very silly question. As I am new to SOLR kindly guide me. I have a job site. Using SOLR to search resumes. When a HR user enters some

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-17 Thread Lance Norskog
This is great; data like this is rare. Can you tell us any hardware or throughput numbers? On 05/17/2013 12:29 PM, Rishi Easwaran wrote: Hi All, Its Friday 3:00pm, warm sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our

Re: Regular expression in solr

2013-05-22 Thread Lance Norskog
If the indexed data includes positions, it should be possible to implement ^ and $ as the first and last positions. On 05/22/2013 04:08 AM, Oussama Jilal wrote: There is no ^ or $ in the solr regex since the regular expression will match tokens (not the complete indexed text). So the results

Re: OPENNLP problems

2013-05-30 Thread Lance Norskog
I will look at these problems. Thanks for trying it out! Lance Norskog On 05/28/2013 10:08 PM, Patrick Mi wrote: Hi there, Checked out branch_4x and applied the latest patch LUCENE-2899-current.patch however I ran into 2 problems Followed the wiki page instruction and set up a field

Re: Dynamic Indexing using DB and DIH

2013-06-02 Thread Lance Norskog
Let's assume that the Solr record includes the database record's timestamp field.You can make a more complex DIH stack that does a Solr query with the SolrEntityProcessor. You can do a query that gets the most recent timestamp in the index, and then use that in the DB update command. On

Re: Shard Keys and Distributed Search

2013-06-02 Thread Lance Norskog
Distributed search does the actual search twice: once to get the scores and again to fetch the documents with the top N scores. This algorithm does not play well with deep searches. On 06/02/2013 07:32 PM, Niran Fajemisin wrote: Thanks Daniel. That's exactly what I thought as well. I did try

Re: OPENNLP problems

2013-06-05 Thread Lance Norskog
Patrick- I found the problem with multiple documents. The problem was that the API for the life cycle of a Tokenizer changed, and I only noticed part of the change. You can now upload multiple documents in one post, and the OpenNLPTokenizer will process each document. You're right, the

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog
, Patrick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Thursday, 6 June 2013 5:16 p.m. To: solr-user@lucene.apache.org Subject: Re: OPENNLP problems Patrick- I found the problem with multiple documents. The problem was that the API for the life cycle of a Tokenizer

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog
patch LUCENE-2899-x.patch uploaded on 6th June but still had the same problem. Regards, Patrick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Thursday, 6 June 2013 5:16 p.m. To: solr-user@lucene.apache.org Subject: Re: OPENNLP problems Patrick- I found

Re: SOLR-4872 and LUCENE-2145 (or, how to clean up a Tokenizer)

2013-06-12 Thread Lance Norskog
In 4.x and trunk is a close() method on Tokenizers and Filters. In currently released up to 4.3, there is instead a reset(stream) method which is how it resets a TokenizerFilter for a following document in the same upload. In both cases I had to track the first time the tokens are consumed,

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Lance Norskog
No, they just learned a few features and then stopped because it was good enough, and they had a thousand other things to code. As to REST- yes, it is worth having a coherent API. Solr is behind the curve here. Look at the HATEOS paradigm. It's ornate (and a really goofy name) but it provides

Re: Best way to match umlauts

2013-06-16 Thread Lance Norskog
One small thing: German u-umlaut is often flattened as 'ue' instead of 'u'. And the same with o-umlaut, it can be 'oe' or 'o'. I don't know if Lucene has a good solution for this problem. On 06/16/2013 06:44 AM, adityab wrote: Thanks for the explanation Steve. I now see it clearly. In my case

Does SolrCloud require matching configuration files?

2013-06-22 Thread Lance Norskog
Accumulo is a BigTable/Cassandra style distributed database. It is now an Apache Incubator project. In the README we find this gem: Synchronize your accumulo conf directory across the cluster. As a precaution against mis-configured systems, servers using different configuration files will not

Re: Http status 503 Error in solr cloud setup

2013-06-29 Thread Lance Norskog
I do not know what causes the error. This setup will not work. You need one or three zookeepers. SolrCloud demands that a majority of the ZK servers agree. If you have two ZKs this will not work. On 06/29/2013 05:47 AM, Sagar Chaturvedi wrote: Hi, I setup 2 solr instances on 2 different

Re: Varnish

2013-06-29 Thread Lance Norskog
Solr HTTP caching also support e-tags. These are unique keys for the output of a query. If you send a query twice, and the index has not changed, the return will be the same. The e-tag is generated from the query string and the index generation number. If Varnish supports e-tags, you can keep

Re: Distributed search results in SocketException: Connection reset

2013-06-30 Thread Lance Norskog
This usually means the end server timed out. On 06/30/2013 06:31 AM, Shahar Davidson wrote: Hi all, We're getting the below exception sporadically when using distributed search. (using Solr 4.2.1) Note that 'core_3' is one of the cores mentioned in the 'shards' parameter. Any ideas anyone?

Re: getting different search results for words with same meaning in Japanese language

2013-06-30 Thread Lance Norskog
The MappingCharFilter allows you to map both characters to one characters. If you do this during indexing and querying, searching with one should find the other. This is sort of like synonyms, but on a character-by-character basis. Lance On 06/18/2013 11:08 PM, Yash Sharma wrote: Hi, we have

Re: Solr limitations

2013-07-10 Thread Lance Norskog
Also, total index file size. At 200-300gb managing an index becomes a pain. Lance On 07/08/2013 07:28 AM, Jack Krupansky wrote: Other that the per-node/per-collection limit of 2 billion documents per Lucene index, most of the limits of Solr are performance-based limits - Solr can handle it,

Re: Norms

2013-07-12 Thread Lance Norskog
Norms stay in the index even if you delete all of the data. If you just changed the schema, emptied the index, and tested again, you've still got norms in there. You can examine the index with Luke to verify this. On 07/09/2013 08:57 PM, William Bell wrote: I have a field that has

Re: JVM Crashed - SOLR deployed in Tomcat

2013-07-16 Thread Lance Norskog
I don't know about jvm crashes, but it is known that the Java 6 jvm had various problems supporting Solr, including the 20-30 series. A lot of people use the final jvm release (I think 6_30). On 07/16/2013 12:25 PM, neoman wrote: Hello Everyone, We are using solrcloud with Tomcat in our

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Lance Norskog
Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).

Re: adding date column to the index

2013-07-22 Thread Lance Norskog
Solr/Lucene does not automatically add when asked, the way DBMS systems do. Instead, all data for a field is added at the same time. To get the new field, you have to reload all of your data. This is also true for deleting fields. If you remove a field, that data does not go away until you

Re: Percolate feature?

2013-08-05 Thread Lance Norskog
Cool! On 08/05/2013 03:34 AM, Charlie Hull wrote: On 03/08/2013 00:50, Mark wrote: We have a set number of known terms we want to match against. In Index: term one term two term three I know how to match all terms of a user query against the index but we would like to know how/if we can

Re: Document Similarity Algorithm at Solr/Lucene

2013-08-07 Thread Lance Norskog
Block-quoting and plagiarism are two different questions. Block-quoting is simple: break the text apart into sentences or even paragraphs and make them separate documents. Make facets of the post-analysis text. Now just pull counts of facets and block quotes will be clear. Mahout has a

Re: How to SOLR file in svn repository

2013-08-22 Thread Lance Norskog
You need to: 1) crawl the SVN database 2) index the files 3) make a UI that fetches the original file when you click on a search results. Solr only has #2. If you run a subversion web browser app, you can download the developer-only version of the LucidWorks product and crawl the SVN web

Re: SOLR Prevent solr of modifying fields when update doc

2013-08-23 Thread Lance Norskog
Solr does not by default generate unique IDs. It uses what you give as your unique field, usually called 'id'. What software do you use to index data from your RSS feeds? Maybe that is creating a new 'id' field? There is no partial update, Solr (Lucene) always rewrites the complete

Re: Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-10 Thread Lance Norskog
Yes, Solr/Lucene works fine with other indexes this large. There are many indexes with hundreds of gigabytes and hundreds of millions of documents. My experience years ago was that at this scale, searching worked great, sorting facets less so, and the real problem was IT: a 200G blob of data

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Lance Norskog
On 10/13/2013 10:02 AM, Shawn Heisey wrote: On 10/13/2013 10:16 AM, Josh Lincoln wrote: I have a large solr response in xml format and would like to import it into a new solr collection. I'm able to use DIH with solrEntityProcessor, but only if I first truncate the file to a small subset of the

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Lance Norskog
the solr result format while using the xpathentityprocessor (i.e. a useSolrResultSchema option) Any other ideas? On Mon, Oct 14, 2013 at 6:24 PM, Lance Norskog goks...@gmail.com wrote: On 10/13/2013 10:02 AM, Shawn Heisey wrote: On 10/13/2013 10:16 AM, Josh Lincoln wrote: I have

Re: SOLR: Searching on OpenNLP fields is unstable

2013-10-20 Thread Lance Norskog
, it is working properly, results are stable and correct. Please help me to make solr results consistent. Thanks in Advance. -- Lance Norskog goks...@gmail.com

Re: SolrCloud unstable

2013-11-24 Thread Lance Norskog
Yes, you should use a recent Java 7. Java 6 is end-of-life and no longer supported by Oracle. Also, read up on the various garbage collectors. It is a complex topic and there are many guides online. In particular there is a problem in some Java 6 releases that causes a massive memory leak in

Re: need help on OpenNLP with Solr

2014-01-09 Thread Lance Norskog
. How can i use payloads for boosting? What are the changes required in schema.xml? Please provide me some pointers to move ahead Thanks in advance -- Lance Norskog goks...@gmail.com

<    9   10   11   12   13   14