Re: need help on OpenNLP with Solr

2014-01-09 Thread Lance Norskog
. How can i use payloads for boosting? What are the changes required in schema.xml? Please provide me some pointers to move ahead Thanks in advance -- Lance Norskog goks...@gmail.com

Re: SolrCloud unstable

2013-11-24 Thread Lance Norskog
Yes, you should use a recent Java 7. Java 6 is end-of-life and no longer supported by Oracle. Also, read up on the various garbage collectors. It is a complex topic and there are many guides online. In particular there is a problem in some Java 6 releases that causes a massive memory leak in

Re: SOLR: Searching on OpenNLP fields is unstable

2013-10-20 Thread Lance Norskog
, it is working properly, results are stable and correct. Please help me to make solr results consistent. Thanks in Advance. -- Lance Norskog goks...@gmail.com

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Lance Norskog
On 10/13/2013 10:02 AM, Shawn Heisey wrote: On 10/13/2013 10:16 AM, Josh Lincoln wrote: I have a large solr response in xml format and would like to import it into a new solr collection. I'm able to use DIH with solrEntityProcessor, but only if I first truncate the file to a small subset of the

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Lance Norskog
the solr result format while using the xpathentityprocessor (i.e. a useSolrResultSchema option) Any other ideas? On Mon, Oct 14, 2013 at 6:24 PM, Lance Norskog goks...@gmail.com wrote: On 10/13/2013 10:02 AM, Shawn Heisey wrote: On 10/13/2013 10:16 AM, Josh Lincoln wrote: I have

Re: Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-10 Thread Lance Norskog
Yes, Solr/Lucene works fine with other indexes this large. There are many indexes with hundreds of gigabytes and hundreds of millions of documents. My experience years ago was that at this scale, searching worked great, sorting facets less so, and the real problem was IT: a 200G blob of data

Re: SOLR Prevent solr of modifying fields when update doc

2013-08-23 Thread Lance Norskog
Solr does not by default generate unique IDs. It uses what you give as your unique field, usually called 'id'. What software do you use to index data from your RSS feeds? Maybe that is creating a new 'id' field? There is no partial update, Solr (Lucene) always rewrites the complete

Re: How to SOLR file in svn repository

2013-08-22 Thread Lance Norskog
You need to: 1) crawl the SVN database 2) index the files 3) make a UI that fetches the original file when you click on a search results. Solr only has #2. If you run a subversion web browser app, you can download the developer-only version of the LucidWorks product and crawl the SVN web

Re: Document Similarity Algorithm at Solr/Lucene

2013-08-07 Thread Lance Norskog
Block-quoting and plagiarism are two different questions. Block-quoting is simple: break the text apart into sentences or even paragraphs and make them separate documents. Make facets of the post-analysis text. Now just pull counts of facets and block quotes will be clear. Mahout has a

Re: Percolate feature?

2013-08-05 Thread Lance Norskog
Cool! On 08/05/2013 03:34 AM, Charlie Hull wrote: On 03/08/2013 00:50, Mark wrote: We have a set number of known terms we want to match against. In Index: term one term two term three I know how to match all terms of a user query against the index but we would like to know how/if we can

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Lance Norskog
Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).

Re: adding date column to the index

2013-07-22 Thread Lance Norskog
Solr/Lucene does not automatically add when asked, the way DBMS systems do. Instead, all data for a field is added at the same time. To get the new field, you have to reload all of your data. This is also true for deleting fields. If you remove a field, that data does not go away until you

Re: JVM Crashed - SOLR deployed in Tomcat

2013-07-16 Thread Lance Norskog
I don't know about jvm crashes, but it is known that the Java 6 jvm had various problems supporting Solr, including the 20-30 series. A lot of people use the final jvm release (I think 6_30). On 07/16/2013 12:25 PM, neoman wrote: Hello Everyone, We are using solrcloud with Tomcat in our

Re: Norms

2013-07-12 Thread Lance Norskog
Norms stay in the index even if you delete all of the data. If you just changed the schema, emptied the index, and tested again, you've still got norms in there. You can examine the index with Luke to verify this. On 07/09/2013 08:57 PM, William Bell wrote: I have a field that has

Re: Solr limitations

2013-07-10 Thread Lance Norskog
Also, total index file size. At 200-300gb managing an index becomes a pain. Lance On 07/08/2013 07:28 AM, Jack Krupansky wrote: Other that the per-node/per-collection limit of 2 billion documents per Lucene index, most of the limits of Solr are performance-based limits - Solr can handle it,

Re: Distributed search results in SocketException: Connection reset

2013-06-30 Thread Lance Norskog
This usually means the end server timed out. On 06/30/2013 06:31 AM, Shahar Davidson wrote: Hi all, We're getting the below exception sporadically when using distributed search. (using Solr 4.2.1) Note that 'core_3' is one of the cores mentioned in the 'shards' parameter. Any ideas anyone?

Re: getting different search results for words with same meaning in Japanese language

2013-06-30 Thread Lance Norskog
The MappingCharFilter allows you to map both characters to one characters. If you do this during indexing and querying, searching with one should find the other. This is sort of like synonyms, but on a character-by-character basis. Lance On 06/18/2013 11:08 PM, Yash Sharma wrote: Hi, we have

Re: Http status 503 Error in solr cloud setup

2013-06-29 Thread Lance Norskog
I do not know what causes the error. This setup will not work. You need one or three zookeepers. SolrCloud demands that a majority of the ZK servers agree. If you have two ZKs this will not work. On 06/29/2013 05:47 AM, Sagar Chaturvedi wrote: Hi, I setup 2 solr instances on 2 different

Re: Varnish

2013-06-29 Thread Lance Norskog
Solr HTTP caching also support e-tags. These are unique keys for the output of a query. If you send a query twice, and the index has not changed, the return will be the same. The e-tag is generated from the query string and the index generation number. If Varnish supports e-tags, you can keep

Does SolrCloud require matching configuration files?

2013-06-22 Thread Lance Norskog
Accumulo is a BigTable/Cassandra style distributed database. It is now an Apache Incubator project. In the README we find this gem: Synchronize your accumulo conf directory across the cluster. As a precaution against mis-configured systems, servers using different configuration files will not

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Lance Norskog
No, they just learned a few features and then stopped because it was good enough, and they had a thousand other things to code. As to REST- yes, it is worth having a coherent API. Solr is behind the curve here. Look at the HATEOS paradigm. It's ornate (and a really goofy name) but it provides

Re: Best way to match umlauts

2013-06-16 Thread Lance Norskog
One small thing: German u-umlaut is often flattened as 'ue' instead of 'u'. And the same with o-umlaut, it can be 'oe' or 'o'. I don't know if Lucene has a good solution for this problem. On 06/16/2013 06:44 AM, adityab wrote: Thanks for the explanation Steve. I now see it clearly. In my case

Re: SOLR-4872 and LUCENE-2145 (or, how to clean up a Tokenizer)

2013-06-12 Thread Lance Norskog
In 4.x and trunk is a close() method on Tokenizers and Filters. In currently released up to 4.3, there is instead a reset(stream) method which is how it resets a TokenizerFilter for a following document in the same upload. In both cases I had to track the first time the tokens are consumed,

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog
, Patrick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Thursday, 6 June 2013 5:16 p.m. To: solr-user@lucene.apache.org Subject: Re: OPENNLP problems Patrick- I found the problem with multiple documents. The problem was that the API for the life cycle of a Tokenizer

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog
patch LUCENE-2899-x.patch uploaded on 6th June but still had the same problem. Regards, Patrick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Thursday, 6 June 2013 5:16 p.m. To: solr-user@lucene.apache.org Subject: Re: OPENNLP problems Patrick- I found

Re: OPENNLP problems

2013-06-05 Thread Lance Norskog
Patrick- I found the problem with multiple documents. The problem was that the API for the life cycle of a Tokenizer changed, and I only noticed part of the change. You can now upload multiple documents in one post, and the OpenNLPTokenizer will process each document. You're right, the

Re: Dynamic Indexing using DB and DIH

2013-06-02 Thread Lance Norskog
Let's assume that the Solr record includes the database record's timestamp field.You can make a more complex DIH stack that does a Solr query with the SolrEntityProcessor. You can do a query that gets the most recent timestamp in the index, and then use that in the DB update command. On

Re: Shard Keys and Distributed Search

2013-06-02 Thread Lance Norskog
Distributed search does the actual search twice: once to get the scores and again to fetch the documents with the top N scores. This algorithm does not play well with deep searches. On 06/02/2013 07:32 PM, Niran Fajemisin wrote: Thanks Daniel. That's exactly what I thought as well. I did try

Re: OPENNLP problems

2013-05-30 Thread Lance Norskog
I will look at these problems. Thanks for trying it out! Lance Norskog On 05/28/2013 10:08 PM, Patrick Mi wrote: Hi there, Checked out branch_4x and applied the latest patch LUCENE-2899-current.patch however I ran into 2 problems Followed the wiki page instruction and set up a field

Re: Regular expression in solr

2013-05-22 Thread Lance Norskog
If the indexed data includes positions, it should be possible to implement ^ and $ as the first and last positions. On 05/22/2013 04:08 AM, Oussama Jilal wrote: There is no ^ or $ in the solr regex since the regular expression will match tokens (not the complete indexed text). So the results

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-17 Thread Lance Norskog
This is great; data like this is rare. Can you tell us any hardware or throughput numbers? On 05/17/2013 12:29 PM, Rishi Easwaran wrote: Hi All, Its Friday 3:00pm, warm sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our

Re: SOLR guidance required

2013-05-13 Thread Lance Norskog
If this is for the US, remove the age range feature before you get sued. On 05/09/2013 08:41 PM, Kamal Palei wrote: Dear SOLR experts I might be asking a very silly question. As I am new to SOLR kindly guide me. I have a job site. Using SOLR to search resumes. When a HR user enters some

Re: Why is SolrCloud doing a full copy of the index?

2013-05-04 Thread Lance Norskog
Great! Thank you very much Shawn. On 05/04/2013 10:55 AM, Shawn Heisey wrote: On 5/4/2013 11:45 AM, Shawn Heisey wrote: Advance warning: this is a long reply. I have condensed some relevant performance problem information into the following wiki page:

Re: SolrCloud vs Solr master-slave replication

2013-04-18 Thread Lance Norskog
Run checksums on all files in both master and slave, and verify that they are the same. TCP/IP has a checksum algorithm that was state-of-the-art in 1969. On 04/18/2013 02:10 AM, Victor Ruiz wrote: Also, I forgot to say... the same error started to happen again.. the index is again corrupted

Re: Spatial search question

2013-04-12 Thread Lance Norskog
Outer distance AND NOT inner distance? On 04/12/2013 09:02 AM, kfdroid wrote: We currently do a radius search from a given Lat/Long point and it works great. I have a new requirement to do a search on a larger radius from the same point, but not include the smaller radius. Kind of a donut

Re: Flow Chart of Solr

2013-04-07 Thread Lance Norskog
Seconded. Single-stepping really is the best way to follow the logic chains and see how the data mutates. On 04/05/2013 06:36 AM, Erick Erickson wrote: Then there's my lazy method. Fire up the IDE and find a test case that looks close to something you want to understand further. Step through

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Lance Norskog
Wow! That's great. And it's a lot of work, especially getting it all keyboard-complete. Thank you. On 03/14/2013 01:29 AM, Chantal Ackermann wrote: Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic

Re: InvalidShapeException when using SpatialRecursivePrefixTreeFieldType with custom worldBounds

2013-03-09 Thread Lance Norskog
Thank you (and Hoss)! I have found this concept elusive, and you two have nailed it. I will be able to understand it for the 5 minutes I will need to code with it. Lance On 03/09/2013 10:57 AM, David Smiley (@MITRE.org) wrote: Just finished:

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Lance Norskog
Yes, the SolrEntityProcessor can be used for this. If you stored the original document bodies in the Solr index! You can also download the documents in Json or CSV format and re-upload those to old Solr. I don't know if CSV will work for your docs. If CSV works, you can directly upload what

Re: Poll: SolrCloud vs. Master-Slave usage

2013-02-25 Thread Lance Norskog
Do you use replication instead, or do you just have one instance? On 02/25/2013 07:55 PM, Otis Gospodnetic wrote: Hi, Quick poll to see what % of Solr users use SolrCloud vs. Master-slave setup: http://blog.sematext.com/2013/02/25/poll-solr-cloud-or-not/ I have to say I'm surprised with the

Re: Benefits of Solr over Lucene?

2013-02-12 Thread Lance Norskog
Lucene and Solr have an aggressive upgrade schedule.From 3 to 4 got a major rewiring, and parts are orders of magnitude faster and smaller. If you code using Lucene, you will never upgrade to newer versions. (I supported SolrLucene customers for 3 years, and nobody ever did.) Cheers, Lance I

Re: Upgrading indexes from Solr 1.4.1 to 4.1.0

2013-02-04 Thread Lance Norskog
A side problem here is text analyzers: the analyzers have changed how they split apart text for searching, and are matched pairs. That is, the analyzer queries are created matching what the analyzer did when indexing. If you do this binary upgrade sequence, the indexed data will not match what

Re: Upgrading indexes from Solr 1.4.1 to 4.1.0

2013-02-04 Thread Lance Norskog
I don't have the source handy. I believe that SolrCloud hard-codes 'id' as the field name for defining shards. On 02/04/2013 10:19 AM, Shawn Heisey wrote: On 2/4/2013 10:58 AM, Lance Norskog wrote: A side problem here is text analyzers: the analyzers have changed how they split apart text

Re: Solr load balancer

2013-01-31 Thread Lance Norskog
It is possible to do this with IP Multicast. The query goes out on the multicast and all query servers read it. The servers wait for a random amount of time, then transmit the answer. Here's the trick: it's multicast. All of the query servers listen to each other's responses, and drop out when

Re: Indexing nouns only - UIMA vs. OpenNLP

2013-01-31 Thread Lance Norskog
Thanks, Kai! About removing non-nouns: the OpenNLP patch includes two simple TokenFilters for manipulating terms with payloads. The FilterPayloadFilter lets you keep or remove terms with given payloads. In the demo schema.xml, there is an example type that keeps only nounsverbs. There is a

Re: Solr 4 slower than Solr 3.x?

2013-01-28 Thread Lance Norskog
For this second report, it's easy: switching from a single query server to a sharded query is going to be slower. Virtual machines add jitter to the performance and response time of the front-end vs the query shards. Distributed search does 2 round-trips for each sharded query. Add these all

Re: RSS tutorial that comes with the apache-solr not indexing

2013-01-14 Thread Lance Norskog
This example may be out of date, if the RSS feeds from Slashdot have changed. If you know XML and XPaths, try this: Find an rss feed from somewhere that works. Compare the xpaths in it v.s. the xpaths in the DIH script. On 01/13/2013 07:38 PM, bibhor wrote: Hi I am trying to use the RSS

Re: Schema Field Names i18n

2013-01-14 Thread Lance Norskog
Will a field have different names in different languages? There is no facility for 'aliases' for field name. Erick is right, this sounds like you need query and update components to implement this. Also, you might try using URL-encoding for the field names. This would save my sanity. On

Re: Index data from multiple tables into Solr

2013-01-14 Thread Lance Norskog
Try all of the links under the collection name in the lower left-hand columns. There several administration monitoring tools you may find useful. On 01/14/2013 11:45 AM, hassancrowdc wrote: ok stats are changing, so the data is indexed. But how can i do query with this data, or ow can i search

Re: DIH fails after processing roughly 10million records

2013-01-09 Thread Lance Norskog
At this scale, your indexing job is prone to break in various ways. If you want this to be reliable, it should be able to restart in the middle of an upload, rather than starting over. On 01/08/2013 10:19 PM, vijeshnair wrote: Yes Shawn, the batchSize is -1 only and I also have the

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Lance Norskog
Also, searching can be much faster if you put all of the shards on one machine, and the search distributor. That way, you search with multiple simultaneous threads inside one machine. I've seen this make searches several times faster. On 01/03/2013 06:36 AM, Jack Krupansky wrote: Ah... the

Re: Upgrading from 3.6 to 4.0

2013-01-03 Thread Lance Norskog
Please start new mail threads for new questions. This makes it much easier to research old mail threads. Old mail is often the only documentation for some problems. On 01/02/2013 10:04 AM, Benjamin, Roy wrote: Will the existing 3.6 indexes work with 4.0 binary ? Will 3.6 solrJ clients work

What is group.query?

2013-01-03 Thread Lance Norskog
What does group.query do? How is it different from q= and fq= ? Thanks.

Re: Upgrading from 3.6 to 4.0

2013-01-02 Thread Lance Norskog
Indexes will not work. I have not heard of an index upgrader. If you run your 3.6 and new 4.0 Solr at the same time, you can upload all the data with a DataImportHandler script using the SolrEntityProcessor. How large are your indexes? 4.1 indexes will not match 4.0, so you will have to

Re: Viewing the Solr MoinMoin wiki offline

2013-01-01 Thread Lance Norskog
3 problems: a- he wanted to read it locally. b- crawling the open web is imperfect. c- /browse needs to get at the files with the same URL as the uploader. a and b- Try downloading the whole thing with 'wget'. It has a 'make links point to the downloaded files' option. Wget is great. I have

Re: [DIH] Script Transformer: Is there a way to import js file?

2012-12-26 Thread Lance Norskog
Maybe you could write a Javascript snippet that downloads and runs your external file? On 12/26/2012 09:12 AM, Dyer, James wrote: I'm not very familiar with using scipting langauges with Java, but having seen the DIH code for this, my guess is that all script code needs to be in the script /

Re: [ANNOUNCE] Apache Solr 3.6.2 released

2012-12-26 Thread Lance Norskog
Cool! On 12/25/2012 08:03 AM, Robert Muir wrote: 25 December 2012, Apache Solr™ 3.6.2 available The Lucene PMC and Santa Claus are pleased to announce the release of Apache Solr 3.6.2. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its

Re: Converting fq params to Filter object

2012-12-26 Thread Lance Norskog
A Solr facet query does a boolean query, caches the Lucene facet data structure, and uses it as a Lucene filter. After that until you do a full commit, using the same fq=string (you must match the string exactly) fetches the cached data structure and uses it again as a Lucene filter. Have

Re: multi field query with selective results

2012-12-23 Thread Lance Norskog
? On Sunday, December 23, 2012, Lance Norskog wrote: Please start a new thread. Thanks! On 12/22/2012 11:03 AM, J Mohamed Zahoor wrote: Hi I have a word completion requirement where i need to pick result from two indexed fields. The trick is i need to pick top 5 results from each field

Re: multi field query with selective results

2012-12-22 Thread Lance Norskog
Please start a new thread. Thanks! On 12/22/2012 11:03 AM, J Mohamed Zahoor wrote: Hi I have a word completion requirement where i need to pick result from two indexed fields. The trick is i need to pick top 5 results from each field and display as suggestions. If i set fq as field1:XXX

Re: Finding the last committed record in SOLR 4

2012-12-21 Thread Lance Norskog
The only sure way to get the last searchable document is to use a timestamp or sequence number in the document. I do not think that using a timestamp with default=NOW will give a unique timestamp, so you need your own sequence number. On 12/19/2012 10:17 PM, Joe wrote: I'm using SOLR 4 for

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Lance Norskog
To be clear: 1) is fine. Lucene index updates are carefully sequenced so that the index is never in a bogus state. All data files are written and flushed to disk, then the segments.* files are written that match the data files. You can capture the files with a set of hard links to create a

Re: optimun precisionStep for DAY granularity in a TrieDateField

2012-12-14 Thread Lance Norskog
Do you use rounding in your dates? You can index a date rounded to the nearest minute, N minutes, hour or day. This way a range query has to look at such a small number of terms that you may not need to tune the precision step. Hunt for NOW/DAY or 5DAYS in the queries.

Re: Modeling openinghours using multipoints

2012-12-10 Thread Lance Norskog
to build, save, and query the bitmap whereas working on top of existing functionality seems to me a lot more maintainable on the user's part. ~ David From: Lance Norskog-2 [via Lucene] [ml-node+s472066n4025579...@n3.nabble.com] Sent: Sunday, December 09, 2012 6:35 PM

Re: Modeling openinghours using multipoints

2012-12-09 Thread Lance Norskog
.nabble.com/Modeling-openinghours-using-multipoints-tp4025336p4025454.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com

Re: Downloading files from the solr replication Handler

2012-11-29 Thread Lance Norskog
Maybe these are text encoding markers? - Original Message - | From: Eva Lacy e...@lacy.ie | To: solr-user@lucene.apache.org | Sent: Thursday, November 29, 2012 3:53:07 AM | Subject: Re: Downloading files from the solr replication Handler | | I tried downloading them with my browser and

Re: User context based search in apache solr

2012-11-24 Thread Lance Norskog
sagarzond- you are trying to embed a recommendation system into search. Recommendations are inherently a matrix problem, where Solr and other search engines are one-dimensional databases. What you have is a sparse user-product matrix. This book has a good explanation of recommender systems:

Re: configuring solr xml as a datasource

2012-11-24 Thread Lance Norskog
You don't need the transformers. I think the paths should be what is in the XML file. forEach=/add And the paths need to use the syntax for name=fname and name=number. I think this is it, but you should make sure. xpath=/add/doc/field[@name='fname'] xpath=/add/doc/field[@name='number'] Look

Re: User context based search in apache solr

2012-11-24 Thread Lance Norskog
- http://sematext.com/spm/index.html | Search Analytics - http://sematext.com/search-analytics/index.html | | | | | On Sat, Nov 24, 2012 at 9:30 PM, Lance Norskog goks...@gmail.com | wrote: | | sagarzond- you are trying to embed a recommendation system into | search. | Recommendations

Re: Solr Delta Import Handler not working

2012-11-19 Thread Lance Norskog
| dataSource=null I think this should not be here. The datasource should default to the dataSource listing. And 'rootEntity=true' should be in the XPathEntityProcessor block, because you are adding each file as one document. - Original Message - | From: Spadez

Re: Solr Delta Import Handler not working

2012-11-17 Thread Lance Norskog
I think this means the pattern did not match any files: str name=Total Rows Fetched0/str The wiki example includes a '^' at the beginning of the filename pattern. This matches a complete line. http://wiki.apache.org/solr/DataImportHandler#Transformers_Example More: Add rootEntity=true. It

Re: More references for configuring Solr

2012-11-11 Thread Lance Norskog
LucidFind collects several sources of information in one searchable archive: http://find.searchhub.org/?q=sort=#%2Fp%3Asolr - Original Message - | From: Dmitry Kan dmitry@gmail.com | To: solr-user@lucene.apache.org | Sent: Sunday, November 11, 2012 2:24:21 AM | Subject: Re: More

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-07 Thread Lance Norskog
You can debug this with the 'Analysis' page in the Solr UI. You pick 'text_general' and then give words with umlauts in the text box for indexing and queries. Lance - Original Message - | From: Daniel Brügge daniel.brue...@googlemail.com | To: solr-user@lucene.apache.org | Sent:

Re: Where to get more documents or references about sold cloud?

2012-11-06 Thread Lance Norskog
LucidFind is a searchable archive of Solr documentation and email lists: http://find.searchhub.org/?q=solrcloud - Original Message - | From: Jack Krupansky j...@basetechnology.com | To: solr-user@lucene.apache.org | Sent: Monday, November 5, 2012 4:44:46 AM | Subject: Re: Where to get

Re: Does SolrCloud supports MoreLikeThis?

2012-11-06 Thread Lance Norskog
The question you meant to ask is: Does MoreLikeThis support Distributed Search? and the answer apparently is no. This is the issue to get it working: https://issues.apache.org/jira/browse/SOLR-788 (Distributed Search is independent of SolrCloud.) If you want to make unit tests, that would

Re: After adding field to schema, the field is not being returned in results.

2012-11-02 Thread Lance Norskog
an post that and/or include it in your sample XML | file... | | Best | Erick | | | On Fri, Nov 2, 2012 at 10:02 AM, Dotan Cohen dotanco...@gmail.com | wrote: | | On Thu, Nov 1, 2012 at 9:28 PM, Lance Norskog goks...@gmail.com | wrote: | Have you uploaded data with that field populated? Solr

Re: After adding field to schema, the field is not being returned in results.

2012-11-01 Thread Lance Norskog
Have you uploaded data with that field populated? Solr is not like a relational database. It does not automatically populate a new field when you add it to the schema. If you sort on a field, a document with no data in that field comes first or last (I don't know which). - Original

Re: throttle segment merging

2012-10-28 Thread Lance Norskog
1) Do you use compound files (CFS)? This adds a lot of overhead to merging. 2) Does ES use the same merge policy code as Solr? In solrconfig.xml, here are the lines that control segment merging. You can probably set mergeFactor to 20 and cut the amount of disk I/O. !-- Expert: Merge Policy

Re: Get metadata for query

2012-10-27 Thread Lance Norskog
understand the real question here. What is the | metadata. | | I mean, q=xfl=* gives you all the (stored) fields for documents | matching | the query. | | What else is there? | | -- Jack Krupansky | | -Original Message- | From: Lance Norskog | Sent: Friday, October 26, 2012 9:42 PM

Re: Get metadata for query

2012-10-27 Thread Lance Norskog
. | | Erik | | | On Oct 27, 2012, at 04:09 , Lance Norskog wrote: | | Nope! Each document comes back with its own list of stored fields. | If you want to find all fields in an index, you have to fetch | every last document and OR in the fields in that document. There | is no Solr call

Re: lukeall.jar for Solr4r?

2012-10-27 Thread Lance Norskog
Aha! Andrzej has not built a 4.0 release version. You need to check out the source and compile your own. http://code.google.com/p/luke/downloads/list - Original Message - | From: Carrie Coy c...@ssww.com | To: solr-user@lucene.apache.org | Sent: Friday, October 26, 2012 7:33:45 AM |

Re: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities

2012-10-26 Thread Lance Norskog
/browse/SOLR-2141) which goes back to | October 2010 and is flagged as Resolved: Cannot Reproduce. | | | 2012/10/20 Lance Norskog goks...@gmail.com: | If it worked before and does not work now, I don't think you are | doing anything wrong :) | | Do you have a different version of your JDBC driver

Re: Get metadata for query

2012-10-26 Thread Lance Norskog
Ah, there's the problem- what is a fast way to fetch all fields in a collection, including dynamic fields? - Original Message - | From: Otis Gospodnetic otis.gospodne...@gmail.com | To: solr-user@lucene.apache.org | Sent: Friday, October 26, 2012 3:05:04 PM | Subject: Re: Get metadata

Re: Search and Entity structure

2012-10-26 Thread Lance Norskog
A side point: in fact, the connection between MBA and grade is not lost. The values in a multi-valued field are stored in order. You can have separate multi-valued fields with matching entries, and the values will be fetched in order and you can match them by counting. This is not database-ish,

Re: Solr-4.0.0 DIH not indexing xml attributes

2012-10-19 Thread Lance Norskog
Do other fields get added? Do these fields have type problems? I.e. is 'attr1' a number and you are adding a string? There is a logging EP that I think shows the data found- I don't know how to use it. Is it possible to post the whole DIH script? - Original Message - | From: Billy

Re: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities

2012-10-19 Thread Lance Norskog
If it worked before and does not work now, I don't think you are doing anything wrong :) Do you have a different version of your JDBC driver? Can you make a unit test with a minimal DIH script and schema? Or, scan through all of the JIRA issues against the DIH from your old Solr capture date.

Re: Flushing RAM to disk

2012-10-17 Thread Lance Norskog
There is no backed by disk RamDirectory feature. The MMapDirectory uses the operating system to do almost exactly the same thing, in a much better way. That is why it is the default. - Original Message - | From: deniz denizdurmu...@gmail.com | To: solr-user@lucene.apache.org | Sent:

Re: Flushing RAM to disk

2012-10-17 Thread Lance Norskog
I do not know how to load an index from disk into a RAMDirectory in Solr. - Original Message - | From: deniz denizdurmu...@gmail.com | To: solr-user@lucene.apache.org | Sent: Wednesday, October 17, 2012 12:15:52 AM | Subject: Re: Flushing RAM to disk | | I heard about MMapDirectory -

Re: How many documents in each Lucene segment?

2012-10-16 Thread Lance Norskog
CheckIndex prints these stats. java -cp lucene-core-WHATEVER.jar org.apache.lucene.index.CheckIndex - Original Message - | From: Shawn Heisey s...@elyograg.org | To: solr-user@lucene.apache.org | Sent: Monday, October 15, 2012 9:46:33 PM | Subject: Re: How many documents in each Lucene

Re: Solr Autocomplete

2012-10-15 Thread Lance Norskog
http://find.searchhub.org/?q=autosuggest+OR+autocomplete - Original Message - | From: Rahul Paul rahul.p...@iiitb.org | To: solr-user@lucene.apache.org | Sent: Monday, October 15, 2012 9:01:14 PM | Subject: Solr Autocomplete | | Hi, | I am using mysql for solr indexing data in solr. I

Re: Solr - db-data-config.xml general asking to entity

2012-10-14 Thread Lance Norskog
.472066.n3.nabble.com/Solr-db-data-config-xml-general-asking-to-entity-tp4013533.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com

Re: How to import a part of index from main Solr server(based on a query) to another Solr server and then do incremental import at intervals later(the updated index)?

2012-10-14 Thread Lance Norskog
this message in context: http://lucene.472066.n3.nabble.com/How-to-import-a-part-of-index-from-main-Solr-server-based-on-a-query-to-another-Solr-server-and-then-tp4013479p4013580.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com

Re: which api to use to manage solr ?

2012-10-12 Thread Lance Norskog
-api-to-use-to-manage-solr-tp4013491.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com

Re: Using

2012-10-12 Thread Lance Norskog
] :: [ivy:resolve] [ivy:resolve] [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS Can anybody point me to the source of this error or a workaround? Thanks, Tricia -- Lance Norskog goks...@gmail.com

Re: Query foreign language synonyms / words of equivalent meaning?

2012-10-10 Thread Lance Norskog
I want an update processor that runs Translation Party. http://translationparty.com/ http://downloadsquad.switched.com/2009/08/14/translation-party-achieves-hilarious-results-using-google-transl/ - Original Message - | From: SUJIT PAL sujit@comcast.net | To:

Re: Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread Lance Norskog
Hapax legomena (terms with DF of 1) are very often typos. You can automatically build a stopword file from these. If you want to be picky, you can use only words with a very small distance from words with much larger DF. - Original Message - | From: Robert Muir rcm...@gmail.com | To:

Re: segment number during optimize of index

2012-10-10 Thread Lance Norskog
Study index merging. This is awesome. http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Jame- opening lots of segments is not a problem. A major performance problem you will find is 'Large Pages'. This is an operating-system strategy for managing servers with 10s of

Re: Help with Velocity in SolrItas

2012-10-09 Thread Lance Norskog
Thanks, everyone. This is the problem: $sentence is a NamedList node, with a name and a value (any Java object). I want its value subnode: #foreach($sentence in $outer) $sentence = $sentence.value | | Here is the XML from a search result: | lst name=outer | lst name=sentence |int

Re: Can SOLR Index UTF-16 Text

2012-10-02 Thread Lance Norskog
If it is a simple text file, does that text file start with the UTF-16 BOM marker? http://unicode.org/faq/utf_bom.html Also, do UTF-8 files work? If not, then your setup has a basic encoding problem. And, when you post such a text file (for example, with curl), use the UTF-16 charset mime-type:

Re: Zookeeper setup for solr cloud

2012-09-30 Thread Lance Norskog
You can find Solr information with this: http://find.searchhub.org/?q=zookeeper+cluster http://find.searchhub.org/link?url=http://wiki.apache.org/solr/SolrCloud - Original Message - | From: varun srivastava varunmail...@gmail.com | To: solr-user@lucene.apache.org | Sent: Saturday,

  1   2   3   4   5   6   7   8   9   10   >