comma-separated multivalued field

2007-07-19 Thread Lance Lance
Hi- I'd like to make a multivalued field of comma-separated phrases. Is there a class available that I can use for this? I can see how to create N separate elements for the same field in the update XML, but is there something I can use in type definition? Thanks, Lance

RE: searching multiple fields

2007-08-01 Thread Lance Lance
is, it is exactly the same as: +a:valueAlpha +a:valueBeta +a:valueGamma I have to use OR between the values. Is this supposed to be true? Thanks, Lance -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 01, 2007 12:48 AM To: solr-user@lucene.apa

Question on query syntax

2007-07-12 Thread Lance Lance
ollection:pile1 OR collection:pile2) When we apply De Morgan's Law, we get 0 records: text (-collection:pile1 AND -collection:pile2) This should return all records, but it returns nothing: text (-collection:pile1 OR -collection:pile2) Thanks, Lance

RE: Question on query syntax

2007-07-12 Thread Lance Lance
A simplified version of the problem: text -(collection:pile1) works, while text (-collection:pile1) finds zero records. lance _ From: Lance Lance [mailto:[EMAIL PROTECTED] Sent: Thursday, July 12, 2007 5:58 PM To: 'solr-user@lucene.apache.org' Subject: Questio

RE: Question on query syntax

2007-07-12 Thread Lance Lance
Ok, here's a simpler version: _ From: Lance Lance [mailto:[EMAIL PROTECTED] Sent: Thursday, July 12, 2007 5:58 PM To: 'solr-user@lucene.apache.org' Subject: Question on query syntax Are there any known bugs in the syntax parser? We're using lucene-2.2.0 and

restrict fuzzy search to longer words

2012-01-19 Thread Lance
HI, Could you please help me with a quick question - Is there a way to restrict lucene/solr fuzzy search to only analyze words that have more than 5 characters and to ignore words with less than that (i.e. less than 6 character words)? Thanks - Lance

restrict fuzzy search to longer words

2012-01-20 Thread Lance
HI, Could you please help me with a quick question - Is there a way to restrict lucene/solr fuzzy search to only analyze words that have more than 5 characters and to ignore words with less than that (i.e. less than 6 character words)? Thanks - Lance

I need an available solr lucene consultant

2011-05-17 Thread Lance
blem solving and analytical abilities. You must have a solid grasp of English – written and verbal. Please note that I am a start-up and I am not going to be able to pay what a large established company can pay. Thank you, Lance ----- Lance

Re: I need an available solr lucene consultant

2011-05-17 Thread Lance
Thanks Markus, I did look at that list, but I'm wondering if there is anyone who is not on the list who may be interested. - Lance On May 17, 2011, at 4:09 PM, Markus Jelsma wrote: Check this out: http://wiki.apache.org/solr/Support > Hi, > >

Re: I need an available solr lucene consultant

2011-05-17 Thread Lance
Thanks Shashi - there aren't too many qualified people on those sites - I have looked. - Lance On May 17, 2011, at 4:13 PM, Shashi Kant wrote: You might be better off looking for freelancers on sites such as odesk.com, guru.com, rentacoder.com, elanc

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Lance Norskog
On 10/13/2013 10:02 AM, Shawn Heisey wrote: On 10/13/2013 10:16 AM, Josh Lincoln wrote: I have a large solr response in xml format and would like to import it into a new solr collection. I'm able to use DIH with solrEntityProcessor, but only if I first truncate the file to a small subset of the

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Lance Norskog
Can you do this data in CSV format? There is a CSV reader in the DIH. The SEP was not intended to read from files, since there are already better tools that do that. Lance On 10/14/2013 04:44 PM, Josh Lincoln wrote: Shawn, I'm able to read in a 4mb file using SEP, so I think that rule

Re: SOLR: Searching on OpenNLP fields is unstable

2013-10-20 Thread Lance Norskog
gt; > > > > > > > And field declared for this analyzer: > > omitNorms="true" omitPositions="true"/> > > > > Problem is here : When I search over this field Detail_Person, results are > not constant. > > > > When I search Detail_Person:brett, it return one document > > > > > > But again when I fire the same query, it return zero document. > > > > Searching is not stable on OpenNLP field, sometimes it return documents > and sometimes not but documents are there. > > And if I search on non OpenNLP fields, it is working properly, results are > stable and correct. > > Please help me to make solr results consistent. > > Thanks in Advance. > > -- Lance Norskog goks...@gmail.com

Re: SolrCloud unstable

2013-11-24 Thread Lance Norskog
on garbage collection. Lance On 11/22/2013 05:27 AM, Martin de Vries wrote: We did some more monitoring and have some new information: Before the issue happens the garbage collector's "collection count" increases a lot. The increase seems to start about an hour before the r

Re: need help on OpenNLP with Solr

2014-01-09 Thread Lance Norskog
and add the payloads. ( but iam not able to analyze it) > > My Question is: > Can i search a phrase giving high boost to NOUN then VERB ? > For example: if iam searching "sitting on blanket" , so i want to give high > boost to NOUN term first then VERB, that are tagged by OpenNLP. > How can i use payloads for boosting? > What are the changes required in schema.xml? > > Please provide me some pointers to move ahead > > Thanks in advance > -- Lance Norskog goks...@gmail.com

Re: Http status 503 Error in solr cloud setup

2013-06-29 Thread Lance Norskog
I do not know what causes the error. This setup will not work. You need one or three zookeepers. SolrCloud demands that a majority of the ZK servers agree. If you have two ZKs this will not work. On 06/29/2013 05:47 AM, Sagar Chaturvedi wrote: Hi, I setup 2 solr instances on 2 different mach

Re: Varnish

2013-06-29 Thread Lance Norskog
s, you can keep some queries cached longer than your timeout. Lance On 06/29/2013 05:51 PM, William Bell wrote: On a large website, by putting 1 varnish in front of all 4 SOLR boxes we were able to trim 25% off the load time (TTFB) of the page. Our hit ratio was between 55 and 75%. We gave varni

Re: Distributed search results in "SocketException: Connection reset"

2013-06-30 Thread Lance Norskog
This usually means the end server timed out. On 06/30/2013 06:31 AM, Shahar Davidson wrote: Hi all, We're getting the below exception sporadically when using distributed search. (using Solr 4.2.1) Note that 'core_3' is one of the cores mentioned in the 'shards' parameter. Any ideas anyone? T

Re: getting different search results for words with same meaning in Japanese language

2013-06-30 Thread Lance Norskog
The MappingCharFilter allows you to map both characters to one characters. If you do this during indexing and querying, searching with one should find the other. This is sort of like synonyms, but on a character-by-character basis. Lance On 06/18/2013 11:08 PM, Yash Sharma wrote: > Hi, >

Re: Solr limitations

2013-07-10 Thread Lance Norskog
Also, total index file size. At 200-300gb managing an index becomes a pain. Lance On 07/08/2013 07:28 AM, Jack Krupansky wrote: Other that the per-node/per-collection limit of 2 billion documents per Lucene index, most of the limits of Solr are performance-based limits - Solr can handle it

Re: Norms

2013-07-12 Thread Lance Norskog
Norms stay in the index even if you delete all of the data. If you just changed the schema, emptied the index, and tested again, you've still got norms in there. You can examine the index with Luke to verify this. On 07/09/2013 08:57 PM, William Bell wrote: I have a field that has omitNorms=t

Re: JVM Crashed - SOLR deployed in Tomcat

2013-07-16 Thread Lance Norskog
I don't know about jvm crashes, but it is known that the Java 6 jvm had various problems supporting Solr, including the 20-30 series. A lot of people use the final jvm release (I think 6_30). On 07/16/2013 12:25 PM, neoman wrote: Hello Everyone, We are using solrcloud with Tomcat in our produc

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Lance Norskog
Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).

Re: adding date column to the index

2013-07-22 Thread Lance Norskog
Solr/Lucene does not automatically add when asked, the way DBMS systems do. Instead, all data for a field is added at the same time. To get the new field, you have to reload all of your data. This is also true for deleting fields. If you remove a field, that data does not go away until you re-

Re: Percolate feature?

2013-08-05 Thread Lance Norskog
Cool! On 08/05/2013 03:34 AM, Charlie Hull wrote: On 03/08/2013 00:50, Mark wrote: We have a set number of known terms we want to match against. In Index: "term one" "term two" "term three" I know how to match all terms of a user query against the index but we would like to know how/if we ca

Re: Document Similarity Algorithm at Solr/Lucene

2013-08-07 Thread Lance Norskog
scalable implementation of n-gram based document similarity. It calculates distances between all documents and identifies clusters of similar documents. This is a much more general technique and may help you find "obfuscated" plagiarism. Lance On 07/23/2013 02:33 AM, Furkan KAMACI

Re: How to SOLR file in svn repository

2013-08-22 Thread Lance Norskog
viewer. This will give you #1 and #3. Lance On 08/21/2013 09:00 AM, jiunarayan wrote: I have a svn respository and svn file path. How can I SOLR search content on the svn file. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-SOLR-file-in-svn-repository-tp4085904.html

Re: SOLR Prevent solr of modifying fields when update doc

2013-08-23 Thread Lance Norskog
Solr does not by default generate unique IDs. It uses what you give as your unique field, usually called 'id'. What software do you use to index data from your RSS feeds? Maybe that is creating a new 'id' field? There is no partial update, Solr (Lucene) always rewrites the complete document.

Re: Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-10 Thread Lance Norskog
data is a pain in the neck to administer. As always, every index is different, but you should not have problems doing the merge that you describe. Lance On 09/08/2013 09:01 PM, diyun2008 wrote: Thank you Erick. It's very useful to me. I have already started to merge logs of collecti

Re: Flow Chart of Solr

2013-04-07 Thread Lance Norskog
Seconded. Single-stepping really is the best way to follow the logic chains and see how the data mutates. On 04/05/2013 06:36 AM, Erick Erickson wrote: Then there's my lazy method. Fire up the IDE and find a test case that looks close to something you want to understand further. Step through it

Re: Spatial search question

2013-04-12 Thread Lance Norskog
Outer distance AND NOT inner distance? On 04/12/2013 09:02 AM, kfdroid wrote: We currently do a radius search from a given Lat/Long point and it works great. I have a new requirement to do a search on a larger radius from the same point, but not include the smaller radius. Kind of a donut (toru

Re: SolrCloud vs Solr master-slave replication

2013-04-18 Thread Lance Norskog
Run checksums on all files in both master and slave, and verify that they are the same. TCP/IP has a checksum algorithm that was state-of-the-art in 1969. On 04/18/2013 02:10 AM, Victor Ruiz wrote: Also, I forgot to say... the same error started to happen again.. the index is again corrupted :(

Re: Why is SolrCloud doing a full copy of the index?

2013-05-04 Thread Lance Norskog
Great! Thank you very much Shawn. On 05/04/2013 10:55 AM, Shawn Heisey wrote: On 5/4/2013 11:45 AM, Shawn Heisey wrote: Advance warning: this is a long reply. I have condensed some relevant performance problem information into the following wiki page: http://wiki.apache.org/solr/SolrPerforman

Re: SOLR guidance required

2013-05-13 Thread Lance Norskog
If this is for the US, remove the age range feature before you get sued. On 05/09/2013 08:41 PM, Kamal Palei wrote: Dear SOLR experts I might be asking a very silly question. As I am new to SOLR kindly guide me. I have a job site. Using SOLR to search resumes. When a HR user enters some keywor

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-17 Thread Lance Norskog
This is great; data like this is rare. Can you tell us any hardware or throughput numbers? On 05/17/2013 12:29 PM, Rishi Easwaran wrote: Hi All, Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our

Re: Regular expression in solr

2013-05-22 Thread Lance Norskog
If the indexed data includes positions, it should be possible to implement ^ and $ as the first and last positions. On 05/22/2013 04:08 AM, Oussama Jilal wrote: There is no ^ or $ in the solr regex since the regular expression will match tokens (not the complete indexed text). So the results yo

Re: OPENNLP problems

2013-05-30 Thread Lance Norskog
I will look at these problems. Thanks for trying it out! Lance Norskog On 05/28/2013 10:08 PM, Patrick Mi wrote: Hi there, Checked out branch_4x and applied the latest patch LUCENE-2899-current.patch however I ran into 2 problems Followed the wiki page instruction and set up a field with

Re: Dynamic Indexing using DB and DIH

2013-06-02 Thread Lance Norskog
Let's assume that the Solr record includes the database record's timestamp field.You can make a more complex DIH stack that does a Solr query with the SolrEntityProcessor. You can do a query that gets the most recent timestamp in the index, and then use that in the DB update command. On 06/02

Re: Shard Keys and Distributed Search

2013-06-02 Thread Lance Norskog
Distributed search does the actual search twice: once to get the scores and again to fetch the documents with the top N scores. This algorithm does not play well with "deep searches". On 06/02/2013 07:32 PM, Niran Fajemisin wrote: Thanks Daniel. That's exactly what I thought as well. I did tr

Re: OPENNLP problems

2013-06-05 Thread Lance Norskog
, the example on the wiki is wrong. The FilterPayloadsFilter default is to remove the given payloads, and needs keepPayloads="true" to retain them. The fixed patch is up as LUCENE-2899-x.patch. Again, thanks for trying it. Lance https://issues.apache.org/jira/browse/LUCENE-2899 On 05/2

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog
text_opennlp has the right behavior. text_opennlp_pos does what you describe. I'll look some more. On 06/09/2013 04:38 PM, Patrick Mi wrote: Hi Lance, I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch uploaded on 6th June but still had the same problem. Re

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog
Found the problem. Please see: https://issues.apache.org/jira/browse/LUCENE-2899?focusedCommentId=13679293&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13679293 On 06/09/2013 04:38 PM, Patrick Mi wrote: Hi Lance, I updated the src from 4.x and applied the la

Re: SOLR-4872 and LUCENE-2145 (or, how to clean up a Tokenizer)

2013-06-12 Thread Lance Norskog
In 4.x and trunk is a close() method on Tokenizers and Filters. In currently released up to 4.3, there is instead a reset(stream) method which is how it resets a Tokenizer&Filter for a following document in the same upload. In both cases I had to track the first time the tokens are consumed, a

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Lance Norskog
No, they just learned a few features and then stopped because it was "good enough", and they had a thousand other things to code. As to REST- yes, it is worth having a coherent API. Solr is behind the curve here. Look at the HATEOS paradigm. It's ornate (and a really goofy name) but it provide

Re: Best way to match umlauts

2013-06-16 Thread Lance Norskog
One small thing: German u-umlaut is often "flattened" as 'ue' instead of 'u'. And the same with o-umlaut, it can be 'oe' or 'o'. I don't know if Lucene has a good solution for this problem. On 06/16/2013 06:44 AM, adityab wrote: Thanks for the explanation Steve. I now see it clearly. In my cas

Does SolrCloud require matching configuration files?

2013-06-22 Thread Lance Norskog
Accumulo is a BigTable/Cassandra style distributed database. It is now an Apache Incubator project. In the README we find this gem: "Synchronize your accumulo conf directory across the cluster. As a precaution against mis-configured systems, servers using different configuration files will not

Re: Solr Geodist

2011-08-30 Thread Lance Norskog
ssage in context: > http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3297088.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: category tree navigation with the help of solr

2011-09-05 Thread Lance Norskog
ith the help of solr. >>> >>> There are following points about catgory and products to be considered, >>> 1.One product can belong to more than one categories. >>> 2.category is a hierarchical facet. >>> 3.More than one categories can share same name. >>> >>> It would be a great help if someone can suggest a way to index and query >>> data based on the above architecture. >>> >>> Thanks, >>> Priti >>> >>> > -- Lance Norskog goks...@gmail.com

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-08 Thread Lance Norskog
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(Unknown > >>> > Source) > >>> > at > org.apache.lucene.store.MMapDirectory$MMapIndexInput.(Unknown > >>> > Source) > >>> > at org.apache.lucene.store.MMapDirectory.openInput(Unknown Source) > >>> > at org.apache.lucene.index.SegmentReader$CoreReaders.(Unknown > >>> Source) > >>> > > >>> > at org.apache.lucene.index.SegmentReader.get(Unknown Source) > >>> > at org.apache.lucene.index.SegmentReader.get(Unknown Source) > >>> > at org.apache.lucene.index.DirectoryReader.(Unknown Source) > >>> > at org.apache.lucene.index.ReadOnlyDirectoryReader.(Unknown > >>> Source) > >>> > at org.apache.lucene.index.DirectoryReader$1.doBody(Unknown Source) > >>> > at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown > >>> > Source) > >>> > at org.apache.lucene.index.DirectoryReader.open(Unknown Source) > >>> > at org.apache.lucene.index.IndexReader.open(Unknown Source) > >>> > ... > >>> > Caused by: java.lang.OutOfMemoryError: Map failed > >>> > at sun.nio.ch.FileChannelImpl.map0(Native Method) > >>> > ... > >>> > >>> > >> > > > -- Lance Norskog goks...@gmail.com

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-09 Thread Lance Norskog
I remember now: by memory-mapping one block of address space that big, the garbage collector has problems working around it. If the OOM is repeatable, you could try watching the app with jconsole and watch the memory spaces. Lance On Thu, Sep 8, 2011 at 8:58 PM, Lance Norskog wrote: > Do

Re: Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Lance Norskog
http://aws.amazon.com/datasets DBPedia might be the easiest to work with: http://aws.amazon.com/datasets/2319 Amazon has a lot of these things. Infochimps.com is a marketplace for free & pay versions. Lance On Thu, Sep 15, 2011 at 6:55 PM, Pulkit Singhal wrote: > Ah missing } doh! &g

Re: ClassCastException: SmartChineseWordTokenFilterFactory to TokenizerFactory

2011-09-15 Thread Lance Norskog
r > > SEVERE: java.lang.ClassCastException: > org.apache.solr.analysis.SmartChineseWordTokenFilterFactory cannot be cast > to org.apache.solr.analysis.TokenizerFactory > > > Any thought? -- Lance Norskog goks...@gmail.com

Re: strange performance issue with many shards on one server

2011-09-28 Thread Lance Norskog
server, we INCREASED speed by > REDUCING the number of cores/threads each query was allowed to use (making > sense of our customer investment) > maybe you can get a similar effect by reducing the number of pieces your > distributed search has to merge > > my 2 eurocents > > federico > -- Lance Norskog goks...@gmail.com

Re: Bug in DIH?

2011-10-01 Thread Lance Norskog
aImporter.doFullImport(DataImporter.java:372) >at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:440) >at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:421) > -- Lance Norskog goks...@gmail.com

Re: SOLR HttpCache Qtime

2011-10-04 Thread Lance Norskog
> indeed, be from the query that filled >>>>>> in the HTTP cache. But what are you doing >>>>>> with that information that you want to "correct" >>>>>> it? >>>>>> >>>>>> That said, I have no clue how you'd attempt to >>>>>> do this. >>>>>> >>>>>> Best >>>>>> Erick >>>>>> >>>>>> On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Han>>>>> > >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Is there anyway to get correct Qtime when we use http caching ? I >>>>>>> >>>>>> think >>>> >>>>> Solr >>>>>> >>>>>>> caching also the Qtime so giving the the same Qtime in response what >>>>>>> >>>>>> ever >>>> >>>>> takes it to finish .. How I can set Qtime correcly from solr when I >>>>>>> >>>>>> use >>>> >>>>> http caching On. >>>>>>> >>>>>>> thanks >>>>>>> >>>>>>> >> -- Lance Norskog goks...@gmail.com

Re: Xsl for query output

2011-10-13 Thread Lance Norskog
generally easiest to use the solr/example 'java -jar start.jar' example to test out features. It is easy to break configuration linkages. Lance On Thu, Oct 13, 2011 at 12:42 PM, Jeremy Cunningham < jeremy.cunningham.h...@statefarm.com> wrote: > I am new to solr and not a web deve

Re: DIH doesn't handle bound namespaces?

2011-11-04 Thread Lance Norskog
ases are covered..." > > ...i thought there was a DIH FAQ about this, but if not there really > should be. > > > -Hoss > -- Lance Norskog goks...@gmail.com

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-11-06 Thread Lance Norskog
Yes, please open a JIRA for this, with as much info as possible. Lance On Thu, Nov 3, 2011 at 9:48 AM, P Williams wrote: > Hi All, > > I'm experiencing a similar problem to the other's in the thread. > > I've recently upgraded from apache-solr-4.0-2011-06-14_08-

Re: [Profiling] How to profile/tune Solr server

2011-11-06 Thread Lance Norskog
> > > > > > > I am a solr newbie. I find solr documents easy to access and use, > > > which > > > > is > > > > > really good thing. While my problem is I did not find a solr home > > > grown > > > > > profiling/monitoring tool. > > > > > > > > > > I set up the server as a multi-core server, each core has > > > approximately > > > > 2GB > > > > > index. And I need to update solr and re-generate index in a real > time > > > > > manner (In java code, using SolrJ). Sometimes the update operation > is > > > > slow. > > > > > And it is expected that in a year, the index size may increase to > > 4GB. > > > > And > > > > > I need to do something to prevent performance downgrade. > > > > > > > > > > Is there any solr official monitoring & profiling tool for this? > > > > > > > > > > Spark > > > > > > > > > > > -- Lance Norskog goks...@gmail.com

Re: is there a way using 1.4 index at 4.0 trunk?

2011-11-30 Thread Lance Norskog
-4-index-at-4-0-trunk-tp3550430p3550430.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: UUID field changed when document is updated

2011-12-07 Thread Lance Norskog
ks for sharing. I'm not sure it > does exactly what I want though. I think it is more for checking if the two > docs are the same, which for my purposes, the url works fine for. > > I think I've sort of come to realise that generating a uuid from the url > might be the way to go. There is a chance of getting the same uuid from > different urls, but it's only 1 in 2^128, so it's basically non-existant. > > Thanks again, > Blaise -- Lance Norskog goks...@gmail.com

Re: UUID field changed when document is updated

2011-12-07 Thread Lance Norskog
http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/UniqueKey On Wed, Dec 7, 2011 at 5:04 PM, Lance Norskog wrote: > Yes, the SignatureUpdateProcessor is what you want. The 128-bit hash is > exactly what you want to use in this situation. You will never get the

Re: Identifying common text in documents

2011-12-24 Thread Lance Norskog
sections of matching or nearly matching text > in documents. Does anyone have any experience in this area that they would be > willing to share? > Thanks, > Mike -- Lance Norskog goks...@gmail.com

Re: solr keep old docs

2011-12-28 Thread Lance Norskog
gt; >> > Tried google but I couldn't find a solution there althoght many people >> >> > encounted such problem. >> >> > >> >> > >> >> it's definitely can be done by overriding >> >> o.a.s.update.DirectUpdateHandler2.addDoc(AddUpdateCommand), but I >> suggest >> >> to start from implementing your own >> >> http://wiki.apache.org/solr/UpdateRequestProcessor - search for PK, >> bypass >> >> chain call if it's found. Then if you meet performance issues on >> querying >> >> your PKs one by one, (but only after that) you can batch your searches, >> >> there are couple of optimization techniques for huge disjunction queries >> >> like PK:(2 OR 4 OR 5 OR 6). >> >> >> >> >> >> > I start considering that I must query index to check if a doc to be >> added >> >> > is in the index already and do not add it to array but I have so many >> >> docs >> >> > that I am affraid it's not a good solution. >> >> > >> >> > Best Regards >> >> > Alexander Aristov >> >> > >> >> >> >> >> >> >> >> -- >> >> Sincerely yours >> >> Mikhail Khludnev >> >> Lucid Certified >> >> Apache Lucene/Solr Developer >> >> Grid Dynamics >> >> >> -- Lance Norskog goks...@gmail.com

Re: How to run the solr dedup for the document which match 80% or match almost.

2011-12-28 Thread Lance Norskog
> > Regards, > > > Vibhor > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-run-the-solr-dedup-for-the-document-which-match-80-or-match-almost-tp3614239p3615787.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com

Re: Migration from Solr 1.4 to Solr 3.5

2011-12-28 Thread Lance Norskog
, introduce better memory management and a lot more. For your production upgrade you should translate your local changes into a fresh 3.5 instance. Lance On Wed, Dec 28, 2011 at 5:23 AM, Bhavnik Gajjar wrote: > Thanks community! That helps! > > To check practically, I have now setup So

Re: Solr Distributed Search vs Hadoop

2011-12-28 Thread Lance Norskog
r seems to handle 100g-200g fine on modern hardware. Lance On Fri, Dec 23, 2011 at 1:54 AM, Nick Vincent wrote: > For data of this size you may want to look at something like Apache > Cassandra, which is made specifically to handle data at this kind of > scale across many machines. > >

Re: Filtered search for subset of ids

2012-01-06 Thread Lance Norskog
om/Filtered-search-for-subset-of-ids-tp502245p3637150.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com

Re: GermanAnalyzer

2012-01-14 Thread Lance Norskog
LUCENE_23 > >> >> In Lucene I use an untweaked org.apache.lucene.analysis.de.GermanAnalyzer. >> >> What is an equivalent fieldType definition in Solr 3.5? > >     >       >     > > -- > lucidimagination.com -- Lance Norskog goks...@gmail.com

Re: Solr Cloud Indexing

2012-01-17 Thread Lance Norskog
ich service to go >> with for solr Cloud Indexing ? >> >> Any good and tried services? >> >> Regards >> Sujatha -- Lance Norskog goks...@gmail.com

Re: Trying to understand SOLR memory requirements

2012-01-17 Thread Lance Norskog
rch.suggest.fst.FSTLookup.build(FSTLookup.java:179) >>> > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70) >>> >  at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133) >>> > at org.apache.solr.spelling.suggest.Suggester.reload(Suggester.java:153) >>> >  at >>> > >>> org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.newSearcher(SpellCheckComponent.java:675) >>> > at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1181) >>> >  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>> >  at >>> > >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>> > at >>> > >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>> >  at java.lang.Thread.run(Thread.java:662) >>> > >>> > Jan 16, 2012 4:06:15 PM org.apache.solr.core.SolrCore registerSearcher >>> > INFO: [places] Registered new searcher Searcher@34b0ede5 main >>> > >>> > >>> > >>> > Basically this means once I've run a full-import, I cannot exit the SOLR >>> > process because I receive this error no matter what when I restart the >>> > process. I've tried with different -Xmx arguments, and I'm really at a >>> loss >>> > at this point. Is there any guideline to how much RAM I need? I've got >>> 8GB >>> > on this machine, although that could be increased if necessary. However, >>> I >>> > can't understand why it would need so much memory. Could I have something >>> > configured incorrectly? I've been over the configs several times, trying >>> to >>> > get them down to the bare minimum. >>> > >>> > Thanks for any assistance! >>> > >>> > Dave >>> >>> >>> >>> -- >>> lucidimagination.com >>> > > > > -- > lucidimagination.com -- Lance Norskog goks...@gmail.com

Re: Indexing HTML files in SOLR

2010-06-16 Thread Lance Norskog
n3.nabble.com/Indexing-HTML-files-in-SOLR-tp896530p896530.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: MappingCharFilterFactory equivalent for use after tokenizer?

2010-06-18 Thread Lance Norskog
t; Is there a token filter which do the same job as >>> MappingCharFilterFactory but after tokenizer, reading the >>> same config file? >> >> No, closest thing can be PatternReplaceFilterFactory. >> >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternReplaceFilterFactory.html >> >> >> > > -- Lance Norskog goks...@gmail.com

Re: federated / meta search

2010-06-18 Thread Lance Norskog
index. With two indexes from two sources, the terms in the documents will not have the same "fingerprint". Relevance scores from one shard will not match the meaning of a document's score in the other shard. There is a project to make this work in Solr, but it is not nearly finished.

Re: OOM on sorting on dynamic fields

2010-06-18 Thread Lance Norskog
before starting a new > development, we want to be sure that we are not doing anything wrong > in the solr configuration or in the index generation. > > Any help would be appreciated. > Regards, > Matteo > -- Lance Norskog goks...@gmail.com

Re: customize the search algorithm of solr

2010-06-18 Thread Lance Norskog
and still allow me to use all > the rest of the features of solr. > > > -- Lance Norskog goks...@gmail.com

Re: solr indexing takes a long time and is not reponsive to abort command

2010-06-18 Thread Lance Norskog
t to abort the process doesn’t really work. Does > anyone know what’s happening here? Thanks! > > Wen > -- Lance Norskog goks...@gmail.com

Re: SolrQuery and escaping special characters

2010-06-18 Thread Lance Norskog
id doesn't do "query parser > escaping" ... mainly because it has no way of knowing which query parser > you are using. > > > -Hoss > > -- Lance Norskog goks...@gmail.com

Re: federated / meta search

2010-06-19 Thread Lance Norskog
https://issues.apache.org/jira/browse/LUCENE-1812 On Fri, Jun 18, 2010 at 7:26 PM, Otis Gospodnetic wrote: > Lance, which project in Solr are you referring to? > > > Thanks, > > Otis > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem

Re: Indexing HTML files in SOLR

2010-06-19 Thread Lance Norskog
Ah! You need a SolrJ program that uses Tika to parse the files and upload the text. I think there is such a program already but do not know where it is. Lance On Thu, Jun 17, 2010 at 6:13 AM, seesiddharth wrote: > > Thank you so much for the reply...The link suggested by you is helpf

Re: Mr Lance : customize the search algorithm of solr

2010-06-22 Thread Lance Norskog
Solr depends on Lucene's implementation of queries and how it returns document hits. I can't help you architect these changes. On Mon, Jun 21, 2010 at 7:47 AM, sarfaraz masood wrote: > Mr Lance > > Thanks > a lot for ur reply.. I am a novice a solr / lucene. but

Re: OOM on sorting on dynamic fields

2010-06-22 Thread Lance Norskog
No, this is basic to how Lucene works. You will need larger EC2 instances. On Mon, Jun 21, 2010 at 2:08 AM, Matteo Fiandesio wrote: > Compiling solr with lucene 2.9.3 instead of 2.9.1 will solve this issue? > Regards, > Matteo > > On 19 June 2010 02:28, Lance Norskog wrote

Re: Field missing when use distributed search + dismax

2010-06-22 Thread Lance Norskog
he result only have "ID". The field "type" > disappeared. I need that "type" to know what the "ID" refer to. Why solr > "eat" my "type"? > > > Thanks. > Regards. > Scott > -- Lance Norskog goks...@gmail.com

Re: Setting up Eclipse with merged Lucene Solr source tree

2010-06-23 Thread Lance Norskog
blems with this, and git is a lifesaver for playing with patches etc. Lance On Wed, Jun 23, 2010 at 8:03 AM, Erick Erickson wrote: > Did you see this page?" > http://wiki.apache.org/solr/HowToContribute > > <http://wiki.apache.org/solr/HowToContribute>Especially down

Re: DIH and dynamicField

2010-06-23 Thread Lance Norskog
: > > solrconfig.xml > > > data-config.xml > > > Hope this helps. > > - Robert Zotter > -- > View this message in context: > http://lucene.472066.n3.nabble.com/DIH-and-dynamicField-tp917823p918189.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

unknown handler dataimport

2010-06-28 Thread Lance Hill
Hi, I am trying to get db indexing up and running, but I am having trouble getting it working. In the solrconfig.xml file, I added data-config.xml I defined a couple of fields in schema.xml media_id is defined as the unique

Indexing a database

2010-06-29 Thread Lance Hill
How do I know if solr is actually loading my database driver properly? I added the mysql connector to the solr/lib directory, I added to the solrconfig.xml just to be sure it would find the connector. When I start the application, I see it loaded my dataImporter data config, but when I try to acce

RE: Indexing a database

2010-06-29 Thread Lance Hill
Yes, it is registered exactly as you indicated in solrconfig and when the application starts up, I can see a message indicating the data-config is loaded successfully. So although the data config is loaded successfully, I cannot seem to access the dataimport handler. Regards, L. Hill -Origin

Re: OOM on uninvert field request

2010-06-29 Thread Lance Norskog
at > org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839) >        at > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:250) >        at > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283) >        at > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166) > -- Lance Norskog goks...@gmail.com

Re: REST calls

2010-06-29 Thread Lance Norskog
is used. > > Am I doing something wrong or is Solr not truly completely RESTful? > > thanks, > > > Jason > -- Lance Norskog goks...@gmail.com

Re: one to many denormalization approach

2010-06-29 Thread Lance Norskog
Solr supports multi-valued fields. You can add various skills to one field and it will store all of the values in order. You can search on any of the values. For numbers, you might want a subtype_value convention: skillYears1_9 as one of the values for the skillYears field. Lance On Mon, Jun 28

Re: unknown handler dataimport

2010-06-29 Thread Lance Norskog
The 'bind error' means that you already had another Solr running. Use 'jps' to find all of the processes called 'start.jar' and kill them. Lance On Mon, Jun 28, 2010 at 2:36 PM, Lance Hill wrote: > Hi, > > > > I am trying to get db indexing up and

Re: Faceted search outofmemory

2010-06-29 Thread Lance Norskog
this but I could not find the answer. >> How can we know the required memory when facets are used so that I try to >> scale my server/index correctly to handle it. >> >> Thanks >> >> Olivier >> > -- Lance Norskog goks...@gmail.com

Re: Very basic questions: Indexing text - working, but slow!

2010-06-29 Thread Lance Norskog
ible >> >> until you force the SOLR reader to reopen. >> >> >> >> HTH >> >> Erick >> >> >> >> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam wrote: >> >> >> >>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: >> >>> >> >>>>> 1) I can get my docs in the index, but when I search, it >> >>>>> returns the entire document.  I'd love to have it only >> >>>>> return the line (or two) around the search term. >> >>>> >> >>>> Solr can generate Google-like snippets as you describe. >> >>>> http://wiki.apache.org/solr/HighlightingParameters >> >>> >> >>> Here's how I commit my documents: >> >>> >> >>> J=0; >> >>> for i in `find . -name \*.txt`; do >> >>>      (( J++ )) >> >>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"; >> >>> -F "myfi...@$i"; >> >>> done; >> >>> >> >>> echo "- Committing" >> >>> curl "http://localhost:8983/solr/update/extract?commit=true"; >> >>> >> >>> >> >>> Then, I try to query using >> >>> >> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing >> >>> but I only get back the document ID rather than the snippet: >> >>> >> >>> >> >>> 0.05030759 >> >>> >> >>> text/plain >> >>> >> >>> doc16 >> >>> >> >>> >> >>> I'm using the schema.xml from the "lucid imagination: Indexing text and >> >>> html files" tutorial. >> >>> >> >>> >> >>> >> >>> -Pete >> >>> >> > >> >> > -- Lance Norskog goks...@gmail.com

Re: Cache hits exposed by API

2010-06-29 Thread Lance Norskog
com/Cache-hits-exposed-by-API-tp930602p930696.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: Unbuffered Exception while setting permissions

2010-06-30 Thread Lance Norskog
e.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487) >>>> at >>>> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) >>>> at >>>> org.apache.commons.httpclient.Ht

Re: REST calls

2010-06-30 Thread Lance Norskog
how efficient and yet simple > SOLR's (and Lucene's) query and response language (incl. response > formats) is. Some things seem complex/difficult at first (like dismax or > function queries) but turn out to be simple/easy to use considering the > complexity of the problems they solve. > > Chantal > > -- Lance Norskog goks...@gmail.com

Re: Multiple Solr servers and a shared index vs master+slaves

2010-06-30 Thread Lance Norskog
gt; store those snapshots, so we'd be pulling it over the wire only to write it > right next to the original index.  If we didn't have these HA clustering > mechanisms available already, then I'm sure I'd be much more willing to look > at a Solr master+slave architecture.  But since we do, it seems like I'm a > little bit hamstrung to use Solr's mechanisms anyway.  So, that's my > scenario, comments welcome.  :) > >  -dKt > > > > -- Lance Norskog goks...@gmail.com

Re: OOM on uninvert field request

2010-06-30 Thread Lance Norskog
edField faceting, the fieldType won't matter much at > all for the space it takes up. > > The key here is that it looks like the number of unique terms in these > fields is low - you would probably do much better with > facet.method=enum (which iterates over terms rather than documents). > > -Yonik > http://www.lucidimagination.com > -- Lance Norskog goks...@gmail.com

Re: REST calls

2010-06-30 Thread Lance Norskog
I've looked at the problem. It's fairly involved. It probably would take several iterations. (But not as many as field collapsing :) On Wed, Jun 30, 2010 at 2:11 PM, Yonik Seeley wrote: > On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog wrote: >>  Apparently this is not ReStFuL

  1   2   3   4   5   6   7   8   9   10   >