Re: Using two Solr documents to represent one logical document/file

2009-09-28 Thread Peter Ledbrook
Matt Weber-2 wrote: Check out the field collapsing patch: http://wiki.apache.org/solr/FieldCollapsing https://issues.apache.org/jira/browse/SOLR-236 That looks like just the ticket. Thanks for the quick response. Peter -- View this message in context:

Re: Showcase: Facetted Search for Wine using Solr

2009-09-28 Thread Marian Steinbach
On Sat, Sep 26, 2009 at 3:22 AM, Lance Norskog goks...@gmail.com wrote: Have you seen this? It is another Solr/Typeo3 integration project. http://forge.typo3.org/projects/show/extension-solr Would you consider open-sourcing your Solr/Typo3 integration? Hi Lance! I wasn't aware of that

Question on trying to Index and XML document...

2009-09-28 Thread Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340
With a basically default install of the trunk version of solr 1.4 when trying to index an xml file, it appears that the xml tags seem to get stripped when indexed. If the tag names and their frequenicies are important to me for search purposes could someone tell me what my options are to not

Re: Only one usage of each socket address error

2009-09-28 Thread Steinar Asbjørnsen
I just posted to the SolrNet-group since i have the exact same(?) problem. Hope I'm not beeing rude posting here as well (since the SolrNet-group doesn't seem as active as this mailinglist). The problem occurs when I'm running an incremental feed(self made) of a index. My post: [snip]

q.alt matching no documents

2009-09-28 Thread Øystein F. Steimler
Hi, list! I want to add a q.alt matching no documents in my dismax handler to serve a consistent reply to a client application. Without a q.alt, a missing q from the client will cause an missing query string error. With a q.alt matching no document I will be able to respond with an empty

Re: Only one usage of each socket address error

2009-09-28 Thread Erik Hatcher
There's nothing in that output that indicates something we can help with over in solr-user land. What is the call you're making to Solr? Did Solr log anything anomalous? Erik On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote: I just posted to the SolrNet-group since i have

Re: Measuring timing with debugQuery=true

2009-09-28 Thread Yonik Seeley
On Mon, Sep 28, 2009 at 7:51 AM, Rahul R rahul.s...@gmail.com wrote: Yonik, I understand that the network can be a bottle-neck but I am pretty sure that it is not. I am operating on a 100 MBPS intranet... How do I ensure that stored fields are cached by the OS ? Only the Solr caches within

Re: q.alt matching no documents

2009-09-28 Thread John Wang
You can actually write a NoHitsQuery implementation,it is rather simple. If you like, I can create a issue and attach a patch. -John On Mon, Sep 28, 2009 at 5:17 AM, Øystein F. Steimler oyst...@easyconnect.no wrote: Hi, list! I want to add a q.alt matching no documents in my dismax handler

Re: Thread Blocking Radomly

2009-09-28 Thread Jeff Newburn
Further interestingness with replication on the thread blocking issue. 1 core seems to take a VERY long time to replicate. This duration is close to 5 minutes when cores 2x its size take like 100 seconds to pull down. The searcher is also taking about 4-5 minutes to warm when an almost

Re: Showcase: Facetted Search for Wine using Solr

2009-09-28 Thread Olivier Dobberkau
Marian Steinbach schrieb: On Sat, Sep 26, 2009 at 3:22 AM, Lance Norskog goks...@gmail.com wrote: Have you seen this? It is another Solr/Typeo3 integration project. http://forge.typo3.org/projects/show/extension-solr Would you consider open-sourcing your Solr/Typo3 integration? Hi

Re: q.alt matching no documents

2009-09-28 Thread Erik Hatcher
Note that whatever query you use will be cached in the query cache. - *:* is likely the best choice. Another alternative if you've got dynamic fields wired in, is something like _nonexistent_field_s:dummy_value Erik On Sep 28, 2009, at 5:17 AM, Øystein F. Steimler wrote: Hi,

Regular expression not working

2009-09-28 Thread Siddhartha Pahade
Hi guys, My search result is Gilmore Girls If I search on Gilmore, it gives me result Gilmore Girls in the output as desired. However, if I search on string gilmore* or gilm , it does not work whereas we want it to work. Any help highly appreciated. Thanks!

Re: Regular expression not working

2009-09-28 Thread Avlesh Singh
Such questions are better answered on the user mailing list. You don't need to post them on the dev list. What matches an incoming query is largely a function of your field type definition and the way you analyze your field data query time and index time. Copy-paste your field and its type

Re: Showcase: Facetted Search for Wine using Solr

2009-09-28 Thread Marian Steinbach
On Mon, Sep 28, 2009 at 4:46 PM, Olivier Dobberkau olivier.dobber...@dkd.de wrote: hi marian. our extension will be able to do see also once we have set up the indexing queue for the typo3 backend. we have a concept called typo3 extensions connectors so that you will be able to add index

Re: Writing optimized index to different storage?

2009-09-28 Thread Jason Rutherglen
Hmm... Interesting question, not that I know of. The only way one could do this would be to intercept the newly optimized files via a FileSwitchDirectory like implementation that knows which new files are optimized and should underneath go to a different physical path. On Mon, Sep 28, 2009 at

Re: download pre-release nightly solr 1.4

2009-09-28 Thread michael8
markrmiller wrote: michael8 wrote: markrmiller wrote: michael8 wrote: Hi, I know Solr 1.4 is going to be released any day now pending Lucene 2.9 release. Is there anywhere where one can download a pre-released nighly build of Solr 1.4 just for getting familiar with new

Re: Parallel requests to Tomcat

2009-09-28 Thread Michael
Great news for Solr -- a third party library that I'm calling is serialized. Silly me, I made a mistake when ruling out that library as the culprit earlier. Solr itself scales just great as add threads. JProfiler helped me find the problem. Sorry for the false alarm, and thanks for the

Re: Regular expression not working

2009-09-28 Thread Lance Norskog
Wildcards don't really get processed like other queries - Gilmore* will work. On Mon, Sep 28, 2009 at 8:30 AM, Avlesh Singh avl...@gmail.com wrote: Such questions are better answered on the user mailing list. You don't need to post them on the dev list. What matches an incoming query is

Re: Writing optimized index to different storage?

2009-09-28 Thread Lance Norskog
The optimize operation happens in place. I've been told that if you set mergeFactor=2 when indexing, it will be slower but you will always have a mostly optimized index. On Mon, Sep 28, 2009 at 10:22 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Hmm... Interesting question, not that I

Re: Limit number of docs that can be indexed (security)

2009-09-28 Thread Valdir Salgueiro
Israel, thanks for your comments. The problem with that alternative is that it works only if the search application is in our server (and in that case, of course, the user doesn't have access to any config file). But more often than not the application is installed on the customer's network, thus

Re: Regular expression not working

2009-09-28 Thread Siddhartha Pahade
Thnx for the reply I want to make gilmore* work...sombody told me you can make attributes case insensitive while building an index... I am trying to research on it... Do you got any pointer? Thanks... On Mon, Sep 28, 2009 at 2:29 PM, Lance Norskog goks...@gmail.com wrote: Wildcards don't

Re: Writing optimized index to different storage?

2009-09-28 Thread Otis Gospodnetic
That's right. mergeFactor=1 is an even more extreme case. However, with the new per-segment readers, having an optimized index is no longer the best index state to go for in some cases. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop,

Re: Solr and Garbage Collection

2009-09-28 Thread Jonathan Ariel
Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems to solve this ugly bug. With the upgraded JVM I could run the solr servers for more than 12 hours on the production environment with the GC mentioned in the previous e-mails. The results are really amazing. The time spent

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller
Do you have your GC logs? Are you still seeing major collections? Where is the time spent? Hard to say without some of that info. The goal of the low pause collector is to finish collecting before the tenured space is filled - if it doesn't, a standard major collection occurs. The collector

alphanumeric queries using LuceneQParser

2009-09-28 Thread Tarun Jain
Hi, I have created an index where the fields have been indexed with omitNorms=true omitTermFreqAndPositions=true to improve indexing performance. One of the side effects of this is that some of the searches with alphanumeric words are not working correctly. Example.. Below is the debugQuery

Re: Regular expression not working

2009-09-28 Thread Lance Norskog
You would have to index GIlmore and gilmore. You could make a separate field type which does not do upper-lower case transformation. On Mon, Sep 28, 2009 at 11:49 AM, Siddhartha Pahade pahade@gmail.com wrote: Thnx for the reply I want to make gilmore* work...sombody told me you can make

Re: alphanumeric queries using LuceneQParser

2009-09-28 Thread Yonik Seeley
On Mon, Sep 28, 2009 at 3:54 PM, Tarun Jain tjai...@yahoo.com wrote: Hi, I have created an index where the fields have been indexed with omitNorms=true omitTermFreqAndPositions=true to improve indexing performance. One of the side effects of this is that some of the searches with

Re: Use cases for ReplicationHandler's backup facility?

2009-09-28 Thread Chris Harris
2009/9/24 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com: Yes, the only reason to take a backup should be for restoration/archival They should contain all the files required for the latest commit point. Ok, I think I get it now. I assumed all the files required for the latest commit point

Question on Access or viewing TermFrequency Vector via SOLR.

2009-09-28 Thread Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340
is there a SOLR query that can access or view the TermFrequencies for the various documents discovered, Or is the only wya to programmatically access this information. If so could someon share an example and maybe a link for information on how to do this? Some sample queries? Thank you in

Re: Solr and Garbage Collection

2009-09-28 Thread Jonathan Ariel
How do you track major collections? Even better, how do you log your GC behavior with details? Right now I just log total time spent on collections, but I don't really know on which collections.Regard application performance with the ConcMarkSweepGC, I think I didn't experience any impact for now.

Re: Question on Access or viewing TermFrequency Vector via SOLR.

2009-09-28 Thread Mark Miller
Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 wrote: is there a SOLR query that can access or view the TermFrequencies for the various documents discovered, Or is the only wya to programmatically access this information. If so could someon share an example and maybe a link for information on

Re: Writing optimized index to different storage?

2009-09-28 Thread Phillip Farber
Thanks to all for thinking about this question. Otis: could you say a bit more about per segment readers. This is new to me. I gather that there is a way to specify that the number of readers should correspond (or automatically correspond) to the number of segments? I suppose this gives

RE: Question on Access or viewing TermFrequency Vector via SOLR.

2009-09-28 Thread Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340
Mark, Thanks. I think this may be partially what I need. Basically, what I'm trying to figure out is the following If someone enters a keyword say Apple. I would like to find all the documents that have the word apple In them, and then for each document, the number of times it showed up in each

Re: Problem changing the default MergePolicy/Scheduler

2009-09-28 Thread Jibo John
On Sep 27, 2009, at 9:42 PM, Shalin Shekhar Mangar wrote: On Mon, Sep 28, 2009 at 2:59 AM, Jibo John jiboj...@mac.com wrote: Additionally, I get the same exception even if I declare the mergePolicy in the mainIndex. mainIndex mergePolicy

Re: Solr and Garbage Collection

2009-09-28 Thread Otis Gospodnetic
Jonathan, Here is the JVM argument for logging GC activity: -Xloggc:filelog GC status to a file with time stamps Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From:

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller
|-verbose:gc | |[GC 325407K-83000K(776768K), 0.2300771 secs] [GC 325816K-83372K(776768K), 0.2454258 secs] [Full GC 267628K-83769K(776768K), 1.8479984 secs]| Additional details with: |-XX:+PrintGCDetails| |[GC [DefNew: 64575K-959K(64576K), 0.0457646 secs] 196016K-133633K(261184K), 0.0459067

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller
Another good option. Here is a comparison of the commands I replied with and this one: http://docs.hp.com/en/5992-5899/ch06s02.html Very similar. Otis Gospodnetic wrote: Jonathan, Here is the JVM argument for logging GC activity: -Xloggc:filelog GC status to a file with time

FileNotFoundException in Java replication handler backups

2009-09-28 Thread Chris Harris
Thanks to Noble Paul, I think I now understand the Java replication handler's backup feature. It seems to work as expected on a toy index. When trying it out on a copy of my production index (300GB-ish), though, I'm getting FileNotFoundExceptions. These cancel the backup, and delete the

Re: FileNotFoundException in Java replication handler backups

2009-09-28 Thread Mark Miller
Looks like a bug to me. I don't see the commit point being reserved in the backup code - which means its likely be removed before its done being copied. Gotto reserve it using the delete policy to keep around for the full backup duration. I'd file a JIRA issue. -- - Mark

Re: FileNotFoundException in Java replication handler backups

2009-09-28 Thread Mark Miller
Mark Miller wrote: Looks like a bug to me. I don't see the commit point being reserved in the backup code - which means its likely be removed before its done being copied. Gotto reserve it using the delete policy to keep around for the full backup duration. I'd file a JIRA issue. You

Re: Highlighting in stemmed or n-grammed fields possible?

2009-09-28 Thread aodhol
Hi Koji et.al, You say https://issues.apache.org/jira/browse/SOLR-1268 is an open issue for the ngram highlighting problem, but it seems to refer to something unrelated. Can you/anyone confirm that it is not possible to use highlighting with an ngram tokenizer/filter.. Thanks, Aodh.

Re: Solr and Garbage Collection

2009-09-28 Thread Bill Au
One way to track expensive is to look at the query time, QTime, in the solr log. There are a couple of tools for analyzing gc logs: http://www.tagtraum.com/gcviewer.html https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPJMETER They will give you frequency and

Re: Highlighting in stemmed or n-grammed fields possible?

2009-09-28 Thread Koji Sekiguchi
I think I need a further explanation for that. The Lucene's FastVectorHighlighter which is pointed in SOLR-1268 is a highlighter that supports n-gram field. Please see the description for the features etc:

Re: Highlighting in stemmed or n-grammed fields possible?

2009-09-28 Thread aodhol
But it would seem that Lucene has always supported highlighting on NGram fields? as show by the example here: https://issues.apache.org/jira/browse/LUCENE-1489 When I try to use highlighting with NGramming, none of the text is highlighted, and instead I get a long string in the highlighting