Re: Need to create dyanamic indexies base on different document workspaces

2011-04-21 Thread Chandan Tamrakar
It depends on your application design how you want your index There is a feature called solr core . http://wiki.apache.org/solr/CoreAdmin You could still have a single index but a field to differentiate the items in index thanks On Thu, Apr 21, 2011 at 10:55 AM, Gaurav Shingala

Re: old searchers not closing after optimize or replication

2011-04-21 Thread Bernd Fehling
Hi Erik, deletionPolicy class=solr.SolrDeletionPolicy str name=maxCommitsToKeep1/str str name=maxOptimizedCommitsToKeep0/str /deletionPolicy Due to 44 minutes optimization time we do an optimization once a day during the night. I will try with an smaler index on my development system. Best

PECL SOLR PHP extension, JSON output

2011-04-21 Thread roySolr
Hello, I use the PECL php extension for SOLR. I want my output in JSON. This is not working: $query-set('wt', 'json'); How do i solve this problem? -- View this message in context: http://lucene.472066.n3.nabble.com/PECL-SOLR-PHP-extension-JSON-output-tp2846092p2846092.html Sent from the

Re: Solr - upgrade from 1.4.1 to 3.1 - finding AbstractSolrTestCase binaries - help please?

2011-04-21 Thread lboutros
There is a jar for the tests in solr. I added this dependency in my pom.xml : dependency groupIdorg.apache.solr/groupId artifactIdsolr-core/artifactId version3.1-SNAPSHOT/version classifiertests/classifier scopetest/scope

Re: The issue of import data from database using Solr DIH

2011-04-21 Thread Em
Hi Kevin, I think you made OS06Y the uniqueKey, right? So, in entity 1 you specify values for it, but in entity 2 you do so as well. I am not absolutely sure about this, but: It seems like your two entities create two documents and the second will overwrite the first. Have a look at this page:

Re: Apache Spam Filter Blocking Messages

2011-04-21 Thread Em
This really helps at the mailinglists. If you send your mails with Thunderbird, be sure to check that you enforce plain-text-emails. If not, it will often send HTML-mails. Regards, Em Marvin Humphrey wrote: On Thu, Apr 21, 2011 at 12:30:29AM -0400, Trey Grainger wrote:

Re: How to return score without using _val_

2011-04-21 Thread Em
Hi, I agree with Yonik here - I do not understand what you would like to do as well. But some additional note from my side: Your FQs never influences the score! Of course you can specify the same query twice, once as a filter - query and once as a regular query but I do not see the reason to do

Re: entity name issue

2011-04-21 Thread Em
Hi Tjong, seems like your XML was invalid. Try the following and compare it to your original config: entity name=e_a query=select myschema.table_a.aid as id, myschema.table_a.aid as a_aid from myschema.table_a where '${dataimporter.request.clean}' != 'false' and myschema.table_a.aid

RE: The issue of import data from database using Solr DIH

2011-04-21 Thread Kevin Xiang
Thanks Em. Yes, OS06Y is the uniqueKey. Table1 and Table2 is parallel in my example. In the Url: http://wiki.apache.org/solr/DIHQuickStart#Index_data_from_multiple_table s_into_Solr The tables don't have parallel relations in the above URL example I want to know that can solr implement the case?

RE: stemming filter analyzers, any favorites?

2011-04-21 Thread Em
Hi Robert, we often ran into the same issue with stemmers. This is why we created more than one field, each field with different stemmers. It adds some overhead but worked quite well. Regarding your off-topic-question: Look at the debugging-output of your searches. Sometimes you configured your

RE: The issue of import data from database using Solr DIH

2011-04-21 Thread Em
Not sure I understood you correct: You expect that OS06Y stores *two* different performanceIds? One from table1 and the other from table2? I think this may be a problem. If both OS06Y-keys are equal, than you can use the syntax as mentioned in the wiki without any problems. You just have to

RE: Need to create dyanamic indexies base on different document workspaces

2011-04-21 Thread Gaurav Shingala
Is it possible to create solr core dyanamically? In our case we want each workspace to have its own solr index. Thanks From: chandan.tamra...@nepasoft.com Date: Thu, 21 Apr 2011 11:57:53 +0545 Subject: Re: Need to create dyanamic indexies base on different document workspaces To:

Re: how to abort a running optimize

2011-04-21 Thread Em
Hi Stockii, how did you configured your segments-number in Solrconfig.xml? Decrease the number to speed up things automatically. Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-abort-a-running-optimize-tp2838721p2846369.html Sent from the Solr - User

RE: Need to create dyanamic indexies base on different document workspaces

2011-04-21 Thread Em
Yes, have a look at the wiki-page. It explains some configurations and REST-API-methods to create cores dynamically and if/how they are persisted. Regards, Em Gaurav Shingala wrote: Is it possible to create solr core dyanamically? In our case we want each workspace to have its own solr

RE: The issue of import data from database using Solr DIH

2011-04-21 Thread Kevin Xiang
I try remove the OS06Y-field from your second entity ,import the second entity failed. Give a example: Table1: OS06Y=123,f1=100,f2=200,f3=300; OS06Y=456,f1=100,f2=200,f3=300; Table2: OS06Y=123,f4=100,f5=200; OS06Y=456,f4=100; OS06Y=789,f4=100; I want the result:

Re: Need to create dyanamic indexies base on different document workspaces

2011-04-21 Thread Chandan Tamrakar
Actually you need to put a file named *solr.xml* in the solr.home directory to create the solr core . you can do that programatically if you want to make it dynamic based on your logic pls check the solr core admin document. On Thu, Apr 21, 2011 at 2:52 PM, Gaurav Shingala

RE: The issue of import data from database using Solr DIH

2011-04-21 Thread lboutros
What you want to do is something like a left outer join, isn't it ? something like : select table2.OS06Y, f1,f2,f3,f4,f5 from table2 left outer join table1 on table2.OS06Y = table1.OS06Y where ... could you prepare a view in your RDBMS ? That could be another solution ? Ludovic. - Jouve

RE: The issue of import data from database using Solr DIH

2011-04-21 Thread Em
As Iboutrus mentioned, if you can summarize it in a query, than yes, Solr can handle it. Make a step backward: Do not think of Solr. Write a query (one! query) that shows exactly the output you exepct. Afterwards, implement this query as a source for DIH. Regards, Em -- View this message in

Re: Need to create dyanamic indexies base on different document workspaces

2011-04-21 Thread Em
Additionally, there is an already set up example for a multicore-setup in the example directory of your Solr-distribution. Regards, Em -- View this message in context:

RE: The issue of import data from database using Solr DIH

2011-04-21 Thread Kevin Xiang
Yes, it is like the left outer join. In my example.the table may be table or view or stored procedure,I can not change it in database. If for every id in table1,we need search the fields by id from table2 in database,it will met performance issue,especially the size of tables are very big.

Unable to load EntityProcessor implementation for entity:16865747177753

2011-04-21 Thread vrpar...@gmail.com
hello i have one datasource - is sql server db and second datasource - is file but dynamic means based on first datasource db record i want to fetch one file that's why i try to use tikaentityprocessor but got following error

Re: PECL SOLR PHP extension, JSON output

2011-04-21 Thread Stefan Matheis
give it a try: http://php.net/manual/en/solrclient.setresponsewriter.php On Thu, Apr 21, 2011 at 9:03 AM, roySolr royrutten1...@gmail.com wrote: Hello, I use the PECL php extension for SOLR. I want my output in JSON. This is not working: $query-set('wt', 'json'); How do i solve this

Re: HTMLStripCharFilterFactory, highlighting and InvalidTokenOffsetsException

2011-04-21 Thread Robert Gründler
On 20.04.11 18:51, Robert Muir wrote: Hi, there is a proposed patch uploaded to the issue. Maybe you can help by reviewing/testing it? if i succeed in compiling solr, i can test the patch. Is this the right starting point for such an endeavour ? http://wiki.apache.org/solr/HackingSolr

Re: Unable to load EntityProcessor implementation for entity:16865747177753

2011-04-21 Thread firdous_kind86
can i see your tikaconfig.xml? meanwhile have a look at this bug: https://issues.apache.org/jira/browse/SOLR-2116 a similar thread also exists: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-td2839188.html -- View this message in context:

Re: PECL SOLR PHP extension, JSON output

2011-04-21 Thread roySolr
I have tried that but it seems like JSON is not supported Parameters responseWriter One of the following : - xml - phpnative -- View this message in context: http://lucene.472066.n3.nabble.com/PECL-SOLR-PHP-extension-JSON-output-tp2846092p2846728.html Sent from the Solr -

Can't determine Sort Order error when using sort by function

2011-04-21 Thread Otis Gospodnetic
Hello, I'm trying out sorting by function with the new function queries and invariably getting this error: Can't determine Sort Order: 'termfreq(name,samsung)', pos=22 Here's an example call: http://localhost:8983/solr/select/?q=*:*sort=termfreq%28name,samsung%29 What am I doing wrong?

Re: PECL SOLR PHP extension, JSON output

2011-04-21 Thread Stefan Matheis
Hm yes correct .. there is a explicit validation of response-writers in place. if you want to modify it yourself, check the current trunk (http://svn.php.net/repository/pecl/solr/trunk/) modify solr_constants.h, define another response_writer and add another check in solr_functions_helpers.c in

Re: entity name issue

2011-04-21 Thread tjtong
Hi Em, Thanks a lot! But it still does not work. Actually my where clause in my query was '${dataimporter.request.clean}' != 'false' and myschema.table_a.aid=${dataimporter.request.aid} which I used to pass a value to the full import process, and it worked without the prefix myschema. on sybase

Re: Can't determine Sort Order error when using sort by function

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 8:30 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, I'm trying out sorting by function with the new function queries and invariably getting this error:  Can't determine Sort Order: 'termfreq(name,samsung)', pos=22 Here's an example call:

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort
OK, so I copied my index and ran solr3.1 against it. Qtime dropped, from about 40s to 17s! This is good news, but still longer than i hoped for. I tried to do the same text with 4.0, but i'm getting IndexFormatTooOldException since my index was created using 1.4.1. Is my only chance to test this

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 9:24 AM, Ofer Fort o...@tra.cx wrote: Another strange behavior is that the Qtime seems pretty stable, no matter how many object match my query. 200K and 20K both take about 17s. I would have guessed that since the time is going over all the terms of all the subset

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort
Not sure i fully understand, If facet.method=enum steps over all terms in the index for that field, than what does setting the q=field:subset do? if i set the q=*:*, than how do i get the frequency only on my subset? Ofer On Thu, Apr 21, 2011 at 4:40 PM, Yonik Seeley

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 9:44 AM, Ofer Fort o...@tra.cx wrote: Not sure i fully understand, If facet.method=enum steps over all terms in the index for that field, than what does setting the q=field:subset do? if i set the q=*:*, than how do i get the frequency only on my subset? It's an

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort
I see, thanks. So if I would want to implement something that would fit my needs, would going through the subset of documents and counting all the terms in each one, would be faster? and easier to implement? On Thu, Apr 21, 2011 at 5:36 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu,

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 10:41 AM, Ofer Fort o...@tra.cx wrote: I see, thanks. So if I would want to implement something that would fit my needs, would going through the subset of documents and counting all the terms in each one, would be faster? and easier to implement? That's not just your

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort
So if i want to use the facet.method=fc, is there a way to speed it up? and remove the bucket size limitation? On Thu, Apr 21, 2011 at 5:58 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Apr 21, 2011 at 10:41 AM, Ofer Fort o...@tra.cx wrote: I see, thanks. So if I would want to

Re: Apache Spam Filter Blocking Messages

2011-04-21 Thread Trey Grainger
Good to know; I'll go change those settings, then.  Thanks for the feedback. -Trey On Thu, Apr 21, 2011 at 4:42 AM, Em mailformailingli...@yahoo.de wrote: This really helps at the mailinglists. If you send your mails with Thunderbird, be sure to check that you enforce plain-text-emails. If

Re: old searchers not closing after optimize or replication

2011-04-21 Thread Trey Grainger
Hey Bernd, Checkout https://issues.apache.org/jira/browse/SOLR-2469.  There is a pretty bad bug in Solr 3.1 which occurs if you have  str name=replicateAfterstartup/str set in your replication configuration in solrconfig.xml.  See the thread between Yonik and myself from a few days ago titled

RE: stemming filter analyzers, any favorites?

2011-04-21 Thread Robert Petersen
Adding another field with another stemmer and searching both??? Wow never thought of doing that. I guess that doesn't really double the size of your index tho because all the terms are almost the same right? Let me look into that. I'll raise the other issue in a separate thread and thanks.

Re: Multiple Tags and Facets

2011-04-21 Thread Em
Are there no ideas of how to use multiple tags per filter or to combine some tags for excluding more than one filter per facet? Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Tags-and-Facets-tp2843130p2847569.html Sent from the Solr - User mailing list

RE: stemming filter analyzers, any favorites?

2011-04-21 Thread Em
As far as I know Lucene does not store an inverted index per field, so no, it would not double the size of the index. However, it could influence the score a little bit. For example: If both stemmers reduce schools to school and you are searching for all schools in america the term school has

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 11:15 AM, Ofer Fort o...@tra.cx wrote: So if i want to use the facet.method=fc, is there a way to speed it up? and remove the bucket size limitation? Not really - else we would have done it already ;-) We don't really have great methods for faceting on full-text fields

RE: stemming filter analyzers, any favorites?

2011-04-21 Thread Robert Petersen
Nice! Thanks! -Original Message- From: Em [mailto:mailformailingli...@yahoo.de] Sent: Thursday, April 21, 2011 9:23 AM To: solr-user@lucene.apache.org Subject: RE: stemming filter analyzers, any favorites? As far as I know Lucene does not store an inverted index per field, so no, it

Solr search based on list of terms. Order by max(score) for each term.

2011-04-21 Thread Bogdan STOICA
Hello, I am trying to query a solr server in order to obtain the most relevant results for a list of terms. For example i have the list of words nokia, iphone, charger My schema contains the following data: nokia iphone nokia iphone otherwords nokia white iphone white If I run a simple query

Re: Multiple Tags and Facets

2011-04-21 Thread Jay Hill
I don't think I understand what you're trying to do. Are you trying to preserve all facets after a user clicks on a facet, and thereby triggers a filter query, which excludes the other facets? If that's the case, you can use local parameters to tag the filter queries so they are not used for the

Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Scott Bigelow
I've been using Solr for a while now, indexing 2-4 million records using the DIH to pull data from MySQL, which has been working great. For a new project, I need to index about 20M records (30 fields) and I have been running into issues with MySQL disconnects, right around 15M. I've tried several

Re: Multiple Tags and Facets

2011-04-21 Thread Em
Hi Jay, thank you for your reply. We most enhance your example to reproduce what I mean: You got the following facets: project: - Solr - Lucene - Nutch - Mahout source: - Documentation - Mailinglist - Wiki - Commercial Websites What I want now is: When I click on Solr + Documentation

MoreLikeThis

2011-04-21 Thread Brian Lamb
Hi all, I have an mlt search set up on my site with over 2 million records in the index. Normally, my results look like: response lst name=responseHeader int name=status0/int int name=QTime204/int /lst result name=match numFound=41750 start=0 doc str name=titleSome

Re: Multiple Tags and Facets

2011-04-21 Thread Chris Hostetter
: I watched an online video with Chris Hostsetter from Lucidimagination. He : showed the possibility of having some Facets that exclude *all* filter while : also having some Facets that take care of some of the set filters while : ignoring other filters. FWIW: That webinar is nearly identical to

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Robert Gründler
we're indexing around 10M records from a mysql database into a single solr core. The DataImportHandler needs to join 3 sub-entities to denormalize the data. We've run into some troubles for the first 2 attempts, but setting batchSize=-1 for the dataSource resolved the issues. Do you need a lot

Re: HTMLStripCharFilterFactory, highlighting and InvalidTokenOffsetsException

2011-04-21 Thread Erick Erickson
Perhaps a better place to start is here: http://wiki.apache.org/solr/HowToContribute#Contributing_Code_.28Features.2C_Big_Fixes.2C_Tests.2C_etc29 That page also has information about setting up Eclipse or IntelliJ environments. But the place to start is to get the source and get to the point

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort
Well, it was worth the try;-) But will using the facet.method=fc, will reducing the subset size reduce the time and memory? Meaning is it an O( ndocs of the set)? Thanks On Thursday, April 21, 2011, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Apr 21, 2011 at 11:15 AM, Ofer Fort

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 6:25 PM, Ofer Fort o...@tra.cx wrote: Well, it was worth the try;-) But will using the facet.method=fc, will reducing the subset size reduce the time and memory? Meaning is it an O( ndocs of the set)? facet.method=fc builds a multi-valued fieldcache like structure

Index upgrade from 1.4.1 to 3.1 and 4.0

2011-04-21 Thread Ofer Fort
Hi all, While doing some tests, I realized that an index that was created with solr 1.4.1 is readable by solr 3.1, but nt readable by solr 4.0. If I plan to migrate my index to 4.0, and I prefer not to reindex it all, what would be my best course of action? Will it be possible to continue to write

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Scott Bigelow
Thanks for your response! I think the issue is that the records are being returned TOO fast from MySQL. I can dump them to CSV in about 30 minutes, but building the solr index takes hours on the system I'm using. I may just need to use a more powerful Solr instance so it doesn't leave MySQL

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort
So I'm guessing my best approach now would be to test trunk, and hope that as 3.1 cut the performance in half, trunk will do the same Thanks for the info Ofer On Friday, April 22, 2011, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Apr 21, 2011 at 6:25 PM, Ofer Fort o...@tra.cx wrote:

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 6:34 PM, Ofer Fort o...@tra.cx wrote: So I'm guessing my best approach now would be to test trunk, and hope that as 3.1 cut the performance in half, trunk will do the same Trunk prob won't be much better... but the bulkpostings branch possibly could be. -Yonik

Re: Multiple Tags and Facets

2011-04-21 Thread Em
Thank you Hoss. I will try the comma-separated thing out. It seems to be what I searched for. :) Regards, Em Chris Hostetter-3 wrote: : I watched an online video with Chris Hostsetter from Lucidimagination. He : showed the possibility of having some Facets that exclude *all* filter while

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort
Ok, I'll give it a try, as this is a server I am willing to risk. How is the competability between solrj of bulkpostings, trunk, 3.1 and 1.4.1? On Friday, April 22, 2011, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Apr 21, 2011 at 6:34 PM, Ofer Fort o...@tra.cx wrote: So I'm guessing

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 6:50 PM, Ofer Fort o...@tra.cx wrote: Ok, I'll give it a try, as this is a server I am willing to risk. How is the competability between solrj of bulkpostings, trunk, 3.1 and 1.4.1? bulkpostings, trunk, and 3.1 should all be relatively solrj compatible. But the SolrJ

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort
Ok, thanks On Friday, April 22, 2011, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Apr 21, 2011 at 6:50 PM, Ofer Fort o...@tra.cx wrote: Ok, I'll give it a try, as this is a server I am willing to risk. How is the competability between solrj of bulkpostings, trunk, 3.1 and 1.4.1?

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Chris Hostetter
: For a new project, I need to index about 20M records (30 fields) and I : have been running into issues with MySQL disconnects, right around : 15M. I've tried several remedies I've found on blogs, changing if you can provide some concrete error/log messages and the details of how you are

term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-21 Thread Robert Petersen
So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory settings I cannot get a match between AppleTV on the indexing side and appletv on the search side. Without that setting the all lowercase version of AppleTV is in term position two due to the catenateWords=1 or the

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Scott Bigelow
Thanks for the e-mail. I probably should have provided more details, but I was more interested in making sure I was approaching the problem correctly (using DIH, with one big SELECT statement for millions of rows) instead of solving this specific problem. Here's a partial stacktrace from this

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Li
Can you post the dataconfig.XML? Probably you didn't use batch size Sent from my iPhone On Apr 21, 2011, at 5:09 PM, Scott Bigelow eph...@gmail.com wrote: Thanks for the e-mail. I probably should have provided more details, but I was more interested in making sure I was approaching the

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 8:06 PM, Robert Petersen rober...@buy.com wrote: So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory settings I cannot get a match between AppleTV on the indexing side and appletv on the search side. Hmmm, that shouldn't be the case. The text field

Re: How to return score without using _val_

2011-04-21 Thread Bill Bell
I know that the _val_ is the only thing influencing the score. The fq is just to limit also by those queries. What I am asking is if it is possible to just influence the score using _val_ but not in the Q parameter? Something like bq=val_:{!type=dismax qf=$qqf v=$qspec} _val_:{!type=dismax