date:20111025

Re: java.net.SocketException: Too many open files

2011-10-25 Thread Jonty Rhods

Hi Yonik,

thanks for reply.

Currently I have more than 50 classes and every class have their own
SolrServer server =  new CommonsHttpSolrServer("
http://localhost:8080/solr/core0";);
Majority of classes connect to core0 however there are many cores which is
connecting from different classes.

My senario is
expecting 40 to 50 hit on server every day and the server is
deployed on tomcat 6.20 with 12GB heap size to catalina. My OS is Red hat
Linux (production) and using Ubuntu as development server.
Logically I can make a common class for connecting solr server but now
question is :

1. If I using a common class then I must have to use max connection on
httpclient what will the ideal default setting for my current problem.
2. I am expecting concurrent connection at pick time is minimum 5000 hits to
solr server.
3. If I will use SolrServer server =  new CommonsHttpSolrServer("
http://localhost:8080/solr/core0";); using common class across all classes
then will it help to resolve current problem with the current load I dont
want my user experience slow response from solr server to get result.
4. As other users overcome with this issue to increase or to make unlimited
TCP/IP setting on OS level. Is it right approach?

As I am newly shift to Java language so any piece of code will much
appreciated and will much easier for me to understand.

thanks.

regards
Jonty

On Wed, Oct 26, 2011 at 1:37 AM, Yonik Seeley wrote:

> On Tue, Oct 25, 2011 at 4:03 PM, Jonty Rhods 
> wrote:
> > Hi,
> >
> > I am using solrj and for connection to server I am using instance of the
> > solr server:
> >
> > SolrServer server =  new CommonsHttpSolrServer("
> > http://localhost:8080/solr/core0";);
>
> Are you reusing the server object for all of your requests?
> By default, Solr and SolrJ use persistent connections, meaning that
> sockets are reused and new ones are not opened for every request.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> > I noticed that after few minutes it start throwing exception
> > java.net.SocketException: Too many open files.
> > It seems that it related to instance of the HttpClient. How to resolved
> the
> > instances to a certain no. Like connection pool in dbcp etc..
> >
> > I am not experienced on java so please help to resolved this problem.
> >
> >  solr version: 3.4
> >
> > regards
> > Jonty
> >
>

Re: java.net.SocketException: Too many open files

2011-10-25 Thread Bui Van Quy


Hi,

I had save problem "Too many open files" but it is logged by Tomcat 
server. Please check your index directory if there are too much index 
files please execute Solr optimize command. This exception is raised by 
OS of server, you can google for researching it.



On 10/26/2011 3:07 AM, Yonik Seeley wrote:

On Tue, Oct 25, 2011 at 4:03 PM, Jonty Rhods  wrote:

Hi,

I am using solrj and for connection to server I am using instance of the
solr server:

SolrServer server =  new CommonsHttpSolrServer("
http://localhost:8080/solr/core0";);

Are you reusing the server object for all of your requests?
By default, Solr and SolrJ use persistent connections, meaning that
sockets are reused and new ones are not opened for every request.

-Yonik
http://www.lucidimagination.com



I noticed that after few minutes it start throwing exception
java.net.SocketException: Too many open files.
It seems that it related to instance of the HttpClient. How to resolved the
instances to a certain no. Like connection pool in dbcp etc..

I am not experienced on java so please help to resolved this problem.

  solr version: 3.4

regards
Jonty

Re: Error loading ICUTokenizerFactory

2011-10-25 Thread Tomek Rej

Looks like another person had the same problem as me.
The solution to the issue can be found here:
http://lucene.472066.n3.nabble.com/Solr-3-1-ICU-filters-error-loading-class-td2835323.html

Perhaps the person in charge of the documentation could add apache-solr
-analysis-extras-X.Y.jar as a requirement.

-- 

*Tomek Rej** *| Developer

*roamz**
**23 Foster Street*
*Surry Hills NSW 2010 Australia***
*M* +61 431 829 593
*E* tomek.rej @roamz.com *
***

Error loading ICUTokenizerFactory

2011-10-25 Thread Tomek Rej

Hi everyone

I'm getting an exception when trying to use the solr.ICUTokenizerFactory:
   SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.ICUTokenizerFactory'

The code in the schema.xml that isn't working is:


  


I copied the jar files found in contrib/analysis-extras/lib and
contrib/analysis-extras/lucene-lib/lucene-libs to the solr/lib directory of
my project,
which is what I assume you have to do from reading some posts I found
online. However I must be doing something wrong as I'm getting the error
even with the jar files in my solr/lib directory.

When the above didn't work I tried changing solrconfig.xml to add extra lib
directives:



When I read the output on the command line it said the class loader was able
to load the jar files but I still got the same error loading class
'solr.ICUTokenizerFactory'


Does anyone know what I'm doing wrong?
Thanks for your help.

-- 

*Tomek Rej** *| Developer

*roamz**
**23 Foster Street*
*Surry Hills NSW 2010 Australia***
*M* +61 431 829 593
*E* tomek.rej @roamz.com *
***

Re: help needed on solr-uima integration

2011-10-25 Thread Xue-Feng Yang

I configured solr-uima integration as the resource() I could found, but the 
data import results had empty data from uima. The other fields not from uima 
were there and no error messages. 

The following were the steps I did:

1) set shema.xml with all fields of both uima and non uima.
2) set lib, updateRequestProcessorChain for AE and maps,  requestHandler for 
update, and  DataImportHandler for config and update.processor.

Do I still miss anything?

Thanks,

Xue-Feng

From: Xue-Feng Yang 
To: "solr-user@lucene.apache.org" 
Sent: Monday, October 24, 2011 11:21:14 AM
Subject: Re: help needed on solr-uima integration

Thanks Koji. I found it. I should the solution there.

Xue-Feng

From: Koji Sekiguchi 
To: solr-user@lucene.apache.org
Sent: Monday, October 24, 2011 7:30:01 AM
Subject: Re: help needed on solr-uima integration

(11/10/24 17:42), Xue-Feng Yang wrote:
> Hi,
>
> Where can I find test code for
 solr-uima component?

You should find them under:

solr/contrib/uima/src/test

koji
-- 
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Difficulties Installing Solr with Jetty 7.x

2011-10-25 Thread Scott Vanderbilt

Hello. I am having trouble installing Solr 3.4.0 with Jetty 7.5.3. My OS 
is OpenBSD 5.0, and JDK is 1.7.0.


I was able to successfully run the Solr example application which comes 
bundled with an earlier version of Jetty (not sure which, but I'm 
assuming pre-version 7). I would like--if at all possible--to run the 
latest version of Jetty.


After some confusion resulting from the fact that the Jetty-specific 
install docs at  are apparently 
out of sync with the newest versions of Jetty, I was able to make some 
progress by cloning the sample contexts file at 
JETTY_HOME/contexts/test.xml, the contents of which is below. Also below 
is the output when attempting to start Solr in the Jetty container. 
(Sorry about the line-wrapping)


When I start Jetty's test application, I can successfully retrieve the 
home page. However, when I attempt to start Solr, Jetty is obviously up 
and serving HTTP requests, but attempts to connect to 
 result in a 404.


Might someone be able to point out what mistake I am making? I'm sure 
it's in the java output somewhere, but I am unable to discern where. 
Alternatively, any pointers to relevant docs to help me get going would 
also be greatly appreciated.


Many thanks in advance.


=
OUTPUT
=
jetty $/usr/local/jdk-1.7.0/bin/java -Dsolr.solr.home=/var/jetty/solr 
-jar /var/jetty/start.jar

2011-10-25 16:44:50.110:INFO:oejs.Server:jetty-7.5.3.v20111011
2011-10-25 16:44:50.160:INFO:oejdp.ScanningAppProvider:Deployment 
monitor /var/jetty/webapps at interval 1
2011-10-25 16:44:50.168:INFO:oejdp.ScanningAppProvider:Deployment 
monitor /var/jetty/contexts at interval 1
2011-10-25 16:44:50.173:INFO:oejd.DeploymentManager:Deployable added: 
/var/jetty/contexts/javadoc.xml
2011-10-25 16:44:50.240:INFO:oejsh.ContextHandler:started 
o.e.j.s.h.ContextHandler{/javadoc,file:/var/jetty/javadoc/}
2011-10-25 16:44:50.241:INFO:oejd.DeploymentManager:Deployable added: 
/var/jetty/contexts/test.xml
2011-10-25 16:44:50.358:INFO:oejw.WebInfConfiguration:Extract 
jar:file:/var/jetty/webapps/test.war!/ to 
/tmp/jetty-0.0.0.0-8080-test.war-_-any-/webapp
2011-10-25 16:44:51.155:INFO:oejsh.ContextHandler:started 
o.e.j.w.WebAppContext{/,file:/tmp/jetty-0.0.0.0-8080-test.war-_-any-/webapp/},/var/jetty/webapps/test.war
2011-10-25 16:44:51.539:INFO:oejs.TransparentProxy:TransparentProxy @ 
/javadoc-proxy to http://download.eclipse.org/jetty/stable-7/apidocs
2011-10-25 16:44:51.543:INFO:oejd.DeploymentManager:Deployable added: 
/var/jetty/contexts/solr.xml
2011-10-25 16:44:51.564:INFO:oejw.WebInfConfiguration:Extract 
jar:file:/var/jetty/webapps/solr.war!/ to 
/tmp/jetty-0.0.0.0-8080-solr.war-_-any-/webapp
2011-10-25 16:44:52.850:INFO:oejsh.ContextHandler:started 
o.e.j.w.WebAppContext{/,file:/tmp/jetty-0.0.0.0-8080-solr.war-_-any-/webapp/},/var/jetty/webapps/solr.war
Oct 25, 2011 4:44:52 PM org.apache.solr.core.SolrResourceLoader 
locateSolrHome

INFO: JNDI not configured for solr (NoInitialContextEx)
Oct 25, 2011 4:44:52 PM org.apache.solr.core.SolrResourceLoader 
locateSolrHome

INFO: using system property solr.solr.home: /var/jetty/solr
Oct 25, 2011 4:44:52 PM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to '/var/jetty/solr/'
Oct 25, 2011 4:44:53 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Oct 25, 2011 4:44:53 PM org.apache.solr.core.SolrResourceLoader 
locateSolrHome

INFO: JNDI not configured for solr (NoInitialContextEx)
Oct 25, 2011 4:44:53 PM org.apache.solr.core.SolrResourceLoader 
locateSolrHome

INFO: using system property solr.solr.home: /var/jetty/solr
Oct 25, 2011 4:44:53 PM org.apache.solr.core.CoreContainer$Initializer 
initialize

INFO: looking for solr.xml: /var/jetty/solr/solr.xml
Oct 25, 2011 4:44:53 PM org.apache.solr.core.SolrResourceLoader 
locateSolrHome

INFO: JNDI not configured for solr (NoInitialContextEx)
Oct 25, 2011 4:44:53 PM org.apache.solr.core.SolrResourceLoader 
locateSolrHome

INFO: using system property solr.solr.home: /var/jetty/solr
Oct 25, 2011 4:44:53 PM org.apache.solr.core.CoreContainer 
INFO: New CoreContainer: solrHome=/var/jetty/solr/ instance=6633803
Oct 25, 2011 4:44:53 PM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to '/var/jetty/solr/'
Oct 25, 2011 4:44:53 PM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to '/var/jetty/solr/./'
Oct 25, 2011 4:44:53 PM org.apache.solr.core.SolrConfig initLibs
INFO: Adding specified lib dirs to ClassLoader
Oct 25, 2011 4:44:53 PM org.apache.solr.core.SolrConfig 
INFO: Using Lucene MatchVersion: LUCENE_34
Oct 25, 2011 4:44:53 PM org.apache.solr.core.SolrConfig 
INFO: Loaded SolrConfig: solrconfig.xml
Oct 25, 2011 4:44:53 PM org.apache.solr.schema.IndexSchema readSchema
INFO: Reading Solr Schema
Oct 25, 2011 4:44:53

Re: Solr main query response & input to facet query

2011-10-25 Thread lee carroll

Take a look at facet query. You can facet on a query results not just
terms in a field

http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting



On 25 October 2011 10:56, Erik Hatcher  wrote:
> I'm not following exactly what you're looking for here, but sounds like you 
> want to facet on name... &facet=on&facet.field=name1
>
> and then to filter on a selected one, you can use fq=name:name1
>
>        Erik
>
> On Oct 24, 2011, at 20:18 , solrdude wrote:
>
>> Hi,
>> I am implementing an solr solution where I want to use some field values
>> from main query output as an input in building facet. How do I do that?
>>
>> Eg:
>> Response from main query:
>>
>> 
>> name1
>> 200
>> 
>> 
>> name1
>> 400
>> 
>>
>> I want to build facet for the query where "prod_id:200 prod_id:400". I like
>> to do all this in single query ideally. if it can't be done in one query, I
>> am ok with 2 query as well. Please help.
>>
>> Thanks
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Solr-main-query-response-input-to-facet-query-tp3449938p3449938.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: java.net.SocketException: Too many open files

2011-10-25 Thread Péter Király

One note for this. I had a trouble to reset the root's limit in
Ubuntu. Somewhere I read, that Ubuntu doesn't give you even the
correct number of limit. The solution to this problem is to run Solr
under another user.

Péter

2011/10/25 Markus Jelsma :
> This is on Linux? This should help:
>
> echo fs.file-max = 16384 >> /etc/sysctl.conf
>
> On some distro's like Debian it seems you also have to add these settings to
> security.conf, otherwise it may not persist between reboots or even shell
> sessions:
>
> echo "systems hard nofile 16384
> systems soft nofile 16384" >> /etc/security/limits.conf
>
>
>> Hi,
>>
>> I am using solrj and for connection to server I am using instance of the
>> solr server:
>>
>> SolrServer server =  new CommonsHttpSolrServer("
>> http://localhost:8080/solr/core0";);
>>
>> I noticed that after few minutes it start throwing exception
>> java.net.SocketException: Too many open files.
>> It seems that it related to instance of the HttpClient. How to resolved the
>> instances to a certain no. Like connection pool in dbcp etc..
>>
>> I am not experienced on java so please help to resolved this problem.
>>
>>  solr version: 3.4
>>
>> regards
>> Jonty
>



-- 
Péter Király
eXtensible Catalog
http://eXtensibleCatalog.org
http://drupal.org/project/xc

Re: Query/Delete performance difference between straight HTTP and SolrJ

2011-10-25 Thread Shawn Heisey


On 10/20/2011 11:00 AM, Shawn Heisey wrote:
I've got two build systems for my Solr index that I wrote.  The first 
one is in Perl and uses GET/POST requests via HTTP, the second is in 
Java using SolrJ.  I've noticed a performance discrepancy when 
processing every one of my delete records, currently about 25000 of 
them.  It takes about 5 seconds in Perl and a minute or more via 
SolrJ.  In the perl system, I do a full delete like this once an 
hour.  The performance impact of doing it once an hour in the SolrJ 
version has forced me to do it only once per day.  The normal delete 
process in both cases looks for new records and deletes just those.  
It happens every two minutes in the Perl program and every minute in 
the Java program.


I've managed to make this somewhat better by using multiple threads to 
do all the deletes on the six large static indexes at once, but that 
shouldn't be required.  The Perl version doesn't do them at the same time.


I asked on the #solr IRC channel.  Only one person responded, and didn't 
really know how to help me.  He did say one thing that intrigues me:


10:27 < cedrichurst> the only difference i could see is deserializing 
the java binary object


Any thoughts from anyone else?  If deserializing is slow, is there any 
way to avoid it or speed it up?


Thanks,
Shawn

Incorrect Search Results showing up

2011-10-25 Thread aronitin

Hi Group,

I've the defined a type "text" in the SOLR schema as shown below. 


  






  
  






  


A multi valued field is defined to use the type defined above


I index some content such as 
- Google REST API
- Facebook REST API
- Software Architecture
- Design Documents
- Xml Web Services
- Web API design

When I issue a search query like content:"rest api"~4, the matches that I
get are
- Google REST API (which is fine)
- Facebook REST API (which is fine)
- *Web API design* (which is not fine, because the query was a phrase query
and rest and api should be within 4 words of each other)

Does any body see the 3rd search result as a correct search result to be
returned? If yes, then what is explanation for that result based on the
schema defined.

According to me 3rd result should not be returned as part of the search
result. If somebody can point out anything wrong in my schema it will be
great help to me.

Thanks
Nitin



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Incorrect-Search-Results-showing-up-tp3452810p3452810.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: org.apache.pdfbox.pdmodel.PDPage Error

2011-10-25 Thread Mike Sokolov


On 10/24/2011 02:35 PM, MBD wrote:

Is this really a stumper? This is my first experience with Solr and having spent 
only an hour or so with it I hit this barrier (below). I'm sure *I* am doing 
something completely wrong just hoping someone more familiar with the platform can 
help me identify&  fix it.

For starters...what's "Could not initialize class ..." mean in Java exactly? 
Maybe that the class (ie code) itself doesn't exist? - so perhaps I haven't downloaded 
all the pieces of the project? Or, could it be a hint that my kit is just not configured 
correctly? Sorry, I'm not a Java expert...but would like to get this stabilized...if 
possible.

   
Yeah - that's the problem. looks like the pdfbox jar is not installed in 
a place where Solr can find it (on its classpath).

If this is the wrong mailing list then just tell me and I'll go away...

Thanks!

On Oct 20, 2011, at 2:54 PM, MBD wrote:

Re: Loading data to SOLR first time ( taking too long)

2011-10-25 Thread Alain Rogister

Sishir,

I believe our main table has about half a million rows, which isn't a lot
but it has multiple dependent tables, several levels deep. The resulting XML
files were about 1 GB in total, split into around 15 files. We could feed
these files one at a time into Solr in as little as a few seconds per file
(tens of seconds on a slow machine), much less that the database export
actually.

In your case, it may be the join that is slowing things down in the DIH.
Depending on your schema, you *may* be able to write the DIH query
differently, or you could create a [materialized] view and use it in the DIH
query.

Alain

On Tue, Oct 25, 2011 at 10:50 PM, Awasthi, Shishir  wrote:

> Alain,
> How many rows did you export in this fashion and what was the
> performance?
>
> We do have oracle as underlying database with data obtained from
> multiple tables. The data is only 1 level deep except for one table
> where we need to traverse hierarchy to get information.
>
> How many XML files did you feed into SOLR one at a time?
>
> Shishir
>
> -Original Message-
> From: Alain Rogister [mailto:alain.rogis...@gmail.com]
> Sent: Tuesday, October 25, 2011 4:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Loading data to SOLR first time ( taking too long)
>
> Are you loading data from multiple tables ? How many levels deep ? After
> some experimenting, I gave up on the DIH because I found it to generate
> very chatty (one row at a time) SQL against my schema, and I experienced
> concurrency bugs unless multithreading was set to false, and I wasn't
> too confident in the incremental mode against a complex schema.
>
> Here is what worked for us (with Oracle):
>
> - create materialized views; make sure that you include a
> 'lastUpdateTime'
> field in the main table. This step may be unnecessary if your source
> data does not need any pre-processing / cleaning / reorganizing.
> - write a stored procedure that exports the data in Solr's XML format;
> parameterize it with a range of primary keys of your main table so that
> you can partition the export into manageable subsets. The XML format is
> very simple, no need for complex in-the-database XML functions to
> generate it.
> - use the database scheduler to run that procedure as a set of jobs; run
> a few of them in parallel.
> - use CURL or WGET or similar to feed the XML files into the index as
> soon as they are available.
> - compress and archive the XML files; they will come handy when you need
> to provision another index instance and will save you a lot of exporting
> time.
> - make sure your stored procedure can work in incremental mode: e.g.
> export all records updated after a certain timestamp; then just push the
> resulting XML into Solr.
>
> Alain
>
> On Tue, Oct 25, 2011 at 9:56 PM, Awasthi, Shishir
> wrote:
>
> > Hi,
> >
> > I recently started working on SOLR and loaded approximately 4 million
> > records to the solr using DataImportHandler. It took 5 days to
> > complete this process.
> >
> >
> >
> > Can you please suggest how this can be improved? I would like this to
> > be done in less than 6 hrs.
> >
> >
> >
> > Thanks,
> >
> > Shishir
> >
> > --
> > This message w/attachments (message) is intended solely for the use of
>
> > the intended recipient(s) and may contain information that is
> > privileged, confidential or proprietary. If you are not an intended
> > recipient, please notify the sender, and then please delete and
> > destroy all copies and attachments, and be advised that any review or
> > dissemination of, or the taking of any action in reliance on, the
> > information contained in or attached to this message is prohibited.
> > Unless specifically indicated, this message is not an offer to sell or
>
> > a solicitation of any investment products or other financial product
> > or service, an official confirmation of any transaction, or an
> > official statement of Sender. Subject to applicable law, Sender may
> > intercept, monitor, review and retain e-communications (EC) traveling
> > through its networks/systems and may produce any such EC to
> > regulators, law enforcement, in litigation and as required by law.
> > The laws of the country of each sender/recipient may impact the
> > handling of EC, and EC may be archived, supervised and produced in
> > countries other than the country in which you are located. This
> > message cannot be guaranteed to be secure or free of errors or
> viruses.
> >
> > References to "Sender" are references to any subsidiary of Bank of
> > America Corporation. Securities and Insurance Products: * Are Not FDIC
>
> > Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank
> > Deposit * Are Not a Condition to Any Banking Service or Activity * Are
>
> > Not Insured by Any Federal Government Agency. Attachments that are
> > part of this EC may have additional important disclosures and
> disclaimers, which you should read.
> > Thi

RE: Loading data to SOLR first time ( taking too long)

2011-10-25 Thread Awasthi, Shishir

Alain,
How many rows did you export in this fashion and what was the
performance?

We do have oracle as underlying database with data obtained from
multiple tables. The data is only 1 level deep except for one table
where we need to traverse hierarchy to get information.

How many XML files did you feed into SOLR one at a time?

Shishir

-Original Message-
From: Alain Rogister [mailto:alain.rogis...@gmail.com] 
Sent: Tuesday, October 25, 2011 4:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Loading data to SOLR first time ( taking too long)

Are you loading data from multiple tables ? How many levels deep ? After
some experimenting, I gave up on the DIH because I found it to generate
very chatty (one row at a time) SQL against my schema, and I experienced
concurrency bugs unless multithreading was set to false, and I wasn't
too confident in the incremental mode against a complex schema.

Here is what worked for us (with Oracle):

- create materialized views; make sure that you include a
'lastUpdateTime'
field in the main table. This step may be unnecessary if your source
data does not need any pre-processing / cleaning / reorganizing.
- write a stored procedure that exports the data in Solr's XML format;
parameterize it with a range of primary keys of your main table so that
you can partition the export into manageable subsets. The XML format is
very simple, no need for complex in-the-database XML functions to
generate it.
- use the database scheduler to run that procedure as a set of jobs; run
a few of them in parallel.
- use CURL or WGET or similar to feed the XML files into the index as
soon as they are available.
- compress and archive the XML files; they will come handy when you need
to provision another index instance and will save you a lot of exporting
time.
- make sure your stored procedure can work in incremental mode: e.g.
export all records updated after a certain timestamp; then just push the
resulting XML into Solr.

Alain

On Tue, Oct 25, 2011 at 9:56 PM, Awasthi, Shishir
wrote:

> Hi,
>
> I recently started working on SOLR and loaded approximately 4 million 
> records to the solr using DataImportHandler. It took 5 days to 
> complete this process.
>
>
>
> Can you please suggest how this can be improved? I would like this to 
> be done in less than 6 hrs.
>
>
>
> Thanks,
>
> Shishir
>
> --
> This message w/attachments (message) is intended solely for the use of

> the intended recipient(s) and may contain information that is 
> privileged, confidential or proprietary. If you are not an intended 
> recipient, please notify the sender, and then please delete and 
> destroy all copies and attachments, and be advised that any review or 
> dissemination of, or the taking of any action in reliance on, the 
> information contained in or attached to this message is prohibited.
> Unless specifically indicated, this message is not an offer to sell or

> a solicitation of any investment products or other financial product 
> or service, an official confirmation of any transaction, or an 
> official statement of Sender. Subject to applicable law, Sender may 
> intercept, monitor, review and retain e-communications (EC) traveling 
> through its networks/systems and may produce any such EC to 
> regulators, law enforcement, in litigation and as required by law.
> The laws of the country of each sender/recipient may impact the 
> handling of EC, and EC may be archived, supervised and produced in 
> countries other than the country in which you are located. This 
> message cannot be guaranteed to be secure or free of errors or
viruses.
>
> References to "Sender" are references to any subsidiary of Bank of 
> America Corporation. Securities and Insurance Products: * Are Not FDIC

> Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank 
> Deposit * Are Not a Condition to Any Banking Service or Activity * Are

> Not Insured by Any Federal Government Agency. Attachments that are 
> part of this EC may have additional important disclosures and
disclaimers, which you should read.
> This message is subject to terms available at the following link:
> http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender

> you consent to the foregoing.
>

--
This message w/attachments (message) is intended solely for the use of the 
intended recipient(s) and may contain information that is privileged, 
confidential or proprietary. If you are not an intended recipient, please 
notify the sender, and then please delete and destroy all copies and 
attachments, and be advised that any review or dissemination of, or the taking 
of any action in reliance on, the information contained in or attached to this 
message is prohibited. 
Unless specifically indicated, this message is not an offer to sell or a 
solicitation of any investment products or other financial

RE: Replication issues with multiple Slaves

2011-10-25 Thread Rob Nicholls

Thanks... Yes, and no. 

The main thing is, after the replicate failed below, I checked the master
and the files that it complains about below (and several others) did
exist... which is where I'm stumped about what is causing the issue (I have
added the maxCommits setting you mention below already).

I'll retest to confirm that there is only a single commit happening in this
scenario, and it's not some weird oddity to do with Windows just being an
arrse with file and path capitalization.


-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: 25 October 2011 20:51
To: solr-user@lucene.apache.org
Cc: Jaeger, Jay - DOT
Subject: Re: Replication issues with multiple Slaves

Are you frequently adding and deleting documents and committing those
mutations? Then it might try to download a file that doesnt exist anymore.
If that is the case try increasing :



> I noted that in these messages the left hand side is lower case 
> collection, but the right hand side is upper case Collection.  
> Assuming you did a cut/paste, could you have a core name mismatch 
> between a master and a slave somehow?
> 
> Otherwise (shudder):  could you be doing a commit while the 
> replication is in progress, causing files to shift about on it?  I'd 
> have expected (perhaps naively) solr to have some sort of lock to 
> prevent such a problem.  But if there is no internal lock, that would 
> be a serious matter (and could happen to us, too, down the road).
> 
> JRJ
> 
> -Original Message-
> From: Rob Nicholls [mailto:robst...@hotmail.com]
> Sent: Tuesday, October 25, 2011 10:32 AM
> To: solr-user@lucene.apache.org
> Subject: Replication issues with multiple Slaves
> 
> 
> Hey guys,
> 
> We have a Master (1 server) and 2 Slaves (2 servers) setup and running 
> replication across multiple cores.
> 
> However, the replication appears to behave sporadically and often 
> fails when left to replicate automatically via poll. More often than 
> not a replicate will fail after the slave has finished pulling down 
> the segment files, because it cannot find a particular file, giving errors
such as:
> 
> Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile
> SEVERE: Unable to move index file from:
> D:\web\solr\collection\data\index.2011102510\_3u.tii to:
> D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy
> 
> SEVERE: Unable to copy index file from:
> D:\web\solr\collection\data\index.2011102510\_3s.fdt to:
> D:\web\solr\Collection\data\index\_3s.fdt java.io.FileNotFoundException:
> D:\web\solr\collection\data\index.2011102510\_3s.fdt (The system 
> cannot find the file specified) at java.io.FileInputStream.open(Native
> Method)
> at java.io.FileInputStream.(Unknown Source)
> at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47)
> at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585)
> at
> org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621) 
> at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:31
> 7)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.
> java
> :267) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) 
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown 
> Source) at java.util.concurrent.FutureTask.runAndReset(Unknown Source) 
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> cces
> s$101(Unknown Source) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> unPe
> riodic(Unknown Source) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> un(U
> nknown Source) at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> at java.lang.Thread.run(Unknown Source)
> 
> For these files, I checked the master, and they did indeed exist.
> 
> Both slave machines are configured the same, with the same replication 
> settings and a 60 minutes poll interval.
> 
> Is it perhaps because both slave machines are trying to pull down 
> files at the same time? (and the other has a lock on the file, thus it 
> gets skipped
> maybe?)
> 
> Note: If I manually force replication on each slave, one at a time, 
> the replication always seems to work fine.
> 
> 
> 
> Is there any obvious explanation or oddities I should be aware of that 
> may cause this?
> 
> Thanks,
> Rob

Re: Loading data to SOLR first time ( taking too long)

2011-10-25 Thread Alain Rogister

Are you loading data from multiple tables ? How many levels deep ? After
some experimenting, I gave up on the DIH because I found it to generate very
chatty (one row at a time) SQL against my schema, and I experienced
concurrency bugs unless multithreading was set to false, and I wasn't too
confident in the incremental mode against a complex schema.

Here is what worked for us (with Oracle):

- create materialized views; make sure that you include a 'lastUpdateTime'
field in the main table. This step may be unnecessary if your source data
does not need any pre-processing / cleaning / reorganizing.
- write a stored procedure that exports the data in Solr's XML format;
parameterize it with a range of primary keys of your main table so that you
can partition the export into manageable subsets. The XML format is very
simple, no need for complex in-the-database XML functions to generate it.
- use the database scheduler to run that procedure as a set of jobs; run a
few of them in parallel.
- use CURL or WGET or similar to feed the XML files into the index as soon
as they are available.
- compress and archive the XML files; they will come handy when you need to
provision another index instance and will save you a lot of exporting time.
- make sure your stored procedure can work in incremental mode: e.g. export
all records updated after a certain timestamp; then just push the resulting
XML into Solr.

Alain

On Tue, Oct 25, 2011 at 9:56 PM, Awasthi, Shishir
wrote:

> Hi,
>
> I recently started working on SOLR and loaded approximately 4 million
> records to the solr using DataImportHandler. It took 5 days to complete
> this process.
>
>
>
> Can you please suggest how this can be improved? I would like this to be
> done in less than 6 hrs.
>
>
>
> Thanks,
>
> Shishir
>
> --
> This message w/attachments (message) is intended solely for the use of the
> intended recipient(s) and may contain information that is privileged,
> confidential or proprietary. If you are not an intended recipient, please
> notify the sender, and then please delete and destroy all copies and
> attachments, and be advised that any review or dissemination of, or the
> taking of any action in reliance on, the information contained in or
> attached to this message is prohibited.
> Unless specifically indicated, this message is not an offer to sell or a
> solicitation of any investment products or other financial product or
> service, an official confirmation of any transaction, or an official
> statement of Sender. Subject to applicable law, Sender may intercept,
> monitor, review and retain e-communications (EC) traveling through its
> networks/systems and may produce any such EC to regulators, law enforcement,
> in litigation and as required by law.
> The laws of the country of each sender/recipient may impact the handling of
> EC, and EC may be archived, supervised and produced in countries other than
> the country in which you are located. This message cannot be guaranteed to
> be secure or free of errors or viruses.
>
> References to "Sender" are references to any subsidiary of Bank of America
> Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are
> Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a
> Condition to Any Banking Service or Activity * Are Not Insured by Any
> Federal Government Agency. Attachments that are part of this EC may have
> additional important disclosures and disclaimers, which you should read.
> This message is subject to terms available at the following link:
> http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you
> consent to the foregoing.
>

RE: Loading data to SOLR first time ( taking too long)

2011-10-25 Thread Awasthi, Shishir

Ok that makes me feel better.

We have around 40 fields being loaded from multiple tables. Other than
not commiting every row is there any other setting that you make?

Are you also using DataImportHandler?

-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] 
Sent: Tuesday, October 25, 2011 4:03 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Loading data to SOLR first time ( taking too long)

My goodness.  We do 4 million in about 1/2 HOUR (7+ million in 40
minutes).

First question:  Are you somehow forcing Solr to do a commit for each
and every record?  If so, that way leads to the house of PAIN.

The thing to do next, I suppose, might be to try and figure out whether
the issue is in Solr proper, or in the database you are importing from.

What does your query against your database look like?
How many fields do you have per record (we have around 30, counting
copyField destinations)

Using a performance monitoring tool, try and find out the CPU
utilization, memory utilization, page write rates and physical disk
drive queue lengths to narrow down which of the two systems are having
the problem (assuming your database is not on the same machine as Solr!)

JRJ

-Original Message-
From: Awasthi, Shishir [mailto:shishir.awas...@baml.com]
Sent: Tuesday, October 25, 2011 2:57 PM
To: solr-user@lucene.apache.org
Subject: Loading data to SOLR first time ( taking too long)

Hi,

I recently started working on SOLR and loaded approximately 4 million
records to the solr using DataImportHandler. It took 5 days to complete
this process.

 

Can you please suggest how this can be improved? I would like this to be
done in less than 6 hrs.

 

Thanks,

Shishir

--
This message w/attachments (message) is intended solely for the use of
the intended recipient(s) and may contain information that is
privileged, confidential or proprietary. If you are not an intended
recipient, please notify the sender, and then please delete and destroy
all copies and attachments, and be advised that any review or
dissemination of, or the taking of any action in reliance on, the
information contained in or attached to this message is prohibited. 
Unless specifically indicated, this message is not an offer to sell or a
solicitation of any investment products or other financial product or
service, an official confirmation of any transaction, or an official
statement of Sender. Subject to applicable law, Sender may intercept,
monitor, review and retain e-communications (EC) traveling through its
networks/systems and may produce any such EC to regulators, law
enforcement, in litigation and as required by law. 
The laws of the country of each sender/recipient may impact the handling
of EC, and EC may be archived, supervised and produced in countries
other than the country in which you are located. This message cannot be
guaranteed to be secure or free of errors or viruses. 

References to "Sender" are references to any subsidiary of Bank of
America Corporation. Securities and Insurance Products: * Are Not FDIC
Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank
Deposit * Are Not a Condition to Any Banking Service or Activity * Are
Not Insured by Any Federal Government Agency. Attachments that are part
of this EC may have additional important disclosures and disclaimers,
which you should read. This message is subject to terms available at the
following link: 
http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender
you consent to the foregoing.

--
This message w/attachments (message) is intended solely for the use of the 
intended recipient(s) and may contain information that is privileged, 
confidential or proprietary. If you are not an intended recipient, please 
notify the sender, and then please delete and destroy all copies and 
attachments, and be advised that any review or dissemination of, or the taking 
of any action in reliance on, the information contained in or attached to this 
message is prohibited. 
Unless specifically indicated, this message is not an offer to sell or a 
solicitation of any investment products or other financial product or service, 
an official confirmation of any transaction, or an official statement of 
Sender. Subject to applicable law, Sender may intercept, monitor, review and 
retain e-communications (EC) traveling through its networks/systems and may 
produce any such EC to regulators, law enforcement, in litigation and as 
required by law. 
The laws of the country of each sender/recipient may impact the handling of EC, 
and EC may be archived, supervised and produced in countries other than the 
country in which you are located. This message cannot be guaranteed to be 
secure or free of errors or viruses. 

References to "Sender" are references to any subsidiary of Bank of America 
Corporation. Securities

Re: Solr Replication: relative path in confFiles Element?

2011-10-25 Thread Yury Kats

On 10/25/2011 11:24 AM, Mark Schoy wrote:
> Hi,
> 
> is ist possible to define a relative path in confFile?
> 
> For example:
> 
> ../../x.xml
> 
> If yes, to which location will the file be copied at the slave?

I don;t think it's possible. Replication copies confFiles from master core's
confDir to slave core's confDir.

Re: Replication issues with multiple Slaves

2011-10-25 Thread Markus Jelsma


> 1) Hmm, maybe, didn't notice that... but I'd be very confused why it works
> occasionally, and manual replication (through Solr Admin) always works ok
> in that case?
> 2) This was my initial thought, it was happening on one core (multiple
> commits while replication in progress), but I noticed it happening on
> another core (the one mentioned below) which only had 1 commit and a single
> generation (11 > 12) change to replicate.
> 
> 
> I too hoped and presumed that the Master is being Locked while replication
> is copying files... can anyone confirm this? We are using the native Lock
> type on a Windows/Tomcat server.

Replication does not lock the index from being written to.

> 
> Is anyone aware of any reason why the replication skips files, or fails to
> copy/find files other than because of presumably a commit or optimize
> re-chunking the segments and deleting them on the Master?

Slaves receive a list of files to download. Files further on the list may 
disappear before it gets a change to download them. By keeping older commits 
we were able to work around this issue.

> 
> -Original Message-
> From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
> Sent: 25 October 2011 20:48
> To: solr-user@lucene.apache.org
> Subject: RE: Replication issues with multiple Slaves
> 
> I noted that in these messages the left hand side is lower case collection,
> but the right hand side is upper case Collection.  Assuming you did a
> cut/paste, could you have a core name mismatch between a master and a slave
> somehow?
> 
> Otherwise (shudder):  could you be doing a commit while the replication is
> in progress, causing files to shift about on it?  I'd have expected
> (perhaps naively) solr to have some sort of lock to prevent such a
> problem.  But if there is no internal lock, that would be a serious matter
> (and could happen to us, too, down the road).
> 
> JRJ
> 
> -Original Message-
> From: Rob Nicholls [mailto:robst...@hotmail.com]
> Sent: Tuesday, October 25, 2011 10:32 AM
> To: solr-user@lucene.apache.org
> Subject: Replication issues with multiple Slaves
> 
> 
> Hey guys,
> 
> We have a Master (1 server) and 2 Slaves (2 servers) setup and running
> replication across multiple cores.
> 
> However, the replication appears to behave sporadically and often fails
> when left to replicate automatically via poll. More often than not a
> replicate will fail after the slave has finished pulling down the segment
> files, because it cannot find a particular file, giving errors such as:
> 
> Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile
> SEVERE: Unable to move index file from:
> D:\web\solr\collection\data\index.2011102510\_3u.tii to:
> D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy
> 
> SEVERE: Unable to copy index file from:
> D:\web\solr\collection\data\index.2011102510\_3s.fdt to:
> D:\web\solr\Collection\data\index\_3s.fdt
> java.io.FileNotFoundException:
> D:\web\solr\collection\data\index.2011102510\_3s.fdt (The system cannot
> find the file specified)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(Unknown Source)
> at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47)
> at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585)
> at
> org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621)
> at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:317)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:
> 2 67)
> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown
> Source) at java.util.concurrent.FutureTask.runAndReset(Unknown Source) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access
> $ 101(Unknown Source)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPer
> i odic(Unknown Source)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Un
> k nown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 
> For these files, I checked the master, and they did indeed exist.
> 
> Both slave machines are configured the same, with the same replication
> settings and a 60 minutes poll interval.
> 
> Is it perhaps because both slave machines are trying to pull down files at
> the same time? (and the other has a lock on the file, thus it gets skipped
> maybe?)
> 
> Note: If I manually force replication on each slave, one at a time, the
> replication always seems to work fine.
> 
> 
> 
> Is there any obvious explanation or oddities I should be aware of that may
> cause this?
> 
> Thanks,
> Rob

RE: Replication issues with multiple Slaves

2011-10-25 Thread Rob Nicholls

1) Hmm, maybe, didn't notice that... but I'd be very confused why it works
occasionally, and manual replication (through Solr Admin) always works ok in
that case?
2) This was my initial thought, it was happening on one core (multiple
commits while replication in progress), but I noticed it happening on
another core (the one mentioned below) which only had 1 commit and a single
generation (11 > 12) change to replicate. 


I too hoped and presumed that the Master is being Locked while replication
is copying files... can anyone confirm this? We are using the native Lock
type on a Windows/Tomcat server.

Is anyone aware of any reason why the replication skips files, or fails to
copy/find files other than because of presumably a commit or optimize
re-chunking the segments and deleting them on the Master?

-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] 
Sent: 25 October 2011 20:48
To: solr-user@lucene.apache.org
Subject: RE: Replication issues with multiple Slaves

I noted that in these messages the left hand side is lower case collection,
but the right hand side is upper case Collection.  Assuming you did a
cut/paste, could you have a core name mismatch between a master and a slave
somehow?

Otherwise (shudder):  could you be doing a commit while the replication is
in progress, causing files to shift about on it?  I'd have expected (perhaps
naively) solr to have some sort of lock to prevent such a problem.  But if
there is no internal lock, that would be a serious matter (and could happen
to us, too, down the road).

JRJ

-Original Message-
From: Rob Nicholls [mailto:robst...@hotmail.com] 
Sent: Tuesday, October 25, 2011 10:32 AM
To: solr-user@lucene.apache.org
Subject: Replication issues with multiple Slaves


Hey guys,

We have a Master (1 server) and 2 Slaves (2 servers) setup and running
replication across multiple cores.

However, the replication appears to behave sporadically and often fails when
left to replicate automatically via poll. More often than not a replicate
will fail after the slave has finished pulling down the segment files,
because it cannot find a particular file, giving errors such as:

Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile
SEVERE: Unable to move index file from:
D:\web\solr\collection\data\index.2011102510\_3u.tii to:
D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy

SEVERE: Unable to copy index file from:
D:\web\solr\collection\data\index.2011102510\_3s.fdt to:
D:\web\solr\Collection\data\index\_3s.fdt
java.io.FileNotFoundException:
D:\web\solr\collection\data\index.2011102510\_3s.fdt (The system cannot
find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(Unknown Source)
at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47)
at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585)
at
org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:317)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:2
67)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
101(Unknown Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeri
odic(Unknown Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unk
nown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

For these files, I checked the master, and they did indeed exist.

Both slave machines are configured the same, with the same replication
settings and a 60 minutes poll interval.

Is it perhaps because both slave machines are trying to pull down files at
the same time? (and the other has a lock on the file, thus it gets skipped
maybe?)

Note: If I manually force replication on each slave, one at a time, the
replication always seems to work fine.



Is there any obvious explanation or oddities I should be aware of that may
cause this?

Thanks,
Rob

Re: java.net.SocketException: Too many open files

2011-10-25 Thread Markus Jelsma

This is on Linux? This should help:

echo fs.file-max = 16384 >> /etc/sysctl.conf

On some distro's like Debian it seems you also have to add these settings to 
security.conf, otherwise it may not persist between reboots or even shell 
sessions:

echo "systems hard nofile 16384
systems soft nofile 16384" >> /etc/security/limits.conf


> Hi,
> 
> I am using solrj and for connection to server I am using instance of the
> solr server:
> 
> SolrServer server =  new CommonsHttpSolrServer("
> http://localhost:8080/solr/core0";);
> 
> I noticed that after few minutes it start throwing exception
> java.net.SocketException: Too many open files.
> It seems that it related to instance of the HttpClient. How to resolved the
> instances to a certain no. Like connection pool in dbcp etc..
> 
> I am not experienced on java so please help to resolved this problem.
> 
>  solr version: 3.4
> 
> regards
> Jonty

Re: java.net.SocketException: Too many open files

2011-10-25 Thread Yonik Seeley

On Tue, Oct 25, 2011 at 4:03 PM, Jonty Rhods  wrote:
> Hi,
>
> I am using solrj and for connection to server I am using instance of the
> solr server:
>
> SolrServer server =  new CommonsHttpSolrServer("
> http://localhost:8080/solr/core0";);

Are you reusing the server object for all of your requests?
By default, Solr and SolrJ use persistent connections, meaning that
sockets are reused and new ones are not opened for every request.

-Yonik
http://www.lucidimagination.com


> I noticed that after few minutes it start throwing exception
> java.net.SocketException: Too many open files.
> It seems that it related to instance of the HttpClient. How to resolved the
> instances to a certain no. Like connection pool in dbcp etc..
>
> I am not experienced on java so please help to resolved this problem.
>
>  solr version: 3.4
>
> regards
> Jonty
>

java.net.SocketException: Too many open files

2011-10-25 Thread Jonty Rhods

Hi,

I am using solrj and for connection to server I am using instance of the
solr server:

SolrServer server =  new CommonsHttpSolrServer("
http://localhost:8080/solr/core0";);

I noticed that after few minutes it start throwing exception
java.net.SocketException: Too many open files.
It seems that it related to instance of the HttpClient. How to resolved the
instances to a certain no. Like connection pool in dbcp etc..

I am not experienced on java so please help to resolved this problem.

 solr version: 3.4

regards
Jonty

RE: Loading data to SOLR first time ( taking too long)

2011-10-25 Thread Jaeger, Jay - DOT

My goodness.  We do 4 million in about 1/2 HOUR (7+ million in 40 minutes).

First question:  Are you somehow forcing Solr to do a commit for each and every 
record?  If so, that way leads to the house of PAIN.

The thing to do next, I suppose, might be to try and figure out whether the 
issue is in Solr proper, or in the database you are importing from.

What does your query against your database look like?
How many fields do you have per record (we have around 30, counting copyField 
destinations)

Using a performance monitoring tool, try and find out the CPU utilization, 
memory utilization, page write rates and physical disk drive queue lengths to 
narrow down which of the two systems are having the problem (assuming your 
database is not on the same machine as Solr!)

JRJ

-Original Message-
From: Awasthi, Shishir [mailto:shishir.awas...@baml.com] 
Sent: Tuesday, October 25, 2011 2:57 PM
To: solr-user@lucene.apache.org
Subject: Loading data to SOLR first time ( taking too long)

Hi,

I recently started working on SOLR and loaded approximately 4 million
records to the solr using DataImportHandler. It took 5 days to complete
this process.

 

Can you please suggest how this can be improved? I would like this to be
done in less than 6 hrs.

 

Thanks,

Shishir

--
This message w/attachments (message) is intended solely for the use of the 
intended recipient(s) and may contain information that is privileged, 
confidential or proprietary. If you are not an intended recipient, please 
notify the sender, and then please delete and destroy all copies and 
attachments, and be advised that any review or dissemination of, or the taking 
of any action in reliance on, the information contained in or attached to this 
message is prohibited. 
Unless specifically indicated, this message is not an offer to sell or a 
solicitation of any investment products or other financial product or service, 
an official confirmation of any transaction, or an official statement of 
Sender. Subject to applicable law, Sender may intercept, monitor, review and 
retain e-communications (EC) traveling through its networks/systems and may 
produce any such EC to regulators, law enforcement, in litigation and as 
required by law. 
The laws of the country of each sender/recipient may impact the handling of EC, 
and EC may be archived, supervised and produced in countries other than the 
country in which you are located. This message cannot be guaranteed to be 
secure or free of errors or viruses. 

References to "Sender" are references to any subsidiary of Bank of America 
Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are 
Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a 
Condition to Any Banking Service or Activity * Are Not Insured by Any Federal 
Government Agency. Attachments that are part of this EC may have additional 
important disclosures and disclaimers, which you should read. This message is 
subject to terms available at the following link: 
http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you 
consent to the foregoing.

Loading data to SOLR first time ( taking too long)

2011-10-25 Thread Awasthi, Shishir

Hi,

I recently started working on SOLR and loaded approximately 4 million
records to the solr using DataImportHandler. It took 5 days to complete
this process.

 

Can you please suggest how this can be improved? I would like this to be
done in less than 6 hrs.

 

Thanks,

Shishir

--
This message w/attachments (message) is intended solely for the use of the 
intended recipient(s) and may contain information that is privileged, 
confidential or proprietary. If you are not an intended recipient, please 
notify the sender, and then please delete and destroy all copies and 
attachments, and be advised that any review or dissemination of, or the taking 
of any action in reliance on, the information contained in or attached to this 
message is prohibited. 
Unless specifically indicated, this message is not an offer to sell or a 
solicitation of any investment products or other financial product or service, 
an official confirmation of any transaction, or an official statement of 
Sender. Subject to applicable law, Sender may intercept, monitor, review and 
retain e-communications (EC) traveling through its networks/systems and may 
produce any such EC to regulators, law enforcement, in litigation and as 
required by law. 
The laws of the country of each sender/recipient may impact the handling of EC, 
and EC may be archived, supervised and produced in countries other than the 
country in which you are located. This message cannot be guaranteed to be 
secure or free of errors or viruses. 

References to "Sender" are references to any subsidiary of Bank of America 
Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are 
Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a 
Condition to Any Banking Service or Activity * Are Not Insured by Any Federal 
Government Agency. Attachments that are part of this EC may have additional 
important disclosures and disclaimers, which you should read. This message is 
subject to terms available at the following link: 
http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you 
consent to the foregoing.

Re: Replication issues with multiple Slaves

2011-10-25 Thread Markus Jelsma

Are you frequently adding and deleting documents and committing those 
mutations? Then it might try to download a file that doesnt exist anymore. If 
that is the case try increasing :



> I noted that in these messages the left hand side is lower case collection,
> but the right hand side is upper case Collection.  Assuming you did a
> cut/paste, could you have a core name mismatch between a master and a
> slave somehow?
> 
> Otherwise (shudder):  could you be doing a commit while the replication is
> in progress, causing files to shift about on it?  I'd have expected
> (perhaps naively) solr to have some sort of lock to prevent such a
> problem.  But if there is no internal lock, that would be a serious matter
> (and could happen to us, too, down the road).
> 
> JRJ
> 
> -Original Message-
> From: Rob Nicholls [mailto:robst...@hotmail.com]
> Sent: Tuesday, October 25, 2011 10:32 AM
> To: solr-user@lucene.apache.org
> Subject: Replication issues with multiple Slaves
> 
> 
> Hey guys,
> 
> We have a Master (1 server) and 2 Slaves (2 servers) setup and running
> replication across multiple cores.
> 
> However, the replication appears to behave sporadically and often fails
> when left to replicate automatically via poll. More often than not a
> replicate will fail after the slave has finished pulling down the segment
> files, because it cannot find a particular file, giving errors such as:
> 
> Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile
> SEVERE: Unable to move index file from:
> D:\web\solr\collection\data\index.2011102510\_3u.tii to:
> D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy
> 
> SEVERE: Unable to copy index file from:
> D:\web\solr\collection\data\index.2011102510\_3s.fdt to:
> D:\web\solr\Collection\data\index\_3s.fdt java.io.FileNotFoundException:
> D:\web\solr\collection\data\index.2011102510\_3s.fdt (The system
> cannot find the file specified) at java.io.FileInputStream.open(Native
> Method)
> at java.io.FileInputStream.(Unknown Source)
> at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47)
> at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585)
> at
> org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621) at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:317)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java
> :267) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at
> java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source) at
> java.util.concurrent.FutureTask.runAndReset(Unknown Source) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces
> s$101(Unknown Source) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPe
> riodic(Unknown Source) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(U
> nknown Source) at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at
> java.lang.Thread.run(Unknown Source)
> 
> For these files, I checked the master, and they did indeed exist.
> 
> Both slave machines are configured the same, with the same replication
> settings and a 60 minutes poll interval.
> 
> Is it perhaps because both slave machines are trying to pull down files at
> the same time? (and the other has a lock on the file, thus it gets skipped
> maybe?)
> 
> Note: If I manually force replication on each slave, one at a time, the
> replication always seems to work fine.
> 
> 
> 
> Is there any obvious explanation or oddities I should be aware of that may
> cause this?
> 
> Thanks,
> Rob

RE: Replication issues with multiple Slaves

2011-10-25 Thread Jaeger, Jay - DOT

I noted that in these messages the left hand side is lower case collection, but 
the right hand side is upper case Collection.  Assuming you did a cut/paste, 
could you have a core name mismatch between a master and a slave somehow?

Otherwise (shudder):  could you be doing a commit while the replication is in 
progress, causing files to shift about on it?  I'd have expected (perhaps 
naively) solr to have some sort of lock to prevent such a problem.  But if 
there is no internal lock, that would be a serious matter (and could happen to 
us, too, down the road).

JRJ

-Original Message-
From: Rob Nicholls [mailto:robst...@hotmail.com] 
Sent: Tuesday, October 25, 2011 10:32 AM
To: solr-user@lucene.apache.org
Subject: Replication issues with multiple Slaves


Hey guys,

We have a Master (1 server) and 2 Slaves (2 servers) setup and running 
replication across multiple cores.

However, the replication appears to behave sporadically and often fails when 
left to replicate automatically via poll. More often than not a replicate will 
fail after the slave has finished pulling down the segment files, because it 
cannot find a particular file, giving errors such as:

Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile
SEVERE: Unable to move index file from: 
D:\web\solr\collection\data\index.2011102510\_3u.tii to: 
D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy

SEVERE: Unable to copy index file from: 
D:\web\solr\collection\data\index.2011102510\_3s.fdt to: 
D:\web\solr\Collection\data\index\_3s.fdt
java.io.FileNotFoundException: 
D:\web\solr\collection\data\index.2011102510\_3s.fdt (The system cannot 
find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(Unknown Source)
at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47)
at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585)
at org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:317)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:267)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

For these files, I checked the master, and they did indeed exist.

Both slave machines are configured the same, with the same replication settings 
and a 60 minutes poll interval.

Is it perhaps because both slave machines are trying to pull down files at the 
same time? (and the other has a lock on the file, thus it gets skipped maybe?)

Note: If I manually force replication on each slave, one at a time, the 
replication always seems to work fine.



Is there any obvious explanation or oddities I should be aware of that may 
cause this?

Thanks,
Rob

RE: Points to processing hastags

2011-10-25 Thread Jaeger, Jay - DOT

Sounds like a possible application of solr.PatternTokenizerFactory  

http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternTokenizerFactory.html

You could use copyField to copy the entire string to a separate field (or set 
of fields) that are processed by patterns.

JRJ

-Original Message-
From: Memory Makers [mailto:memmakers...@gmail.com] 
Sent: Tuesday, October 25, 2011 9:27 AM
To: solr-user@lucene.apache.org
Subject: Points to processing hastags

Greetings,

I am trying to index hashtags from twitter -- so they are tokens that start
with a # symbol and can have any number of alpha numeric characters.

Examples:
1. #jane
2. #Jane
3. #Jane!

At a high level I'd like to be able to:
1. differentiate between say #jane and #jane!
2. differentiate between a hashtag such as #jane and a regular text token
jane
3. ask for variation on #jane -- by this I mean #jane? #jane!!! #jane!?!??
are all variations of jane

I'd appreciate points to what my considerations should be when I attempt to
do the above.

Thanks,

MM.

Adding a DocSet as a filter from a custom search component

2011-10-25 Thread Marc Sturlese

Hey there,
I'm wondering if there's a more clean way to to this:
I've written a SearchComponent, that runs as last-component. In the prepare
method I build a DocSet (SortedIntDocSet) based on if some values of the
fieldCache of a determined field accomplish some rules (if rules are
accomplished, set the docId to the DocSet). I want to use this DocSet as a
filter for the main query. Right now I'm cloning the existent filters of the
request (if they exist at all) to a filter list and add mine there. Then add
the list to the request Context:

  ... build myDocSet
  DocSet ds =
rb.req.getSearcher().getDocSet(filtersCloned).andNot(myDocSet);
  rb.setFilters(null);   //you'll see why
  rb.req.getContext().put("newFilters",ds);

Then to apply the DocSet containing all filters, in the QueryCommand process
method do:

SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand();
if(rb.req.getContext().containsKey("newFilters")){
  cmd.setFilter((DocSet)rb.req.getContext().get("newFilters"));
}
As I've set rb.setFilters(null) I won't have exceptions and it will work.
This looks definitely nasty, I would like not to touch the QueryCommand. Any
suggestions?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-a-DocSet-as-a-filter-from-a-custom-search-component-tp3452449p3452449.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: sort non-roman character strings last

2011-10-25 Thread Jaeger, Jay - DOT

As far as I know, in the index, a string that is zero length is still a string, 
and would not count as "missing".

The CSV importer has a way to not index empty entries, but once it is in the 
index, it is in the index -- as an empty string.

i.e.

String silly = null;

Is not the same thing as:

String silly = "";

JRJ

-Original Message-
From: themanwho [mailto:theman...@mac.com] 
Sent: Tuesday, October 25, 2011 9:22 AM
To: solr-user@lucene.apache.org
Subject: RE: sort non-roman character strings last

Jay,
Thanks, good call on the pattern.

Still, my embedded question: if a field is filtered down to a zero-length
string, does this qualify as "missing" so far as sortMissingLast is
concerned?

If not, your suggestion should work fine -- appreciated!!
Cheers,
Bill

--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-non-roman-character-strings-last-tp3449415p3451485.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Optimization /Commit memory

2011-10-25 Thread Simon Willnauer

RAM costs during optimize / merge is generally low. Optimize is
basically a merge of all segments into one, however there are
exceptions. Lucene streams existing segments from disk and serializes
the new segment on the fly. When you optimize or in general when you
merge segments you need disk space for the "source" segments and the
"targed" (merged) segment.

If you use CompoundFileSystem (CFS) you need to additional space once
the merge is done and your files are packed into the CFS which is
basically the size of the "target" (merged) segment. Once the merge is
done lucene can free the diskspace unless you have an IndexReader open
that references those segments (lucene keeps track of these files and
frees diskspace once possible).

That said, I think you should use optimize very very rarely. Usually
if you document collection is rarely changing optimize is useful and
reasonable once in a while. if you collection is constantly changing
you should rely on the merge policy to balance the number of segments
for you in the background. Lucene 3.4 has a nice improved
TieredMergePolicy that does a great job. (previous version are also
good - just saying)

A commit is basically flushing the segment you have in memory
(IndexWriter memory) to disk. compression ratio can be up to 30% of
the ram cost or even more depending on your data. The actual commit
doesn't need a notable amount of memory.

hope this helps

simon

On Mon, Oct 24, 2011 at 7:38 PM, Jaeger, Jay - DOT
 wrote:
> I have not spent a lot of time researching it, but one would expect that the 
> OS RAM requirement for optimization of an index to be minimal.
>
> My understanding is that during optimization an essentially new index is 
> built.  Once complete it switches out the indexes and will throw away the old 
> one.  (In Windows it may not throw away the old one until the next Commit).
>
> JRJ
>
> -Original Message-
> From: Sujatha Arun [mailto:suja.a...@gmail.com]
> Sent: Friday, October 21, 2011 12:10 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Optimization /Commit memory
>
> Just one more thing ,when we are talking about Optimization , we
> are referring to  HD  free space for  replicating the index  (2 or 3 times
> the index size  ) .what is role of  RAM (OS) here?
>
> Regards
> Suajtha
>
> On Fri, Oct 21, 2011 at 10:12 AM, Sujatha Arun  wrote:
>
>> Thanks that helps.
>>
>> Regards
>> Sujatha
>>
>>
>> On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT 
>> wrote:
>>
>>> Well, since the OS RAM includes the JVM RAM, that is part of your
>>> requirement, yes?  Aside from the JVM and normal OS requirements, all you
>>> need OS RAM for is file caching.  Thus, for updates, the OS RAM is not a
>>> major factor.  For searches, you want sufficient OS RAM to cache enough of
>>> the index to get the query performance you need, and to cache queries inside
>>> the JVM if you get a lot of repeat queries (see solrconfig.xml for the
>>> various caches: we have not played with them much).  So, the amount of RAM
>>> necessary for that is very much dependent upon the size of your index, so I
>>> cannot give you a simple number.
>>>
>>> You seem to believe that you have to have sufficient memory to have the
>>> entire index in memory.  Except where extremely high performance is
>>> required, I have not found that to be the case.
>>>
>>> This is just one of those "your mileage may vary" things.  There is not a
>>> single answer or formula that fits every situation.
>>>
>>> JRJ
>>>
>>> -Original Message-
>>> From: Sujatha Arun [mailto:suja.a...@gmail.com]
>>> Sent: Wednesday, October 19, 2011 11:58 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Optimization /Commit memory
>>>
>>> Thanks  Jay ,
>>>
>>> I was trying to compute the *OS RAM requirement*  *not JVM RAM* for a 14
>>> GB
>>> Index [cumulative Index size of all Instances].And I put it thus -
>>>
>>> Requirement of Operating System RAM for an Index of  14GB is   - Index
>>> Size
>>> + 3 Times the  maximum Index Size of Individual Instance for Optimize .
>>>
>>> That is to say ,I have several Instances ,combined Index Size is 14GB
>>> .Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM is
>>>  14GB +3 * 2.5 GB  ~ = 22GB.
>>>
>>> Correct?
>>>
>>> Regards
>>> Sujatha
>>>
>>>
>>>
>>> On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT >> >wrote:
>>>
>>> > Commit does not particularly spike disk or memory usage, unless you are
>>> > adding a very large number of documents between commits.  A commit can
>>> cause
>>> > a need to merge indexes, which can increase disk space temporarily.  An
>>> > optimize is *likely* to merge indexes, which will usually increase disk
>>> > space temporarily.
>>> >
>>> > How much disk space depends very much upon how big your index is in the
>>> > first place.  A 2 to 3 times factor of the sum of your peak index file
>>> size
>>> > seems safe, to me.
>>> >
>>> > Solr uses only modest amounts of memory for the JVM for this stuff.
>>> >
>>>

Re: some basic information on Solr

2011-10-25 Thread Simon Willnauer

hey,

2011/10/24 Dan Wu :
>  Hi all,
>
> I am doing a student project on search engine research. Right now I have
> some basic questions about Slor.
>
> 1. How many types of data file Solr can support (estimate)? i.e. No. of
> file types solr can look at for indexing and searching.
basically you can use solr to index all kinds of documents as long as
you can extract the text from the document. However, Solr ships with
content extraction support that handles a large set of different
files. AFAIK it leverages apache tika (http://tika.apache.org) which
supports a very large set of document formats
(http://tika.apache.org/0.10/formats.html). Hope this helps here?!
>
> 2. How much is estimated cost of incidents per year for Solr ?

I have to admit I don't know what you are asking for. can you
elaborate on this a bit? What is an incident in this context?

simon
>
> Since the numbers could vary from different platforms, however we would like
> to know the estimate answers regarding the general cases.
>
> Thanks
>
>
>
> --
> Dan Wu (Fiona Wu)  武丹
> Master of Engineering Management Program Degree Candidate
> Duke University, North Carolina, USA
> Email: dan...@duke.edu
> Tel: 919-599-2730
>

Re: joins and filter queries effecting scoring

2011-10-25 Thread Jason Toy

Hi Yonik,

Without a Join I would normally query user docs with:
q=data_text:"test"&fq=is_active_boolean:true

With joining users with posts, I get no no results:
q={!join from=self_id_i
to=user_id_i}data_text:"test"&fq=is_active_boolean:true&fq=posts_text:"hello"



I am able to use this query, but it gives me the results in an order that I
dont want(nor do I understand its order):
q={!join from=self_id_i to=user_id_i}data_text:"test" AND
is_active_boolean:true&fq=posts_text:"hello"

I want the order to be the same as I would get from my original
"q=data_text:"test"&fq=is_active_boolean:true", but with the ability to join
with the Posts docs.





On Tue, Oct 25, 2011 at 11:30 AM, Yonik Seeley
wrote:

> Can you give an example of the request (URL) you are sending to Solr?
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> On Mon, Oct 24, 2011 at 3:31 PM, Jason Toy  wrote:
> > I have 2 types of docs, users and posts.
> > I want to view all the docs that belong to certain users by joining posts
> > and users together.  I have to filter the users with a filter query of
> > "is_active_boolean:true" so that the score is not effected,but since I do
> a
> > join, I have to move the filter query to the query parameter so that I
> can
> > get the filter applied. The problem is that since the is_active_boolean
> is
> > moved to the query, the score is affected which returns back an order
> that I
> > don't want.
> >  If I leave the is_active_boolean:true in the fq paramater, I get no
> > results back.
> >
> > My question is how can I apply a filter query to users so that the score
> is
> > not effected?
> >
>



-- 
- sent from my mobile

Replication issues with multiple Slaves

2011-10-25 Thread Rob Nicholls


Hey guys,

We have a Master (1 server) and 2 Slaves (2 servers) setup and running 
replication across multiple cores.

However, the replication appears to behave sporadically and often fails when 
left to replicate automatically via poll. More often than not a replicate will 
fail after the slave has finished pulling down the segment files, because it 
cannot find a particular file, giving errors such as:

Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile
SEVERE: Unable to move index file from: 
D:\web\solr\collection\data\index.2011102510\_3u.tii to: 
D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy

SEVERE: Unable to copy index file from: 
D:\web\solr\collection\data\index.2011102510\_3s.fdt to: 
D:\web\solr\Collection\data\index\_3s.fdt
java.io.FileNotFoundException: 
D:\web\solr\collection\data\index.2011102510\_3s.fdt (The system cannot 
find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(Unknown Source)
at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47)
at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585)
at org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:317)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:267)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

For these files, I checked the master, and they did indeed exist.

Both slave machines are configured the same, with the same replication settings 
and a 60 minutes poll interval.

Is it perhaps because both slave machines are trying to pull down files at the 
same time? (and the other has a lock on the file, thus it gets skipped maybe?)

Note: If I manually force replication on each slave, one at a time, the 
replication always seems to work fine.



Is there any obvious explanation or oddities I should be aware of that may 
cause this?

Thanks,
Rob

Re: joins and filter queries effecting scoring

2011-10-25 Thread Yonik Seeley

Can you give an example of the request (URL) you are sending to Solr?

-Yonik
http://www.lucidimagination.com



On Mon, Oct 24, 2011 at 3:31 PM, Jason Toy  wrote:
> I have 2 types of docs, users and posts.
> I want to view all the docs that belong to certain users by joining posts
> and users together.  I have to filter the users with a filter query of
> "is_active_boolean:true" so that the score is not effected,but since I do a
> join, I have to move the filter query to the query parameter so that I can
> get the filter applied. The problem is that since the is_active_boolean is
> moved to the query, the score is affected which returns back an order that I
> don't want.
>  If I leave the is_active_boolean:true in the fq paramater, I get no
> results back.
>
> My question is how can I apply a filter query to users so that the score is
> not effected?
>

RE: Dismax handler - whitespace and special character behaviour

2011-10-25 Thread Demian Katz

I just sent an email to the list about DisMax interacting with 
WordDelimiterFilterFactory, and I think our problems are at least partially 
related -- I think the reason you are seeing an OR where you expect an AND is 
that you have autoGeneratePhraseQueries set to false, which changes the way 
DisMax handles the output of the WordDelimiterFilterFactory (among others).  
Unfortunately, I don't have a solution for you...  but you might want to keep 
an eye on my thread in case replies there shed any additional light.

- Demian

> -Original Message-
> From: Rohk [mailto:khor...@gmail.com]
> Sent: Tuesday, October 25, 2011 10:33 AM
> To: solr-user@lucene.apache.org
> Subject: Dismax handler - whitespace and special character behaviour
> 
> Hello,
> 
> I've got strange results when I have special characters in my query.
> 
> Here is my request :
> 
> q=histoire-
> france&start=0&rows=10&sort=score+desc&defType=dismax&qf=any^1.0&mm=100
> %
> 
> Parsed query :
> 
> +((any:histoir any:franc)) ()
> 
> I've got 17000 results because Solr is doing an OR (should be AND).
> 
> I have no problem when I'm using a whitespace instead of a special char
> :
> 
> q=histoire
> france&start=0&rows=10&sort=score+desc&defType=dismax&qf=any^1.0&mm=100
> %
> 
> +(((any:histoir) (any:franc))~2)
> ()
> 
> 2000 results for this query.
> 
> Here is my schema.xml (relevant parts) :
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
>   
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
> 
>  words="stopwords_french.txt" ignoreCase="true"/>
>  words="stopwords_french.txt" enablePositionIncrements="true"/>
>  language="French" protected="protwords.txt"/>
> 
> 
>   
>   
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="0"/>
> 
>  words="stopwords_french.txt" ignoreCase="true"/>
>  words="stopwords_french.txt" enablePositionIncrements="true"/>
>  language="French" protected="protwords.txt"/>
> 
> 
>   
> 
> 
> I tried with a PatternTokenizerFactory to tokenize on whitespaces &
> special
> chars but no change...
> Even with a charFilter (PatternReplaceCharFilterFactory) to replace
> special
> characters by whitespace, it doesn't work...
> 
> First line of analysis via solr admin, with verbose output, for query =
> 'histoire-france' :
> 
> org.apache.solr.analysis.PatternReplaceCharFilterFactory {replacement=
> , pattern=([,;./\\'&-]), luceneMatchVersion=LUCENE_32}
> texthistoire france
> 
> The '-' is replaced by ' ', then tokenized by
> WhitespaceTokenizerFactory.
> However I still have different number of results for 'histoire-france'
> and
> 'histoire france'.
> 
> My current workaround is to replace all special chars by whitespaces
> before
> sending query to Solr, but it is not satisfying.
> 
> Did i miss something ?

Replication issues with multiple Slaves

2011-10-25 Thread Rob Nicholls


Hey all,

We have a Master (1 server) and 2 Slaves (2 servers) setup and running 
replication across multiple cores.

However, the replication appears to behave sporadically and often fails when 
left to replicate automatically via poll. More often than not a replicate will 
fail after the slave has finished pulling down the segment files, because it 
cannot find a particular file, giving errors such as:

Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile
SEVERE: Unable to move index file from: 
D:\web\solr\collection\data\index.2011102510\_3u.tii to: 
D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy

SEVERE: Unable to copy index file from: 
D:\web\solr\collection\data\index.2011102510\_3s.fdt to: 
D:\web\solr\Collection\data\index\_3s.fdt
java.io.FileNotFoundException: 
D:\web\solr\collection\data\index.2011102510\_3s.fdt (The system cannot 
find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(Unknown Source)
at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47)
at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585)
at org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:317)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:267)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

For these files, I checked the master, and they did indeed exist.

Both slave machines are configured the same, with the same replication settings 
and a 60 minutes poll interval. Using Solr 3.1

Is it perhaps because both slave machines are trying to pull down files at the 
same time? (and the other has a lock on the file, thus it gets skipped maybe?)

Note: If I manually force replication on each slave, one at a time, the 
replication always seems to work fine.




Is there any obvious explanation or oddities I should be aware of that may 
cause this?

Thanks,
Rob

DisMax and WordDelimiterFilterFactory

2011-10-25 Thread Demian Katz

I've seen a couple of threads related to this subject (for example, 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg33400.html), but I 
haven't found an answer that addresses the aspect of the problem that concerns 
me...

I have a field type set up like this:


  







  
  








  


The important feature here is the use of WordDelimiterFilterFactory, which 
allows a search for "WiFi" to match an indexed term of "wi fi" (for example).

The problem, of course, is that if a user accidentally introduces a case change 
in their query, the query analyzer chain breaks it into multiple words and no 
hits are found...  so a search for "exaMple" will look for "exa mple" and fail.

I've found two solutions that resolve this problem in the admin panel field 
analysis tool:


1.)Turn on catenateWords and catenateNumbers in the query analyzer - this 
reassembles the user's broken word and allows a match.

2.)Turn on preserveOriginal in the query analyzer - this passes through the 
user's original query, which then gets cleaned up bythe ICUFoldingFilterFactory 
and allows a match.

The problem is that in my real-world application, which uses DisMax, neither of 
these solutions work.  It appears that even though (if I understand correctly) 
the WordDelimiterFilterFactory is returning ALTERNATIVE tokens, the DisMax 
handler is combining them a way that requires all of them to match in an 
inappropriate way...  for example, here's partial debugQuery output for the 
"exaMple" search using Dismax and solution #2 above:

"parsedquery":"+DisjunctionMaxQuery((genre:\"(exampl exa) mple\"^300.0 | 
title_new:\"(exampl exa) mple\"^100.0 | topic:\"(exampl exa) mple\"^500.0 | 
series:\"(exampl exa) mple\"^50.0 | title_full_unstemmed:\"(example exa) 
mple\"^600.0 | geographic:\"(exampl exa) mple\"^300.0 | contents:\"(exampl exa) 
mple\"^10.0 | fulltext_unstemmed:\"(example exa) mple\"^10.0 | 
allfields_unstemmed:\"(example exa) mple\"^10.0 | title_alt:\"(exampl exa) 
mple\"^200.0 | series2:\"(exampl exa) mple\"^30.0 | title_short:\"(exampl exa) 
mple\"^750.0 | author:\"(example exa) mple\"^300.0 | title:\"(exampl exa) 
mple\"^500.0 | topic_unstemmed:\"(example exa) mple\"^550.0 | 
allfields:\"(exampl exa) mple\" | author_fuller:\"(example exa) mple\"^150.0 | 
title_full:\"(exampl exa) mple\"^400.0 | fulltext:\"(exampl exa) mple\")) ()",

Obviously, that is not what I want - ideally it would be something like 'exampl 
OR "ex ample"'.

I also read about the autoGeneratePhraseQueries setting, but that seems to take 
things way too far in the opposite direction - if I set that to false, then I 
get matches for any individual token; i.e. example OR ex OR ample - not good at 
all!

I have a sinking suspicion that there is not an easy solution to my problem, 
but this seems to be a fairly basic need; splitOnCaseChange is a useful feature 
to have, but it's more valuable if it serves as an ALTERNATIVE search rather 
than a necessary query munge.  Any thoughts?

thanks,
Demian

Search for the single hash "#" character never returns results

2011-10-25 Thread Daniel Bradley

When running a search such as:
  field_name:#
  field_name:"#"
  field_name:"\#"

where there is a record with the value of exactly "#", solr returns 0 rows.

The workaround we are having to use is to use a range query on the
field such as:
  field_name:[# TO #]
and this returns the correct documents.

Use case details:
We have a field that indexes a text field and calculates a "letter
group". This keeps only the first significant character from a value
(number or letter), and if it is a number the simply stores "#" as we
want all numbered items grouped together.

I'm also aware that we could also fix this by using a specific number
instead of the hash character, however, I though I'd raise this to see
if there is a wider issue. I've listed some specific details below.

Thanks for your time,

Daniel Bradley


Field definition:

  





  


Server information:
Solr Specification Version: 3.2.0
Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15
Lucene Specification Version: 3.2.0
Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Mikhail Garber

Solr as enterprise event warehouse. Multiple heterogeneous
applications and log file sweepers posting stuff to centralized Solr
index.

On Sat, Oct 22, 2011 at 2:12 AM, Grant Ingersoll  wrote:
> Hi All,>
> I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." 
> (http://na11.apachecon.com/talks/18396).  It's based on my observation, that 
> over the years, a number of us in the community have done some pretty cool 
> things using Lucene/Solr that don't fit under the core premise of full text 
> search.  I've got a fair number of ideas for the talk (easily enough for 1 
> hour), but I wanted to reach out to hear your stories of ways you've (ab)used 
> Lucene and Solr to see if we couldn't extend the conversation to a bit more 
> than the conference and also see if I can't inject more ideas beyond the ones 
> I have.  I don't need deep technical details, but just high level use case 
> and the basic insight that led you to believe Lucene/Solr could solve the 
> problem.
>
> Thanks in advance,
> Grant
>
> 
> Grant Ingersoll
> http://www.lucidimagination.com
>
>

Solr Replication: relative path in confFiles Element?

2011-10-25 Thread Mark Schoy

Hi,

is ist possible to define a relative path in confFile?

For example:

../../x.xml

If yes, to which location will the file be copied at the slave?

Thanks.

Dismax handler - whitespace and special character behaviour

2011-10-25 Thread Rohk

Hello,

I've got strange results when I have special characters in my query.

Here is my request :

q=histoire-france&start=0&rows=10&sort=score+desc&defType=dismax&qf=any^1.0&mm=100%

Parsed query :

+((any:histoir any:franc)) ()

I've got 17000 results because Solr is doing an OR (should be AND).

I have no problem when I'm using a whitespace instead of a special char :

q=histoire 
france&start=0&rows=10&sort=score+desc&defType=dismax&qf=any^1.0&mm=100%

+(((any:histoir) (any:franc))~2) ()

2000 results for this query.

Here is my schema.xml (relevant parts) :


  








  
  









  


I tried with a PatternTokenizerFactory to tokenize on whitespaces & special
chars but no change...
Even with a charFilter (PatternReplaceCharFilterFactory) to replace special
characters by whitespace, it doesn't work...

First line of analysis via solr admin, with verbose output, for query =
'histoire-france' :

org.apache.solr.analysis.PatternReplaceCharFilterFactory {replacement=
, pattern=([,;./\\'&-]), luceneMatchVersion=LUCENE_32}
texthistoire france

The '-' is replaced by ' ', then tokenized by WhitespaceTokenizerFactory.
However I still have different number of results for 'histoire-france' and
'histoire france'.

My current workaround is to replace all special chars by whitespaces before
sending query to Solr, but it is not satisfying.

Did i miss something ?

Points to processing hastags

2011-10-25 Thread Memory Makers

Greetings,

I am trying to index hashtags from twitter -- so they are tokens that start
with a # symbol and can have any number of alpha numeric characters.

Examples:
1. #jane
2. #Jane
3. #Jane!

At a high level I'd like to be able to:
1. differentiate between say #jane and #jane!
2. differentiate between a hashtag such as #jane and a regular text token
jane
3. ask for variation on #jane -- by this I mean #jane? #jane!!! #jane!?!??
are all variations of jane

I'd appreciate points to what my considerations should be when I attempt to
do the above.

Thanks,

MM.

RE: sort non-roman character strings last

2011-10-25 Thread themanwho

Jay,
Thanks, good call on the pattern.

Still, my embedded question: if a field is filtered down to a zero-length
string, does this qualify as "missing" so far as sortMissingLast is
concerned?

If not, your suggestion should work fine -- appreciated!!
Cheers,
Bill

--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-non-roman-character-strings-last-tp3449415p3451485.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: accessing the query string from inside TokenFilter

2011-10-25 Thread Simon Willnauer

On Tue, Oct 25, 2011 at 3:51 PM, Bernd Fehling
 wrote:
> Dear list,
> while writing some TokenFilter for my analyzer chain I need access to
> the query string from inside of my TokenFilter for some comparison, but the
> Filters are working with a TokenStream and get seperate Tokens.
> Currently I couldn't get any access to the query string.
>
> Any idea how to get this done?
>
> Is there an Attribute for "query" or "qstr"?

I don't think there is anything like that but this could be useful. We
could add this and make it optional on the query parser? Maybe even in
lucene. can you bring this to the dev list?

simon
>
> Regards Bernd
>

accessing the query string from inside TokenFilter

2011-10-25 Thread Bernd Fehling


Dear list,
while writing some TokenFilter for my analyzer chain I need access to
the query string from inside of my TokenFilter for some comparison, but the
Filters are working with a TokenStream and get seperate Tokens.
Currently I couldn't get any access to the query string.

Any idea how to get this done?

Is there an Attribute for "query" or "qstr"?

Regards Bernd

gets time out error in full import with data import hadler

2011-10-25 Thread vrpar...@gmail.com

Hello all,

 i am using data import hadler, jdbc to get data from db for indexing.

i have one query which takes more time to get data, when i do full import,
it gives me timeout error.

please help me to solve this problem, if i can set timeout anywhere or any
other way.



Thanks,
Vishal Parekh


--
View this message in context: 
http://lucene.472066.n3.nabble.com/gets-time-out-error-in-full-import-with-data-import-hadler-tp3451345p3451345.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: sort non-roman character strings last

2011-10-25 Thread Jaeger, Jay - DOT

Could you replace it with something that will sort it last instead of an empty 
string?  (Say, for example, replacement="{}").  This would still give something 
that looks empty to a person, and would sort last.

BTW, it looks to me as though your pattern only requires that the input contain 
just ONE non-roman character.  For it to consist of ALL (and including at least 
one) non-roman characters, I think your pattern should be  "(^[^a-z]+$)".  

JRJ


-Original Message-
From: themanwho [mailto:theman...@mac.com] 
Sent: Monday, October 24, 2011 3:29 PM
To: solr-user@lucene.apache.org
Subject: sort non-roman character strings last

As the subject line says, when sorting documents on a field that may contain 
only non-roman characters, I would like sort those documents last.  My initial 
approach was a pattern replacement filter:

First of all, lowercase everything...

then...



(along with sortMissingLast=true)

The analysis servlet demonstrates the fieldtype is doing what it needs to do -- 
that is, if the field is strictly non-roman, I wind up with a null string.  
However, when I sort ascending, these pesky documents still sort at the top of 
the list.

What am I doing wrong or misunderstanding here?  Guess -- does a null string 
not qualify as a missing field?

Any suggestions?
Thanks a million for any clues.

Bill

Human wheels spin round and round
While the clock keeps the pace... -- John Mellencamp

Bill TantzenUniversity of Minnesota Libraries
612-626-9949 (U of M)612-325-1777 (cell)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-non-roman-character-strings-last-tp3449415p3449415.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Erik Hatcher

Well, if what you want is straightforward like this, why not just use and tweak 
the templates that come with Solr's VelocityResponseWriter?

Have a look at /browse from a recent Solr distro to see what I mean.  It's very 
easily customizable.

Prism is my tinkering to pull the (Velocity, or otherwise) templating to 
another tier, yet keeping the templates very lean and clean for this type of 
purpose, so maybe you can find some value in using Prism, though admittedly 
it's just a quick (and somewhat dirty) hack at this point.

Erik

On Oct 25, 2011, at 08:34 , Fred Zimmerman wrote:

> what about something that's a bit less discovery-oriented? for my particular
> application I am most concerned with bringing back a straightforward "top
> ten" answer set and having users look at it. I actually don't want to bother
> them with faceting, etc. at this juncture.
> 
> Fred
> 
> On Tue, Oct 25, 2011 at 7:40 AM, Erik Hatcher wrote:
> 
>> 
>> On Oct 25, 2011, at 07:24 , Robert Stewart wrote:
>> 
>>> It is really not very difficult to build a decent web front-end to SOLR
>> using one of the available client libraries
>> 
>> Or even just not using any client library at all (other than an HTTP
>> library).  I've done a bit of proof-of-concept/prototyping with a super
>> light weight (and of course Ruby!) approach with my Prism tinkering: <
>> https://github.com/lucidimagination/Prism>
>> 
>> Yes, in general it's very straightforward to build a search UI that shows
>> results, pages through them, displays facets, and allows them to be clicked
>> and filter results and so on.  Devil is always in the details, and having
>> saved searches, export, customizability, authentication, and so on makes it
>> a more involved proposition.
>> 
>> If you're in a PHP environment, there is VUFind... again pretty
>> library-centric at first, but likely flexible enough to handle any Solr
>> setup - .  For the Pythonistas, there's Kochief -
>> http://code.google.com/p/kochief/
>> 
>> Being a Rubyist myself (and founder of Blacklight), I'm not intimately
>> familiar with the other solutions but the library world has done a lot to
>> get this sort of thing off the ground in many environments.
>> 
>>   Erik
>> 
>>

Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Memory Makers

Well https://github.com/evolvingweb/ajax-solr is fairly decent for that --
haven't used it in a while but that is a minimalist client -- however I find
it hard to customize.

MM.

On Tue, Oct 25, 2011 at 8:34 AM, Fred Zimmerman wrote:

> what about something that's a bit less discovery-oriented? for my
> particular
> application I am most concerned with bringing back a straightforward "top
> ten" answer set and having users look at it. I actually don't want to
> bother
> them with faceting, etc. at this juncture.
>
> Fred
>
> On Tue, Oct 25, 2011 at 7:40 AM, Erik Hatcher  >wrote:
>
> >
> > On Oct 25, 2011, at 07:24 , Robert Stewart wrote:
> >
> > > It is really not very difficult to build a decent web front-end to SOLR
> > using one of the available client libraries
> >
> > Or even just not using any client library at all (other than an HTTP
> > library).  I've done a bit of proof-of-concept/prototyping with a super
> > light weight (and of course Ruby!) approach with my Prism tinkering: <
> > https://github.com/lucidimagination/Prism>
> >
> > Yes, in general it's very straightforward to build a search UI that shows
> > results, pages through them, displays facets, and allows them to be
> clicked
> > and filter results and so on.  Devil is always in the details, and having
> > saved searches, export, customizability, authentication, and so on makes
> it
> > a more involved proposition.
> >
> > If you're in a PHP environment, there is VUFind... again pretty
> > library-centric at first, but likely flexible enough to handle any Solr
> > setup - .  For the Pythonistas, there's Kochief -
> > http://code.google.com/p/kochief/
> >
> > Being a Rubyist myself (and founder of Blacklight), I'm not intimately
> > familiar with the other solutions but the library world has done a lot to
> > get this sort of thing off the ground in many environments.
> >
> >Erik
> >
> >
>

RE: some basic information on Solr

2011-10-25 Thread Jaeger, Jay - DOT

I am not a developer either.  We are just using it in a project here.

-Original Message-
From: Dan Wu [mailto:wudan1...@gmail.com] 
Sent: Monday, October 24, 2011 2:16 PM
To: solr-user@lucene.apache.org
Subject: Re: some basic information on Solr

 JRJ,

We did check the solr official website but found it was really technical,
since we are not on the developer side and we just want some basic
information or numbers about its usage.

Thanks for your answer, anyway.



2011/10/24 Jaeger, Jay - DOT 

> 1.  Solr, proper, does not index "files".  An adjunct called Solr Cel can.
>  See http://wiki.apache.org/solr/ExtractingRequestHandler .  That article
> describes which kinds of files it Solr Cel can handle.
>
> 2.  I have no idea what you mean by "incidents per year".  Please explain.
>
> 3.  Even though you didn't ask:  You are apparently a student at an
> advanced level.  At your level I would guess that your professors expect
> *YOU* to read thru the material available on the Internet on Solr and figure
> it out on your own, rather than just asking others to do your work for you.
>  ;^)
>
> In particular, before asking further questions you should probably read
> thru http://wiki.apache.org/solr/FrontPage and
> http://lucene.apache.org/solr/tutorial.html .
>
> JRJ
>
> -Original Message-
> From: Dan Wu [mailto:wudan1...@gmail.com]
> Sent: Monday, October 24, 2011 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: some basic information on Solr
>
>  Hi all,
>
> I am doing a student project on search engine research. Right now I have
> some basic questions about Slor.
>
> 1. How many types of data file Solr can support (estimate)? i.e. No. of
> file types solr can look at for indexing and searching.
>
> 2. How much is estimated cost of incidents per year for Solr ?
>
> Since the numbers could vary from different platforms, however we would
> like
> to know the estimate answers regarding the general cases.
>
> Thanks
>

Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Fred Zimmerman

what about something that's a bit less discovery-oriented? for my particular
application I am most concerned with bringing back a straightforward "top
ten" answer set and having users look at it. I actually don't want to bother
them with faceting, etc. at this juncture.

Fred

On Tue, Oct 25, 2011 at 7:40 AM, Erik Hatcher wrote:

>
> On Oct 25, 2011, at 07:24 , Robert Stewart wrote:
>
> > It is really not very difficult to build a decent web front-end to SOLR
> using one of the available client libraries
>
> Or even just not using any client library at all (other than an HTTP
> library).  I've done a bit of proof-of-concept/prototyping with a super
> light weight (and of course Ruby!) approach with my Prism tinkering: <
> https://github.com/lucidimagination/Prism>
>
> Yes, in general it's very straightforward to build a search UI that shows
> results, pages through them, displays facets, and allows them to be clicked
> and filter results and so on.  Devil is always in the details, and having
> saved searches, export, customizability, authentication, and so on makes it
> a more involved proposition.
>
> If you're in a PHP environment, there is VUFind... again pretty
> library-centric at first, but likely flexible enough to handle any Solr
> setup - .  For the Pythonistas, there's Kochief -
> http://code.google.com/p/kochief/
>
> Being a Rubyist myself (and founder of Blacklight), I'm not intimately
> familiar with the other solutions but the library world has done a lot to
> get this sort of thing off the ground in many environments.
>
>Erik
>
>

Re: prefix search

2011-10-25 Thread Michael Kuhlmann

I think what Radha Krishna (is this really her name?) means is different:

She wants to return only the matching token instead of the complete
field value.

Indeed, this is not possible. But you could use highlighting
(http://wiki.apache.org/solr/HighlightingParameters), and then extract
the matching part on your own. This shouldn't be too complicated.

-Kuli

Am 25.10.2011 12:12, schrieb Alireza Salimi:
> That's because the phrases are being tokenized and then indexed by Solr.
> You have to define a new fieldType which is not tokenized.
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.KeywordTokenizerFactory
> 
> I'm not sure if it would solve your problem
> 
> On Tue, Oct 25, 2011 at 5:46 AM, Radha Krishna Reddy <
> radhakrishn...@gmail.com> wrote:
> 
>> Hi,
>>
>> when i indexed words like 'Joe Tom' and 'Terry'.When i do prefix query like
>> q=t*,i get both 'Joe Tom' and Terry' as the results.But i want the result
>> for the complete string that start with 'T'.means i want only 'Terry' as
>> the
>> result.
>>
>> Can i do this?
>>
>> Thanks and Regards,
>> Radha Krishna.
>>
> 
> 
>

Re: Date boosting with dismax question

2011-10-25 Thread Erik Hatcher

Also, those boosts on your qf and pf are a red flag and may be causing you 
issues.  Look at explains provided with debugQuery=true output to see how your 
field/phrase boosts are working in conjunction with your date boosting attempts.

Erik

On Oct 23, 2011, at 17:15 , Erick Erickson wrote:

> Define "not working". Show what you're getting and what you
> expect to find. Show your data. Note that the example given
> boosts on quite coarse dates, it *tends* to make documents
> published in a particular *year* score higher.
> 
> You might review:
> http://wiki.apache.org/solr/UsingMailingLists
> 
> Best
> Erick
> 
> On Sun, Oct 23, 2011 at 11:08 PM, Craig Stadler  
> wrote:
>> Yes I have and I cannot get it to work. Perhaps something is out of version
>> for my setup?
>> I tried for 3 hours to get ever example I could find to work.
>> 
>> - Original Message - From: "Erick Erickson"
>> 
>> To: 
>> Sent: Sunday, October 23, 2011 5:07 PM
>> Subject: Re: Date boosting with dismax question
>> 
>> 
>> Have you seen this?
>> 
>> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
>> 
>> Best
>> Erick
>> 
>> 
>> On Sat, Oct 22, 2011 at 3:26 AM, Craig Stadler 
>> wrote:
>>> 
>>> Solr Specification Version: 1.4.0
>>> Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06
>>> 12:33:40
>>> Lucene Specification Version: 2.9.1
>>> Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25
>>> 
>>> >> precisionStep="6" positionIncrementGap="0"/>
>>> 
>>> >> stored="false" omitNorms="true" required="false"
>>> omitTermFreqAndPositions="true" />
>>> 
>>> I am using 'created' as the name of the date field.
>>> 
>>> My dates are being populated as such :
>>> 1980-01-01T00:00:00Z
>>> 
>>> Search handler (solrconfig) :
>>> 
>>> 
>>> 
>>> dismax
>>> explicit
>>> 0.1
>>> name0^2 other ^1
>>> name0^2 other ^1
>>> 3
>>> 3
>>> *:*
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Query :
>>> 
>>> /solr/ftf/dismax/?q=libya
>>> &debugQuery=off
>>> &hl=true
>>> &start=
>>> &rows=10
>>> --
>>> 
>>> I am trying to factor in created to the SCORE. (boost) I have tried a
>>> million ways to do this, no success. I know the dates are populating
>>> correctly because I can sort by them. Can anyone help me implement date
>>> boosting with dismax under this scenario???
>>> 
>>> -Craig
>>> 
>> 
>>

Re: About the indexing process

2011-10-25 Thread Martijn v Groningen

Hi Amos,

How are you currently indexing files? Are you indexing Solr input
documents or just regular files?

You can use Solr cell to index binary files:
http://wiki.apache.org/solr/ExtractingRequestHandler

Martijn

On 25 October 2011 10:21, 刘浪  wrote:
> Hi，
>     I appreciate you can help me. When I index a file, can I transfer the 
> content of the file and metadate to solr to construct the index, instead of 
> the file path and metadate?
>
> Thank you!
> Amos



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Erik Hatcher

On Oct 25, 2011, at 07:24 , Robert Stewart wrote:

> It is really not very difficult to build a decent web front-end to SOLR using 
> one of the available client libraries 

Or even just not using any client library at all (other than an HTTP library).  
I've done a bit of proof-of-concept/prototyping with a super light weight (and 
of course Ruby!) approach with my Prism tinkering: 

Yes, in general it's very straightforward to build a search UI that shows 
results, pages through them, displays facets, and allows them to be clicked and 
filter results and so on.  Devil is always in the details, and having saved 
searches, export, customizability, authentication, and so on makes it a more 
involved proposition.

If you're in a PHP environment, there is VUFind... again pretty library-centric 
at first, but likely flexible enough to handle any Solr setup - 
.  For the Pythonistas, there's Kochief - 
http://code.google.com/p/kochief/

Being a Rubyist myself (and founder of Blacklight), I'm not intimately familiar 
with the other solutions but the library world has done a lot to get this sort 
of thing off the ground in many environments.

Erik

Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Robert Stewart

It is really not very difficult to build a decent web front-end to SOLR using 
one of the available client libraries (such as solrpy for python).

I recently build pretty full-featured search front-end to SOLR in python (using 
tornado web server and templates) and it was not difficult at all to build from 
scratch - it may have even been more work to learn and customize backlight.  I 
also wanted to avoid adding yet another language - since lots of other back-end 
code was already in python...



On Oct 25, 2011, at 6:03 AM, Memory Makers wrote:

> Kool -- I was hoping to avoid adding another language :-( python/java/php
> were going to be it for me -- but I guess not.
> 
> Thanks.
> 
> On Tue, Oct 25, 2011 at 6:02 AM, Erik Hatcher wrote:
> 
>> You could be up and running with Blacklight by following the quickstart
>> instructions in only a few minutes, but Ruby and RoR know-how will be needed
>> to go further with the types of customizations you mentioned.  Some things
>> will be purely in configuration sections (but still within Ruby code files)
>> and done easily, but some other customizations will require deeper
>> knowledge.
>> 
>> With only a few minutes (given the prerequisites already installed) to give
>> it a try, might as well give it a go :)  The Blacklight community is very
>> helpful too, so ask on their e-mail list for assistance, or tap into the
>> #blacklight IRC channel.
>> 
>>   Erik
>> 
>> 
>> On Oct 25, 2011, at 05:53 , Memory Makers wrote:
>> 
>>> Looks very interesting -- actually I looked at it a while back but in a
>>> different context -- for a non RoR person how much of a learning curve is
>> it
>>> to set up?
>>> 
>>> Thanks.
>>> 
>>> On Tue, Oct 25, 2011 at 5:49 AM, Erik Hatcher >> wrote:
>>> 
 Blacklight - http://projectblacklight.org/
 
 It's a full featured application fronting Solr.  It's Ruby on Rails
>> based,
 and powers many library front-ends but is becoming much more general
>> purpose
 for other domains.  See examples here:
 https://github.com/projectblacklight/blacklight/wiki/Examples
 
 Also, the forensics domain has used it as well, as mentioned in the
>> slides
 and talk I attended at Lucene Revolution earlier this year: <
 
>> http://www.lucidimagination.com/blog/2011/06/01/solr-and-law-enforcement-highly-relevant-results-can-be-a-crime/
> 
 
 Often the decision for an application layer like this is determined by
>> the
 programming language and frameworks used.  Blacklight is "opinionated"
>> (as
 any other concrete implementation would be) in this regard.  If it fits
>> your
 tastes, it's a great technology to use.
 
  Erik
 
 
 On Oct 24, 2011, at 15:56 , Memory Makers wrote:
 
> Greetings guys,
> 
> Is there a good front end application / interface for solr?
> 
> Features I'm looking for are:
> configure query interface (using non programatic features)
> configure pagination
> configure bookmarking of results
> export results of a query to a csv or other format (JSON, etc.)
> 
> Is there any demand for such an application?
> 
> Thanks.
 
 
>> 
>>

Queries suggestion (not the suggester :P)

2011-10-25 Thread Simone Tripodi

Hi all guys,
I'm working on a search service that uses solr as search engine and
the results are provided in the Atom form, containing some OpenSearch
tags.

What I'm interested to understand is if it is possible, via solr,
having in the response some suggestions to other queries in order to
enrich our opensearch info, i.e. a user submits `General Motors annual
report` and solr answers the results with information to form a
`General Motors annual report 2005` subset or a `General Motors`
superset, so the replu can be transformed to:

   
   

So my question is: is this possible? And if yes... how? :)

Many thanks in advance, every suggestion would be really appreciated!
Have a nice day, all the best,
Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/

Re: prefix search

2011-10-25 Thread Alireza Salimi

That's because the phrases are being tokenized and then indexed by Solr.
You have to define a new fieldType which is not tokenized.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.KeywordTokenizerFactory

I'm not sure if it would solve your problem

On Tue, Oct 25, 2011 at 5:46 AM, Radha Krishna Reddy <
radhakrishn...@gmail.com> wrote:

> Hi,
>
> when i indexed words like 'Joe Tom' and 'Terry'.When i do prefix query like
> q=t*,i get both 'Joe Tom' and Terry' as the results.But i want the result
> for the complete string that start with 'T'.means i want only 'Terry' as
> the
> result.
>
> Can i do this?
>
> Thanks and Regards,
> Radha Krishna.
>

-- 
Alireza Salimi
Java EE Developer

Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Memory Makers

Kool -- I was hoping to avoid adding another language :-( python/java/php
were going to be it for me -- but I guess not.

Thanks.

On Tue, Oct 25, 2011 at 6:02 AM, Erik Hatcher wrote:

> You could be up and running with Blacklight by following the quickstart
> instructions in only a few minutes, but Ruby and RoR know-how will be needed
> to go further with the types of customizations you mentioned.  Some things
> will be purely in configuration sections (but still within Ruby code files)
> and done easily, but some other customizations will require deeper
> knowledge.
>
> With only a few minutes (given the prerequisites already installed) to give
> it a try, might as well give it a go :)  The Blacklight community is very
> helpful too, so ask on their e-mail list for assistance, or tap into the
> #blacklight IRC channel.
>
>Erik
>
>
> On Oct 25, 2011, at 05:53 , Memory Makers wrote:
>
> > Looks very interesting -- actually I looked at it a while back but in a
> > different context -- for a non RoR person how much of a learning curve is
> it
> > to set up?
> >
> > Thanks.
> >
> > On Tue, Oct 25, 2011 at 5:49 AM, Erik Hatcher  >wrote:
> >
> >> Blacklight - http://projectblacklight.org/
> >>
> >> It's a full featured application fronting Solr.  It's Ruby on Rails
> based,
> >> and powers many library front-ends but is becoming much more general
> purpose
> >> for other domains.  See examples here:
> >> https://github.com/projectblacklight/blacklight/wiki/Examples
> >>
> >> Also, the forensics domain has used it as well, as mentioned in the
> slides
> >> and talk I attended at Lucene Revolution earlier this year: <
> >>
> http://www.lucidimagination.com/blog/2011/06/01/solr-and-law-enforcement-highly-relevant-results-can-be-a-crime/
> >>>
> >>
> >> Often the decision for an application layer like this is determined by
> the
> >> programming language and frameworks used.  Blacklight is "opinionated"
> (as
> >> any other concrete implementation would be) in this regard.  If it fits
> your
> >> tastes, it's a great technology to use.
> >>
> >>   Erik
> >>
> >>
> >> On Oct 24, 2011, at 15:56 , Memory Makers wrote:
> >>
> >>> Greetings guys,
> >>>
> >>> Is there a good front end application / interface for solr?
> >>>
> >>> Features I'm looking for are:
> >>> configure query interface (using non programatic features)
> >>> configure pagination
> >>> configure bookmarking of results
> >>> export results of a query to a csv or other format (JSON, etc.)
> >>>
> >>> Is there any demand for such an application?
> >>>
> >>> Thanks.
> >>
> >>
>
>

Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Erik Hatcher

You could be up and running with Blacklight by following the quickstart 
instructions in only a few minutes, but Ruby and RoR know-how will be needed to 
go further with the types of customizations you mentioned.  Some things will be 
purely in configuration sections (but still within Ruby code files) and done 
easily, but some other customizations will require deeper knowledge.

With only a few minutes (given the prerequisites already installed) to give it 
a try, might as well give it a go :)  The Blacklight community is very helpful 
too, so ask on their e-mail list for assistance, or tap into the #blacklight 
IRC channel.

Erik


On Oct 25, 2011, at 05:53 , Memory Makers wrote:

> Looks very interesting -- actually I looked at it a while back but in a
> different context -- for a non RoR person how much of a learning curve is it
> to set up?
> 
> Thanks.
> 
> On Tue, Oct 25, 2011 at 5:49 AM, Erik Hatcher wrote:
> 
>> Blacklight - http://projectblacklight.org/
>> 
>> It's a full featured application fronting Solr.  It's Ruby on Rails based,
>> and powers many library front-ends but is becoming much more general purpose
>> for other domains.  See examples here:
>> https://github.com/projectblacklight/blacklight/wiki/Examples
>> 
>> Also, the forensics domain has used it as well, as mentioned in the slides
>> and talk I attended at Lucene Revolution earlier this year: <
>> http://www.lucidimagination.com/blog/2011/06/01/solr-and-law-enforcement-highly-relevant-results-can-be-a-crime/
>>> 
>> 
>> Often the decision for an application layer like this is determined by the
>> programming language and frameworks used.  Blacklight is "opinionated" (as
>> any other concrete implementation would be) in this regard.  If it fits your
>> tastes, it's a great technology to use.
>> 
>>   Erik
>> 
>> 
>> On Oct 24, 2011, at 15:56 , Memory Makers wrote:
>> 
>>> Greetings guys,
>>> 
>>> Is there a good front end application / interface for solr?
>>> 
>>> Features I'm looking for are:
>>> configure query interface (using non programatic features)
>>> configure pagination
>>> configure bookmarking of results
>>> export results of a query to a csv or other format (JSON, etc.)
>>> 
>>> Is there any demand for such an application?
>>> 
>>> Thanks.
>> 
>>

Re: Solr main query response & input to facet query

2011-10-25 Thread Erik Hatcher

I'm not following exactly what you're looking for here, but sounds like you 
want to facet on name... &facet=on&facet.field=name1

and then to filter on a selected one, you can use fq=name:name1

Erik

On Oct 24, 2011, at 20:18 , solrdude wrote:

> Hi,
> I am implementing an solr solution where I want to use some field values
> from main query output as an input in building facet. How do I do that?
> 
> Eg: 
> Response from main query:
> 
> 
> name1
> 200
> 
> 
> name1
> 400
> 
> 
> I want to build facet for the query where "prod_id:200 prod_id:400". I like
> to do all this in single query ideally. if it can't be done in one query, I
> am ok with 2 query as well. Please help.
> 
> Thanks
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-main-query-response-input-to-facet-query-tp3449938p3449938.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Memory Makers

Looks very interesting -- actually I looked at it a while back but in a
different context -- for a non RoR person how much of a learning curve is it
to set up?

Thanks.

On Tue, Oct 25, 2011 at 5:49 AM, Erik Hatcher wrote:

> Blacklight - http://projectblacklight.org/
>
> It's a full featured application fronting Solr.  It's Ruby on Rails based,
> and powers many library front-ends but is becoming much more general purpose
> for other domains.  See examples here:
> https://github.com/projectblacklight/blacklight/wiki/Examples
>
> Also, the forensics domain has used it as well, as mentioned in the slides
> and talk I attended at Lucene Revolution earlier this year: <
> http://www.lucidimagination.com/blog/2011/06/01/solr-and-law-enforcement-highly-relevant-results-can-be-a-crime/
> >
>
> Often the decision for an application layer like this is determined by the
> programming language and frameworks used.  Blacklight is "opinionated" (as
> any other concrete implementation would be) in this regard.  If it fits your
> tastes, it's a great technology to use.
>
>Erik
>
>
> On Oct 24, 2011, at 15:56 , Memory Makers wrote:
>
> > Greetings guys,
> >
> > Is there a good front end application / interface for solr?
> >
> > Features I'm looking for are:
> >  configure query interface (using non programatic features)
> >  configure pagination
> >  configure bookmarking of results
> >  export results of a query to a csv or other format (JSON, etc.)
> >
> >  Is there any demand for such an application?
> >
> > Thanks.
>
>

Re: Is there a good web front end application / interface for solr

2011-10-25 Thread Erik Hatcher

Blacklight - http://projectblacklight.org/

It's a full featured application fronting Solr.  It's Ruby on Rails based, and 
powers many library front-ends but is becoming much more general purpose for 
other domains.  See examples here: 
https://github.com/projectblacklight/blacklight/wiki/Examples

Also, the forensics domain has used it as well, as mentioned in the slides and 
talk I attended at Lucene Revolution earlier this year: 

Often the decision for an application layer like this is determined by the 
programming language and frameworks used.  Blacklight is "opinionated" (as any 
other concrete implementation would be) in this regard.  If it fits your 
tastes, it's a great technology to use.

Erik

On Oct 24, 2011, at 15:56 , Memory Makers wrote:

> Greetings guys,
> 
> Is there a good front end application / interface for solr?
> 
> Features I'm looking for are:
>  configure query interface (using non programatic features)
>  configure pagination
>  configure bookmarking of results
>  export results of a query to a csv or other format (JSON, etc.)
> 
>  Is there any demand for such an application?
> 
> Thanks.

prefix search

2011-10-25 Thread Radha Krishna Reddy

Hi,

when i indexed words like 'Joe Tom' and 'Terry'.When i do prefix query like
q=t*,i get both 'Joe Tom' and Terry' as the results.But i want the result
for the complete string that start with 'T'.means i want only 'Terry' as the
result.

Can i do this?

Thanks and Regards,
Radha Krishna.

data import handler issue

2011-10-25 Thread Tanweer Noor

Hi,
I am having issue in fetching records from db, I am using *eXist* database.
Please see if you can help in looking into this.

http://localhost:8983/solr/dataimport?command=full-import


 -  <
response>
 -  
   0
   0
   
 -  
 -  
   data-config.xml
  
  
   full-import
   idle
   
 -  
   1
   <*str name="Total Rows Fetched">0*
   0
   2011-10-25 01:58:24
   *Indexing completed. Added/Updated: 0 documents. Deleted 0
documents.*
   2011-10-25 01:58:24
   2011-10-25 01:58:24
   0
   0:0:0.219
  
   This response format is experimental. It is likely
to change in the future.
  

below is my data-config.xml file




http://localhost:8081/exist/servlet/db/dbname/?_query=for%20$p%20in%20//Music%20return%20$p
"
processor="XPathEntityProcessor"
forEach="/data/store"
transformer="DateFormatTransformer">
 
   
  

  
  
  
  
  
  

 
 






If i go to URL, I can see that it is fetching record from database but
somehow, solr is not fetching a single row.

http://localhost:8081/exist/servlet/db/dbname/?_query=for%20$p%20in%20//Music%20return%20$p



Thanks

Re: [ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released

2011-10-25 Thread alex

Hello roySolr,


roySolr wrote:
> 
> Are you working on some changes to support earlier versions of PHP? What
> is the status?
> 

I have supplied a patch, so that it can be compiled with PHP 5.2:
https://bugs.php.net/bug.php?id=59808

I contacted Israel a while ago to integrate this into the package, but he
hasn't answered yet.

Cheers,
 Alex


--
View this message in context: 
http://lucene.472066.n3.nabble.com/ANNOUNCEMENT-PHP-Solr-Extension-1-0-1-Stable-Has-Been-Released-tp3024040p3450881.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: questions about autocommit & committing documents

2011-10-25 Thread darul

I was not sure thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/questions-about-autocommit-committing-documents-tp1582487p3450794.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: questions about autocommit & committing documents

2011-10-25 Thread Mark Miller

It's not 'mandatory', but it makes no sense to keep it. Even without 
autocommit, committing after every doc add is horribly inefficient.

On Oct 25, 2011, at 9:45 AM, darul wrote:

> Well until now I was using SolrJ API to commit() (for each document added...)
> changes but wonder in case of a production deployment it was not a best
> solution to use AutoCommit feature instead.
> 
> With AutoCommit parameters, is it mandatory to remove commit() instruction
> called on CommonsHttpSolrServer
> 
> try
> {
>   getServer().addBean(o);
>   getServer().commit(); => to remove ?
> ...}
> 
> I just have another questions, I was looking all over the threads but not
> found any solutions yet about how to get a CallbackHandler with all
> documents commited. Is there a way simple way to achieve this ?
> 
> Thanks again Erick.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/questions-about-autocommit-committing-documents-tp1582487p3450739.html
> Sent from the Solr - User mailing list archive at Nabble.com.

- Mark Miller
lucidimagination.com

Re: questions about autocommit & committing documents

2011-10-25 Thread darul

Well until now I was using SolrJ API to commit() (for each document added...)
changes but wonder in case of a production deployment it was not a best
solution to use AutoCommit feature instead.

With AutoCommit parameters, is it mandatory to remove commit() instruction
called on CommonsHttpSolrServer

try
{
   getServer().addBean(o);
   getServer().commit(); => to remove ?
...}
  
I just have another questions, I was looking all over the threads but not
found any solutions yet about how to get a CallbackHandler with all
documents commited. Is there a way simple way to achieve this ?

Thanks again Erick.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/questions-about-autocommit-committing-documents-tp1582487p3450739.html
Sent from the Solr - User mailing list archive at Nabble.com.

MoreLikeThis - To many hits

2011-10-25 Thread vraa

Hi

I'm using the MoreLikeThis functionallity 
http://wiki.apache.org/solr/MoreLikeThis
http://wiki.apache.org/solr/MoreLikeThis , and it works almost perfectly for
my situation.

But, i get to many hist, and mayby thats the hole idea of MoreLikeThis, but
im gonna ask anyway. 

My query looks like this:
/select/?q=id:11&mlt=true&mlt.match.include=true&mlt.fl=make,model,variant&mlt.mindf=1&mlt.mintf=1&fl=id,score,make,model,variant
.
The id is a Lamborghini. There are only 8 lamborghinis in my database and
still i get a lot more hits.
Is it possible to make it so Solr only return 8 results in this query? Which
means solr must interpret the query so that there must be a hit on all of
the "mlt.fl". If not, then remove the last of the "mlt.fl (variant)" and try
again. If no hits then remove "model" and so forth.

Does it make sense?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/MoreLikeThis-To-many-hits-tp3450632p3450632.html
Sent from the Solr - User mailing list archive at Nabble.com.

70 matches

Mail list logo