Re: Stopwords

2014-06-26 Thread David Stuart
Hi,

Not really as the words don’t exist in the corpus field. They way we have got 
around it in the past is to have another non stopped field that is also 
searched on (in addition to the the stopped field) with a boost to the score 
for matches. 
As an slight alternative you could do the above but choose a stopped or non 
stopped field if quotes are present when your application builds the query


Regards

David Stuart
M  +44(0) 778 854 2157
T   +44(0) 845 519 5465
www.axistwelve.com
Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK

AXIS12 - Enterprise Web Solutions

Reg Company No. 7215135
VAT No. 997 4801 60

This e-mail is strictly confidential and intended solely for the ordinary user 
of the e-mail account to which it is addressed. If you have received this 
e-mail in error please inform Axis12 immediately by return e-mail or telephone. 
We advise that in keeping with good computing practice the recipient of this 
e-mail should ensure that it is virus free. We do not accept any responsibility 
for any loss or damage that may arise from the use of this email or its 
contents.



On 26 Jun 2014, at 10:33, Geert Van Huychem ge...@iframeworx.be wrote:

 Hello
  
 We have the default dutch stopwords implemented in our Solr instance, so 
 words like ‘de’, ‘het’, ‘ben’ are filtered at index time.
  
 Is there a way to trick Solr into ignoring those stopwords at query time, 
 when users puts the search terms between quotes?
  
 Best
  
 Geert Van Huychem
 IT Services  Applications Manager
 T. +32 2 741 60 22
 M. +32 497 27 69 03
 ge...@iframeworx.be
 Media ID CVBA
 Rue Barastraat 175
 1070 Bruxelles - Brussel (BE)
 www.media-id.be
 



Solr CoreAdmin RELOAD + Properties

2014-03-28 Thread David Stuart
Hey,

In the Solr CoreAdmin CREATE action you have the ability to define arbitrary 
properties by defining propery.[name] = value, this works well in both Solr 3.x 
and Solr 4.x. To change a property value on a core in Solr 3.x you could run 
the CREATE command again and this would overwrite the value. In Solr 4.x you 
get a error saying core exitst (make sense) but I can’t see a way of update the 
properties values via a url without unloading and re creating the core (which 
is not great a this could cause a outage on the live system).

I tried adding the propery.[name] = value as a part of the RELOAD action but 
that was ignored. 

Any ideas? If not I will create a patch for RELOAD to support this 
functionality.

Regards,



David Stuart
M  +44(0) 778 854 2157
T   +44(0) 845 519 5465
www.axistwelve.com
Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK

AXIS12 - Enterprise Web Solutions

Reg Company No. 7215135
VAT No. 997 4801 60

This e-mail is strictly confidential and intended solely for the ordinary user 
of the e-mail account to which it is addressed. If you have received this 
e-mail in error please inform Axis12 immediately by return e-mail or telephone. 
We advise that in keeping with good computing practice the recipient of this 
e-mail should ensure that it is virus free. We do not accept any responsibility 
for any loss or damage that may arise from the use of this email or its 
contents.





Re: Elevation and core create

2014-03-03 Thread David Stuart
HI Erick,

Thanks for the response. 
On the wiki it states

config-file
Path to the file that defines query elevation. This file must exist in 
$instanceDir/conf/config-file or$dataDir/config-file. 

If the file exists in the /conf/ directory it will be loaded once at startup. 
If it exists in the data directory, it will be reloaded for each IndexReader.

Which is the elevate.xml. So looks like I will go down the custom coding route.

Regards,


David Stuart
M  +44(0) 778 854 2157
T   +44(0) 845 519 5465
www.axistwelve.com
Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK

AXIS12 - Enterprise Web Solutions

Reg Company No. 7215135
VAT No. 997 4801 60

This e-mail is strictly confidential and intended solely for the ordinary user 
of the e-mail account to which it is addressed. If you have received this 
e-mail in error please inform Axis12 immediately by return e-mail or telephone. 
We advise that in keeping with good computing practice the recipient of this 
e-mail should ensure that it is virus free. We do not accept any responsibility 
for any loss or damage that may arise from the use of this email or its 
contents.



On 2 Mar 2014, at 18:07, Erick Erickson erickerick...@gmail.com wrote:

 Hmmm, you _ought_ to be able to specify a relative path
 in str name=confFilessolrconfig_slave.xml:solrconfig.xml,x.xml,y.xml/str
 
 But there's certainly the chance that this is hard-coded in
 the query elevation component so I can't say that this'll work
 with assurance.
 
 Best,
 Erick
 
 On Sun, Mar 2, 2014 at 6:14 AM, David Stuart d...@axistwelve.com wrote:
 Hi sorry for the cross post but I got no response in the dev group so 
 assumed I posted in the wrong place.
 
 
 
 I am using Solr 3.6 and am trying to automate the deployment of cores with a 
 custom elevate file. It is proving to be difficult as most of the file 
 (schema, stop words etc) support absolute path elevate seems to need to be 
 in either a conf directory as a sibling to data or in the data directory 
 itself. I am able to achieve my goal by having a secondary process that 
 places the file but thought I would as the group just in case I have missed 
 the obvious. Should I move to Solr 4 is it fixed here? I could also go down 
 the root of extending the SolrCore create function to accept additional 
 params and move the file into the defined data directory.
 
 Ideas?
 
 Thanks for your help
 David Stuart
 M  +44(0) 778 854 2157
 T   +44(0) 845 519 5465
 www.axistwelve.com
 Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK
 
 AXIS12 - Enterprise Web Solutions
 
 Reg Company No. 7215135
 VAT No. 997 4801 60
 
 This e-mail is strictly confidential and intended solely for the ordinary 
 user of the e-mail account to which it is addressed. If you have received 
 this e-mail in error please inform Axis12 immediately by return e-mail or 
 telephone. We advise that in keeping with good computing practice the 
 recipient of this e-mail should ensure that it is virus free. We do not 
 accept any responsibility for any loss or damage that may arise from the use 
 of this email or its contents.
 
 
 



Elevation and core create

2014-03-02 Thread David Stuart
Hi sorry for the cross post but I got no response in the dev group so assumed I 
posted in the wrong place.



I am using Solr 3.6 and am trying to automate the deployment of cores with a 
custom elevate file. It is proving to be difficult as most of the file (schema, 
stop words etc) support absolute path elevate seems to need to be in either a 
conf directory as a sibling to data or in the data directory itself. I am able 
to achieve my goal by having a secondary process that places the file but 
thought I would as the group just in case I have missed the obvious. Should I 
move to Solr 4 is it fixed here? I could also go down the root of extending the 
SolrCore create function to accept additional params and move the file into the 
defined data directory.

Ideas?

Thanks for your help
David Stuart
M  +44(0) 778 854 2157
T   +44(0) 845 519 5465
www.axistwelve.com
Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK

AXIS12 - Enterprise Web Solutions

Reg Company No. 7215135
VAT No. 997 4801 60

This e-mail is strictly confidential and intended solely for the ordinary user 
of the e-mail account to which it is addressed. If you have received this 
e-mail in error please inform Axis12 immediately by return e-mail or telephone. 
We advise that in keeping with good computing practice the recipient of this 
e-mail should ensure that it is virus free. We do not accept any responsibility 
for any loss or damage that may arise from the use of this email or its 
contents.





Re: Solr DataImportHandler (DIH) and Cassandra

2010-12-01 Thread David Stuart
This is good timing I am/was just to embark on a spike if anyone is keen to 
help out


On 30 Nov 2010, at 00:37, Mark wrote:

 The DataSource subclass route is what I will probably be interested in. Are 
 there are working examples of this already out there?
 
 On 11/29/10 12:32 PM, Aaron Morton wrote:
 AFAIK there is nothing pre-written to pull the data out for you.
 
 You should be able to create your DataSource sub class 
 http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/DataSource.html
  Using the Hector java library to pull data from Cassandra.
 
 I'm guessing you will need to consider how to perform delta imports. Perhaps 
 using the secondary indexes in 0.7* , or maintaining your own queues or 
 indexes to know what has changed.
 
 There is also the Lucandra project, not exactly what your after but may be 
 of interest anyway https://github.com/tjake/Lucandra
 
 Hope that helps.
 Aaron
 
 
 On 30 Nov, 2010,at 05:04 AM, Mark static.void@gmail.com wrote:
 
 Is there anyway to use DIH to import from Cassandra? Thanks



Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread David Stuart
If you are using Solr Multicore http://wiki.apache.org/solr/CoreAdmin you can 
issue a Reload command 
http://localhost:8983/solr/admin/cores?action=RELOADcore=core0

On 26 Oct 2010, at 11:09, Swapnonil Mukherjee wrote:

 Hi Everybody,
 
 If I change my schema.xml to, do I have to restart Solr. Is there some way, I 
 can apply the changes to schema.xml without restarting Solr?
 
 Swapnonil Mukherjee
 
 
 



Re: DataImportHandler dynamic fields clarification

2010-09-30 Thread David Stuart
Two things, one are your DB column uppercase as this would effect the out.

Second what does your db-data-config.xml look like

Regards,

Dave

On 30 Sep 2010, at 03:01, harrysmith wrote:

 
 Looking for some clarification on DIH to make sure I am interpreting this
 correctly.
 
 I have a wide DB table, 100 columns. I'd rather not have to add 100 values
 in schema.xml and data-config.xml. I was under the impression that if the
 column name matched a dynamic Field name, it would be added. I am not
 finding this is the case, but only works when the column name is explicitly
 listed as a static field.
 
 Example: 100 column table, columns named 'COLUMN_1, COLUMN_2 ... COLUMN_100'
 
 If I add something like:
 field name=column_60  type=string  indexed=true  stored=true/
 to schema.xml, and don't reference the column in data-config entity/field
 tag, it gets imported, as expected.
 
 However, if I use:
 dynamicField name=column_*  type=string  indexed=true 
 stored=true/
 It does not get imported into Solr, I would expect it would.
 
 
 Is this the expected behavior?
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1606159.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to Combine Drupal solrconfig.xml with Nutch solrconfig.xml?

2010-07-27 Thread David Stuart
I would use the string version as Drupal will probably populate it with a url 
like thing something that may not validate as type url


On 27 Jul 2010, at 04:00, Savannah Beckett wrote:

 
 I am trying to merge the schema.xml that is the solr/nutch setup with the one 
 from drupal apache solr module.  I encounter a field that is not mergeable.
 From drupal module:
  field name=url type=string indexed=true stored=true/
 From solr/nutch setup:
 field name=url type=url stored=true indexed=true
 required=true/
 I am not sure if there are any more stuff like this that is not mergeable.
  
 Is there a easy way to deal with schema.xml?
 Thanks.
 From: David Stuart david.stu...@progressivealliance.co.uk
 To: solr-user@lucene.apache.org
 Sent: Mon, July 26, 2010 1:46:58 PM
 Subject: Re: How to Combine Drupal solrconfig.xml with Nutch solrconfig.xml?
 
 Hi Savannah,
 
 I have just answered this question over on drupal.org. 
 http://drupal.org/node/811062
 
 Response number 5 and 11 will help you. On the solrconfig.xml side of things 
 you will only really need Drupal's version.
 
 Although still in alpha my Nutch module will help you out with integration 
 http://drupal.org/project/nutch
 
 Regards,
 
 David Stuart
 
 On 26 Jul 2010, at 21:37, Savannah Beckett wrote:
 
  I am using Drupal ApacheSolr module to integrate solr with drupal.  I 
  already 
  integrated solr with nutch.  I already moved nutch's solrconfig.xml and 
  schema.xml to solr's example directory, and it work.  I tried to append 
  Drupal's 
  ApacheSolr module's own solrconfig.xml and schema.xml into the same xml 
  files, 
  but I got the following error when I java -jar start.jar:
   
  Jul 26, 2010 1:18:31 PM org.apache.solr.common.SolrException log
  SEVERE: Exception during parsing file: 
  solrconfig.xml:org.xml.sax.SAXParseException: The markup in the document 
  following the root element must be well-formed.
 at 
  com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
 at 
  com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
  
 at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
 at org.apache.solr.core.Config.init(Config.java:110)
 at org.apache.solr.core.SolrConfig.init(SolrConfig.java:130)
 at 
  org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134)
  
 at 
  org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
  
  Why?  does solrconfig.xml allow to have 2 config sections?  does 
  schema.xml 
  allow to have 2 schema sections?  
  
  Thanks.
  
  
 
 
 



Re: How to Combine Drupal solrconfig.xml with Nutch solrconfig.xml?

2010-07-26 Thread David Stuart
Hi Savannah,

I have just answered this question over on drupal.org. 
http://drupal.org/node/811062

Response number 5 and 11 will help you. On the solrconfig.xml side of things 
you will only really need Drupal's version.

Although still in alpha my Nutch module will help you out with integration 
http://drupal.org/project/nutch

Regards,

David Stuart

On 26 Jul 2010, at 21:37, Savannah Beckett wrote:

 I am using Drupal ApacheSolr module to integrate solr with drupal.  I already 
 integrated solr with nutch.  I already moved nutch's solrconfig.xml and 
 schema.xml to solr's example directory, and it work.  I tried to append 
 Drupal's 
 ApacheSolr module's own solrconfig.xml and schema.xml into the same xml 
 files, 
 but I got the following error when I java -jar start.jar:
  
 Jul 26, 2010 1:18:31 PM org.apache.solr.common.SolrException log
 SEVERE: Exception during parsing file: 
 solrconfig.xml:org.xml.sax.SAXParseException: The markup in the document 
 following the root element must be well-formed.
 at 
 com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
 at 
 com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
 
 at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
 at org.apache.solr.core.Config.init(Config.java:110)
 at org.apache.solr.core.SolrConfig.init(SolrConfig.java:130)
 at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134)
 
 at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
 
 Why?  does solrconfig.xml allow to have 2 config sections?  does schema.xml 
 allow to have 2 schema sections?  
 
 Thanks.
 
 



Re: Importing large datasets

2010-06-03 Thread David Stuart



On 3 Jun 2010, at 02:58, Dennis Gearon gear...@sbcglobal.net wrote:

When adding data continuously, that data is available after  
committing and is indexed, right?

Yes


If so, how often is reindexing do some good?
You should only need to reindex if the data changes or you change your  
schema. The DIH in solr 1.4 supports delta imports so you should only  
really be adding of updating (which is actually deleting and adding)  
items when necessary.


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
 otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 6/2/10, Andrzej Bialecki a...@getopt.org wrote:


From: Andrzej Bialecki a...@getopt.org
Subject: Re: Importing large datasets
To: solr-user@lucene.apache.org
Date: Wednesday, June 2, 2010, 4:52 AM
On 2010-06-02 13:12, Grant Ingersoll
wrote:


On Jun 2, 2010, at 6:53 AM, Andrzej Bialecki wrote:


On 2010-06-02 12:42, Grant Ingersoll wrote:


On Jun 1, 2010, at 9:54 PM, Blargy wrote:



We have around 5 million items in our

index and each item has a description

located on a separate physical database.

These item descriptions vary in

size and for the most part are quite

large. Currently we are only indexing

items and not their corresponding

description and a full import takes around

4 hours. Ideally we want to index both our

items and their descriptions but

after some quick profiling I determined

that a full import would take in

excess of 24 hours.

- How would I profile the indexing process

to determine if the bottleneck is

Solr or our Database.


As a data point, I routinely see clients index

5M items on normal

hardware in approx. 1 hour (give or take 30

minutes).


When you say quite large, what do you

mean?  Are we talking books here or maybe a couple
pages of text or just a couple KB of data?


How long does it take you to get that data out

(and, from the sounds of it, merge it with your item) w/o
going to Solr?



- In either case, how would one speed up

this process? Is there a way to run

parallel import processes and then merge

them together at the end? Possibly

use some sort of distributed computing?


DataImportHandler now supports multiple

threads.  The absolute fastest way that I know of to
index is via multiple threads sending batches of documents
at a time (at least 100).  Often, from DBs one can
split up the table via SQL statements that can then be
fetched separately.  You may want to write your own
multithreaded client to index.


SOLR-1301 is also an option if you are familiar

with Hadoop ...




If the bottleneck is the DB, will that do much?



Nope. But the workflow could be set up so that during night
hours a DB
export takes place that results in a CSV or SolrXML file
(there you
could measure the time it takes to do this export), and
then indexing
can work from this file.


--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _
_   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic
Web
___|||__||  \|  ||  |  Embedded Unix,
System Integration
http://www.sigram.com  Contact: info at sigram dot
com




Re: Importing large datasets

2010-06-03 Thread David Stuart



On 3 Jun 2010, at 02:51, Dennis Gearon gear...@sbcglobal.net wrote:

Well, I hope to have around 5 million datasets/documents within 1  
year, so this is good info. BUT if I DO have that many, then the  
market I am aiming at will end giving me 100 times more than than  
within 2 years.


Are there good references/books on using Solr/Lucen/(linux/nginx)  
for 500 million plus documents?


As far as I'm aware there aren't any books yet that cover this for  
solr. The wiki, this mailing list, nabble are your best sources and  
there have been some quite indepth conversations on the matter in this  
list in the past

The data is easily shardible geographially, as one given.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
 otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 6/2/10, Grant Ingersoll gsing...@apache.org wrote:


From: Grant Ingersoll gsing...@apache.org
Subject: Re: Importing large datasets
To: solr-user@lucene.apache.org
Date: Wednesday, June 2, 2010, 3:42 AM

On Jun 1, 2010, at 9:54 PM, Blargy wrote:



We have around 5 million items in our index and each

item has a description

located on a separate physical database. These item

descriptions vary in

size and for the most part are quite large. Currently

we are only indexing

items and not their corresponding description and a

full import takes around

4 hours. Ideally we want to index both our items and

their descriptions but

after some quick profiling I determined that a full

import would take in

excess of 24 hours.

- How would I profile the indexing process to

determine if the bottleneck is

Solr or our Database.


As a data point, I routinely see clients index 5M items on
normal
hardware in approx. 1 hour (give or take 30 minutes).


When you say quite large, what do you mean?  Are we
talking books here or maybe a couple pages of text or just a
couple KB of data?

How long does it take you to get that data out (and, from
the sounds of it, merge it with your item) w/o going to
Solr?


- In either case, how would one speed up this process?

Is there a way to run

parallel import processes and then merge them together

at the end? Possibly

use some sort of distributed computing?


DataImportHandler now supports multiple threads.  The
absolute fastest way that I know of to index is via multiple
threads sending batches of documents at a time (at least
100).  Often, from DBs one can split up the table via
SQL statements that can then be fetched separately.
You may want to write your own multithreaded client to
index.

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search




Re: Importing large datasets

2010-06-03 Thread David Stuart



On 3 Jun 2010, at 03:51, Blargy zman...@hotmail.com wrote:



Would dumping the databases to a local file help at all?


I would suspect not especally with the size of your data. But it would  
be good to know how long that takes i.e. Creating a SQL script that  
just pulls that data out how long does that take?


Also have many fields are you indexing per document 10,50,100?
--  
View this message in context: http://lucene.472066.n3.nabble.com/Importing-large-datasets-f 
 tp863447p866538.html

Sent from the Solr - User mailing list archive at Nabble.com.


Re: Importing large datasets

2010-06-02 Thread David Stuart
How long does it take to do a grab of all the data via SQL? I found by  
denormalizing the data into a lookup table meant that I was able to  
index about 300k rows of similar data size with dih regex spilting on  
some fields in about 8mins I know it's not quite the scale bit with  
batching...


David Stuar

On 2 Jun 2010, at 17:58, Blargy zman...@hotmail.com wrote:





One thing that might help indexing speed - create a *single* SQL  
query

to grab all the data you need without using DIH's sub-entities, at
least the non-cached ones.



Not sure how much that would help. As I mentioned that without the  
item
description import the full process takes 4 hours which is bearable.  
However
once I started to import the item description which is located on a  
separate

machine/database the import process exploded to over 24 hours.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865324.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing different entities in Solr

2010-05-28 Thread David Stuart

Hi,

So for your use case are you wanting to search for a consultant then  
look at all of his or her request or pull both at the same time? In  
both cases one index should suffice. In you define a primary key field  
and use it for both doc types it shouldn't be an issue. Unless your  
dataset in very large it would reduce the overhead of running a  
multicore solution especially in indexing etc


David Stuart

On 28 May 2010, at 18:12, Moazzam Khan moazz...@gmail.com wrote:


Thanks for all your answers guys. Requests and consultants have a many
to many relationship so I can't store request info in a document with
advisorID as the primary key.

Bill's solution and multicore solutions might be what I am looking
for. Bill, will I be able to have 2 primary keys (so I can update and
delete documents)? If yes, can you please give me a link or someting
where I can get more info on this?

Thanks,
Moazzam



On Fri, May 28, 2010 at 11:50 AM, Bill Au bill.w...@gmail.com wrote:

You can keep different type of documents in the same index.  If each
document has a type field.  You can restrict your searches to  
specific

type(s) of document by using a filter query, which is very fast and
efficient.

Bill

On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin 
knagelb...@globeandmail.com wrote:

Multi-core is an option, but keep in mind if you go that route you  
will

need to do two searches to correlate data between the two.

-Kallin Nagelberg

-Original Message-
From: Robert Zotter [mailto:robertzot...@gmail.com]
Sent: Friday, May 28, 2010 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Storing different entities in Solr


Sounds like you'll want to use a multiple core setup. One core  
fore each

type
of document

http://wiki.apache.org/solr/CoreAdmin
--
View this message in context:
http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: how to patch solr-236 in mac os

2010-05-11 Thread David Stuart

Hey,

In osx you shoud be able to patch in the same way as on liux patch -p 
[level]  name_of_patch.patch. You can do this from the shell  
including on the mac.


David Stuart

On 11 May 2010, at 17:15, Jonty Rhods jonty.rh...@gmail.com wrote:


hi all,

I am very new to solr.
Now I required to patch solr (patch no 236).
I download the latest src code and patch, but unable to finde  
suitable way

to patch.
I have eclipse installed.

please guide me..


Re: how to patch solr-236 in mac os

2010-05-11 Thread David Stuart



Hi jonty,

In then root directory of the src run

patch -p0  name_of_patch.patch


David Stuart

On 11 May 2010, at 17:50, Jonty Rhods jonty.rh...@gmail.com wrote:


hi David,
thanks for quick reply..
please give me full command. so I can patch. what is meaning of  
[level].
As I write I had downloaded latest src from trunk.. So please also  
tell

that, in terminal what will be command and from where I can run..
should I try

patch -p[level]  name_of_patch.patch


thanks

On Tue, May 11, 2010 at 10:02 PM, David Stuart 
david.stu...@progressivealliance.co.uk wrote:


Hey,

In osx you shoud be able to patch in the same way as on liux patch
-p[level]  name_of_patch.patch. You can do this from the shell  
including on

the mac.

David Stuart


On 11 May 2010, at 17:15, Jonty Rhods jonty.rh...@gmail.com wrote:

hi all,


I am very new to solr.
Now I required to patch solr (patch no 236).
I download the latest src code and patch, but unable to finde  
suitable way

to patch.
I have eclipse installed.

please guide me..





Re: Switching cores dynamically

2010-03-19 Thread David Stuart
Using a multicore setup should do the trick see  http://wiki.apache.org/solr/CoreAdmin 
 specificly the swap option


Cheers

David Stuart

On 19 Mar 2010, at 10:18, muneeb muneeba...@hotmail.com wrote:



Hi,

I have indexed almost 7 million articles on two separate cores, each  
with
their own conf/ and data/ folder, i.e. they have their individual  
index.


What I normally do is, use core0 for querying and core1 for any  
updates and
once updates are finished i copy the index of core1 to core0's data  
folder.

I know this isn't an efficient way of doing this, since this brings a
downtime on my search service for a couple of minutes.

I was wondering if its possible to switch between cores dynamically  
(keeping
my current setup in mind) in such a way that there is no downtime at  
all

during switching.

Thanks very much in advance.
-M
--
View this message in context: 
http://old.nabble.com/Switching-cores-dynamically-tp27950928p27950928.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question on Solr Scalability

2010-02-10 Thread David Stuart

Hi,

I think your needs would meet better with Distributed Search http://wiki.apache.org/solr/DistributedSearch 
 Which allows sharding to live on different servers and will search  
across all of those shard when a query comes in. There are a few patch  
which will hopefully be available in the Solr 1.5 release that will  
improve this including distributed tf idf across shards


Regards,

David
On 11 Feb 2010, at 07:12, abhishes abhis...@gmail.com wrote:



Suppose I am indexing very large data (5 billion rows in a database)

Now I want to use the Solr Core feature to split the index into  
manageable

chunks.

However I have two questions


1. Can Cores reside on difference physical servers?

2. when a query comes, will the query be answered by index in 1 core  
or the

query will be sent to all the cores?

My desire is to have a system which from outside appears as a single  
large

index... but inside it is multiple small indexes running on different
hardware machines.
--
View this message in context: 
http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html
Sent from the Solr - User mailing list archive at Nabble.com.



Default value attribute in RSS DIH

2010-01-24 Thread David Stuart
Hey All,

Can anyone tell me what the attribute name is for defining a default value in 
the field tag of the RSS data import handler??

Basically I want to do something like
field column=type value=external_source commonField=true/   


Any Ideas?


Regards,


Dave

Re: MoreLikeThis - How to pass in external text?

2010-01-22 Thread David Stuart
The MoreLikeThisHandler allows external text to be streamed to it see 
http://wiki.apache.org/solr/MoreLikeThisHandler#Using_ContentStreams. The url 
feature is quite good if you have a lot of text and start hitting the character 
limit in the url

Regards,

Dave


On 22 Jan 2010, at 05:24, Otis Gospodnetic wrote:

 Hi,
 
 Try what I suggested, please.
 
 Or, if you want, go to that (or any other) web page, copy a large chunk of 
 its content, and paste it into Google/Yahoo/Bing.  I just did that.  Google 
 said my query was too long, but Yahoo took it.  Guess what hit #1 was?  The 
 page I copied the text from!  Very much more like this-like.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
 
 
 - Original Message 
 From: ldung dung@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, January 22, 2010 12:08:26 AM
 Subject: Re: MoreLikeThis - How to pass in external text?
 
 
 I want to use MoreLikeThis since i want to find text in the Solr data that is
 similar to the input text. I want to see how will this works against just a
 standard keyword search.
 
 I want to do something similar to the article below.
 http://www.bbc.co.uk/blogs/radiolabs/2008/06/wikipedia_plus_lucene_morelikethis.shtml
 
 In the article the author uses MoreLikeThis to classifiy text according into
 pre-existing categories.
 
 
 
 
 Otis Gospodnetic wrote:
 
 Hi,
 
 if you have text to pass in, why do you need MoreLikeThis?  The text you
 speak of can be used as a normal query, so pass it in as a regular
 multi-word query.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
 
 
 - Original Message 
 From: ldung 
 To: solr-user@lucene.apache.org
 Sent: Thu, January 21, 2010 8:08:41 PM
 Subject: MoreLikeThis - How to pass in external text?
 
 
 How can I have the MoreLikeThis query process a piece of text that is
 passed
 into the query. Currently I can only get it MoreLikeThis to work only for
 pieces of text that are already indexed by Solr. 
 
 For example here is a query that works for using MoreLikeThis for
 document
 with id:134847893.
 
 
 http://localhost:8983/solr/select?mlt=trueq=id:134847893mlt.fl=descmlt.mindf=1mlt.mintf=1debugQuery=on
 
 How can I pass in some external text like 'Solr Rocks'. Below is an
 example
 of how it would look like.
 
 http://localhost:8983/solr/select?mlt=trueexternal.text=Solr
 Rocksmlt.fl=descmlt.mindf=1mlt.mintf=1debugQuery=on
 
 
 -- 
 View this message in context: 
 
 http://old.nabble.com/MoreLikeThis---How-to-pass-in-external-text--tp27266316p27266316.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 -- 
 View this message in context: 
 http://old.nabble.com/MoreLikeThis---How-to-pass-in-external-text--tp27266316p27268777.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Solr Drupal module problem

2010-01-14 Thread David Stuart
Hi,

The Drupal Solr Module will work with both Solr 1.3 and 1.4
I currently have client installations using both these versions with Drupal 
(verison 5 and 6 )

Regards,

Dave


On 14 Jan 2010, at 23:08, Otis Gospodnetic wrote:

 You may want to ask on Drupal's mailing lists.  I hear about Drupal and Solr 
 constantly, I can't imagine them not having Solr 1.4 support, esp. if you say 
 their configs contain referenes to things that are in Solr 1.4.0.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
 
 
 - Original Message 
 From: reallove thereall...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, January 14, 2010 5:54:55 PM
 Subject: Re: Solr Drupal module problem
 
 
 Hello,
 Thanks for the answer.
 Unfortunately, in the Debian repositories, even in testing, latest Solr
 version is 1.3.0 . Can I use that for the Drupal module to work ? I highly
 prefer to use the Debian repositories instead the source code.
 Thank you.
 
 
 Otis Gospodnetic wrote:
 
 Hi,
 
 Solr 1.2.0 didn't have TrieIntField.
 Use the latest Solr - Solr 1.4.0
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
 
 
 - Original Message 
 From: reallove 
 To: solr-user@lucene.apache.org
 Sent: Thu, January 14, 2010 5:43:23 PM
 Subject: Solr Drupal module problem
 
 
 Hello,
 System : Debian 5.0
 Java , tomcat  solr installed from the repositories.
 Java version 1.6_12 , tomcat 5.5 and solr 1.2.0 .
 I am trying to use the schema.xml and the solrconfig.xml from the Drupal
 module, but they fail to work.
 The error I am getting is :
 Error loading class 'solr.TrieIntField' .
 How can I fix this ?
 Thank you !
 -- 
 View this message in context: 
 http://old.nabble.com/Solr-Drupal-module-problem-tp27169365p27169365.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 -- 
 View this message in context: 
 http://old.nabble.com/Solr-Drupal-module-problem-tp27169365p27169511.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: I cant get it to work

2009-12-15 Thread David Stuart

Hi,

The answer is it depends ;)

If your 10 tables represent an entity e.g a person their address etc  
the one document entity works


But if your 10 tables each represnt a series of entites that you want  
to surface in your search results separately then make a document for  
each (I.e it depends on your data).


What is your use case? Are you wanting a search index that is able to  
search on every field in your 10 tables or just a few?
Think of it this way if you where creating SQL to pull the data out of  
the db using joins etc what fields would you grab, do you get multiple  
rows back because some of you tables have a one to many relationship.  
Once you have formed that query that is your document minus the  
duplicate information caused by the rows


Cheers

David

On 15 Dec 2009, at 08:05, Faire Mii faire@gmail.com wrote:


I just cant get it.

If i got 10 tables in mysql and they are all related to eachother  
with foreign keys. Should i have 10 documents in solr?


or just one document with rows from all tables in it?

i have tried in vain for 2 days now...plz help

regards

fayer


Re: Log of zero result searches

2009-12-15 Thread David Stuart
The returning XML result tag has a numFound attribute that will report  
0 if nothing matches your search criteria


David

On 15 Dec 2009, at 08:16, Roland Villemoes r...@alpha-solutions.dk  
wrote:



Hi

Question: How do you log zero result searches?

I quite important from a business perspective to know what searches  
that returns zero/empty results.

Does anybody know a way to get this information?

Roland Villemoes


Re: is it possible to use Xinclude in schema.xml?

2009-11-28 Thread David Stuart
Yea i tried it as well it doesn't seem to implement xpointer properly  
so you can't add multiple fields or field types


David

On 28 Nov 2009, at 18:49, Peter Wolanin peter.wola...@acquia.com  
wrote:



Follow-up:  it seems the schema parser doesn't barf if you use
xinclude with a single analyzer element, but so far seems like it's
impossible for a field type.  So this seems to work:

   fieldType name=text class=solr.TextField  
positionIncrementGap=100 

xi:include  href=solr/core2/conf/text-analyzer.xml 
 xi:fallback
 analyzer type=index
...
 /analyzer
 /xi:fallback
/xi:include
 analyzer type=query
...
 /analyzer
   /fieldType

On Sat, Nov 28, 2009 at 1:40 PM, Peter Wolanin peter.wola...@acquia.com 
 wrote:

I'm trying to determine if it's possible to use Xinclude to (for
example) have a base schema file and then substitute various pieces.

It seems that the schema fieldTypes throw exceptions if there is an
unexpected attribute?

SEVERE: java.lang.RuntimeException: schema fieldtype
text(org.apache.solr.schema.TextField) invalid
arguments:{xml:base=solr/core2/conf/text-analyzer.xml}

This is what I'm trying to do (details of the analyzer chain  
omitted -

nothing unusual) - so the error occurs when the external xml file is
actually included:

xi:include  href=solr/core2/conf/text-analyzer.xml
xmlns:xi=http://www.w3.org/2001/XInclude; 
 xi:fallback
   fieldType name=text class=solr.TextField  
positionIncrementGap=100 

 analyzer type=index
...
 /analyzer
 analyzer type=query
...
 /analyzer
   /fieldType
 /xi:fallback
/xi:include


Where (for testing) the text-analyzer.xml file just looks like the  
fallback:



?xml version=1.0 encoding=UTF-8 ?
   fieldType name=text class=solr.TextField  
positionIncrementGap=100 

 analyzer type=index
...
 /analyzer
 analyzer type=query
...
 /analyzer
   /fieldType


--
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com





--
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: memory size

2009-11-11 Thread David Stuart

Hi
This is a php problem you need to increase your per thread memory  
limit in your php.ini the field name is memory_limit


Regards

David

On 11 Nov 2009, at 07:56, Jörg Agatz joerg.ag...@googlemail.com  
wrote:



Hallo,

I have a Problem withe the Memory Size, but i dont know how i can  
repair it.


Maby it is a PHP problem, but i dont know.

My Error:

Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to
allocate 16515072 bytes)


I hope you can help me

KinGArtus


Re: apply a patch on solr

2009-11-04 Thread David Stuart
You should be ok with the the revision option below. Look for the  
highest revision number in the list of files in the patch as  
subversion increments revision number on a repo basis not a file basis  
so the highest number will represent the current state of all the  
files when the patch was made if that make sense


Regards
Dave

On 4 Nov 2009, at 03:40, michael8 mich...@saracatech.com wrote:



Perfect.  This is what I need to know instead of patching 'in the  
dark'.

Good thing SVN revision cuts across all files like a tag.

Thanks Mike!

Michael


cambridgemike wrote:


You can see what revision the patch was written for at the top of the
patch,
it will look like this:

Index: org/apache/solr/handler/MoreLikeThisHandler.java
===
--- org/apache/solr/handler/MoreLikeThisHandler.java (revision  
772437)

+++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)

now check out revision 772437 using the --revision switch in svn,  
patch
away, and then svn up to make sure everything merges cleanly.  This  
is a

good guide to follow as well:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html

cheers,
-mike

On Mon, Nov 2, 2009 at 3:55 PM, michael8 mich...@saracatech.com  
wrote:




Hi,

First I like to pardon my novice question on patching solr (1.4).   
What I

like to know is, given a patch, like the one for collapse field, how
would
one go about knowing what solr source that patch is meant for  
since this

is
a source level patch?  Wouldn't the exact versions of a set of  
java files

to
be patched critical for the patch to work properly?

So far what I have done is to pull the latest collapse field patch  
down

from
http://issues.apache.org/jira/browse/SOLR-236 (field- 
collapse-5.patch),

and
then svn up the latest trunk from
http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and  
build.

Intuitively I was thinking I should be doing svn up to a specific
revision/tag instead of just latest.  So far everything seems  
fine, but I

just want to make sure I'm doing the right thing and not just being
lucky.

Thanks,
Michael
--
View this message in context:
http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html
Sent from the Solr - User mailing list archive at Nabble.com.







--
View this message in context: 
http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26189573.html
Sent from the Solr - User mailing list archive at Nabble.com.



Conditional copyField

2009-10-12 Thread David Stuart

Hi,
I am pushing data to solr from two different sources nutch and a cms.  
I have a data clash in that in nutch a copyField is required to push  
the url field to the id field as it is used as  the primary lookup in  
the nutch solr intergration update. The other cms also uses the url  
field but also populates the id field with a different value. Now I  
can't really change either source definition so is there a way in  
solrconfig or schema to check if id is empty and only copy if true or  
is there a better way via the updateprocessor?


Thanks for your help in advance
Regards

David


xincludes schema help

2009-10-11 Thread David Stuart

Hi,

I am trying to get xincludes with xpointer working in schema.xml as  
per this closed issue requrest https://issues.apache.org/jira/browse/SOLR-1167 
.



To make our upgrade path easier I want to be able to include extra  
custom fields
in the schema and am including an extra set of fields inside the  
fields tags but

keep getting a  XPointer resolution unsuccessful error. Files below.

schema
types.../types
fields
   field name=site type=string indexed=true stored=true/
   field name=hash type=string indexed=true stored=true/
   field name=url type=string indexed=true stored=true/

   xi:include href=/usr/local/solr_home/solr/db/conf/ 
nutch_schema.xml

parse=xml xpointer=./nutch/*
xmlns:xi=http://www.w3.org/2001/XInclude/
/fields
/schema

-- Include file --
nutch
!-- fields for index-basic plugin --
field name=host type=url stored=false indexed=true /
field name=content type=text stored=true indexed=true /
copyField source=content dest=body/
copyField source=content dest=teaser/
/nutch

I have also tried this to add muliple extra fieldType definitions

xi:include href=/usr/local/solr_home/solr/db/conf/ 
nutch_schema.xml parse=xml xpointer=./extraFieldTypes/*

xmlns:xi=http://www.w3.org/2001/XInclude/
extraFieldTypes
fieldType name=url class=solr.TextField  
positionIncrementGap=100

analyzer
  tokenizer class=solr.StandardTokenizerFactory /
  /analyzer
/fieldType
fieldType/fieldType
/extraFieldTypes
Any thoughts

Thanks for your help

Regards,

David