Re: Field Collapsing SOLR-236

2010-07-09 Thread Moazzam Khan
Hi Rakhi,

Sorry, I didn't see this email until just now. Did you get it working?


If not here's some things that might help.


- Download the patch first.
- Check the date on which the patch was released.
- Download the version of the trunk that existed at that date.
- Apply the patch using the patch program in linux. There is a Windows
program for patching but I can't remember right now.
- After applying the patch just compile the whole thing


It might be better if you used the example folder first and modify the
config to work for multicore (at least that's what I did) . You can
compile example by doing

ant example

(if I remember correctly)

For config stuff refer to this link :

http://wiki.apache.org/solr/FieldCollapsing


HTH :)

- Moazzam


I'd give you the



On Wed, Jun 23, 2010 at 7:23 AM, Rakhi Khatwani  wrote:
> Hi,
>   But these is almost no settings in my config
> heres a snapshot of what i have in my solrconfig.xml
>
> 
> 
>
> 
>  multipartUploadLimitInKB="2048" />
> 
>
>  default="true" />
> 
>  class="org.apache.solr.handler.admin.AdminHandlers" />
>
> 
> 
> *:*
> 
>
> 
>  class="org.apache.solr.handler.component.CollapseComponent" />
> 
>
> Am i goin wrong anywhere?
> Regards,
> Raakhi
>
> On Wed, Jun 23, 2010 at 3:28 PM, Govind Kanshi wrote:
>
>> fieldType:analyzer without class or tokenizer & filter list seems to point
>> to the config - you may want to correct.
>>
>>
>> On Wed, Jun 23, 2010 at 3:09 PM, Rakhi Khatwani 
>> wrote:
>>
>> > Hi,
>> >        I checked out modules & lucene from the trunk.
>> > Performed a build using the following commands
>> > ant clean
>> > ant compile
>> > ant example
>> >
>> > Which compiled successfully.
>> >
>> >
>> > I then put my existing index(using schema.xml from solr1.4.0/conf/solr/)
>> in
>> > the multicore folder, configured solr.xml and started the server
>> >
>> > When i type in http://localhost:8983/solr
>> >
>> > i get the following error:
>> > org.apache.solr.common.SolrException: Plugin init failure for
>> [schema.xml]
>> > fieldType:analyzer without class or tokenizer & filter list
>> > at
>> >
>> >
>> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168)
>> > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
>> > at org.apache.solr.schema.IndexSchema.(IndexSchema.java:122)
>> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429)
>> > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:286)
>> > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198)
>> > at
>> >
>> >
>> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86)
>> > at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
>> > at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>> > at
>> >
>> >
>> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
>> > at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
>> > at
>> >
>> >
>> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
>> > at
>> > org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
>> > at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
>> > at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>> > at
>> >
>> >
>> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>> > at
>> >
>> >
>> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
>> > at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>> > at
>> >
>> >
>> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>> > at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>> > at
>> > org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>> > at org.mortbay.jetty.Server.doStart(Server.java:224)
>> > at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>> > at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > at
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > at org.mortbay.start.Main.invokeMain(Main.java:194)
>> > at org.mortbay.start.Main.start(Main.java:534)
>> > at org.mortbay.start.Main.start(Main.java:441)
>> > at org.mortbay.start.Main.main(Main.java:119)
>> > Caused by: org.apache.solr.common.SolrException: analyzer without class
>> or
>> > tokenizer & filter list
>> > at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:908)
>> > at org.apache.solr.schema.IndexSchema.access

Re: Realtime + Batch indexing

2010-07-09 Thread Shawn Heisey
 It's possible to get near real-time adds and updates (every two 
minutes in our case) with a multi-shard setup, if you have a shard 
dedicated to new content and have the right combination of unique 
identifiers on your data.  I'll respond off-list with a full description 
of my setup.



On 7/9/2010 4:41 PM, bbarani wrote:

I have a scheduled batch indexing happening in master every 2 days for 3
sources (Ex: s1, s2, s3) Once the batch indexing gets completed I replicate
that to slave instance for user queries.

There is one more app which posts the XML (of s3) to SOLR slave instance (to
perform real time indexing) and the posted XML can add / update document to
the slave index (created by batch indexing). Now since the data posted via
XML is also available for batch indexing, If I do a batch indexing for s3
after 2 days and replicate it in slave users should be able to view all
data. I am posting just to slave first in order to have a kind of real time
indexing where the user can see the results immediately but whenever the XML
post happens to SOLR there is a db entry corresponding to that post..

Now I am afraid that I might run in to an issue when someone kicks off real
time indexing from the app when batch indexing is in progress as the batch
indexing might not pick up the changes made to slave at that time (when the
batch indexing is in progress).

Has anyone faced this kind of scenario..

My ideal solution is that I should be able to do real time (XML post) /
batch indexing at same time and also I cant use shards as real time data may
even need to update the existing index (not just add a new document)..My
assumption is that I can use shards if we are going to maintain index
separately for real time / batch indexing but if I need to update an
existing document using XML post I don't think Shards would work...

I also thought of doing this.. I will always write both XML post / batch
indexing to Master and do a replication to slave every 15 seconds.. even in
this case if I am doing a batch indexing I suppose SOLR will lock the index
files and I wont be able to do a XML push to the same index at that time..
please correct me if I am wrong..




Problem with linux

2010-07-09 Thread sarfaraz masood
I have problems when i execute my prog on linux having this following piece of 
code. 
{

Document d;
 Analyzer analyzer = new PorterStemAnalyzer();
System.out.println("1");
    

Directory index = FSDirectory.open(new File("index1"));
System.out.println("2");

IndexWriter w = new IndexWriter(index, analyzer, true, 
IndexWriter.MaxFieldLength.UNLIMITED ) ; // MY PROG HANGS UP HERE
System.out.println("3");
.
.
.
}


Strangely this exact prog runs well on windows. It simply hangs 
up(doesnt halt) while creating the IndexWriter object in linux. The account via 
which im logged in has sufficient rights for the concerned folder. 


-Sarfaraz





Re: PDF remote streaming extract with lots of multiValues

2010-07-09 Thread David Thompson
POSTing the individual parameters (literal.id, literal.mycategory, 
literal.mycategory) as name value pairs to 1.4's /update/extract does work.  I 
just realized the POST's content type hadn't been set to 
'application/x-www-form-urlencoded'.  Set it to that and it accepts all the 
parameters.

 -dKt





From: David Thompson 
To: solr-user@lucene.apache.org
Sent: Fri, July 9, 2010 12:17:59 PM
Subject: PDF remote streaming extract with lots of multiValues


How would I go about setting a large number of literal values in a call to 
index 
a remote PDF?  I'm currently calling:

http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&stream.url=http://otherhost/some/file.pdf


And that works great, except now I'm coming across usecases where I need send 
in 
hundreds, up to thousands, of different values for 'mycategory'.  So with 
mycategory defined as a multiValued string, I can call:

 
http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&literal.mycategory=foo&literal.mycategory=bar&stream.url=http://otherhost/some/file.pdf


and that works as expected.  But when I try to embed thousands of 
literal.mycategory parameters in the call, eventually my container says 'look, 
I've been forgiving about letting you GET URLs far longer than 1500 characters, 
but this is ridiculous' and barfs on it.  


I've tried POSTing a ... command, but it only pays 
attention to parameters in the URL query string, ignoring everything in the 
document.  I've seen some other threads that seem related, but now I'm just 
confused.  


What's the best way to tackle  this?

-dKt


  

Re: Realtime + Batch indexing

2010-07-09 Thread bbarani

Hi,

Thanks a lot for your replies

Here is the exact problem I am facing right now..

I have a scheduled batch indexing happening in master every 2 days for 3
sources (Ex: s1, s2, s3) Once the batch indexing gets completed I replicate
that to slave instance for user queries.

There is one more app which posts the XML (of s3) to SOLR slave instance (to
perform real time indexing) and the posted XML can add / update document to
the slave index (created by batch indexing). Now since the data posted via
XML is also available for batch indexing, If I do a batch indexing for s3
after 2 days and replicate it in slave users should be able to view all
data. I am posting just to slave first in order to have a kind of real time
indexing where the user can see the results immediately but whenever the XML
post happens to SOLR there is a db entry corresponding to that post..

Now I am afraid that I might run in to an issue when someone kicks off real
time indexing from the app when batch indexing is in progress as the batch
indexing might not pick up the changes made to slave at that time (when the
batch indexing is in progress).

Has anyone faced this kind of scenario..

My ideal solution is that I should be able to do real time (XML post) /
batch indexing at same time and also I cant use shards as real time data may
even need to update the existing index (not just add a new document)..My
assumption is that I can use shards if we are going to maintain index
separately for real time / batch indexing but if I need to update an
existing document using XML post I don't think Shards would work...

I also thought of doing this.. I will always write both XML post / batch
indexing to Master and do a replication to slave every 15 seconds.. even in
this case if I am doing a batch indexing I suppose SOLR will lock the index
files and I wont be able to do a XML push to the same index at that time..
please correct me if I am wrong..

Any suggestion / thoughts would be greatly appreciated.


Thanks,
BB

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p955442.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Function Query Sorting vs 'Sort' parameter?

2010-07-09 Thread Koji Sekiguchi

(10/07/10 7:15), Saïd Radhouani wrote:

Yes, indeed, you understood my question. Looking forward to the next version 
then.

To your reply, I'd add that _val_ is used for standard request handler, and bf 
is used for dismax, right?

-S

   

Right.

Koji

--
http://www.rondhuit.com/en/



Re: Function Query Sorting vs 'Sort' parameter?

2010-07-09 Thread Saïd Radhouani
Yes, indeed, you understood my question. Looking forward to the next version 
then.

To your reply, I'd add that _val_ is used for standard request handler, and bf 
is used for dismax, right?

-S 


On Jul 10, 2010, at 12:05 AM, Koji Sekiguchi wrote:

> (10/07/10 0:54), Saïd Radhouani wrote:
>> Hi,
>> 
>> I'm making some basic sorting (date, price, etc.) using the "sort" parameter 
>> (sort=field+asc), and it's working fine. I'm wondering whether there's a 
>> significant argument to use function query sorting instead of the "sort" 
>> parameter?
>> 
>> Thanks,
>> -S
>>   
> I'm not sure if I understand your question correctly,
> but sort by function will be available in next version of Solr:
> 
> https://issues.apache.org/jira/browse/SOLR-1297
> 
> q=ipod&sort=func(price) asc
> 
> Or you can sort by function via _val_ in Solr 1.4:
> 
> q=ipod^0 _val_:"func(price)"&sort=score asc
> 
> Koji
> 
> -- 
> http://www.rondhuit.com/en/
> 



Re: Sort by Day - Use of DateMathParser in Function Query?

2010-07-09 Thread Chris Hostetter

: In https://issues.apache.org/jira/browse/SOLR-1297,
: Grant writes:
: """
: Note, there is a temporary workaround for this: (main query)^0
: func(...) 
: """
: 
: Is that workaround an option for my use case?

that would in fact be a workarround for sorting by function where the 
function uses "ms" to get hte milliseconds of a rounded date field -- 
however...

: > I am using 1.4.1, the date field is configured like this:
: >  omitNorms="true"/>
: > 
: > (The schema has been created using the schema file from 1.4.0, and I
: > haven't changed anything when upgrading to 1.4.1. TrieDate is said to be
: > the default in 1.4, so I would expect this date field to have that
: > type?)

...somewhere you got confused, or missunderstood something.  There is no 
"default" date field in Solr, there are only recomendations and examples 
provided in the example schema.xml -- in Solr 1.4.1 *and* in Solr 1.4 the 
recommended field for dealing with dates is "solr.TrieDateField"

As noted in the FunctionQuery wiki page you mentioned, the ms() function 
does not work with "solr.DateField".  

(most likely your schema.xml originally started from the example in SOlr 
1.3 or earlier ... *OR* ... you needed the 
sortMissingLast/sortMissingFirst functionality that DateField supports but 
TrieDateField does not.  the 1.4 example schema.xml explains the 
differences)


-Hoss



Re: Function Query Sorting vs 'Sort' parameter?

2010-07-09 Thread Koji Sekiguchi

(10/07/10 0:54), Saïd Radhouani wrote:

Hi,

I'm making some basic sorting (date, price, etc.) using the "sort" parameter 
(sort=field+asc), and it's working fine. I'm wondering whether there's a significant argument to 
use function query sorting instead of the "sort" parameter?

Thanks,
-S
   

I'm not sure if I understand your question correctly,
but sort by function will be available in next version of Solr:

https://issues.apache.org/jira/browse/SOLR-1297

q=ipod&sort=func(price) asc

Or you can sort by function via _val_ in Solr 1.4:

q=ipod^0 _val_:"func(price)"&sort=score asc

Koji

--
http://www.rondhuit.com/en/



Re: Delta Import by ID

2010-07-09 Thread Chris Hostetter

I'm not certain but i think what you want is something like this...

deltaQuery="select '${dataimporter.request.do_this_id}'"
deltaImportQuery="select ... from destinations 
  where DestID='${dataimporter.delta.id}'
   "
...and then hit the handler with a URL like..

   /dataimport?config=data-config.xml&command=delta-import&do_this_id=XYZ&

Normally, the job of deltaQuery is to pick a list of IDs based on the 
${dataimporter.last_index_time}, and the ndeleteImportQuery fetches all 
the data for those Ids -- but in your case you don't care about the 
last index time, you just want to force it to index a specific id.  so you 
just need to select that id as is in your request params.


: 
: 
: However I really dont want to use CreationDate, but rather just pass in the
: id (as done in the deltaImportQuery) - Can I do that directly - if so how do
: I specify the value for dataimporter.delta.id?
: 
: (P.S. sorry for a new thread, I kept getting my mail bounced back when I did
: a reply, so I'm trying a new thread.)
: 



-Hoss



Re: making rotating timestamped logs from solr output

2010-07-09 Thread Chris Hostetter

The entire wording/phrasing of your email leads me to suspect that you are 
using the example jetty server provided with solr (ie: java -jar 
start.jar) and that you aren't clear on the distinction between the logs 
generated by jetty and the logs generated by solr.

the simple instance of Jetty thta you get when running java -jar start.jar 
does request logging into the example/logs directory -- while various 
debug/info/warn/error messages from the java code are all configured 
to be logged to the console specificly because it's an example, we want 
you to see what types of things are logged.

For a "real" installation of Solr, i would recommend you look into 
something line init.d or "services" in windows (i think that's what they 
are called) to ensure that the servlet container is started as a daemon 
(independent of your user session).  You can then configure your serlvet 
container to log anyway you want it to...
   http://wiki.apache.org/solr/SolrLogging

That said: "request" logging from your servlet container only knows about 
the HTTP level request/response information -- it has no way of knowing 
about things like number of hits.  those things are logged by Solr, but 
there is a single log message per request that does includes this 
information, so you can configure LogHandlers to direct copies of these 
specific messages to a special file (i can't remember the pattern off the 
top of my head)

: Hello,
: 
: I would like to log the solr console. although solr logs requests in
: timestamped format, this only logs the requests, i.e. does not log
: number of hits for a given query, etc.
: 
: is there any easy way to do this other then reverting to methods for
: capturing solr output. I usually run solr on my server using screen
: command first, running solr, then detaching from console.
: 
: but it would be nice to have output logging instead of request logging.
: 
: best regards,
: c.b.
: 



-Hoss



Re: Custom PhraseQuery

2010-07-09 Thread Chris Hostetter

: It sounds like all I need to do is actually override tf(float) in the
: SweetSpotSimilarity class to delegate to baselineTF just like tf(int) does.
: Is this correct?

you have to decide how you want to map the float->int (ie: round, 
truncate, etc...) but otherwise: yes that should work fine.



-Hoss



RE: solr connection question

2010-07-09 Thread Chris Hostetter

: Yes I mean  HTTP-requests 
: How can I log them?

it's entirely dependent on your Servlet Container (ie: jetty, tomcat, 
resin, weblogic, etc...)

If you are using the example jetty provided in the solr releases (ie: java 
-jar start.jar) they show up in examples/logs


-Hoss



Re: ClassCastException SOLR

2010-07-09 Thread Chris Hostetter

: If you look at the Lucene factories, they all subclass from
: BaseTokenFilterFactory which then subclasses from
: BaseTokenStreamFactory. That last one does various things for the
: child factories (I don't know what they are).

Note also that if you really did copy the body of SynonymFilterFactory 
exactly (so that it already subclasses BaseTokenFilterFactory) the other 
possible cause of this problem is a classloader issue .. if the only thing 
in your plugin jar your new factory, and this new plugin is a lib dir that 
is either in your solr home dir, or configured in your solrconfig.xml, 
then you shouldn't have a problem.  BUT! ... this wording here jumps out 
at me...

: > I'm using the same dependencies as SOLR 1.4.1, because it caused problems
: > with newer versions of lucene-core.

...you should be *compiling* against the same lucene/solr jars that come 
with SOlr, but you should not be trying to include any of those 
classes/jars in your classpath yourself -- having multiple instances of a 
class in the classloader can cause problems like the one you are seeing.



-Hoss



Re: SolrQueryResponse - Solr Documents

2010-07-09 Thread Chris Hostetter

: How can I view solr docs in response writers before the response is sent
: to the client ? What I get is only DocSlice with int values having size
: equal the docs requested. All this while debugging on the
: SolrQueryResponse Object.

if you are writitng a custom ResponseWriter you can get the Documents 
corrisponding to a DocList (or DocSlice) by fetching the Documents from 
the SolrIndexSearcher (with is associated with the SolrQueryRequest)  the 
int's in the DocSlice are the Lucene internal docIds that the 
IndexSearcher.document(int) method expects.

Note: if you subclass the BaseResponseWriter class this is a lot easier -- 
it takes care of the hard work and converts the Document into a 
SolrDocument -- all you have to do is implement writeDoc(SolrDocument)


-Hoss



Re: Custom PhraseQuery

2010-07-09 Thread Blargy

Oh.. i didnt know about the different signatures to tf. Thanks for that
clarification.

It sounds like all I need to do is actually override tf(float) in the
SweetSpotSimilarity class to delegate to baselineTF just like tf(int) does.
Is this correct?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-PhraseQuery-tp932414p955257.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom PhraseQuery

2010-07-09 Thread Chris Hostetter

: Query: "foo bar"
: Doc1: "foo bar baz"
: Doc2: "foo bar foo bar"
: 
: These two documents should be scored exactly the same. I accomplished the
: above in the "normal" query use-case by using the SweetSpotSimilarity class.

You can change this by subclassing SweetSpotSimilarity (or any Similarity 
class) and overridding the tf(float) function.  

tf(int) is called for terms, while tf(float) is called for for phrases 
-- the float value is lower for phrases with a lot of slop, and higher for 
exact matches.

unfortunately, the input to tf(float) is lossy in accounting for docs 
htat match the phrase multiple times ... the value of "1.0f" 
might mean it mathes the phrase once exactly, or it might mean thta it 
matches many times in a sloppy manner.

in your case, it sounds like you just want it to return "1" for any input 
except "0.0f"



-Hoss



Re: Polish language support?

2010-07-09 Thread Robert Muir
Hi Peter,

this stemmer is integrated into trunk and 3x.

http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/stempel/
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/contrib/analyzers/stempel/


On Fri, Jul 9, 2010 at 2:38 PM, Peter Wolanin wrote:

> In IRC trying to help someone find Polish-language support for Solr.
>
> Seems lucene has nothing to offer?  Found one stemmer that looks to be
> compatibly licensed in case someone wants to take a shot at
> incorporating it:  http://www.getopt.org/stempel/
>
> -Peter
>
> --
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wola...@acquia.com
>



-- 
Robert Muir
rcm...@gmail.com


Polish language support?

2010-07-09 Thread Peter Wolanin
In IRC trying to help someone find Polish-language support for Solr.

Seems lucene has nothing to offer?  Found one stemmer that looks to be
compatibly licensed in case someone wants to take a shot at
incorporating it:  http://www.getopt.org/stempel/

-Peter

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


PDF remote streaming extract with lots of multiValues

2010-07-09 Thread David Thompson
How would I go about setting a large number of literal values in a call to 
index 
a remote PDF?  I'm currently calling:

http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&stream.url=http://otherhost/some/file.pdf


And that works great, except now I'm coming across usecases where I need send 
in 
hundreds, up to thousands, of different values for 'mycategory'.  So with 
mycategory defined as a multiValued string, I can call:

 
http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&literal.mycategory=foo&literal.mycategory=bar&stream.url=http://otherhost/some/file.pdf


and that works as expected.  But when I try to embed thousands of 
literal.mycategory parameters in the call, eventually my container says 'look, 
I've been forgiving about letting you GET URLs far longer than 1500 characters, 
but this is ridiculous' and barfs on it.  


I've tried POSTing a ... command, but it only pays 
attention to parameters in the URL query string, ignoring everything in the 
document.  I've seen some other threads that seem related, but now I'm just 
confused.  


What's the best way to tackle this?

-dKt



  

Re: Realtime + Batch indexing

2010-07-09 Thread Shawn Heisey
Replication does not transfer files that already exist on the slave and 
have the same metadata (size, last modified, etc) as the master.  As far 
as deleting files, it will only do so if they do not exist on the master.


In most cases, the only way that it would delete and copy the entire 
index is if the slave index were optimized after updating, which would 
result in different filenames with entirely different sizes and 
modification times.


The wiki has more detail:

http://wiki.apache.org/solr/SolrReplication#How_does_it_work.3F

My build scripts use DIH full-import for a reindex, DIH delta-import for 
adding new content, and the XML update handler for deletes.  Replication 
is very fast after an update on the master.  I've got my replication 
interval set to 15 seconds, and once it's triggered, it typically only 
takes a second or two.  I optimize one of my shards every day, and when 
that happens, replicating that shard (12GB) does take a little while.



On 7/8/2010 10:48 PM, bbarani wrote:

One final question about replication.. When I initiate replication I thought
SOLR would delete the existing index in slave and just transfers the master
index in to Slave. If thats the case there wont be any sync up issues right?

I am asking this because everytime I initiate replication the index size of
both slave and master becomes the same  (even if for some reason if index
size of slave is bigger than master it gets reduced to the same size as
master after replication) so thought that SOLR just deletes the slave index
and then moves all the files from master..
   




Function Query Sorting vs 'Sort' parameter?

2010-07-09 Thread Saïd Radhouani
Hi,

I'm making some basic sorting (date, price, etc.) using the "sort" parameter 
(sort=field+asc), and it's working fine. I'm wondering whether there's a 
significant argument to use function query sorting instead of the "sort" 
parameter?

Thanks,
-S

MLT with boost capability

2010-07-09 Thread Blargy

I've asked this question in the past without too much success. I figured I
would try to revive it.

Is there a way I can incorporate boost functions with a MoreLikeThis search?
Can it be accomplished at the MLT request handler level or would I need to
create a custom request handler which in turn delegates the majority of the
search to a specialized instance of MLT? Can someone point me in the right
direction?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/MLT-with-boost-capability-tp954650p954650.html
Sent from the Solr - User mailing list archive at Nabble.com.


Last day to submit your Surge 2010 CFP!

2010-07-09 Thread Jason Dixon
Today is your last chance to submit a CFP abstract for the 2010 Surge
Scalability Conference.  The event is taking place on Sept 30 and Oct 1,
2010 in Baltimore, MD.  Surge focuses on case studies that address
production failures and the re-engineering efforts that led to victory
in Web Applications or Internet Architectures.

You can find more information, including suggested topics and our
current list of speakers, online:

http://omniti.com/surge/2010

The final lineup should be available on the conference website next
week.  If you have questions about the CFP, attending Surge, or having
your business sponsor/exhibit at Surge 2010, please contact us at
su...@omniti.com.

Thanks!

-- 
Jason Dixon
OmniTI Computer Consulting, Inc.
jdi...@omniti.com
443.325.1357 x.241


Re: AW: Sort by Day - Use of DateMathParser in Function Query?

2010-07-09 Thread Chantal Ackermann
Hi Bastian,

that is an option but it would be more flexible to sort using a function
query.
It looks like I'll have to add that field, however. At least, for as
long as using 1.4.

Thanks,
Chantal

On Fri, 2010-07-09 at 12:08 +0200, Bastian Spitzer wrote:
> Hi Chantal,
> 
> why dont you just add another Field to your Index where u put the Day only, 
> you can sort by this filed then
> in your queries
> 
> cheers.
> 
> -Ursprüngliche Nachricht-
> Von: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] 
> Gesendet: Freitag, 9. Juli 2010 11:45
> An: solr-user@lucene.apache.org
> Betreff: Sort by Day - Use of DateMathParser in Function Query?
> 
> Dear all,
> 
> this is not a new problem, I just wanted to check whether with 1.4 there 
> might have been changes that allow a different approach.
> 
> In my query, I retrieve results that have a date field. I have to sort the 
> result by day only, then by a different string field. The time of that date 
> shall not be used for sorting.
> I cannot filter the results on a certain date (day).
> 
> This thread confirms my first thought that I need another field in the
> index:
> http://search.lucidimagination.com/search/document/422dc30e0a222c28/sorting_dates_with_reduced_precision#46566037750d7b5
> 
> However, is it possible to use the DateMathParser somehow in the function 
> queries?
> If it's not yet possible - why not:
> (a) is there are great risk that the performance would be bad? Or some other 
> reason that discourages this solution.
> (b) simple not implemented
> 
> In case of (b), I might try to implement it.
> 
> Thanks!
> Chantal
> 





Re: Sort by Day - Use of DateMathParser in Function Query?

2010-07-09 Thread Chantal Ackermann
Sorry for the pollution. Sorting by function will only be possible with
1.5.

In https://issues.apache.org/jira/browse/SOLR-1297,
Grant writes:
"""
Note, there is a temporary workaround for this: (main query)^0
func(...) 
"""

Is that workaround an option for my use case?

Thanks,
Chantal

On Fri, 2010-07-09 at 12:08 +0200, Chantal Ackermann wrote:
> [P.S. to my first post]
> 
> Further contemplating http://wiki.apache.org/solr/FunctionQuery.
> 
> I am using 1.4.1, the date field is configured like this:
>  omitNorms="true"/>
> 
> (The schema has been created using the schema file from 1.4.0, and I
> haven't changed anything when upgrading to 1.4.1. TrieDate is said to be
> the default in 1.4, so I would expect this date field to have that
> type?)
> 
> On the wiki page, the following example is listed:
> Example: ms(NOW/DAY)
> Could I do that same thing with my own date?
> ms(start_date/DAY)
> 
> I tried that query:
> http://192.168.2.40:8080/solr/epg/select?qt=dismax&fl=start_date,title&sort=ms%28start_date/DAY%29%20asc,title%20asc
> 
> (search for all *:* configured in solrconfig.xml for dismax)
> 
> I get the following error message back:
> """
> message can not sort on undefined field: ms(start_date/DAY)
> 
> description The request sent by the client was syntactically incorrect
> (can not sort on undefined field: ms(start_date/DAY)).
> """
> 
> I am a complete newbie when it comes to function queries.
> 
> Thanks for any suggestions!
> Chantal
> 
> On Fri, 2010-07-09 at 11:44 +0200, Chantal Ackermann wrote:
> > Dear all,
> > 
> > this is not a new problem, I just wanted to check whether with 1.4 there
> > might have been changes that allow a different approach.
> > 
> > In my query, I retrieve results that have a date field. I have to sort
> > the result by day only, then by a different string field. The time of
> > that date shall not be used for sorting.
> > I cannot filter the results on a certain date (day).
> > 
> > This thread confirms my first thought that I need another field in the
> > index:
> > http://search.lucidimagination.com/search/document/422dc30e0a222c28/sorting_dates_with_reduced_precision#46566037750d7b5
> > 
> > However, is it possible to use the DateMathParser somehow in the
> > function queries?
> > If it's not yet possible - why not:
> > (a) is there are great risk that the performance would be bad? Or some
> > other reason that discourages this solution.
> > (b) simple not implemented
> > 
> > In case of (b), I might try to implement it.
> > 
> > Thanks!
> > Chantal
> > 
> 




AW: Sort by Day - Use of DateMathParser in Function Query?

2010-07-09 Thread Bastian Spitzer
Hi Chantal,

why dont you just add another Field to your Index where u put the Day only, you 
can sort by this filed then
in your queries

cheers.

-Ursprüngliche Nachricht-
Von: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] 
Gesendet: Freitag, 9. Juli 2010 11:45
An: solr-user@lucene.apache.org
Betreff: Sort by Day - Use of DateMathParser in Function Query?

Dear all,

this is not a new problem, I just wanted to check whether with 1.4 there might 
have been changes that allow a different approach.

In my query, I retrieve results that have a date field. I have to sort the 
result by day only, then by a different string field. The time of that date 
shall not be used for sorting.
I cannot filter the results on a certain date (day).

This thread confirms my first thought that I need another field in the
index:
http://search.lucidimagination.com/search/document/422dc30e0a222c28/sorting_dates_with_reduced_precision#46566037750d7b5

However, is it possible to use the DateMathParser somehow in the function 
queries?
If it's not yet possible - why not:
(a) is there are great risk that the performance would be bad? Or some other 
reason that discourages this solution.
(b) simple not implemented

In case of (b), I might try to implement it.

Thanks!
Chantal




Sort by Day - Use of DateMathParser in Function Query?

2010-07-09 Thread Chantal Ackermann
[P.S. to my first post]

Further contemplating http://wiki.apache.org/solr/FunctionQuery.

I am using 1.4.1, the date field is configured like this:


(The schema has been created using the schema file from 1.4.0, and I
haven't changed anything when upgrading to 1.4.1. TrieDate is said to be
the default in 1.4, so I would expect this date field to have that
type?)

On the wiki page, the following example is listed:
Example: ms(NOW/DAY)
Could I do that same thing with my own date?
ms(start_date/DAY)

I tried that query:
http://192.168.2.40:8080/solr/epg/select?qt=dismax&fl=start_date,title&sort=ms%28start_date/DAY%29%20asc,title%20asc

(search for all *:* configured in solrconfig.xml for dismax)

I get the following error message back:
"""
message can not sort on undefined field: ms(start_date/DAY)

description The request sent by the client was syntactically incorrect
(can not sort on undefined field: ms(start_date/DAY)).
"""

I am a complete newbie when it comes to function queries.

Thanks for any suggestions!
Chantal

On Fri, 2010-07-09 at 11:44 +0200, Chantal Ackermann wrote:
> Dear all,
> 
> this is not a new problem, I just wanted to check whether with 1.4 there
> might have been changes that allow a different approach.
> 
> In my query, I retrieve results that have a date field. I have to sort
> the result by day only, then by a different string field. The time of
> that date shall not be used for sorting.
> I cannot filter the results on a certain date (day).
> 
> This thread confirms my first thought that I need another field in the
> index:
> http://search.lucidimagination.com/search/document/422dc30e0a222c28/sorting_dates_with_reduced_precision#46566037750d7b5
> 
> However, is it possible to use the DateMathParser somehow in the
> function queries?
> If it's not yet possible - why not:
> (a) is there are great risk that the performance would be bad? Or some
> other reason that discourages this solution.
> (b) simple not implemented
> 
> In case of (b), I might try to implement it.
> 
> Thanks!
> Chantal
> 





Sort by Day - Use of DateMathParser in Function Query?

2010-07-09 Thread Chantal Ackermann
Dear all,

this is not a new problem, I just wanted to check whether with 1.4 there
might have been changes that allow a different approach.

In my query, I retrieve results that have a date field. I have to sort
the result by day only, then by a different string field. The time of
that date shall not be used for sorting.
I cannot filter the results on a certain date (day).

This thread confirms my first thought that I need another field in the
index:
http://search.lucidimagination.com/search/document/422dc30e0a222c28/sorting_dates_with_reduced_precision#46566037750d7b5

However, is it possible to use the DateMathParser somehow in the
function queries?
If it's not yet possible - why not:
(a) is there are great risk that the performance would be bad? Or some
other reason that discourages this solution.
(b) simple not implemented

In case of (b), I might try to implement it.

Thanks!
Chantal




Re: index format error because disk full

2010-07-09 Thread Michael McCandless
Disk full should never lead to index corruption (except for very old
versions of Lucene).

Lucene always writes (and closes) all files associated with the
segment, then fsync's them, before writing & fsync'ing the segments_N
file that refers to these files.

Can you describe in more detail the events that led up to the
zero-bytes del file?  What OS/filesystem?

Is there any external process that could have truncated the file?  Or
possibly filesystem corruption?

Mike

On Wed, Jul 7, 2010 at 10:12 PM, Li Li  wrote:
> I use SegmentInfos to read the segment_N file and found the error is
> that it try to load deletedDocs but the .del file's size is 0(because
> of disk error) . So I use SegmentInfos to set delGen=-1 to ignore
> deleted Docs.
> But I think there is some bug. The logic of  write my be -- it first
> writes the .del file then write the segment_N file. But it only write
> to buffer and don't flush to disk immediately. So when disk full. it
> may happen that segment_N file is flushed but del file faild.
>
> 2010/7/8 Lance Norskog :
>> If autocommit does not to an automatic rollback, that is a serious bug.
>>
>> There should be a way to detect that an automatic rollback has
>> happened, but I don't know what it is. Maybe something in the Solr
>> MBeans?
>>
>> On Wed, Jul 7, 2010 at 5:41 AM, osocurious2  
>> wrote:
>>>
>>> I haven't used this myself, but Solr supports a
>>> http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 rollback
>>> function. It is supposed to rollback to the state at the previous commit. So
>>> you may want to turn off auto-commit on the index you are updating if you
>>> want to control what that last commit level is.
>>>
>>> However, in your case if the index gets corrupted due to a disk full
>>> situation, I don't know what rollback would do, if anything, to help. You
>>> may need to play with the scenario to see what would happen.
>>>
>>> If you are using the DataImportHandler it may handle the rollback for
>>> you...again, however, it may not deal with disk full situations gracefully
>>> either.
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/index-format-error-because-disk-full-tp948249p948968.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>


Re: solr connection question

2010-07-09 Thread Leonardo Menezes
jarrlll

On Fri, Jul 9, 2010 at 10:20 AM, Óscar Marín Miró
wrote:

> xD
>
> On Thu, Jul 8, 2010 at 2:58 PM, Alejandro Gonzalez
>  wrote:
> > ok please don't forget it :)
> >
> > 2010/7/8 Ruben Abad 
> >
> >> Jorl, ok tendré que modificar mi petición de vacaciones :(
> >> Rubén Abad 
> >>
> >>
> >> On Thu, Jul 8, 2010 at 2:46 PM, ZAROGKIKAS,GIORGOS <
> >> g.zarogki...@multirama.gr> wrote:
> >>
> >> > Hi solr users
> >> >
> >> > I need to know how solr manages the connections when we make a
> >> > request(select update commit)
> >> > Is there any connection pooling or an article to learn about it
> >> connection
> >> > management??
> >> > How can I log in a file the connections solr server
> >> >
> >> > I have setup my solr 1.4 with tomcat
> >> >
> >> > Thanks in advance
> >> >
> >> >
> >> >
> >> >
> >>
> >
>
>
>
> --
> Whether it's science, technology, personal experience, true love,
> astrology, or gut feelings, each of us has confidence in something
> that we will never fully comprehend.
>  --Roy H. William
>


Re: solr connection question

2010-07-09 Thread Óscar Marín Miró
xD

On Thu, Jul 8, 2010 at 2:58 PM, Alejandro Gonzalez
 wrote:
> ok please don't forget it :)
>
> 2010/7/8 Ruben Abad 
>
>> Jorl, ok tendré que modificar mi petición de vacaciones :(
>> Rubén Abad 
>>
>>
>> On Thu, Jul 8, 2010 at 2:46 PM, ZAROGKIKAS,GIORGOS <
>> g.zarogki...@multirama.gr> wrote:
>>
>> > Hi solr users
>> >
>> > I need to know how solr manages the connections when we make a
>> > request(select update commit)
>> > Is there any connection pooling or an article to learn about it
>> connection
>> > management??
>> > How can I log in a file the connections solr server
>> >
>> > I have setup my solr 1.4 with tomcat
>> >
>> > Thanks in advance
>> >
>> >
>> >
>> >
>>
>



-- 
Whether it's science, technology, personal experience, true love,
astrology, or gut feelings, each of us has confidence in something
that we will never fully comprehend.
 --Roy H. William


Job offer / Oferta de trabajo - Madrid, Spain

2010-07-09 Thread Leonardo Menezes
Hello,
 not sure if i should really send this kind of stuff to the list, but
since i guess it's only positive and someone might be interested... The
company i work at is looking for people with experience with SolR/Lucene.
Below, the offer:

http://www.infojobs.net/pozuelo-de-alarcon/programador-solr/of-icbca57230549aab73e4b484023657f


cheers,



Hola,
 no estoy seguro que deberia enviar este tipo de cosa a la lista, pero
como imagino que a alguien le podra venir bien... La empresa en la que
trabajo esta buscando gente con experiencia en SolR/Lucene. Sigue la oferta
abajo:

http://www.infojobs.net/pozuelo-de-alarcon/programador-solr/of-icbca57230549aab73e4b484023657f


saludos,


Leonardo Menezes


Re: Using hl.regex.pattern to print complete lines

2010-07-09 Thread Peter Spam
Ah, this makes sense.  I've changed my regex to "(?m)^.*$", and it works 
better, but I still get fragments before and after some returns.
Thanks for the hint!


-Pete

On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote:

> 
> : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
> : is available that is for getting entire field contents with search terms
> : highlighted. To use it, set hl.useFastVectorHighlighter to true.
> 
> He doesn't want the entire field -- his stored field values contain 
> multi-line strings (using newline characters) and he wants to make 
> fragments per "line" (ie: bounded by newline characters, or the start/end 
> of the entire field value)
> 
> Peter: i haven't looked at the code, but i expect that the problem is that 
> the java regex engine isn't being used in a way that makes ^ and $ match 
> any line boundary -- they are probably only matching the start/end of the 
> field (and . is probably only matching non-newline characters)
> 
> java regexes support embedded flags (ie: "(?xyz)your regex") so you might 
> try that (i don't remember what the correct modifier flag is for the 
> multiline mode off the top of my head)
> 
> -Hoss
>