DIH stopping an action

2009-02-03 Thread Marc Sturlese

Hey there,
I would like to know if there is any way to stop a dela-import or a
full-import in the middle of the ejecution and free Tomcats memory.
In case not... is there any way to tell Solr to stop all actions and free
all memory that is using?
Is it possible to do one of both things without restarting Tomcat??
Thanks in advance.
-- 
View this message in context: 
http://www.nabble.com/DIH-stopping-an-action-tp21805669p21805669.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH stopping an action

2009-02-03 Thread Shalin Shekhar Mangar
On Tue, Feb 3, 2009 at 2:01 PM, Marc Sturlese marc.sturl...@gmail.comwrote:


 Hey there,
 I would like to know if there is any way to stop a dela-import or a
 full-import in the middle of the ejecution and free Tomcats memory.


There is an 'abort' command for DIH which should do what you want. Most of
the DIH related objects should go out of scope once import is aborted. Then
it is upto the garbage collector to free the memory.

-- 
Regards,
Shalin Shekhar Mangar.


Re: DIH stopping an action

2009-02-03 Thread Marc Sturlese

Thanks, that's exactly what I need.

Shalin Shekhar Mangar wrote:
 
 On Tue, Feb 3, 2009 at 2:01 PM, Marc Sturlese
 marc.sturl...@gmail.comwrote:
 

 Hey there,
 I would like to know if there is any way to stop a dela-import or a
 full-import in the middle of the ejecution and free Tomcats memory.
 
 
 There is an 'abort' command for DIH which should do what you want. Most of
 the DIH related objects should go out of scope once import is aborted.
 Then
 it is upto the garbage collector to free the memory.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/DIH-stopping-an-action-tp21805669p21805823.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH stopping an action

2009-02-03 Thread Marc Sturlese

I just opened an issue explaining my solution:
https://issues.apache.org/jira/browse/SOLR-1004


Shalin Shekhar Mangar wrote:
 
 On Tue, Feb 3, 2009 at 4:06 PM, Marc Sturlese
 marc.sturl...@gmail.comwrote:
 

 Doing that, once a doc is aborted in DocBuilder, it will not keep
 checking
 all other docs and abort will finish soon.
 I think it could be done in the function deleteAll(deletedKeys); in case
 the
 amount of docs to delete is huge aswell.

 Has to do that any bad consequence? In case not... do you think it would
 be
 useful to add it in dataimporthandler for other use cases?

 
 You are right Marc. We should be getting out of that loop (in
 buildDocument
 as well as in collectDelta) if abort is called. Can you please a raise an
 issue in jira?
 
 --
 View this message in context:
 http://www.nabble.com/DIH-stopping-an-action-tp21805669p21807365.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/DIH-stopping-an-action---possible-improvement-tp21805669p21808760.html
Sent from the Solr - User mailing list archive at Nabble.com.



Problem with setting solr.solr.home property

2009-02-03 Thread Manupriya

Hi,

Till now I was working with the jetty server bundled with the SOLR
distribution. But I want to deploy solr.war to another jetty server. Here I
am facing some problem with solr/home. Whenever I start the jetty server, I
get the following error - 

2009-02-03 17:45:48.900::INFO:  Extract
jar:file:/C:/jetty-6.1.3/jetty-6.1.3/webapps/solr.war!/ to C:\DOCUME~1\MANUP
0_8080_solr.war__solr__7k9npr\webapp
Feb 3, 2009 5:45:53 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Feb 3, 2009 5:45:53 PM org.apache.solr.core.SolrResourceLoader
locateInstanceDir
INFO: No /solr/home in JNDI
Feb 3, 2009 5:45:53 PM org.apache.solr.core.SolrResourceLoader
locateInstanceDir
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
Feb 3, 2009 5:45:53 PM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: looking for solr.xml: C:\jetty-6.1.3\jetty-6.1.3\solr\solr.xml
Feb 3, 2009 5:45:53 PM org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to 'solr/'
Feb 3, 2009 5:45:53 PM org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Reusing parent classloader
Feb 3, 2009 5:45:53 PM org.apache.solr.core.SolrResourceLoader
locateInstanceDir
INFO: No /solr/home in JNDI
Feb 3, 2009 5:45:53 PM org.apache.solr.core.SolrResourceLoader
locateInstanceDir
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
Feb 3, 2009 5:45:53 PM org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to 'solr/'
Feb 3, 2009 5:45:53 PM org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Reusing parent classloader
Feb 3, 2009 5:45:53 PM org.apache.solr.servlet.SolrDispatchFilter init
SEVERE: Could not start SOLR. Check solr/home property
java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in
classpath or 'solr/conf/', cwd=C:\jetty-6.1.3\je
at
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:194)
at
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:162)
at org.apache.solr.core.Config.init(Config.java:100)
at org.apache.solr.core.SolrConfig.init(SolrConfig.java:113)
at org.apache.solr.core.SolrConfig.init(SolrConfig.java:70)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Feb 3, 2009 5:45:53 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in
classpath or 'solr/conf/', cwd=C:\jetty-
at
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:194)
at
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:162)
at org.apache.solr.core.Config.init(Config.java:100)
at org.apache.solr.core.SolrConfig.init(SolrConfig.java:113)
at org.apache.solr.core.SolrConfig.init(SolrConfig.java:70)
at

Re: Problem with setting solr.solr.home property

2009-02-03 Thread Kraus, Ralf | pixelhouse GmbH

Manupriya schrieb:

Hi,

Till now I was working with the jetty server bundled with the SOLR
distribution. But I want to deploy solr.war to another jetty server. Here I
am facing some problem with solr/home. Whenever I start the jetty server, I

try to extract the solr.war and edit the web.xml !

Greets -Ralf-



Re: Problem with setting solr.solr.home property

2009-02-03 Thread Manupriya

Thanks Ralf,

Yeah... I can add the system preoprty through web.xml. But as I am deploying
my application for a production environment, I dont want to make changes to
web.xml. :confused:


Kraus, Ralf | pixelhouse GmbH wrote:
 
 Manupriya schrieb:
 Hi,

 Till now I was working with the jetty server bundled with the SOLR
 distribution. But I want to deploy solr.war to another jetty server. Here
 I
 am facing some problem with solr/home. Whenever I start the jetty server,
 I
 try to extract the solr.war and edit the web.xml !
 
 Greets -Ralf-
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-setting-solr.solr.home-property-tp21808987p21809093.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH stopping an action

2009-02-03 Thread Shalin Shekhar Mangar
On Tue, Feb 3, 2009 at 4:06 PM, Marc Sturlese marc.sturl...@gmail.comwrote:


 Doing that, once a doc is aborted in DocBuilder, it will not keep checking
 all other docs and abort will finish soon.
 I think it could be done in the function deleteAll(deletedKeys); in case
 the
 amount of docs to delete is huge aswell.

 Has to do that any bad consequence? In case not... do you think it would be
 useful to add it in dataimporthandler for other use cases?


You are right Marc. We should be getting out of that loop (in buildDocument
as well as in collectDelta) if abort is called. Can you please a raise an
issue in jira?

--
 View this message in context:
 http://www.nabble.com/DIH-stopping-an-action-tp21805669p21807365.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: field range (min and max term)

2009-02-03 Thread Otis Gospodnetic
Hi Ben,

Look at this: http://wiki.apache.org/solr/StatsComponent


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Ben Incani ben.inc...@datacomit.com.au
 To: solr-user@lucene.apache.org
 Sent: Tuesday, February 3, 2009 1:52:05 AM
 Subject: field range (min and max term)
 
 Hi Solr users,
 
 Is there a method of retrieving a field range i.e. the min and max
 values of that fields term enum.
 
 For example I would like to know the first and last date entry of N
 documents.
 
 Regards,
 
 -Ben



Total count of facets

2009-02-03 Thread Bruno Aranda
Hi,

I would like to know if is there a way to get the total number of different
facets returned by a faceted search? I see already that I can paginate
through the facets with the facet.offset and facet.limit, but there is a way
to know how many facets are found in total?

For instance,

NameSurname

Peter Smith
John  Smith
Anne Baker
Mary York
... 1 million records more with 100.000 distinct surnames

For instance, now I search for people with names starting with A, and I
retrieve 5000 results. I would like to know the distinct number of surnames
(facets) for the result set if possible, so I could show in my app something
like this:

5000 people found with 1440 distinct surnames.

Any ideas? Is this possible to implement? Any pointers would be greatly
appreciated,

Thanks!

Bruno


Re: Total count of facets

2009-02-03 Thread Markus Jelsma - Buyways B.V.
Hello,


Searching for ?q=*:* with facetting turned on gives me the total number
of available constraints, if that is what you mean.


Cheers,



On Tue, 2009-02-03 at 16:03 +, Bruno Aranda wrote:

 Hi,
 
 I would like to know if is there a way to get the total number of different
 facets returned by a faceted search? I see already that I can paginate
 through the facets with the facet.offset and facet.limit, but there is a way
 to know how many facets are found in total?
 
 For instance,
 
 NameSurname
 
 Peter Smith
 John  Smith
 Anne Baker
 Mary York
 ... 1 million records more with 100.000 distinct surnames
 
 For instance, now I search for people with names starting with A, and I
 retrieve 5000 results. I would like to know the distinct number of surnames
 (facets) for the result set if possible, so I could show in my app something
 like this:
 
 5000 people found with 1440 distinct surnames.
 
 Any ideas? Is this possible to implement? Any pointers would be greatly
 appreciated,
 
 Thanks!
 
 Bruno


Re: Performance dead-zone due to garbage collection

2009-02-03 Thread wojtekpia

I noticed your wiki post about sorting with a function query instead of the
Lucene sort mechanism. Did you see a significantly reduced memory footprint
by doing this? Did you reduce the number of fields you allowed users to sort
by?


Lance Norskog-2 wrote:
 
 Sorting creates a large array with roughly an entry for every document
 in
 the index. If it is not on an 'integer' field it takes even more memory.
 If
 you do a sorted request and then don't sort for a while, that will drop
 the
 sort structures and trigger a giant GC.
 
 We went through some serious craziness with sorting.
 

-- 
View this message in context: 
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21814038.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Performance dead-zone due to garbage collection

2009-02-03 Thread Yonik Seeley
On Tue, Feb 3, 2009 at 11:58 AM, wojtekpia wojte...@hotmail.com wrote:
 I noticed your wiki post about sorting with a function query instead of the
 Lucene sort mechanism. Did you see a significantly reduced memory footprint
 by doing this?

FunctionQuery derives field values from the FieldCache... so it would
use the same amount of memory as sorting.

-Yonik


Re: Total count of facets

2009-02-03 Thread Bruno Aranda
But as far as I understand the total number of constraints is limited (there
is a default value), so I cannot know the total if I don't set the
facet.limit to a really big number and then the request takes a long time. I
was wondering if there was a way to get the total number (e.g. 100.000
constraints) to show it to the user, and then paginate using facet.offset
and facet.limit until I reach that total.
Does this make sense?

Thanks!

Bruno

2009/2/3 Markus Jelsma - Buyways B.V. mar...@buyways.nl

 Hello,


 Searching for ?q=*:* with facetting turned on gives me the total number
 of available constraints, if that is what you mean.


 Cheers,



 On Tue, 2009-02-03 at 16:03 +, Bruno Aranda wrote:

  Hi,
 
  I would like to know if is there a way to get the total number of
 different
  facets returned by a faceted search? I see already that I can paginate
  through the facets with the facet.offset and facet.limit, but there is a
 way
  to know how many facets are found in total?
 
  For instance,
 
  NameSurname
 
  Peter Smith
  John  Smith
  Anne Baker
  Mary York
  ... 1 million records more with 100.000 distinct surnames
 
  For instance, now I search for people with names starting with A, and I
  retrieve 5000 results. I would like to know the distinct number of
 surnames
  (facets) for the result set if possible, so I could show in my app
 something
  like this:
 
  5000 people found with 1440 distinct surnames.
 
  Any ideas? Is this possible to implement? Any pointers would be greatly
  appreciated,
 
  Thanks!
 
  Bruno



Re: DIH stopping an action

2009-02-03 Thread Marc Sturlese

Hey Shalin,
I have been testing de abort command and for full-import there's no problem.
In delta-import, at DocBuilder.java I have seen it checks for 
 if (stop.get())
before executing deleteAll and inside collectDelta (in doDelta function).
The problem is that once you have the SetMapString, Object with all de
data to modify, it will just check for   if (stop.get()) inside the function
BuilDocuement. In my case, I have 300.000 docs to modifiy so, as
BuildDocuement in doDelta is called inside a while it will pass for all
300.000 aborting all of them. What I have done is check if there is abortion
inside the while:

while (pkIter.hasNext()) {
  MapString, Object map = pkIter.next();
  vri.addNamespace(DataConfig.IMPORTER_NS + .delta, map);
  buildDocument(vri, null, map, root, true, null,true);
  pkIter.remove();
  
  //#patch checking if abortion
  if (stop.get()) {  return; }
  
}
This part of code is from doDelta in DocBuilder.
Doing that, once a doc is aborted in DocBuilder, it will not keep checking
all other docs and abort will finish soon.
I think it could be done in the function deleteAll(deletedKeys); in case the
amount of docs to delete is huge aswell.

Has to do that any bad consequence? In case not... do you think it would be
useful to add it in dataimporthandler for other use cases?





Marc Sturlese wrote:
 
 Thanks, that's exactly what I need.
 
 Shalin Shekhar Mangar wrote:
 
 On Tue, Feb 3, 2009 at 2:01 PM, Marc Sturlese
 marc.sturl...@gmail.comwrote:
 

 Hey there,
 I would like to know if there is any way to stop a dela-import or a
 full-import in the middle of the ejecution and free Tomcats memory.
 
 
 There is an 'abort' command for DIH which should do what you want. Most
 of
 the DIH related objects should go out of scope once import is aborted.
 Then
 it is upto the garbage collector to free the memory.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/DIH-stopping-an-action-tp21805669p21807365.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: newbie question --- multiple schemas

2009-02-03 Thread Chris Hostetter

: Is it possible to define more than one schema? I'm reading the example 
: schema.xml. It seems that we can only define one schema? What about if I 
: want to define one schema for document type A and another schema for 
: document type B?

there are lots of ways to tackle a problem like this, depending on your 
specific needs, some starting points are...

http://wiki.apache.org/solr/MultipleIndexes



-Hoss



Re: DIH, assigning multiple xpaths to the same solr field

2009-02-03 Thread Shalin Shekhar Mangar
On Wed, Feb 4, 2009 at 1:35 AM, Fergus McMenemie fer...@twig.me.uk wrote:

   entity name=x
  dataSource=myfilereader
  processor=XPathEntityProcessor
  url=${jc.fileAbsolutePath}
  stream=false
  forEach=/record
   field column=para xpath=/record/sect1/para /
   field column=para xpath=/record/list/listitem/para /
   field column=para xpath=/a/b/c/para /
   field column=para xpath=/d/e/f/g/para /

 Below is the line from my schema.xml

   field name=para type=text indexed=true  stored=true
  multiValued=true/

 Now a given document will only have one style of layout, and of course
 the /a/b/c /d/e/f/g  stuff is made up. For a document that has a single
 paraHello world/para element I see search results as follows, the
 one para string seems to have been entered into the index four times.
 I only saw duplicate results before adding the extra made-up stuff.


I think there is something fishy with the XPathEntityProcessor. For now, I
think you can work around by giving each field a different 'column' and
attribute 'name=para' on each of them.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Query Performance while updating teh index

2009-02-03 Thread Chris Hostetter

: Just to clarify - we do not optimize on the slaves at all. We only optimize
: on the master.

that doesn't change anything about hte comments that i made before.  it 
*really* wouldn't make sense to optimize on a slave right before pulling a 
new snapshot, but it still doesn't make any more sense to optimize on a 
master right before doing some updates and then pulling a new snapshot.  
my second comment also still applies: a snappull after an optimize is 
always going to be involve more churn on the disk...

:  : We do optimize the index before updates but we get tehse performance
:  issues
:  : even when we pull an empty snapshot. Thus even when our update is tiny,
:  the
:  : performance issues still happen.
:  
:  FWIW: this behavior doesn't make a lot of sense -- optimizing just 
:  before you are about to make updates/additions ot your data, is a complete 
:  waste.  the main value in optimizing your index is that you have one 
:  segment, as soon as you add a docment that changes.
:  
:  the other thing to keep in mind is that an optimized index is a completley 
:  new segment as a new file with a new name, so there is going to be added 
:  overhead on the slave machines as the OS purges the old index files and 
:  replaces them with the new optimized index files -- more overhead then if 
:  you had just done your additions w/o optimizing first.



-Hoss



Re: Recent document boosting with dismax

2009-02-03 Thread Chris Hostetter

: Hi, no the data_added field was one per document.

i would suggest adding multiValued=false to your date fieldType so 
that Solr can enforce that for you -- otherwise we can't be 100% sure.

if it really is only a single valued field, then i suspect you're right 
about the index corruption being the source of your problem, but it's 
not neccessarily a permenant problem.  try optimizing your index, that 
should merge all the segments and purge any terms that aren't actually 
part of live documents (i think) ... if that doesn't work, rebuilding will 
be your best bet (and with that multiValued=false will error if you 
are inadvertantly sending multiple values per document)

:  I'm having lots of other problems (un-related) with corrupt indices -
:  could
:  it be that in running the org.apache.lucene.index.CheckIndex utility, and
:  losing some documents in the process, the ordinal part of my boost
:  function
:  is permanently broken?



-Hoss



Re: Should I extend DIH to handle POST too?

2009-02-03 Thread Chris Hostetter

: I guess I got the wrong impression initially.  These classes extend the
: RequestHandlerBase.

your confusion is totally understandable, and stemms from some confusion 
legacy naming convention.  there is an UpdateHandler API which 
is a low level API for dictating how changes are made to the underlying 
IndexWriter -- there is *NO* reason for anyone to ever do anything 
with this API (in my opinion)

there is also a SolrRequestHandler which dictates how Solr deals with 
external requests, and what kind of input parsing it does.  Some of these 
Request Handlers are designed for making Updates and many people (who 
aren't even aware of the UpdateHandler API mentioned above) informally 
refer to them as Update Handlers ... hence a lot of confusion.

http://wiki.apache.org/solr/SolrPlugins



-Hoss



RE: Unsubscribing

2009-02-03 Thread Ross MacKinnon
Nothing in the Junk folder, but that reminded me that our company is using a 
3rd party spam filter (i.e., Lanlogic)... which sure enough had snagged the 
confirmation emails.  Since the list emails were going through I never thought 
to check the filtering systems.  Thanks for jogging my memory. :-)
 
Ross



From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Tue 2/3/2009 2:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Unsubscribing




: Subject: Unsubscribing
:
: I've tried multiple times to unsubscribe from this list using the proper
: method (mailto:solr-user-unsubscr...@lucene.apache.org), but it's not
: working!  Can anyone help with that?

Did you get a confirmation email from the mailing list software asking you
to verify that you really wanted to unsubscribe?  (is it in a Junk Mail or
Spam folder that you didn't think to check?) did you reply to it
according to the instructions?

see also...

http://www.nabble.com/Re%3A-PLEASE-REMOVE-ME-FROM-THIS-EMAIL-LIST!-p10879673.html



-Hoss




Re: Unsubscribing

2009-02-03 Thread Chris Hostetter

: Subject: Unsubscribing
: 
: I've tried multiple times to unsubscribe from this list using the proper 
: method (mailto:solr-user-unsubscr...@lucene.apache.org), but it's not 
: working!  Can anyone help with that?

Did you get a confirmation email from the mailing list software asking you 
to verify that you really wanted to unsubscribe?  (is it in a Junk Mail or 
Spam folder that you didn't think to check?) did you reply to it 
according to the instructions?

see also...

http://www.nabble.com/Re%3A-PLEASE-REMOVE-ME-FROM-THIS-EMAIL-LIST!-p10879673.html



-Hoss



Re: ranged query on multivalued field doesnt seem to work

2009-02-03 Thread Chris Hostetter

: I am still struggling with this... but I guess would it be because for some
: data there are maximum interger values for the fields start_year
: end_year, like 2.14748365E9, which solr does not recognise as sfloat,
: because there is a E letter? 

when you say you are using sfloat, that fieldType is using the 
SortableFloatField class correct?

SortableFloatField uses Float.parseFloat to get the float value from your 
input string, if that fails it will through an exception -- so you should 
have gotten an error if the value was unparsable ... i'm not sure what 
might be causing your problem.




-Hoss



Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-03 Thread Chris Hostetter

:  The solr data field is populated properly. So I guess that bit works.
:  I really wish I could use xpath=//para

: The limitation comes from streaming the XML instead of creating a DOM.
: XPathRecordReader is a custom streaming XPath parser implementation and
: streaming is easy only because we limit the syntax. You can use
: PlainTextEntityProcessor which gives the XML as a string to a  custom
: Transformer. This Transformer can create a DOM, run your XPath query and
: populate the fields. It's more expensive but it is an option.

Maybe it's just me, but it seems like i'm noticing that as DIH gets used 
more, many people are noting that the XPath processing in DIH doesn't work 
the way they expect because it's a custom XPath parser/engine designed for 
streaming.  

It seems like it would be helpful to have an alternate processor for 
people who don't need the streaming support (ie: are dealing with small 
enough docs that they can load the full DOM tree into memory) that would 
use the default Java XPath engine (and have less caveats/suprises) ... i 
wou think it would probably even make sense for this new XPath processor 
to be the one we suggest for new users, and only suggest the existing 
(stream based) processor if they have really big xml docs to deal with.

(In hindsight XPathEntityProcessor and XPathRecordReader should probably 
have been named StreamingXPathEntityProcessor and 
StreamingXPathRecordReader)

thoughts?


-Hoss



Re: MASTER / SLAVES numdoc

2009-02-03 Thread Chris Hostetter

: I've one server and several slaves and I would like to know if I go to the
: host.name/solr/admin/stat.jsp if there is a way to know the difference of
: the numDoc per server? 

i don't really understand your question -- sure you can go to that page 
on each server and compare the number of docs ... ok, now what?

what is your goal?

if i had to guess, i would suspect that this URL (on your master) might be 
of use to you...
http://localhost:8983/solr/admin/distributiondump.jsp
...but that's just a guess, and it only works if you are using the 
replication scripts (i'm not sure if DIH has a similar feature)


http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss



Re: Problem with setting solr.solr.home property

2009-02-03 Thread Chris Hostetter
: Till now I was working with the jetty server bundled with the SOLR
: distribution. But I want to deploy solr.war to another jetty server. Here I
: am facing some problem with solr/home. Whenever I start the jetty server, I
: get the following error - 
...
: INFO: solr home defaulted to 'solr/' (could not find system property or
: JNDI)
...
: SEVERE: Could not start SOLR. Check solr/home property
: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in
: classpath or 'solr/conf/', cwd=C:\jetty-6.1.3\je
...

: I have tried following options - 
: 
: 1. Added system property on windows as 'solr.solr.home'. I am able to get
: its value when I check through command prompt.
: http://www.nabble.com/file/p21808987/cmd.gif 

i don't know much about windows, but i don't  think that's the same thing 
as a java system property (that looks like an enviornment variable to me)

: 2. I also tried adding vm argument through command prompt as follows - 
: set
: 
JAVA_OPTS=-Dsolr.solr.home=C:\SOLR\apache-solr-1.3.0\apache-solr-1.3.0\example\solr
: 
: But in all the case, I am getting the above exception.

what about the INFO line i quoted above (solr home defaulted to 
'solr/'...) are you seeing that line even when you modify the JAVA_OPTS 
this way?  (i'm wondering if perhaps you are setting the system property 
but maybe the quotes or formating or soemthing is confusing it when trying 
to find that directory) ... it would be helpful to see the *exact* logs 
and error messages you get when trying the JAVA_OPTS method ... i'm 
suspicious that maybe it's a slightly different error.

: 3. I tried to retrieve the System property through java code (It is the
: similar code that is triggered by Solr, SolrResourceLoader.java
: locateInstanceDir() method). I get the value of system property in the code.

your code looks right, but i don't understand exactly what you're saying 
-- do you in fact see the path in your logging output?  if so then i'm 
more confident it's a problem with formating the path correctly so java 
understands it.

FYI: in my opinion the best way to set solr home is using JNDI, but you 
didn't mention trying that...
http://wiki.apache.org/solr/SolrJetty


-Hoss



RE: Problem with setting solr.solr.home property

2009-02-03 Thread Nicholas Piasecki
For what it's worth, I bumped into
http://jira.codehaus.org/browse/JETTY-854 on a recent Jetty installation
when trying to set up Solr for a test run, so setting via JNDI may end
up causing even more heartburn. I ended up just using Tomcat.

V/R,
Nicholas Piasecki

Software Developer
Skiviez, Inc.
n...@skiviez.com
804-550-9406

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Tuesday, February 03, 2009 8:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with setting solr.solr.home property

: Till now I was working with the jetty server bundled with the SOLR
: distribution. But I want to deploy solr.war to another jetty server.
Here I
: am facing some problem with solr/home. Whenever I start the jetty
server, I
: get the following error - 
...
: INFO: solr home defaulted to 'solr/' (could not find system property
or
: JNDI)
...
: SEVERE: Could not start SOLR. Check solr/home property
: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in
: classpath or 'solr/conf/', cwd=C:\jetty-6.1.3\je
...

: I have tried following options - 
: 
: 1. Added system property on windows as 'solr.solr.home'. I am able to
get
: its value when I check through command prompt.
: http://www.nabble.com/file/p21808987/cmd.gif 

i don't know much about windows, but i don't  think that's the same
thing 
as a java system property (that looks like an enviornment variable to
me)

: 2. I also tried adding vm argument through command prompt as follows -

: set
:
JAVA_OPTS=-Dsolr.solr.home=C:\SOLR\apache-solr-1.3.0\apache-solr-1.3.0\
example\solr
: 
: But in all the case, I am getting the above exception.

what about the INFO line i quoted above (solr home defaulted to 
'solr/'...) are you seeing that line even when you modify the JAVA_OPTS 
this way?  (i'm wondering if perhaps you are setting the system property

but maybe the quotes or formating or soemthing is confusing it when
trying 
to find that directory) ... it would be helpful to see the *exact* logs 
and error messages you get when trying the JAVA_OPTS method ... i'm 
suspicious that maybe it's a slightly different error.

: 3. I tried to retrieve the System property through java code (It is
the
: similar code that is triggered by Solr, SolrResourceLoader.java
: locateInstanceDir() method). I get the value of system property in the
code.

your code looks right, but i don't understand exactly what you're saying

-- do you in fact see the path in your logging output?  if so then i'm 
more confident it's a problem with formating the path correctly so java 
understands it.

FYI: in my opinion the best way to set solr home is using JNDI, but you 
didn't mention trying that...
http://wiki.apache.org/solr/SolrJetty


-Hoss



Re: Recent document boosting with dismax

2009-02-03 Thread James Brady
Great, thanks for that, Chris!

2009/2/3 Chris Hostetter hossman_luc...@fucit.org


 : Hi, no the data_added field was one per document.

 i would suggest adding multiValued=false to your date fieldType so
 that Solr can enforce that for you -- otherwise we can't be 100% sure.

 if it really is only a single valued field, then i suspect you're right
 about the index corruption being the source of your problem, but it's
 not neccessarily a permenant problem.  try optimizing your index, that
 should merge all the segments and purge any terms that aren't actually
 part of live documents (i think) ... if that doesn't work, rebuilding will
 be your best bet (and with that multiValued=false will error if you
 are inadvertantly sending multiple values per document)

 :  I'm having lots of other problems (un-related) with corrupt indices -
 :  could
 :  it be that in running the org.apache.lucene.index.CheckIndex utility,
 and
 :  losing some documents in the process, the ordinal part of my boost
 :  function
 :  is permanently broken?



 -Hoss




Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Feb 4, 2009 at 6:13 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 :  The solr data field is populated properly. So I guess that bit works.
 :  I really wish I could use xpath=//para

 : The limitation comes from streaming the XML instead of creating a DOM.
 : XPathRecordReader is a custom streaming XPath parser implementation and
 : streaming is easy only because we limit the syntax. You can use
 : PlainTextEntityProcessor which gives the XML as a string to a  custom
 : Transformer. This Transformer can create a DOM, run your XPath query and
 : populate the fields. It's more expensive but it is an option.

 Maybe it's just me, but it seems like i'm noticing that as DIH gets used
 more, many people are noting that the XPath processing in DIH doesn't work
 the way they expect because it's a custom XPath parser/engine designed for
 streaming.

 It seems like it would be helpful to have an alternate processor for
 people who don't need the streaming support (ie: are dealing with small
 enough docs that they can load the full DOM tree into memory) that would
 use the default Java XPath engine (and have less caveats/suprises) ... i
 wou think it would probably even make sense for this new XPath processor
 to be the one we suggest for new users, and only suggest the existing
 (stream based) processor if they have really big xml docs to deal with.

I guess the current XPathEntityProcessor must be able to switch
between the streaming xpath(XPathRecordReader) and the default java
XPath engine .

I am just hoping that all the current syntax and semantics will be
applicable for the Java Xpath engine. If not ,we will need a new
EntityProcessor.

I also would like to explore if the current XPathRecordReader can
implement more XPath syntax with streaming.

The java xpath engine is not at all efficient for large scale data processing


 (In hindsight XPathEntityProcessor and XPathRecordReader should probably
 have been named StreamingXPathEntityProcessor and
 StreamingXPathRecordReader)


 thoughts?


 -Hoss





-- 
--Noble Paul


Re: DIH, assigning multiple xpaths to the same solr field

2009-02-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is safe to use different column names as Shalin suggested. After
all a row is a map with the column value as the key. If you map
multiple values to the same column it may overwrite each other. use
explicit 'name' attributes

On Wed, Feb 4, 2009 at 2:17 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Wed, Feb 4, 2009 at 1:35 AM, Fergus McMenemie fer...@twig.me.uk wrote:

   entity name=x
  dataSource=myfilereader
  processor=XPathEntityProcessor
  url=${jc.fileAbsolutePath}
  stream=false
  forEach=/record
   field column=para xpath=/record/sect1/para /
   field column=para xpath=/record/list/listitem/para /
   field column=para xpath=/a/b/c/para /
   field column=para xpath=/d/e/f/g/para /

 Below is the line from my schema.xml

   field name=para type=text indexed=true  stored=true
  multiValued=true/

 Now a given document will only have one style of layout, and of course
 the /a/b/c /d/e/f/g  stuff is made up. For a document that has a single
 paraHello world/para element I see search results as follows, the
 one para string seems to have been entered into the index four times.
 I only saw duplicate results before adding the extra made-up stuff.


 I think there is something fishy with the XPathEntityProcessor. For now, I
 think you can work around by giving each field a different 'column' and
 attribute 'name=para' on each of them.

 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul


RE: Problem with setting solr.solr.home property

2009-02-03 Thread Manupriya

Thanks everyone!! Finally got a solution for this problem on Jetty Server. 

Instead of setting Java system variables like
JAVA_OPTS=-Dsolr.solr.home=C:\SOLR\apache-solr-1.3.0\apache-solr-1.3.0\example\solr,
we can provide the vm arguments directly while starting the jetty server.

I am running jetty as follows - 
java -Dsolr.solr.home=PATH_TO_SOLR_HOME -jar start.jar

After this I am not getting any error. :-D

Thanks,
Manu


Nicholas Piasecki-2 wrote:
 
 For what it's worth, I bumped into
 http://jira.codehaus.org/browse/JETTY-854 on a recent Jetty installation
 when trying to set up Solr for a test run, so setting via JNDI may end
 up causing even more heartburn. I ended up just using Tomcat.
 
 V/R,
 Nicholas Piasecki
 
 Software Developer
 Skiviez, Inc.
 n...@skiviez.com
 804-550-9406
 
 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
 Sent: Tuesday, February 03, 2009 8:31 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Problem with setting solr.solr.home property
 
 : Till now I was working with the jetty server bundled with the SOLR
 : distribution. But I want to deploy solr.war to another jetty server.
 Here I
 : am facing some problem with solr/home. Whenever I start the jetty
 server, I
 : get the following error - 
   ...
 : INFO: solr home defaulted to 'solr/' (could not find system property
 or
 : JNDI)
   ...
 : SEVERE: Could not start SOLR. Check solr/home property
 : java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in
 : classpath or 'solr/conf/', cwd=C:\jetty-6.1.3\je
   ...
 
 : I have tried following options - 
 : 
 : 1. Added system property on windows as 'solr.solr.home'. I am able to
 get
 : its value when I check through command prompt.
 : http://www.nabble.com/file/p21808987/cmd.gif 
 
 i don't know much about windows, but i don't  think that's the same
 thing 
 as a java system property (that looks like an enviornment variable to
 me)
 
 : 2. I also tried adding vm argument through command prompt as follows -
 
 : set
 :
 JAVA_OPTS=-Dsolr.solr.home=C:\SOLR\apache-solr-1.3.0\apache-solr-1.3.0\
 example\solr
 : 
 : But in all the case, I am getting the above exception.
 
 what about the INFO line i quoted above (solr home defaulted to 
 'solr/'...) are you seeing that line even when you modify the JAVA_OPTS 
 this way?  (i'm wondering if perhaps you are setting the system property
 
 but maybe the quotes or formating or soemthing is confusing it when
 trying 
 to find that directory) ... it would be helpful to see the *exact* logs 
 and error messages you get when trying the JAVA_OPTS method ... i'm 
 suspicious that maybe it's a slightly different error.
 
 : 3. I tried to retrieve the System property through java code (It is
 the
 : similar code that is triggered by Solr, SolrResourceLoader.java
 : locateInstanceDir() method). I get the value of system property in the
 code.
 
 your code looks right, but i don't understand exactly what you're saying
 
 -- do you in fact see the path in your logging output?  if so then i'm 
 more confident it's a problem with formating the path correctly so java 
 understands it.
 
 FYI: in my opinion the best way to set solr home is using JNDI, but you 
 didn't mention trying that...
   http://wiki.apache.org/solr/SolrJetty
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-setting-solr.solr.home-property-tp21808987p21825052.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: WebLogic 10 Compatibility Issue - StackOverflowError

2009-02-03 Thread Ilan Rabinovitch
We believe that the filters/forward issue is likely something specific 
to weblogic.  Specifically that other containers have filters disabled 
on forward by default, where as weblogic has them enabled.



We dont think the small modification we had to make to headers.jsp are 
weblogic specific.






On 1/30/09 8:15 AM, Feak, Todd wrote:

Are the issues ran into due to non-standard code in Solr, or is there
some WebLogic inconsistency?

-Todd Feak

-Original Message-
From: news [mailto:n...@ger.gmane.org] On Behalf Of Ilan Rabinovitch
Sent: Friday, January 30, 2009 1:11 AM
To: solr-user@lucene.apache.org
Subject: Re: WebLogic 10 Compatibility Issue - StackOverflowError

I created a wiki page shortly after posting to the list:

http://wiki.apache.org/solr/SolrWeblogic

  From what we could tell Solr itself was fully functional, it was only
the admin tools that were failing.

Regards,
Ilan Rabinovitch

---
SCALE 7x: 2009 Southern California Linux Expo
Los Angeles, CA
http://www.socallinuxexpo.org


On 1/29/09 4:34 AM, Mark Miller wrote:

We should get this on the wiki.

- Mark


Ilan Rabinovitch wrote:

We were able to deploy Solr 1.3 on Weblogic 10.0 earlier today. Doing
so required two changes:

1) Creating a weblogic.xml file in solr.war's WEB-INF directory. The
weblogic.xml file is required to disable Solr's filter on FORWARD.

The contents of weblogic.xml should be:

?xml version='1.0' encoding='UTF-8'?
weblogic-web-app
xmlns=http://www.bea.com/ns/weblogic/90;
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
xsi:schemaLocation=http://www.bea.com/ns/weblogic/90
http://www.bea.com/ns/weblogic/90/weblogic-web-app.xsd;

container-descriptor



filter-dispatched-requests-enabledfalse/filter-dispatched-requests-en
abled

/container-descriptor

/weblogic-web-app


2) Remove the pageEncoding attribute from line 1 of

solr/admin/header.jsp




On 1/17/09 2:02 PM, KSY wrote:

I hit a major roadblock while trying to get Solr 1.3 running on

WebLogic

10.0.

A similar message was posted before - (


http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin-
page-td20157873.html



http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin-
page-td20157873.html

) - but it seems like it hasn't been resolved yet, so I'm re-posting
here.

I am sure I configured everything correctly because it's working

fine on

Resin.

Has anyone successfully run Solr 1.3 on WebLogic 10.0 or higher?

Thanks.


SUMMARY:

When accessing /solr/admin page, StackOverflowError occurs due to an
infinite recursion in SolrDispatchFilter


ENVIRONMENT SETTING:

Solr 1.3.0
WebLogic 10.0
JRockit JVM 1.5


ERROR MESSAGE:

SEVERE: javax.servlet.ServletException: java.lang.StackOverflowError
at


weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche
rImpl.java:276)

at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:273)

at


weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:
42)

at


weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis
patcherImpl.java:526)

at


weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche
rImpl.java:261)

at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:273)

at


weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:
42)

at


weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis
patcherImpl.java:526)

at


weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche
rImpl.java:261)

at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:273)

at


weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:
42)

at


weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis
patcherImpl.java:526)

at


weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche
rImpl.java:261)

at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:273)















--
Ilan Rabinovitch
i...@fonz.net

---
SCALE 7x: 2009 Southern California Linux Expo
Los Angeles, CA
http://www.socallinuxexpo.org



Re: Custom Sorting

2009-02-03 Thread psyron

Thanks Erik, that helped me a lot ...

but still have somthing, i am not sure about:

If i am using a custom sort - like the DistanceComparator example described
in oh your book - and i debugged the code and seem to understand that
the the distances-array is created for all indexed documents - not only
for the search result. The compare-function is then called only for the
docs of the search result, right?
My problem is now, that i wonder, if it is not possible to compute only the
distances from the documents of the search result (that should help the
performance, if there are a lot of documents, but the search result is
mostly very small, right?)

Another point:
Of course it also could be interesting to compute all distances for all
documents the first time a new start location is given, in the case, that
you want do a lot of queries from the same location. But this would then
only make sense, if all distances are cached together with the location
value.
I am not sure how things are actually handled in lucene/solr. What and
at which timer things are cached?

To compute distances only for the search result, i could
- store the reader instance in a variable
- for every doc-id called in the compare function the first time, i could
  compute the distance at this moment
- and then compare
Would this work? Or is there a better way to compute the distances
only on the search result?

A lot of questions, i know,

Thanks for the good book,
Markus


Erik Hatcher wrote:
 
* QueryComponent - this is where results are generated, it uses a  
 SortSpec from the QParser.
 
* QParser#getSort - creating a custom QParser you'll be able to  
 wire in your own custom sort
 
 You can write your own QParserPlugin and QParser, and configure it  
 into solrconfig.xml and should be good to go.  Subclassing existing  
 classes, this should only be a handful of lines of code to do.
 

-- 
View this message in context: 
http://www.nabble.com/Custom-Sorting-tp1659p21825900.html
Sent from the Solr - User mailing list archive at Nabble.com.