Re: Solr JVM performance issue after 2 days

2010-12-12 Thread Hamid Vahedi
Hi 

Thanks for suggestion.
I do following changes in solrconfig.xml :

ramBufferSizeMB256/ramBufferSizeMB

useColdSearcherfalse/useColdSearcher

maxWarmingSearchers1/maxWarmingSearchers

autoCommit
  maxDocs2000/maxDocs
  maxTime30/maxTime
/autoCommit
lockTypesimple/lockType

documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

filterCache
  class=solr.FastLRUCache
  size=512
  initialSize=512
  autowarmCount=0/

queryResultCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

after that, i see one server works fine (that includes 3 cores for 3 languages)
but another server (3 cores for 3 other languages) has problem after 52 hours. 


I will plan to do your suggestion. i hope it helps me 

any better idea would be appreciated

Kind Regards
Hamid




From: Peter Karich peat...@yahoo.de
To: solr-user@lucene.apache.org
Sent: Tue, December 7, 2010 8:26:01 PM
Subject: Re: Solr  JVM performance issue after 2 days

  Am 07.12.2010 13:01, schrieb Hamid Vahedi:
 Hi Peter

 Thanks a lot for reply. Actually I need real time indexing and query at the 
same
 time.

 Here  told:
 You  can run multiple Solr instances in separate JVMs, with both having  
their
 solr.xml configured to use the same index folder.

 Now
 Q1: I'm using Tomcat now, Could you please tell me how to have separate JVMs
 with Tomcat?

Are you sure you don't want two servers and you really want real time?
Slow down indexing + less cache should do the trick I think.

I wouldn't recommend indexing AND querying on the same machine unless 
you have a lot RAM and CPU.

you could even deploy two indices into one tomcat... the read only index 
refers to the data dir via:
dataDir/path/to/index/data/dataDir
then issue an empty (!!) commit to the read only index every minute. so 
that the read only index sees the changes from the feeding index.
(again: see the wikipage!)

setting up two tomcats on one server I woudn't recommend too, but its 
possible via copying tomcat into, say tomcat2
and change the shutdown and 8080 port in the tomcat2/conf/server.xml

 Q2:What should  I set for LockType?

I'm using simple, but native should also be ok.

 Thanks in advanced




 
 From: Peter Karichpeat...@yahoo.de
 To: solr-user@lucene.apache.org
 Sent: Tue, December 7, 2010 2:06:49 PM
 Subject: Re: Solr  JVM performance issue after 2 days

Hi Hamid,

 try to avoid autowarming when indexing (see solrconfig.xml:
 caches-autowarm + newSearcher + maxSearcher).
 If you need to query and indexing at the same time,
 then probably you'll need one read-only core and one for writing with no
 autowarming configured.
 See: http://wiki.apache.org/solr/NearRealtimeSearchTuning

 Or replicate from the indexing-core to a different core with different
 settings.

 Regards,
 Peter.


 Hi,

 I am using multi-core tomcat on 2 servers. 3 language per server.

 I am adding documents to solr up to 200 doc/sec. when updating process is
 started, every thing is fine (update performance is max 200 ms/doc. with 
about
 800 MB memory used with minimal cpu usage).

 After 15-17 hours it's became so slow  (more that 900 sec for update), used
 heap
 memory is about 15GB, GC time is became more than one hour.


 I don't know what's wrong with it? Can anyone describe me what's the problem?
 Is that came from Solr or JVM?

 Note: when i stop updating, CPU busy within 15-20 min. and when start 
updating
 again i have same issue. but when stop tomcat service and start it again, all
 thing is OK.

 I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr 1.4.1

 thanks in advanced
 Hamid



-- 
http://jetwick.com twitter search prototype


  

Re: Solr JVM performance issue after 2 days

2010-12-12 Thread Erick Erickson
Several things:
1 Your ramBufferSizeMB is probably too large. 128M is often the
 point of diminishing returns. Your situation may be different...
2 Your logs will show you what is happening with your autocommit
   properties. If you're really sending a 200 docs/second to your index
   your commits are happening every 10 seconds. Still too fast..
3 I'd really, really, really recommend that you use a master/slave
configuration where the slaves are your searchers and your
master is the indexer. Really. You're really hammering your machine.
If you separate the machines, you can turn off all of the autowarming
etc on the indexer and control the frequency of slave updates. Really
consider this.
4 You haven't given us any idea of the total index size.
5 I doubt separate JVMs are useful here. You're still operating on the
 same underlying hardware. Multiple cores are preferable to
 multiple JVMs almost always.

Best
Erick

On Sun, Dec 12, 2010 at 8:26 AM, Hamid Vahedi hvb...@yahoo.com wrote:

 Hi

 Thanks for suggestion.
 I do following changes in solrconfig.xml :

 ramBufferSizeMB256/ramBufferSizeMB

 useColdSearcherfalse/useColdSearcher

 maxWarmingSearchers1/maxWarmingSearchers

 autoCommit
  maxDocs2000/maxDocs
  maxTime30/maxTime
 /autoCommit
 lockTypesimple/lockType

 documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 filterCache
  class=solr.FastLRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 queryResultCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 after that, i see one server works fine (that includes 3 cores for 3
 languages)
 but another server (3 cores for 3 other languages) has problem after 52
 hours.


 I will plan to do your suggestion. i hope it helps me

 any better idea would be appreciated

 Kind Regards
 Hamid



 
 From: Peter Karich peat...@yahoo.de
 To: solr-user@lucene.apache.org
 Sent: Tue, December 7, 2010 8:26:01 PM
 Subject: Re: Solr  JVM performance issue after 2 days

  Am 07.12.2010 13:01, schrieb Hamid Vahedi:
  Hi Peter
 
  Thanks a lot for reply. Actually I need real time indexing and query at
 the
 same
  time.
 
  Here  told:
  You  can run multiple Solr instances in separate JVMs, with both having
 their
  solr.xml configured to use the same index folder.
 
  Now
  Q1: I'm using Tomcat now, Could you please tell me how to have separate
 JVMs
  with Tomcat?

 Are you sure you don't want two servers and you really want real time?
 Slow down indexing + less cache should do the trick I think.

 I wouldn't recommend indexing AND querying on the same machine unless
 you have a lot RAM and CPU.

 you could even deploy two indices into one tomcat... the read only index
 refers to the data dir via:
 dataDir/path/to/index/data/dataDir
 then issue an empty (!!) commit to the read only index every minute. so
 that the read only index sees the changes from the feeding index.
 (again: see the wikipage!)

 setting up two tomcats on one server I woudn't recommend too, but its
 possible via copying tomcat into, say tomcat2
 and change the shutdown and 8080 port in the tomcat2/conf/server.xml

  Q2:What should  I set for LockType?

 I'm using simple, but native should also be ok.

  Thanks in advanced
 
 
 
 
  
  From: Peter Karichpeat...@yahoo.de
  To: solr-user@lucene.apache.org
  Sent: Tue, December 7, 2010 2:06:49 PM
  Subject: Re: Solr  JVM performance issue after 2 days
 
 Hi Hamid,
 
  try to avoid autowarming when indexing (see solrconfig.xml:
  caches-autowarm + newSearcher + maxSearcher).
  If you need to query and indexing at the same time,
  then probably you'll need one read-only core and one for writing with no
  autowarming configured.
  See: http://wiki.apache.org/solr/NearRealtimeSearchTuning
 
  Or replicate from the indexing-core to a different core with different
  settings.
 
  Regards,
  Peter.
 
 
  Hi,
 
  I am using multi-core tomcat on 2 servers. 3 language per server.
 
  I am adding documents to solr up to 200 doc/sec. when updating process
 is
  started, every thing is fine (update performance is max 200 ms/doc. with
 about
  800 MB memory used with minimal cpu usage).
 
  After 15-17 hours it's became so slow  (more that 900 sec for update),
 used
  heap
  memory is about 15GB, GC time is became more than one hour.
 
 
  I don't know what's wrong with it? Can anyone describe me what's the
 problem?
  Is that came from Solr or JVM?
 
  Note: when i stop updating, CPU busy within 15-20 min. and when start
 updating
  again i have same issue. but when stop tomcat service and start it
 again, all
  thing is OK.
 
  I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr
 1.4.1
 
  thanks in advanced
  Hamid
 


 --
 http://jetwick.com twitter search prototype






Re: Solr JVM performance issue after 2 days

2010-12-12 Thread Hamid Vahedi
Dear Erick 


thanks for advice

Index size on all cores is 35 GB for 35 million doc (for 3 week indexing data) 

Kind Regards,
Hamid



From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 5:24:18 PM
Subject: Re: Solr  JVM performance issue after 2 days

Several things:
1 Your ramBufferSizeMB is probably too large. 128M is often the
 point of diminishing returns. Your situation may be different...
2 Your logs will show you what is happening with your autocommit
   properties. If you're really sending a 200 docs/second to your index
   your commits are happening every 10 seconds. Still too fast..
3 I'd really, really, really recommend that you use a master/slave
configuration where the slaves are your searchers and your
master is the indexer. Really. You're really hammering your machine.
If you separate the machines, you can turn off all of the autowarming
etc on the indexer and control the frequency of slave updates. Really
consider this.
4 You haven't given us any idea of the total index size.
5 I doubt separate JVMs are useful here. You're still operating on the
 same underlying hardware. Multiple cores are preferable to
 multiple JVMs almost always.

Best
Erick

On Sun, Dec 12, 2010 at 8:26 AM, Hamid Vahedi hvb...@yahoo.com wrote:

 Hi

 Thanks for suggestion.
 I do following changes in solrconfig.xml :

 ramBufferSizeMB256/ramBufferSizeMB

 useColdSearcherfalse/useColdSearcher

 maxWarmingSearchers1/maxWarmingSearchers

 autoCommit
  maxDocs2000/maxDocs
  maxTime30/maxTime
 /autoCommit
 lockTypesimple/lockType

 documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 filterCache
  class=solr.FastLRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 queryResultCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 after that, i see one server works fine (that includes 3 cores for 3
 languages)
 but another server (3 cores for 3 other languages) has problem after 52
 hours.


 I will plan to do your suggestion. i hope it helps me

 any better idea would be appreciated

 Kind Regards
 Hamid



 
 From: Peter Karich peat...@yahoo.de
 To: solr-user@lucene.apache.org
 Sent: Tue, December 7, 2010 8:26:01 PM
 Subject: Re: Solr  JVM performance issue after 2 days

  Am 07.12.2010 13:01, schrieb Hamid Vahedi:
  Hi Peter
 
  Thanks a lot for reply. Actually I need real time indexing and query at
 the
 same
  time.
 
  Here  told:
  You  can run multiple Solr instances in separate JVMs, with both having
 their
  solr.xml configured to use the same index folder.
 
  Now
  Q1: I'm using Tomcat now, Could you please tell me how to have separate
 JVMs
  with Tomcat?

 Are you sure you don't want two servers and you really want real time?
 Slow down indexing + less cache should do the trick I think.

 I wouldn't recommend indexing AND querying on the same machine unless
 you have a lot RAM and CPU.

 you could even deploy two indices into one tomcat... the read only index
 refers to the data dir via:
 dataDir/path/to/index/data/dataDir
 then issue an empty (!!) commit to the read only index every minute. so
 that the read only index sees the changes from the feeding index.
 (again: see the wikipage!)

 setting up two tomcats on one server I woudn't recommend too, but its
 possible via copying tomcat into, say tomcat2
 and change the shutdown and 8080 port in the tomcat2/conf/server.xml

  Q2:What should  I set for LockType?

 I'm using simple, but native should also be ok.

  Thanks in advanced
 
 
 
 
  
  From: Peter Karichpeat...@yahoo.de
  To: solr-user@lucene.apache.org
  Sent: Tue, December 7, 2010 2:06:49 PM
  Subject: Re: Solr  JVM performance issue after 2 days
 
 Hi Hamid,
 
  try to avoid autowarming when indexing (see solrconfig.xml:
  caches-autowarm + newSearcher + maxSearcher).
  If you need to query and indexing at the same time,
  then probably you'll need one read-only core and one for writing with no
  autowarming configured.
  See: http://wiki.apache.org/solr/NearRealtimeSearchTuning
 
  Or replicate from the indexing-core to a different core with different
  settings.
 
  Regards,
  Peter.
 
 
  Hi,
 
  I am using multi-core tomcat on 2 servers. 3 language per server.
 
  I am adding documents to solr up to 200 doc/sec. when updating process
 is
  started, every thing is fine (update performance is max 200 ms/doc. with
 about
  800 MB memory used with minimal cpu usage).
 
  After 15-17 hours it's became so slow  (more that 900 sec for update),
 used
  heap
  memory is about 15GB, GC time is became more than one hour.
 
 
  I don't know what's wrong with it? Can anyone describe me what's the
 problem?
  Is that came from Solr or JVM?
 
  Note: when i stop updating, CPU busy within 15-20 min. and 

Re: SOLR geospatial

2010-12-12 Thread Adam Estrada
I am particularly interested in storing and querying polygons. That sort of
thing looks like its on their roadmap so does anyone know what the status is
on that? Also, integration with JTS would make this a core component of any
GIS. Again, anyone know what the status is on that?

*What’s on the roadmap of future features?*

Here are some of the features and henhancements we're planning for SSP:

   -

   Performance improvements for larger data sets
   -

   Fixing of known bugs
   -

   Distance facets: Allowing Solr users to be able to filter their results
   based on the calculated distances.
   -

   Search with regular polygons, and groups of shapes
   -

   Integration with JTS
   -

   Highly optimized distance calculation algorithms
   -

   Ranking results by distance
   -

   3D dimension search


Adam

On Sun, Dec 12, 2010 at 12:01 AM, Markus Jelsma
markus.jel...@openindex.iowrote:

 That smells like: http://www.jteam.nl/news/spatialsolr.html

  My partner is using a publicly available plugin for GeoSpatial. It is
 used
  both during indexing and during search. It forms some kind of gridding
  system and puts 10 fields per row related to that. Doing a Radius search
  (vs a bounding box search which is faster in almost all cases in all
  GeoSpatial query systems) seems pretty fast. GeoSpatial was our project's
  constraint. We've moved past that now.
 
  Did I mention that it returns distance from the center of the radius
 based
  on units supplied in the query?
 
  I would tell you what the plugin is, but in our division of labor, I have
  kept that out of my short term memory. You can contact him at:
  Danilo Unite danilo.un...@gmail.com;
 
  Dennis Gearon
 
 
  Signature Warning
  
  It is always a good idea to learn from your own mistakes. It is usually a
  better idea to learn from others’ mistakes, so you do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 
  EARTH has a Right To Life,
  otherwise we all die.
 
 
 
  - Original Message 
  From: George Anthony pa...@rogers.com
  To: solr-user@lucene.apache.org
  Sent: Fri, December 10, 2010 9:23:18 AM
  Subject: SOLR geospatial
 
  In looking at some of the docs support for geospatial search.
 
  I see this functionality is mostly scheduled for upcoming release 4.0
 (with
  some
 
  playing around with backported code).
 
 
  I note the support for the bounding box filter, but will bounding box
 be
  one of the supported *data* types for use with this filter?  For example,
  if my lat/long data describes the footprint of a map, I'm curious if
  that type of coordinate data can be used by the bounding box filter (or
 in
  any other way for similar limiting/filtering capability). I see it can
  work with point type data but curious about functionality with bounding
  box type data (in contrast to simple point lat/long data).
 
  Thanks,
  George



Re: SOLR geospatial

2010-12-12 Thread Erick Erickson
By and large, spatial solr is being replaced by geospatial, see:
http://wiki.apache.org/solr/SpatialSearch. I don't think the old
spatial contrib is still included in the trunk or 3.x code bases, but
I could be wrong

That said, I don't know whether what you want is on the roadmap
there either. Here's a place to start if you want to see the JIRA
discussions: https://issues.apache.org/jira/browse/SOLR-1568

Best
Erick


On Sun, Dec 12, 2010 at 11:23 AM, Adam Estrada estrada.a...@gmail.comwrote:

 I am particularly interested in storing and querying polygons. That sort of
 thing looks like its on their roadmap so does anyone know what the status
 is
 on that? Also, integration with JTS would make this a core component of any
 GIS. Again, anyone know what the status is on that?

 *What’s on the roadmap of future features?*

 Here are some of the features and henhancements we're planning for SSP:

   -

   Performance improvements for larger data sets
   -

   Fixing of known bugs
   -

   Distance facets: Allowing Solr users to be able to filter their results
   based on the calculated distances.
   -

   Search with regular polygons, and groups of shapes
   -

   Integration with JTS
   -

   Highly optimized distance calculation algorithms
   -

   Ranking results by distance
   -

   3D dimension search


 Adam

 On Sun, Dec 12, 2010 at 12:01 AM, Markus Jelsma
 markus.jel...@openindex.iowrote:

  That smells like: http://www.jteam.nl/news/spatialsolr.html
 
   My partner is using a publicly available plugin for GeoSpatial. It is
  used
   both during indexing and during search. It forms some kind of gridding
   system and puts 10 fields per row related to that. Doing a Radius
 search
   (vs a bounding box search which is faster in almost all cases in all
   GeoSpatial query systems) seems pretty fast. GeoSpatial was our
 project's
   constraint. We've moved past that now.
  
   Did I mention that it returns distance from the center of the radius
  based
   on units supplied in the query?
  
   I would tell you what the plugin is, but in our division of labor, I
 have
   kept that out of my short term memory. You can contact him at:
   Danilo Unite danilo.un...@gmail.com;
  
   Dennis Gearon
  
  
   Signature Warning
   
   It is always a good idea to learn from your own mistakes. It is usually
 a
   better idea to learn from others’ mistakes, so you do not have to make
   them yourself. from
   'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
  
  
   EARTH has a Right To Life,
   otherwise we all die.
  
  
  
   - Original Message 
   From: George Anthony pa...@rogers.com
   To: solr-user@lucene.apache.org
   Sent: Fri, December 10, 2010 9:23:18 AM
   Subject: SOLR geospatial
  
   In looking at some of the docs support for geospatial search.
  
   I see this functionality is mostly scheduled for upcoming release 4.0
  (with
   some
  
   playing around with backported code).
  
  
   I note the support for the bounding box filter, but will bounding box
  be
   one of the supported *data* types for use with this filter?  For
 example,
   if my lat/long data describes the footprint of a map, I'm curious if
   that type of coordinate data can be used by the bounding box filter (or
  in
   any other way for similar limiting/filtering capability). I see it can
   work with point type data but curious about functionality with bounding
   box type data (in contrast to simple point lat/long data).
  
   Thanks,
   George
 



Re: SOLR geospatial

2010-12-12 Thread Dennis Gearon
We're in Alpha, heading to Alpha 2. Our requirements are simple: radius 
searching, and distance from center. Solr Spatial works and is current. 
GeoSpatial is almost there, but we're going to wait until it's released to 
spend 
time with it. We have other tasks to work on and don't want to be part of the 
debugging process of any project right now.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 11:18:03 AM
Subject: Re: SOLR geospatial

By and large, spatial solr is being replaced by geospatial, see:
http://wiki.apache.org/solr/SpatialSearch. I don't think the old
spatial contrib is still included in the trunk or 3.x code bases, but
I could be wrong

That said, I don't know whether what you want is on the roadmap
there either. Here's a place to start if you want to see the JIRA
discussions: https://issues.apache.org/jira/browse/SOLR-1568

Best
Erick


On Sun, Dec 12, 2010 at 11:23 AM, Adam Estrada estrada.a...@gmail.comwrote:

 I am particularly interested in storing and querying polygons. That sort of
 thing looks like its on their roadmap so does anyone know what the status
 is
 on that? Also, integration with JTS would make this a core component of any
 GIS. Again, anyone know what the status is on that?

 *What’s on the roadmap of future features?*

 Here are some of the features and henhancements we're planning for SSP:

   -

   Performance improvements for larger data sets
   -

   Fixing of known bugs
   -

   Distance facets: Allowing Solr users to be able to filter their results
   based on the calculated distances.
   -

   Search with regular polygons, and groups of shapes
   -

   Integration with JTS
   -

   Highly optimized distance calculation algorithms
   -

   Ranking results by distance
   -

   3D dimension search


 Adam

 On Sun, Dec 12, 2010 at 12:01 AM, Markus Jelsma
 markus.jel...@openindex.iowrote:

  That smells like: http://www.jteam.nl/news/spatialsolr.html
 
   My partner is using a publicly available plugin for GeoSpatial. It is
  used
   both during indexing and during search. It forms some kind of gridding
   system and puts 10 fields per row related to that. Doing a Radius
 search
   (vs a bounding box search which is faster in almost all cases in all
   GeoSpatial query systems) seems pretty fast. GeoSpatial was our
 project's
   constraint. We've moved past that now.
  
   Did I mention that it returns distance from the center of the radius
  based
   on units supplied in the query?
  
   I would tell you what the plugin is, but in our division of labor, I
 have
   kept that out of my short term memory. You can contact him at:
   Danilo Unite danilo.un...@gmail.com;
  
   Dennis Gearon
  
  
   Signature Warning
   
   It is always a good idea to learn from your own mistakes. It is usually
 a
   better idea to learn from others’ mistakes, so you do not have to make
   them yourself. from
   'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
  
  
   EARTH has a Right To Life,
   otherwise we all die.
  
  
  
   - Original Message 
   From: George Anthony pa...@rogers.com
   To: solr-user@lucene.apache.org
   Sent: Fri, December 10, 2010 9:23:18 AM
   Subject: SOLR geospatial
  
   In looking at some of the docs support for geospatial search.
  
   I see this functionality is mostly scheduled for upcoming release 4.0
  (with
   some
  
   playing around with backported code).
  
  
   I note the support for the bounding box filter, but will bounding box
  be
   one of the supported *data* types for use with this filter?  For
 example,
   if my lat/long data describes the footprint of a map, I'm curious if
   that type of coordinate data can be used by the bounding box filter (or
  in
   any other way for similar limiting/filtering capability). I see it can
   work with point type data but curious about functionality with bounding
   box type data (in contrast to simple point lat/long data).
  
   Thanks,
   George
 




boosting, both query time and other

2010-12-12 Thread Dennis Gearon
So, our main search results has some very common fields,

'title'
'tags'
'description'

What kind of boosting has everybody been using that makes them and their 
customers happy with these kind of fields?

What are the pros and cons of query time boosting versus configured boosting?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



Very high load after replicating

2010-12-12 Thread Mark
After replicating an index of around 20g my slaves experience very high 
load (50+!!)


Is there anything I can do to alleviate this problem?  Would solr cloud 
be of any help?


thanks


Re: Very high load after replicating

2010-12-12 Thread Markus Jelsma
There can be numerous explanations such as your configuration (cache warm 
queries, merge factor, replication events etc) but also I/O having trouble 
flushing everything to disk. It could also be a memory problem, the OS might 
start swapping if you allocate too much RAM to the JVM leaving little for the 
OS to work with.

You need to provide more details.

 After replicating an index of around 20g my slaves experience very high
 load (50+!!)
 
 Is there anything I can do to alleviate this problem?  Would solr cloud
 be of any help?
 
 thanks


Re: Using synonyms in combination with facets

2010-12-12 Thread kirchheimer

Thanks,

this is exactly the type of solution  I need.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-synonyms-in-combination-with-facets-tp1968584p2074692.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: full text search in multiple fields

2010-12-12 Thread PeterKerk

I went for the * operator, and it works now! Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2075140.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: boosting, both query time and other

2010-12-12 Thread Erick Erickson
Basically that's unanswerable, you have to look at trying
various choices with your corpus. Take a look at the defaults
in the dismax request handler in the example schema for a
place to start... And do be aware that the correct values may
change as your corpus acquires more data.

I'm not sure what you're really asking when you say query time
boosting versus configured boosting. Could you give an example?

Best
Erick

On Sun, Dec 12, 2010 at 3:51 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 So, our main search results has some very common fields,

 'title'
 'tags'
 'description'

 What kind of boosting has everybody been using that makes them and their
 customers happy with these kind of fields?

 What are the pros and cons of query time boosting versus configured
 boosting?

  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.




Re: SOLR geospatial

2010-12-12 Thread Adam Estrada
I would be more than happy to help with any of the spatial testing you are
working on.

adam

On Sun, Dec 12, 2010 at 3:08 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 We're in Alpha, heading to Alpha 2. Our requirements are simple: radius
 searching, and distance from center. Solr Spatial works and is current.
 GeoSpatial is almost there, but we're going to wait until it's released to
 spend
 time with it. We have other tasks to work on and don't want to be part of
 the
 debugging process of any project right now.

  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Erick Erickson erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sun, December 12, 2010 11:18:03 AM
 Subject: Re: SOLR geospatial

 By and large, spatial solr is being replaced by geospatial, see:
 http://wiki.apache.org/solr/SpatialSearch. I don't think the old
 spatial contrib is still included in the trunk or 3.x code bases, but
 I could be wrong

 That said, I don't know whether what you want is on the roadmap
 there either. Here's a place to start if you want to see the JIRA
 discussions: https://issues.apache.org/jira/browse/SOLR-1568

 Best
 Erick


 On Sun, Dec 12, 2010 at 11:23 AM, Adam Estrada estrada.a...@gmail.com
 wrote:

  I am particularly interested in storing and querying polygons. That sort
 of
  thing looks like its on their roadmap so does anyone know what the status
  is
  on that? Also, integration with JTS would make this a core component of
 any
  GIS. Again, anyone know what the status is on that?
 
  *What’s on the roadmap of future features?*
 
  Here are some of the features and henhancements we're planning for SSP:
 
-
 
Performance improvements for larger data sets
-
 
Fixing of known bugs
-
 
Distance facets: Allowing Solr users to be able to filter their results
based on the calculated distances.
-
 
Search with regular polygons, and groups of shapes
-
 
Integration with JTS
-
 
Highly optimized distance calculation algorithms
-
 
Ranking results by distance
-
 
3D dimension search
 
 
  Adam
 
  On Sun, Dec 12, 2010 at 12:01 AM, Markus Jelsma
  markus.jel...@openindex.iowrote:
 
   That smells like: http://www.jteam.nl/news/spatialsolr.html
  
My partner is using a publicly available plugin for GeoSpatial. It is
   used
both during indexing and during search. It forms some kind of
 gridding
system and puts 10 fields per row related to that. Doing a Radius
  search
(vs a bounding box search which is faster in almost all cases in all
GeoSpatial query systems) seems pretty fast. GeoSpatial was our
  project's
constraint. We've moved past that now.
   
Did I mention that it returns distance from the center of the radius
   based
on units supplied in the query?
   
I would tell you what the plugin is, but in our division of labor, I
  have
kept that out of my short term memory. You can contact him at:
Danilo Unite danilo.un...@gmail.com;
   
Dennis Gearon
   
   
Signature Warning

It is always a good idea to learn from your own mistakes. It is
 usually
  a
better idea to learn from others’ mistakes, so you do not have to
 make
them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
   
   
EARTH has a Right To Life,
otherwise we all die.
   
   
   
- Original Message 
From: George Anthony pa...@rogers.com
To: solr-user@lucene.apache.org
Sent: Fri, December 10, 2010 9:23:18 AM
Subject: SOLR geospatial
   
In looking at some of the docs support for geospatial search.
   
I see this functionality is mostly scheduled for upcoming release 4.0
   (with
some
   
playing around with backported code).
   
   
I note the support for the bounding box filter, but will bounding
 box
   be
one of the supported *data* types for use with this filter?  For
  example,
if my lat/long data describes the footprint of a map, I'm curious
 if
that type of coordinate data can be used by the bounding box filter
 (or
   in
any other way for similar limiting/filtering capability). I see it
 can
work with point type data but curious about functionality with
 bounding
box type data (in contrast to simple point lat/long data).
   
Thanks,
George
  
 




Re: [Multiple] RSS Feeds at a time...

2010-12-12 Thread Adam Estrada
Hi Ahmet,

This is a great idea but still does not appear to be working correctly. The
idea is that I want to be able to add an RSS feed and then index that feed
on a schedule. My C# method looks something like this.

public ActionResult Index()
{
try {
HTTPGet req = new HTTPGet();
string solrStr =
System.Configuration.ConfigurationManager.AppSettings[solrUrl].ToString();
req.Request(solrStr +
/select?clean=truecommit=trueqt=/dataimportcommand=reload-config);
req.Request(solrStr +
/select?clean=falsecommit=trueqt=/dataimportcommand=full-import);
Response.Write(req.StatusLine);
Response.Write(req.ResponseTime);
Response.Write(req.StatusCode);
return RedirectToAction(../Import/Feeds);
//return View();
} catch (SolrConnectionException) {
throw new Exception(string.Format(Couldn't Import RSS
Feeds));
}
}

My XML configuration file looks somethiing like this...

dataConfig
dataSource type=HttpDataSource /
  document
entity name=filedatasource
processor=FileListEntityProcessor
baseDir=./solr/conf/dataimporthandler
fileName=^.*xml$
recursive=true
rootEntity=false
dataSource=null

  entity name=cnn
  pk=link
  datasource=filedatasource
  url=http://rss.cnn.com/rss/cnn_topstories.rss;
  processor=XPathEntityProcessor
  forEach=/rss/channel | /rss/channel/item
  transformer=DateFormatTransformer,HTMLStripTransformer

field column=source   xpath=/rss/channel/title
commonField=true /
field column=source-link  xpath=/rss/channel/link
 commonField=true /
field column=subject  xpath=/rss/channel/description
commonField=true /
field column=titlexpath=/rss/channel/item/title /
field column=link xpath=/rss/channel/item/link /
field column=description  xpath=/rss/channel/item/description
stripHTML=true /
field column=creator  xpath=/rss/channel/item/creator /
field column=item-subject xpath=/rss/channel/item/subject /
field column=author   xpath=/rss/channel/item/author /
field column=comments xpath=/rss/channel/item/comments /
field column=pubdate  xpath=/rss/channel/item/pubDate
dateTimeFormat=-MM-dd'T'hh:mm:ss'Z' /
  /entity

  entity name=newsweek
pk=link
datasource=filedatasource
url=http://feeds.newsweek.com/newsweek/nation;
processor=XPathEntityProcessor
forEach=/rss/channel | /rss/channel/item
transformer=DateFormatTransformer,HTMLStripTransformer

field column=source   xpath=/rss/channel/title
commonField=true /
field column=source-link  xpath=/rss/channel/link
 commonField=true /
field column=subject  xpath=/rss/channel/description
commonField=true /
field column=titlexpath=/rss/channel/item/title /
field column=link xpath=/rss/channel/item/link /
field column=description  xpath=/rss/channel/item/description
stripHTML=true /
field column=creator  xpath=/rss/channel/item/creator /
field column=item-subject xpath=/rss/channel/item/subject /
field column=author   xpath=/rss/channel/item/author /
field column=comments xpath=/rss/channel/item/comments /
field column=pubdate  xpath=/rss/channel/item/pubDate
dateTimeFormat=-MM-dd'T'hh:mm:ss'Z'/
  /entity
   /entity
  /document
/dataConfig

As you can see, I can add sub-entities from what appears to be as many times
as I want. The idea was to reload the xml file after each entity is added.
What else am I missing here because the reload-config command does not seem
to be working. Any ideas would be great!

Thanks,
Adam Estrada

On Sat, Dec 11, 2010 at 4:48 PM, Ahmet Arslan iori...@yahoo.com wrote:

  I found that you can have a single config file that can
  have several
  entities in it. My question now is how can I add entities
  without restarting
  the Solr service?

 You mean changing and re-loading xml config file?

 dataimport?command=reload-config
 http://wiki.apache.org/solr/DataImportHandler#Commands






[pubDate] is not converting correctly

2010-12-12 Thread Adam Estrada
All,

I am having some difficulties parsing the pubDate field that is part of the
RSS spec (I believe). I get the warning that states, Dec 12, 2010 6:45:26
PM org.apache.solr.handler.dataimport.DateFormatTransformer
 transformRow
WARNING: Could not parse a Date field
java.text.ParseException: Unparseable date: Thu, 30 Jul 2009 14:41:43
+
at java.text.DateFormat.parse(Unknown Source)

Does anyone know how to fix this? I would eventually like to do a date query
but without the ability to properly parse them I don't know if it's going to
work.

Thanks,
Adam


Re: [pubDate] is not converting correctly

2010-12-12 Thread Koji Sekiguchi

(10/12/13 8:49), Adam Estrada wrote:

All,

I am having some difficulties parsing the pubDate field that is part of the
RSS spec (I believe). I get the warning that states, Dec 12, 2010 6:45:26
PM org.apache.solr.handler.dataimport.DateFormatTransformer
  transformRow
WARNING: Could not parse a Date field
java.text.ParseException: Unparseable date: Thu, 30 Jul 2009 14:41:43
+
 at java.text.DateFormat.parse(Unknown Source)

Does anyone know how to fix this? I would eventually like to do a date query
but without the ability to properly parse them I don't know if it's going to
work.

Thanks,
Adam


Adam,

How does your data-config.xml look like for that field?
Have you looked at rss-data-config.xml file
under example/example-DIH/solr/rss/conf directory?

Koji
--
http://www.rondhuit.com/en/


Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Dennis Gearon
Which query parser did my partner set up below, and how to I parse three fields 
in the index for scoring and returning results?



/solr/select?wt=jsonindent=truestart=0rows=20q={!spatial%20lat=37.326375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft


 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



Re: full text search in multiple fields

2010-12-12 Thread Dennis Gearon
For those of us who come late to a thread, having at least the last post that 
you're replying to would help. Me at least ;-)

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: PeterKerk vettepa...@hotmail.com
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 1:47:35 PM
Subject: Re: full text search in multiple fields


I went for the * operator, and it works now! Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2075140.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Pradeep Singh
You said you were using a third party plugin. What do you expect people
herre to know? Solr plugins don't have parameters lat, long, radius and
threadCount (they have pt and dist).

On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 Which query parser did my partner set up below, and how to I parse three
 fields
 in the index for scoring and returning results?




 /solr/select?wt=jsonindent=truestart=0rows=20q={!spatial%20lat=37.326375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft


  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.




Rebuild Spellchecker based on cron expression

2010-12-12 Thread Martin Grotzke
Hi,

the spellchecker component already provides a buildOnCommit and
buildOnOptimize option.

Since we have several spellchecker indices building on each commit is
not really what we want to do.
Building on optimize is not possible as index optimization is done on
the master and the slaves don't even run an optimize but only fetch
the optimized index.

Therefore I'm thinking about an extension of the spellchecker that
allows you to rebuild the spellchecker based on a cron-expression
(e.g. rebuild each night at 1 am).

What do you think about this, is there anybody else interested in this?

Regarding the lifecycle, is there already some executor framework or
any regularly running process in place, or would I have to pull up my
own thread? If so, how can I stop my thread when solr/tomcat is
shutdown (I couldn't see any shutdown or destroy method in
SearchComponent)?

Thanx for your feedback,
cheers,
Martin


Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Markus Jelsma
Pradeep is right, but, check the solrconfig, the query parser is defined there. 
Look for the basedOn attribute in the queryParser element.



 You said you were using a third party plugin. What do you expect people
 herre to know? Solr plugins don't have parameters lat, long, radius and
 threadCount (they have pt and dist).
 
 On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon gear...@sbcglobal.netwrote:
  Which query parser did my partner set up below, and how to I parse three
  fields
  in the index for scoring and returning results?
  
  
  
  
  /solr/select?wt=jsonindent=truestart=0rows=20q={!spatial%20lat=37.326
  375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20L
  oft
  
   Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own mistakes. It is usually a
  better
  idea to learn from others’ mistakes, so you do not have to make them
  yourself.
  from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
  
  
  EARTH has a Right To Life,
  otherwise we all die.


Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Dennis Gearon
Well, I didn't think the plugin would be an issue. I thought the rest of the 
query was from the main query parser, and the plugin processes after that. so I 
thought the rest of query AFTER the plugin/filter part of the query was like 
normal,without the filter/plugin. Is that so?

Using the plugin makes me do everything according to it's reequirements, or for 
just what's in the braces {}?

I believe the plugin is Spatial Solr, anyway.

I'm really new to using this, guys.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Pradeep Singh pksing...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 5:02:54 PM
Subject: Re: Which query parser and how to do full text on mulitple fields

You said you were using a third party plugin. What do you expect people
herre to know? Solr plugins don't have parameters lat, long, radius and
threadCount (they have pt and dist).

On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 Which query parser did my partner set up below, and how to I parse three
 fields
 in the index for scoring and returning results?




/solr/select?wt=jsonindent=truestart=0rows=20q={!spatial%20lat=37.326375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft
t


  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.





Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Dennis Gearon
And to be more specific, the fields I want to combine for *full text* are just 
three text fields, they're not geospatial.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Pradeep Singh pksing...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 5:02:54 PM
Subject: Re: Which query parser and how to do full text on mulitple fields

You said you were using a third party plugin. What do you expect people
herre to know? Solr plugins don't have parameters lat, long, radius and
threadCount (they have pt and dist).

On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 Which query parser did my partner set up below, and how to I parse three
 fields
 in the index for scoring and returning results?




/solr/select?wt=jsonindent=truestart=0rows=20q={!spatial%20lat=37.326375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft
t


  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.





Re: Rebuild Spellchecker based on cron expression

2010-12-12 Thread Markus Jelsma
Maybe you've overlooked the build parameter?
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.build

 Hi,
 
 the spellchecker component already provides a buildOnCommit and
 buildOnOptimize option.
 
 Since we have several spellchecker indices building on each commit is
 not really what we want to do.
 Building on optimize is not possible as index optimization is done on
 the master and the slaves don't even run an optimize but only fetch
 the optimized index.
 
 Therefore I'm thinking about an extension of the spellchecker that
 allows you to rebuild the spellchecker based on a cron-expression
 (e.g. rebuild each night at 1 am).
 
 What do you think about this, is there anybody else interested in this?
 
 Regarding the lifecycle, is there already some executor framework or
 any regularly running process in place, or would I have to pull up my
 own thread? If so, how can I stop my thread when solr/tomcat is
 shutdown (I couldn't see any shutdown or destroy method in
 SearchComponent)?
 
 Thanx for your feedback,
 cheers,
 Martin


Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Dennis Gearon
Oh, I didn't know that the syntax didn't show the parser used, that it was set 
in the config file.

I'll talk to my partner, thanks.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Markus Jelsma markus.jel...@openindex.io
To: solr-user@lucene.apache.org
Cc: Pradeep Singh pksing...@gmail.com
Sent: Sun, December 12, 2010 5:08:11 PM
Subject: Re: Which query parser and how to do full text on mulitple fields

Pradeep is right, but, check the solrconfig, the query parser is defined there. 
Look for the basedOn attribute in the queryParser element.



 You said you were using a third party plugin. What do you expect people
 herre to know? Solr plugins don't have parameters lat, long, radius and
 threadCount (they have pt and dist).
 
 On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon gear...@sbcglobal.netwrote:
  Which query parser did my partner set up below, and how to I parse three
  fields
  in the index for scoring and returning results?
  
  
  
  
  /solr/select?wt=jsonindent=truestart=0rows=20q={!spatial%20lat=37.326
  375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20L
  oft
  
   Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own mistakes. It is usually a
  better
  idea to learn from others’ mistakes, so you do not have to make them
  yourself.
  from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
  
  
  EARTH has a Right To Life,
  otherwise we all die.



Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Markus Jelsma
The manual answers most questions.

 Oh, I didn't know that the syntax didn't show the parser used, that it was
 set in the config file.
 
 I'll talk to my partner, thanks.
 
  Dennis Gearon
 
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 
 EARTH has a Right To Life,
 otherwise we all die.
 
 
 
 - Original Message 
 From: Markus Jelsma markus.jel...@openindex.io
 To: solr-user@lucene.apache.org
 Cc: Pradeep Singh pksing...@gmail.com
 Sent: Sun, December 12, 2010 5:08:11 PM
 Subject: Re: Which query parser and how to do full text on mulitple fields
 
 Pradeep is right, but, check the solrconfig, the query parser is defined
 there. Look for the basedOn attribute in the queryParser element.
 
  You said you were using a third party plugin. What do you expect people
  herre to know? Solr plugins don't have parameters lat, long, radius and
  threadCount (they have pt and dist).
  
  On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon 
gear...@sbcglobal.netwrote:
   Which query parser did my partner set up below, and how to I parse
   three fields
   in the index for scoring and returning results?
   
   
   
   
   /solr/select?wt=jsonindent=truestart=0rows=20q={!spatial%20lat=37.3
   26
   375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%
   20L oft
   
Dennis Gearon
   
   Signature Warning
   
   It is always a good idea to learn from your own mistakes. It is usually
   a better
   idea to learn from others’ mistakes, so you do not have to make them
   yourself.
   from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
   
   
   EARTH has a Right To Life,
   otherwise we all die.


Re: Rebuild Spellchecker based on cron expression

2010-12-12 Thread Martin Grotzke
On Mon, Dec 13, 2010 at 2:12 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
 Maybe you've overlooked the build parameter?
 http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.build
I'm aware of this, but we don't want to maintain cron-jobs on all
slaves for all spellcheckers for all cores.
That's why I'm thinking about a more integrated solution. Or did I
really overlook s.th.?

Cheers,
Martin



 Hi,

 the spellchecker component already provides a buildOnCommit and
 buildOnOptimize option.

 Since we have several spellchecker indices building on each commit is
 not really what we want to do.
 Building on optimize is not possible as index optimization is done on
 the master and the slaves don't even run an optimize but only fetch
 the optimized index.

 Therefore I'm thinking about an extension of the spellchecker that
 allows you to rebuild the spellchecker based on a cron-expression
 (e.g. rebuild each night at 1 am).

 What do you think about this, is there anybody else interested in this?

 Regarding the lifecycle, is there already some executor framework or
 any regularly running process in place, or would I have to pull up my
 own thread? If so, how can I stop my thread when solr/tomcat is
 shutdown (I couldn't see any shutdown or destroy method in
 SearchComponent)?

 Thanx for your feedback,
 cheers,
 Martin




-- 
Martin Grotzke
http://twitter.com/martin_grotzke


Re: [pubDate] is not converting correctly

2010-12-12 Thread Adam Estrada
Thanks for the feedback! There are quite a few formats that can be used. I
am experiencing at least 5 of them. Would something like this work? Note
that there are 2 different formats separated by a comma.

field column=pubdate xpath=/rss/channel/item/pubDate
dateTimeFormat=EEE, dd MMM  HH:mm:ss zzz, -MM-dd'T'HH:mm:ss'Z' /

I don't suppose it will because there is already a comma in the first
parser. I guess I am reallly looking for an all purpose data time parser but
even if I have that, would I still be able to query *all* fields in the
index?

Good article:
http://www.java2s.com/Open-Source/Java-Document/RSS-RDF/Rome/com/sun/syndication/io/impl/DateParser.java.htm

Adam

On Sun, Dec 12, 2010 at 7:31 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (10/12/13 8:49), Adam Estrada wrote:

 All,

 I am having some difficulties parsing the pubDate field that is part of
 the?
 RSS spec (I believe). I get the warning that states, Dec 12, 2010
 6:45:26
 PM org.apache.solr.handler.dataimport.DateFormatTransformer
  transformRow
 WARNING: Could not parse a Date field
 java.text.ParseException: Unparseable date: Thu, 30 Jul 2009 14:41:43
 +
 at java.text.DateFormat.parse(Unknown Source)

 Does anyone know how to fix this? I would eventually like to do a date
 query
 but without the ability to properly parse them I don't know if it's going
 to
 work.

 Thanks,
 Adam


 Adam,

 How does your data-config.xml look like for that field?
 Have you looked at rss-data-config.xml file
 under example/example-DIH/solr/rss/conf directory?

 Koji
 --
 http://www.rondhuit.com/en/



Re: [Multiple] RSS Feeds at a time...

2010-12-12 Thread Ahmet Arslan
 What else am I missing here because the reload-config
 command does not seem
 to be working. Any ideas would be great!

solr/dataimport?command=reload-config should return the message 
str name=importResponseConfiguration Re-loaded sucessfully/str
if everything went well. May be you can check that after each reload. May be it 
is not a valid xml?

By the way, can't you use variable resolver in your case?

http://wiki.apache.org/solr/DataImportHandler#A_VariableResolver

Passing different rss URLs using a custom parameter from request
like, ${dataimporter.request.myrssurl}. 

/dataimport?command=full-importclean=falsemyrssurl=http://rss.cnn.com/rss/cnn_topstories.rss

Similar discussion http://search-lucene.com/m/xILqvbY6h91/


  


Re: Rebuild Spellchecker based on cron expression

2010-12-12 Thread Erick Erickson
I'm shooting in the dark here, but according to this:
http://wiki.apache.org/solr/SolrReplication
http://wiki.apache.org/solr/SolrReplicationafter the slave pulls the index
down, it issues a commit. So if your
slave is configured to generate the dictionary on commit, will it
just happen?

But according to this: https://issues.apache.org/jira/browse/SOLR-866
https://issues.apache.org/jira/browse/SOLR-866this is an open issue

Best
Erick

On Sun, Dec 12, 2010 at 8:30 PM, Martin Grotzke 
martin.grot...@googlemail.com wrote:

 On Mon, Dec 13, 2010 at 2:12 AM, Markus Jelsma
 markus.jel...@openindex.io wrote:
  Maybe you've overlooked the build parameter?
  http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.build
 I'm aware of this, but we don't want to maintain cron-jobs on all
 slaves for all spellcheckers for all cores.
 That's why I'm thinking about a more integrated solution. Or did I
 really overlook s.th.?

 Cheers,
 Martin


 
  Hi,
 
  the spellchecker component already provides a buildOnCommit and
  buildOnOptimize option.
 
  Since we have several spellchecker indices building on each commit is
  not really what we want to do.
  Building on optimize is not possible as index optimization is done on
  the master and the slaves don't even run an optimize but only fetch
  the optimized index.
 
  Therefore I'm thinking about an extension of the spellchecker that
  allows you to rebuild the spellchecker based on a cron-expression
  (e.g. rebuild each night at 1 am).
 
  What do you think about this, is there anybody else interested in this?
 
  Regarding the lifecycle, is there already some executor framework or
  any regularly running process in place, or would I have to pull up my
  own thread? If so, how can I stop my thread when solr/tomcat is
  shutdown (I couldn't see any shutdown or destroy method in
  SearchComponent)?
 
  Thanx for your feedback,
  cheers,
  Martin
 



 --
 Martin Grotzke
 http://twitter.com/martin_grotzke



PDFBOX 1.3.1 Parsing Error

2010-12-12 Thread pankaj bhatt
hi All,
While using PDFBOX 1.3.1 in APACHE TIKA 1.7 i am getting the
following error to parse an PDF Document.
*Error: Expected an integer type, actual=''  at
org.apache.pdfbox.pdfparser.BaseParser.readInt*
*
*
This error occurs, because of SHA-256 Encryption used by Adobe Acrobat 9.
is there is any solution to this problem??? I get stuck because of this
approoach.

In Jira Issue-697 has been created against this.
https://issues.apache.org/jira/browse/PDFBOX-697

Please help!!

/ Pankaj Bhatt.


Re: PDFBOX 1.3.1 Parsing Error

2010-12-12 Thread Pradeep Singh
If the document is encrypted maybe it isn't meant to be indexed and publicly
visible after all?

On Sun, Dec 12, 2010 at 10:22 PM, pankaj bhatt panbh...@gmail.com wrote:

 hi All,
While using PDFBOX 1.3.1 in APACHE TIKA 1.7 i am getting the
 following error to parse an PDF Document.
 *Error: Expected an integer type, actual=''  at
 org.apache.pdfbox.pdfparser.BaseParser.readInt*
 *
 *
 This error occurs, because of SHA-256 Encryption used by Adobe Acrobat 9.
 is there is any solution to this problem??? I get stuck because of this
 approoach.

 In Jira Issue-697 has been created against this.
 https://issues.apache.org/jira/browse/PDFBOX-697

 Please help!!

 / Pankaj Bhatt.



Re: Rebuild Spellchecker based on cron expression

2010-12-12 Thread Martin Grotzke
Hi,

when thinking further about it it's clear that
  https://issues.apache.org/jira/browse/SOLR-433
would be even better - we could generate the spellechecker indices on
commit/optimize on the master and replicate them to all slaves.

Just wondering what's the reason that this patch receives that little
interest. Anything wrong with it?

Cheers,
Martin


On Mon, Dec 13, 2010 at 2:04 AM, Martin Grotzke
martin.grot...@googlemail.com wrote:
 Hi,

 the spellchecker component already provides a buildOnCommit and
 buildOnOptimize option.

 Since we have several spellchecker indices building on each commit is
 not really what we want to do.
 Building on optimize is not possible as index optimization is done on
 the master and the slaves don't even run an optimize but only fetch
 the optimized index.

 Therefore I'm thinking about an extension of the spellchecker that
 allows you to rebuild the spellchecker based on a cron-expression
 (e.g. rebuild each night at 1 am).

 What do you think about this, is there anybody else interested in this?

 Regarding the lifecycle, is there already some executor framework or
 any regularly running process in place, or would I have to pull up my
 own thread? If so, how can I stop my thread when solr/tomcat is
 shutdown (I couldn't see any shutdown or destroy method in
 SearchComponent)?

 Thanx for your feedback,
 cheers,
 Martin




-- 
Martin Grotzke
http://www.javakaffee.de/blog/