Re: multiple attribute

2012-12-10 Thread Mikhail Khludnev
Michael,

Your question is a little bit confusing. Business entities have attributes.
We model entities as documents, and attributes as fields. That's why adding
attributes to a filed is contradictory. Btw there few nearby conceptions in
Lucene, which are Payloads and TermsPositions.
About the problem itself I can suggest:

   - http://wiki.apache.org/solr/Join
   - http://wiki.apache.org/solr/FieldCollapsing - it supports faceting
   - block join
   http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
   https://issues.apache.org/jira/browse/SOLR-3076 - patch only but really
   performant
   - http://wiki.apache.org/solr/SurroundQueryParser available in 4.0 can
   be used by the way proposed here http://goo.gl/R2bxc video
   http://vimeo.com/album/2012142/video/33817062

Have a good dive!


On Mon, Dec 10, 2012 at 12:27 PM, Michael Jones michaelj...@gmail.comwrote:

 Hi,

 I know that solr doesn't provide support for nested documents, but can I
 add multiple attributes to a field?

 add
 document
 field name=test foo=one bar =two index=true

 And specify an index on those attributes?

 I have a nested document that needs to be saved and searched.

 If the above can not be achieved what would be a suitable alternative?

 Would I have to do something like:

 add
 document
 field name=test foo=one bar =two
 field name=foo index=trueone/field
 field name=bar index=truetwo/field

 And just return name=test ?

 Thanks




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: multiple attribute

2012-12-10 Thread Michael Jones
Hi,

Sorry if anyone found my question confusing.

I have an XML document that is nested

file
 foo
  bar/
  thunk/
 /foo
/file

And I know that with solr that you have to flatten your data, so I was just
trying to workout the best way to do a search on nested document. I was
looking to see if instead of having multiple nodes, I would have those
nodes as attributes on one single node and still be able to search.
So if my node looked like field name=person date=2000-01-01
location=earth username=bob job=test index=true /

I would be able to search: date  2002  location = earth  job = test

But I'm not sure if that is the best way to do it? I know I would have to
specify a type for each attribute in the config.



On Mon, Dec 10, 2012 at 9:36 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Michael,

 Your question is a little bit confusing. Business entities have attributes.
 We model entities as documents, and attributes as fields. That's why adding
 attributes to a filed is contradictory. Btw there few nearby conceptions in
 Lucene, which are Payloads and TermsPositions.
 About the problem itself I can suggest:

- http://wiki.apache.org/solr/Join
- http://wiki.apache.org/solr/FieldCollapsing - it supports faceting
- block join

 http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
https://issues.apache.org/jira/browse/SOLR-3076 - patch only but really
performant
- http://wiki.apache.org/solr/SurroundQueryParser available in 4.0 can
be used by the way proposed here http://goo.gl/R2bxc video
http://vimeo.com/album/2012142/video/33817062

 Have a good dive!


 On Mon, Dec 10, 2012 at 12:27 PM, Michael Jones michaelj...@gmail.com
 wrote:

  Hi,
 
  I know that solr doesn't provide support for nested documents, but can I
  add multiple attributes to a field?
 
  add
  document
  field name=test foo=one bar =two index=true
 
  And specify an index on those attributes?
 
  I have a nested document that needs to be saved and searched.
 
  If the above can not be achieved what would be a suitable alternative?
 
  Would I have to do something like:
 
  add
  document
  field name=test foo=one bar =two
  field name=foo index=trueone/field
  field name=bar index=truetwo/field
 
  And just return name=test ?
 
  Thanks
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: Wildcards and fuzzy/phonetic query

2012-12-10 Thread Haagen Hasle

It's been two months since I asked about wildcards and phonetic filters, and 
finally the task of upgrading Solr to version 4.0 was prioritized in our 
project.  So the last couple of days I've been working on it.  Another team 
member upgraded Solr from 3.4 to 4.0, and I've been making changes to 
schema.xml to accommodate the new multiterm functionality.

However, it doesn't seem to work..  Lowercasing is still not done when I do a 
fuzzy search, not through the regular index analyzer and its support of 
MultitermAwareComponents, and not when I try to define a special multiterm 
analyzer.

Do I have to do anything special to enable the multiterm functionality in Solr 
4.0?


Regards, 

Hågen

Den 8. okt. 2012 kl. 18:09 skrev Erick Erickson:

 whether phonetic filters can be multiterm aware:
 
 I'd be leery of this, as I basically don't quite know how that would
 behave. You'd have to insure that the  algorithms changed the
 first parts of the words uniformly, regardless of what followed. I'm
 pretty sure that _some_ phonetic algorithms do not follow this
 pattern, i.e. eric wouldn't necessarily have the same beginning
 as erickson. That said, some of the algorithms _may_ follow this
 rule and might be OK candidates for being MultiTermAware
 
 But, you don't need this in order to try it out. See the Expert Level
 Schema Possibilities
 at:
 http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
 
 You can define your own analysis chain for wildcards as part of your 
 fieldType
 definition and include whatever you want, whether or not it's
 MultiTermAware and it
 will be applied at query time. Use the analyzer type=query entry
 as a basis. _But_ you shouldn't include anything in this section that
 produces more than one output per input token. Note, token, not
 field. I.e. a really bad candidate for this section is
 WordDelimiterFilterFactory
 if you use the admin/analysis page (which you'll get to know intimately) and
 look at a type that has WordDelimiterFilterFactory in its chain and
 put something
 like erickErickson1234, you'll see what I mean.. Make sure and check the
 verbose box
 
 If you can determine that some of the phonetic algorithms _should_ be
 MultiTermAware, please feel free to raise a JIRA and we can discuss... I 
 suspect
 it'll be on a case-by-case basis.
 
 Best
 Erick
 
 On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle
 haagenha...@gmail.com wrote:
 Hi!
 
 I'm quite new to Solr, I was recently asked to help out on a project where 
 the previous Solr-person quit quite suddenly.  I've noticed that some of 
 our searches don't return the expected result, and I'm hoping you guys can 
 help me out.
 
 We've indexed a lot of names, and would like to search for a person in our 
 system using these names.  We previously used Oracle Text for this, and we 
 experience that Solr is much faster.  So far so good! :)  But when we try to 
 use wildcards things start to to wrong.
 
 We're using Solr 3.4, and I see that some of our problems are solved in 3.6. 
  Ref SOLR-2438:
 https://issues.apache.org/jira/browse/SOLR-2438
 
 But we would also like to be able to combine wildcards with fuzzy searches, 
 and wildcards with a phonetic filter.  I don't see anything about phonetic 
 filters in SOLR-2438 or SOLR-2921.  
 (https://issues.apache.org/jira/browse/SOLR-2921)
 Is it possible to make the phonetic filters MultiTermAware?
 
 Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in 
 Solr..) and find both christian and kristian.  As far as I understand, this 
 is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.  
 Is this correct, or have I misunderstood anything?  Are there any 
 workarounds or filter-combinations I can use to achieve the same result?  
 I've seen people suggest using a boolean query to combine the two, but I 
 don't really see how that would solve my chr*-problem.
 
 As I mentioned earlier I'm quite new to this, so I apologize if what I'm 
 asking about only shows my ignorance..
 
 
 Regards, Hågen



about NRTCachingDirectory

2012-12-10 Thread Marc Sturlese
I have a doubt about how NRTCachingDirectory works.
As far as I've seen, it receives a delegator Directory and caches newly
created segments. So, if MMapDirectory use to be the default:

1.- Does NRTCachingDirectory works acting sort of as a wrapper of MMap
caching the new segments?

2.- If I have a master/slave setup and deploy a full optimized index with a
single segment and the slave is configured with NRTCachingDirectory, will it
try to cache that segment (I suppose not)?
And let's say I remove the replication, and start adding docs to that slave,
creating small segments every 10 minutes, will by default the
NRTCachingDirectory start caching this new small segments?
And finally, If I set up again the replication, when a full new index with
single segment is deployed, how NRTCachingDirectory would behave?

Know it's not a typical use case, but would like to know how it behaves in
those different situations.
Thanks in advance.
 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/about-NRTCachingDirectory-tp4025665.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Versioning

2012-12-10 Thread Per Steffensen
Depends on exactly what you mean by versioning. But if you mean that 
every document in Solr gets a version-number which is increased every 
time the document is updated, all you need to do is to add a _version_ 
field in you schema: http://wiki.apache.org/solr/SolrCloud#Required_Config
Believe you will get optimistic locking out-of-the-box if you do this 
(you will also need the updateLog configured in solrconfig.xml). Or else 
you can take my patch for SOLR-3178 and have optimistic locking work as 
described on: 
http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics


Regards, Per Steffensen

Sushil jain skrev:

Hello Everyone,

I am a Solr beginner.

I just want to know if versioning of data is possible in Solr, if yes then
please share the procedure.

Thanks  Regards,
Sushil Jain

  




Re: stress testing Solr 4.x

2012-12-10 Thread Alain Rogister
Hi Mark,

Usually I was stopping them with ctrl-c but several times, one of the
servers was hung and had to be stopped with kill -9.

Thanks,

Alain

On Mon, Dec 10, 2012 at 5:09 AM, Mark Miller markrmil...@gmail.com wrote:

 Hmmm...EOF on the segments file is odd...

 How were you killing the nodes? Just stopping them or kill -9 or what?

 - Mark

 On Sun, Dec 9, 2012 at 1:37 PM, Alain Rogister alain.rogis...@gmail.com
 wrote:
  Hi,
 
  I have re-ran my tests today after I updated Solr 4.1 to apply the patch.
 
  First, the good news : it works i.e. if I stop all three Solr servers and
  then restart one, it will try to find the other two for a while (about 3
  minutes I think) then give up, become the leader and start processing
  requests.
 
  Now, the not-so-good : I encountered several exceptions that seem to
  indicate 2 other issues. Here are the relevant bits.
 
  1) The ZK session expiry problem : not sure what caused it but I did a
 few
  Solr or ZK node restarts while the system was under load.
 
  SEVERE: There was a problem finding the leader in
  zk:org.apache.solr.common.SolrException: Could not get leader props
  at
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732)
  at
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696)
  at
 
 org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095)
  at
 
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265)
  at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
  at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
  at
 
 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
  at
 
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
  at
 
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
  at
 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
  Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
  KeeperErrorCode = Session expired for
 /collections/adressage/leaders/shard1
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
  at
 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
  at
 org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
  at
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710)
  ... 10 more
  SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException:
  KeeperErrorCode = Session expired for /overseer/queue/qn-
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207)
  at
 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
  at
 org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207)
  at
 org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229)
  at org.apache.solr.cloud.ZkController.publish(ZkController.java:824)
  at org.apache.solr.cloud.ZkController.publish(ZkController.java:797)
  at
 
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258)
  at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
  at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
  at
 
 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
  at
 
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
  at
 
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
  at
 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 
  2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed
  to start due to the exceptions below and both servers went into a
 seemingly
  endless loop of exponential retries. The fix was to stop both faulty
  servers, remove the data directory of this core and restart : replication
  then took place correctly. As above, not sure what exactly caused this to
  happen; no updates were taking place, only searches.
 
  On server 1 :
 
  INFO: Closing
 
 

Re: Wildcards and fuzzy/phonetic query

2012-12-10 Thread Haagen Hasle

Lowercasing actually seems to work with Wildcard queries, but not with fuzzy 
queries.  Are there any reasons why I should experience such a difference?


Regards, Haagen


Den 10. des. 2012 kl. 13:24 skrev Haagen Hasle:

 
 It's been two months since I asked about wildcards and phonetic filters, and 
 finally the task of upgrading Solr to version 4.0 was prioritized in our 
 project.  So the last couple of days I've been working on it.  Another team 
 member upgraded Solr from 3.4 to 4.0, and I've been making changes to 
 schema.xml to accommodate the new multiterm functionality.
 
 However, it doesn't seem to work..  Lowercasing is still not done when I do a 
 fuzzy search, not through the regular index analyzer and its support of 
 MultitermAwareComponents, and not when I try to define a special multiterm 
 analyzer.
 
 Do I have to do anything special to enable the multiterm functionality in 
 Solr 4.0?
 
 
 Regards, 
 
 Hågen
 
 Den 8. okt. 2012 kl. 18:09 skrev Erick Erickson:
 
 whether phonetic filters can be multiterm aware:
 
 I'd be leery of this, as I basically don't quite know how that would
 behave. You'd have to insure that the  algorithms changed the
 first parts of the words uniformly, regardless of what followed. I'm
 pretty sure that _some_ phonetic algorithms do not follow this
 pattern, i.e. eric wouldn't necessarily have the same beginning
 as erickson. That said, some of the algorithms _may_ follow this
 rule and might be OK candidates for being MultiTermAware
 
 But, you don't need this in order to try it out. See the Expert Level
 Schema Possibilities
 at:
 http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
 
 You can define your own analysis chain for wildcards as part of your 
 fieldType
 definition and include whatever you want, whether or not it's
 MultiTermAware and it
 will be applied at query time. Use the analyzer type=query entry
 as a basis. _But_ you shouldn't include anything in this section that
 produces more than one output per input token. Note, token, not
 field. I.e. a really bad candidate for this section is
 WordDelimiterFilterFactory
 if you use the admin/analysis page (which you'll get to know intimately) and
 look at a type that has WordDelimiterFilterFactory in its chain and
 put something
 like erickErickson1234, you'll see what I mean.. Make sure and check the
 verbose box
 
 If you can determine that some of the phonetic algorithms _should_ be
 MultiTermAware, please feel free to raise a JIRA and we can discuss... I 
 suspect
 it'll be on a case-by-case basis.
 
 Best
 Erick
 
 On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle
 haagenha...@gmail.com wrote:
 Hi!
 
 I'm quite new to Solr, I was recently asked to help out on a project where 
 the previous Solr-person quit quite suddenly.  I've noticed that some of 
 our searches don't return the expected result, and I'm hoping you guys can 
 help me out.
 
 We've indexed a lot of names, and would like to search for a person in our 
 system using these names.  We previously used Oracle Text for this, and we 
 experience that Solr is much faster.  So far so good! :)  But when we try 
 to use wildcards things start to to wrong.
 
 We're using Solr 3.4, and I see that some of our problems are solved in 
 3.6.  Ref SOLR-2438:
 https://issues.apache.org/jira/browse/SOLR-2438
 
 But we would also like to be able to combine wildcards with fuzzy searches, 
 and wildcards with a phonetic filter.  I don't see anything about phonetic 
 filters in SOLR-2438 or SOLR-2921.  
 (https://issues.apache.org/jira/browse/SOLR-2921)
 Is it possible to make the phonetic filters MultiTermAware?
 
 Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in 
 Solr..) and find both christian and kristian.  As far as I understand, this 
 is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.  
 Is this correct, or have I misunderstood anything?  Are there any 
 workarounds or filter-combinations I can use to achieve the same result?  
 I've seen people suggest using a boolean query to combine the two, but I 
 don't really see how that would solve my chr*-problem.
 
 As I mentioned earlier I'm quite new to this, so I apologize if what I'm 
 asking about only shows my ignorance..
 
 
 Regards, Hågen
 



RE: Modeling openinghours using multipoints

2012-12-10 Thread David Smiley (@MITRE.org)
Maybe it would? I don't completely get your drift.  But you're talking about a 
user writing a bunch of custom code to build, save, and query the bitmap 
whereas working on top of existing functionality seems to me a lot more 
maintainable on the user's part.
~ David


From: Lance Norskog-2 [via Lucene] [ml-node+s472066n4025579...@n3.nabble.com]
Sent: Sunday, December 09, 2012 6:35 PM
To: Smiley, David W.
Subject: Re: Modeling openinghours using multipoints

If these are not raw times, but quantized on-the-hour, would it be
faster to create a bit map of hours and then query across the bit
maps?

On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson [hidden 
email]UrlBlockedError.aspx wrote:

 Thanks for the discussion, I've added this to my bag of tricks, way cool!

 Erick


 On Sat, Dec 8, 2012 at 10:52 PM, britske [hidden 
 email]UrlBlockedError.aspx wrote:

 Brilliant! Got some great ideas for this. Indeed all sorts of usecases
 which use multiple temporal ranges could benefit..

 Eg: Another Guy on stackoverflow asked me about this some days ago.. He
 wants to model multiple temporary offers per product (free shopping for
 christmas, 20% discount for Black friday , etc) .. All possible with this
 out of the box. Factor in 'offer category' in  x and y as well for some
 extra powerfull querying.

 Yup im enthousiastic about it , which im sure you can tell :)

 Thanks a lot David,

 Cheers,
 Geert-Jan



 Sent from my iPhone

 On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] 
 [hidden email]UrlBlockedError.aspx wrote:

  britske wrote
  That's seriously awesome!
 
  Some change in the query though:
  You described: To query for a business that is open during at least some
  part of a given time duration
  I want To query for a business that is open during at least the entire
  given time duration.
 
  Feels like a small difference but probably isn't (I'm still wrapping my
  head on the intersect query I must admit)
  So this would be a slightly different rectangle query.  Interestingly,
 you simply swap the location in the rectangle where you put the start and
 end time.  In summary:
 
  Indexed span CONTAINS query span:
  minX minY maxX maxY - 0 end start *
 
  Indexed span INTERSECTS (i.e. OVERLAPS) query span:
  minX minY maxX maxY - 0 start end *
 
  Indexed span WITHIN query span:
  minX minY maxX maxY - start 0 * end
 
  I'm using '*' here to denote the max possible value.  At some point I
 may add that as a feature.
 
  That was a fun exercise!  I give you credit in prodding me in this
 direction as I'm not sure if this use of spatial would have occurred to me
 otherwise.
 
  britske wrote
  Moreover, any indication on performance? Should, say, 50.000 docs with
  about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I
 know
  'your mileage may very' etc. but just a guestimate :)
  You should have absolutely no problem.  The real clincher in your favor
 is the fact that you only need 9600 discrete time values (so you said), not
 Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with the
 current implementation because it's using Doubles which has 52 bits of
 precision not the 64 that would be required to be a complete substitute for
 any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with
 maxLevels=52 would probably not perform well or might fail; not sure.
  Eventually when I have time to work on an implementation that can be based
 on a configurable number of grid cells (not unlike how you can configure
 precisionStep on the Trie numeric fields), 52 should be no problem.
 
  I'll have to remember to refer back to this email on the approach if I
 create a field type that wraps this functionality.
 
  ~ David
 
  britske wrote
  Again, this looks good!
  Geert-Jan
 
  2012/12/8 David Smiley (@MITRE.org) [via Lucene] 
  [hidden email]
 
   Hello again Geert-Jan!
  
   What you're trying to do is indeed possible with Solr 4 out of the box.
Other terminology people use for this is multi-value time duration.
  This
   creative solution is a pure application of spatial without the
 geospatial
   notion -- we're not using an earth or other sphere model -- it's a flat
   plane.  So no need to make reference to longitude  latitude, it's x 
 y.
  
   I would put opening time into x, and closing time into y.  To express a
   point, use x y (x space y), and supply this as a string to your
   SpatialRecursivePrefixTreeFieldType based field for indexing.  You can
 give
   it multiple values and it will work correctly; this is one of RPT's
 main
   features that set it apart from Solr 3 spatial.  To query for a
 business
   that is open during at least some part of a given time duration, say
 6-8
   o'clock, the query would look like openDuration:Intersects(minX minY
 maxX
   maxY)  and put 0 or minX (always), 6 for minY (start time), 8 for maxX
   (end time), and the largest possible value for maxY.  You wouldn't
 

RE: Modeling openinghours using multipoints

2012-12-10 Thread David Smiley (@MITRE.org)
Mikhail,
Join of any nature should be chosen in last resort to using a single index 
(when it's possible), especially if there is minimal to no denormalization of 
data.  In this specific case, if the average document had 200 temporal ranges 
to index (100 days out, 2 per day), a Join based solution would have 200+1 
documents in the index.  That's an explosion of the document count by 200x!  
Yoyzah!  Obviously what we're discussing here, modeling numeric ranges as x-y 
points has its limits -- namely that the spatial module is limited to 2 
dimensions currently.  It's plausible to see it generalized, but I don't think 
it'll scale well beyond 4-5 dimensions.  I recall a research paper talking 
about multi-dimensional numeric indexes seriously breaking down at about 6.

~ David


From: Mikhail Khludnev [via Lucene] [ml-node+s472066n4025602...@n3.nabble.com]
Sent: Monday, December 10, 2012 12:15 AM
To: Smiley, David W.
Subject: Re: Modeling openinghours using multipoints

Colleagues,
What are benefits of this approach at contrast to block join?

Thanks
10.12.2012 3:35 пользователь Lance Norskog [hidden 
email]UrlBlockedError.aspx написал:

 If these are not raw times, but quantized on-the-hour, would it be
 faster to create a bit map of hours and then query across the bit
 maps?

 On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson [hidden 
 email]UrlBlockedError.aspx
 wrote:
  Thanks for the discussion, I've added this to my bag of tricks, way cool!
 
  Erick
 
 
  On Sat, Dec 8, 2012 at 10:52 PM, britske [hidden 
  email]UrlBlockedError.aspx wrote:
 
  Brilliant! Got some great ideas for this. Indeed all sorts of usecases
  which use multiple temporal ranges could benefit..
 
  Eg: Another Guy on stackoverflow asked me about this some days ago.. He
  wants to model multiple temporary offers per product (free shopping for
  christmas, 20% discount for Black friday , etc) .. All possible with
 this
  out of the box. Factor in 'offer category' in  x and y as well for some
  extra powerfull querying.
 
  Yup im enthousiastic about it , which im sure you can tell :)
 
  Thanks a lot David,
 
  Cheers,
  Geert-Jan
 
 
 
  Sent from my iPhone
 
  On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] 
  [hidden email]UrlBlockedError.aspx wrote:
 
   britske wrote
   That's seriously awesome!
  
   Some change in the query though:
   You described: To query for a business that is open during at least
 some
   part of a given time duration
   I want To query for a business that is open during at least the
 entire
   given time duration.
  
   Feels like a small difference but probably isn't (I'm still wrapping
 my
   head on the intersect query I must admit)
   So this would be a slightly different rectangle query.  Interestingly,
  you simply swap the location in the rectangle where you put the start
 and
  end time.  In summary:
  
   Indexed span CONTAINS query span:
   minX minY maxX maxY - 0 end start *
  
   Indexed span INTERSECTS (i.e. OVERLAPS) query span:
   minX minY maxX maxY - 0 start end *
  
   Indexed span WITHIN query span:
   minX minY maxX maxY - start 0 * end
  
   I'm using '*' here to denote the max possible value.  At some point I
  may add that as a feature.
  
   That was a fun exercise!  I give you credit in prodding me in this
  direction as I'm not sure if this use of spatial would have occurred to
 me
  otherwise.
  
   britske wrote
   Moreover, any indication on performance? Should, say, 50.000 docs with
   about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I
  know
   'your mileage may very' etc. but just a guestimate :)
   You should have absolutely no problem.  The real clincher in your
 favor
  is the fact that you only need 9600 discrete time values (so you said),
 not
  Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with
 the
  current implementation because it's using Doubles which has 52 bits of
  precision not the 64 that would be required to be a complete substitute
 for
  any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with
  maxLevels=52 would probably not perform well or might fail; not sure.
   Eventually when I have time to work on an implementation that can be
 based
  on a configurable number of grid cells (not unlike how you can configure
  precisionStep on the Trie numeric fields), 52 should be no problem.
  
   I'll have to remember to refer back to this email on the approach if I
  create a field type that wraps this functionality.
  
   ~ David
  
   britske wrote
   Again, this looks good!
   Geert-Jan
  
   2012/12/8 David Smiley (@MITRE.org) [via Lucene] 
   [hidden email]
  
Hello again Geert-Jan!
   
What you're trying to do is indeed possible with Solr 4 out of the
 box.
 Other terminology people use for this is multi-value time duration.
   This
creative solution is a pure application of spatial without the
  geospatial
notion -- we're not 

Re: setting hostPort for SolrCloud

2012-12-10 Thread Bill Au
Thanks for the information.

Bill


On Fri, Dec 7, 2012 at 3:04 PM, Mark Miller markrmil...@gmail.com wrote:

 Yup, solr.xml is pretty much required - especially if you want to use
 solrcloud.

 The only reason anything works without is for back compat.

 We are working towards removing the need for it, but's considered required
 these days.

 - Mark

 On Dec 7, 2012, at 11:04 AM, Bill Au bill.w...@gmail.com wrote:

  I actually was not using a solr.xml.  I am only using a single core.  I
 am
  using the default core name collection1.  I know for sure I will not be
  using more than a single core so I did not bother with having a solr.xml.
  Is that a bad thing?
 
  Everything works when I had tomcat config to run on port 8983.  But once
 I
  configure tomcat to use a different port, I notice that SolrCloud is
 still
  using port 8983 so it wasn't working.  I then tried adding
  -Djetty.port=8000 and -DhostPort=8000 to the environment variable
  JAVA_OPTS before running the tomcat start script bin/startup.sh.  But
  SolrCloud was still using 8983.  I ended up setting hostPort in solr.xml
  and got things working.
 
  It solr.xml is required, then I can just set the port for SolrCloud in
  there.  But I was hoping I did not have to bother with solr.xml at all.
  One less configuration file, one less thing that can go wrong.
 
  Bill
 
 
  On Wed, Dec 5, 2012 at 4:40 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Be aware that you still have to setup tomcat to run Solr on the right
 port
  - and you also have to provide the port to Solr on startup. With jetty
 we
  do both with -Djetty.port - with Tomcat you have to setup Tomcat to run
 on
  the right port *and* tell Solr what that port is. By default that means
  also passing -Djetty.port - but you can change that to whatever you
 want in
  solr.xml (to hostPort or solr.port or whatever).
 
  The problem is that it's difficult for a webapp to find what ports it's
  running on - you can only do it when a request actually comes in to my
  knowledge.
 
  - Mark
 
  On Dec 5, 2012, at 1:05 PM, Bill Au bill.w...@gmail.com wrote:
 
  I am using tomcat.  In my tomcat start script I have tried setting
 system
  properties with both
 
  -Djetty.port=8080
 
  and
 
  -DhostPort=8080
 
  but neither changed the host port for SolrCloud.  It still uses the
  default
  8983.
 
  Bill
 
 
  On Wed, Dec 5, 2012 at 12:11 PM, Jack Krupansky 
 j...@basetechnology.com
  wrote:
 
  Solr runs in a container and the container controls the port. So, you
  need
  to tell the container which port to use.
 
  For example,
 
  java -Djetty.port=8180 -jar start.jar
 
  -- Jack Krupansky
 
  -Original Message- From: Bill Au
  Sent: Wednesday, December 05, 2012 10:30 AM
  To: solr-user@lucene.apache.org
  Subject: setting hostPort for SolrCloud
 
 
  Can hostPort for SolrCloud only be set in solr.xml?  I tried setting
 the
  system property hostPort and jetty.port on the Java command line but
  neither of them work.
 
  Bill
 
 
 




RE: Need help with delta import

2012-12-10 Thread Dyer, James
Its surprising that your full import is working for you.  Both your query and 
your deltaImportQuery have:

SELECT ID FROM...

...So both your full-import (query attr) and your delta-import 
(deltaImportQuery attr) are only getting the ID field from your db.  
Shouldn't you be at least be getting email and fname to index also?  So by 
changing both these queries to something like:

SELECT ID, EMAIL, FNAME FROM...

...You should see these 3 fields come through after your full-import.  Then, 
after changing data in your rbdms and doing a delta you should see the data 
update.

Besides this, your log looks right:
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder collectDelta
O: Completed ModifiedRowKey for Entity: person rows obtained : 8

...so it looks like it was going to update 8 rows.  But seeing that your 
deltaImportQuery is only pulling back the ID, it couldn't possibly change the 
values for fields like email and fname.

Make sense?

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: umajava [mailto:umaj...@gmail.com] 
Sent: Thursday, December 06, 2012 8:59 PM
To: solr-user@lucene.apache.org
Subject: Need help with delta import

Hi,

I am trying to do delta import and I am not able to get it to work. How ever
full import does work. Could you please help me figure out what I am
missing?

data-config.xml file

document name=persons
entity name=person pk=ID query=select id from uma_test 
deltaImportQuery=select id from uma_test where
ID='${dataimport.delta.id}'
deltaQuery=select ID from uma_test where upd_ts gt;
'${dataimport.last_index_time}'
field column=ID name=id indexed=true 
stored=true /
field column=email name=email 
indexed=true stored=true /
field column=fname name=fname 
indexed=true stored=true /
/entity
/document

dataimport.properties file

metadataObject.last_index_time=2012-09-20 11\:12\:47
person.last_index_time=2012-11-18 13\:54\:29
interval=10
port=8080
server=localhost
params=/select?qt\=/dataimportcommand\=delta-importclean\=falsecommit\=true
webapp=solr
syncEnabled=1
last_index_time=2012-11-18 13\:54\:29
syncCores=coreHr,coreEn


log output

Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DataImporter
maybeReloadConfiguration
O: Loading DIH Configuration:
C://Software//apache-solr-4.0.0//apache-solr-4.0.0//Uma//db//db-data-config.xml
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DataImporter
loadDataConfig
O: Data Configuration loaded successfully
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
O: Starting Delta Import
Dec-2012 02:49:24 org.apache.solr.core.SolrCore execute
O: [collection1] webapp=/solr path=/dataimport
params={commit=falsecommand=delta-import} status=0 QTime=16
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.SimplePropertiesWriter
readIndexerProperties
O: Read dataimport.properties
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder doDelta
O: Starting delta collection.
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder collectDelta
O: Running ModifiedRowKey() for Entity: person
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.JdbcDataSource$1 call
O: Creating a connection for entity person with URL:
jdbc:mysql://localhost/test
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.JdbcDataSource$1 call
O: Time taken for getConnection(): 125
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder collectDelta
O: Completed ModifiedRowKey for Entity: person rows obtained : 8
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder collectDelta
O: Completed DeletedRowKey for Entity: person rows obtained : 0
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder collectDelta
O: Completed parentDeltaQuery for Entity: person
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder doDelta
O: Delta Import completed successfully
Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder execute
O: Time taken = 0:0:0.156
Dec-2012 02:49:24 org.apache.solr.update.processor.LogUpdateProcessor finish
O: [collection1] webapp=/solr path=/dataimport
params={commit=falsecommand=delta-import} status=0 QTime=16 {} 0


Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-delta-import-tp4025003.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Intersect Circle is matching points way outside the radius ( Solr 4 Spatial)

2012-12-10 Thread David Smiley (@MITRE.org)
Javi,
  The center point of your query circle and the indexed point is just under
49.9km (just under your query radius); this is why it matched.  I plugged in
your numbers here:
http://www.movable-type.co.uk/scripts/latlong.html
Perhaps you are misled by the projection you are using to view the map, on
how far away the points are.

FYI The default distErrPct of 0.025 should be fine in general and wasn't the
issue.  You should (almost) never use 0.0 on the field type because that
means your indexed non-point shapes (rectangles you said) will use a ton of
indexed terms unless they are very small rectangles (relative to your grid
resolution -- 1 meter in your case).  Using distErrPct=0 in the query is
safe, on the other hand.

Cheers,
  David



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Intersect-Circle-is-matching-points-way-outside-the-radius-Solr-4-Spatial-tp4025609p4025704.html
Sent from the Solr - User mailing list archive at Nabble.com.


highlighting multiple occurrences

2012-12-10 Thread Rafael Ribeiro
Hi all,

 I have a solr instance with one field configured for highlighting as
follows:
 str name=hlon/str
 str name=hl.flconteudo/str
 str name=hl.fragsize500/str
 str name=hl.maxAnalyzedChars9/str
 str name=hl.simple.prelt;font style=background-color:
yellowgt;/str
 but I was willing to have the highlighter display multiple occurrences of
the query instead of the first one... is it possible? I tried searching this
mailing list but I couldn't find anyone mentioning this...

best regards,
Rafael



--
View this message in context: 
http://lucene.472066.n3.nabble.com/highlighting-multiple-occurrences-tp4025715.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: highlighting multiple occurrences

2012-12-10 Thread Swati Swoboda
Did you mean that you want multiple snippets? 

http://wiki.apache.org/solr/HighlightingParameters#hl.snippets



-Original Message-
From: Rafael Ribeiro [mailto:rafae...@gmail.com] 
Sent: Monday, December 10, 2012 11:20 AM
To: solr-user@lucene.apache.org
Subject: highlighting multiple occurrences

Hi all,

 I have a solr instance with one field configured for highlighting as
follows:
 str name=hlon/str
 str name=hl.flconteudo/str
 str name=hl.fragsize500/str
 str name=hl.maxAnalyzedChars9/str
 str name=hl.simple.prelt;font style=background-color:
yellowgt;/str
 but I was willing to have the highlighter display multiple occurrences of the 
query instead of the first one... is it possible? I tried searching this 
mailing list but I couldn't find anyone mentioning this...

best regards,
Rafael



--
View this message in context: 
http://lucene.472066.n3.nabble.com/highlighting-multiple-occurrences-tp4025715.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Is there a way to round data when index, but still able to return original content?

2012-12-10 Thread Swati Swoboda
When you apply your analyzers/filters/tokenizers, the result value is kept in 
the indexed; however, the input value is actually stored. For example, from 
schema.xml file:

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

This particular field type will strip out the HTML. So if the input is:

bHello/b

It's being tokenized in the index as 

Hello

It's being stored (and hence returned to you) as

bHello/b

So you can create your own charFilter or filter class which converts your date 
for the indexer, but the original data will automatically be stored.

I hope this makes sense.

-Original Message-
From: jefferyyuan [mailto:yuanyun...@gmail.com] 
Sent: Monday, December 10, 2012 10:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there a way to round data when index, but still able to return 
original content?

Erick, Thanks for your reply.

I know how to implement the solution 1.

But no idea how yo implement the solution 2 you mentioned:
===
If you put some sort of (perhaps custom) filter in place, then the original 
value would go in as stored and the altered value would get in the index and 
you could do both in the same field. 

Can you please describe more about how to store original data and index the 
altered value in the same filed?

Thanks :)







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405p4025695.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Modeling openinghours using multipoints

2012-12-10 Thread Lance Norskog
Bit maps can be done with a separate term for each bit. You search for 
all of the terms in the bit range you want.


On 12/10/2012 06:34 AM, David Smiley (@MITRE.org) wrote:

Maybe it would? I don't completely get your drift.  But you're talking about a 
user writing a bunch of custom code to build, save, and query the bitmap 
whereas working on top of existing functionality seems to me a lot more 
maintainable on the user's part.
~ David


From: Lance Norskog-2 [via Lucene] [ml-node+s472066n4025579...@n3.nabble.com]
Sent: Sunday, December 09, 2012 6:35 PM
To: Smiley, David W.
Subject: Re: Modeling openinghours using multipoints

If these are not raw times, but quantized on-the-hour, would it be
faster to create a bit map of hours and then query across the bit
maps?

On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson [hidden 
email]UrlBlockedError.aspx wrote:


Thanks for the discussion, I've added this to my bag of tricks, way cool!

Erick


On Sat, Dec 8, 2012 at 10:52 PM, britske [hidden email]UrlBlockedError.aspx 
wrote:


Brilliant! Got some great ideas for this. Indeed all sorts of usecases
which use multiple temporal ranges could benefit..

Eg: Another Guy on stackoverflow asked me about this some days ago.. He
wants to model multiple temporary offers per product (free shopping for
christmas, 20% discount for Black friday , etc) .. All possible with this
out of the box. Factor in 'offer category' in  x and y as well for some
extra powerfull querying.

Yup im enthousiastic about it , which im sure you can tell :)

Thanks a lot David,

Cheers,
Geert-Jan



Sent from my iPhone

On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] 
[hidden email]UrlBlockedError.aspx wrote:


britske wrote
That's seriously awesome!

Some change in the query though:
You described: To query for a business that is open during at least some
part of a given time duration
I want To query for a business that is open during at least the entire
given time duration.

Feels like a small difference but probably isn't (I'm still wrapping my
head on the intersect query I must admit)
So this would be a slightly different rectangle query.  Interestingly,

you simply swap the location in the rectangle where you put the start and
end time.  In summary:

Indexed span CONTAINS query span:
minX minY maxX maxY - 0 end start *

Indexed span INTERSECTS (i.e. OVERLAPS) query span:
minX minY maxX maxY - 0 start end *

Indexed span WITHIN query span:
minX minY maxX maxY - start 0 * end

I'm using '*' here to denote the max possible value.  At some point I

may add that as a feature.

That was a fun exercise!  I give you credit in prodding me in this

direction as I'm not sure if this use of spatial would have occurred to me
otherwise.

britske wrote
Moreover, any indication on performance? Should, say, 50.000 docs with
about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I

know

'your mileage may very' etc. but just a guestimate :)
You should have absolutely no problem.  The real clincher in your favor

is the fact that you only need 9600 discrete time values (so you said), not
Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with the
current implementation because it's using Doubles which has 52 bits of
precision not the 64 that would be required to be a complete substitute for
any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with
maxLevels=52 would probably not perform well or might fail; not sure.
  Eventually when I have time to work on an implementation that can be based
on a configurable number of grid cells (not unlike how you can configure
precisionStep on the Trie numeric fields), 52 should be no problem.

I'll have to remember to refer back to this email on the approach if I

create a field type that wraps this functionality.

~ David

britske wrote
Again, this looks good!
Geert-Jan

2012/12/8 David Smiley (@MITRE.org) [via Lucene] 
[hidden email]


Hello again Geert-Jan!

What you're trying to do is indeed possible with Solr 4 out of the box.
  Other terminology people use for this is multi-value time duration.

  This

creative solution is a pure application of spatial without the

geospatial

notion -- we're not using an earth or other sphere model -- it's a flat
plane.  So no need to make reference to longitude  latitude, it's x 

y.

I would put opening time into x, and closing time into y.  To express a
point, use x y (x space y), and supply this as a string to your
SpatialRecursivePrefixTreeFieldType based field for indexing.  You can

give

it multiple values and it will work correctly; this is one of RPT's

main

features that set it apart from Solr 3 spatial.  To query for a

business

that is open during at least some part of a given time duration, say

6-8

o'clock, the query would look like openDuration:Intersects(minX minY

maxX

maxY)  and put 0 or minX (always), 6 for minY (start time), 8 for maxX
(end time), and the largest 

RE: highlighting multiple occurrences

2012-12-10 Thread Rafael Ribeiro
yep!

 I tried enabling this and settings various values bot no success... still
it only shows the first fragment of the search found...
 I also saw this
http://lucene.472066.n3.nabble.com/hl-snippets-in-solr-3-1-td2445178.html
but increasing maxAnalyzedChars (that was already huge) produced no
difference at all.
 Do I have to change anything else? For example, something on the velocity
template???

best regards,
Rafael



--
View this message in context: 
http://lucene.472066.n3.nabble.com/highlighting-multiple-occurrences-tp4025715p4025771.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problems with UUIDUpdateProcessorFactory on SolrCloud

2012-12-10 Thread Chris Hostetter

: In logs I can see some UUID is being generated when adding new document:
: INFO: [selekta] webapp=/solr path=/update params={}
: {add=[504a4ea8-7b82-48b6-a2fa-b8dd56376fd7]} 0 27

: but when I query Solr I got:
: Dec 07, 2012 1:52:10 PM org.apache.solr.common.SolrException log
: SEVERE: java.lang.NullPointerException
: at
: 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:879)

Hmm...

1) exactly which version of solr are you using?
2) please show us the uniqueKey declaration from your schema.xml 
along with the field and fieldType declarations for that field.
3) what exacty does the config for your update chain look like?
4) are you certain every document was indexed using that chain, with that 
processor? you didn't have any old documents in your index?

...because that error seems to be suggesting that you have documents in 
your index w/o a stored value for your uniqueKey field (so the bug is 
happening when the results get merged) but solrcloud shouldn't be letting 
you do that at all.


-Hoss


RE: highlighting multiple occurrences

2012-12-10 Thread Swati Swoboda
Rafael,

Can you share more on how you are rendering the results in your velocity 
template? The data is probably being sent to you, but you have to loop through 
and actually access the data.

-Original Message-
From: Rafael Ribeiro [mailto:rafae...@gmail.com] 
Sent: Monday, December 10, 2012 2:26 PM
To: solr-user@lucene.apache.org
Subject: RE: highlighting multiple occurrences

yep!

 I tried enabling this and settings various values bot no success... still it 
only shows the first fragment of the search found...
 I also saw this
http://lucene.472066.n3.nabble.com/hl-snippets-in-solr-3-1-td2445178.html
but increasing maxAnalyzedChars (that was already huge) produced no difference 
at all.
 Do I have to change anything else? For example, something on the velocity 
template???

best regards,
Rafael



--
View this message in context: 
http://lucene.472066.n3.nabble.com/highlighting-multiple-occurrences-tp4025715p4025771.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Is there a way to round data when index, but still able to return original content?

2012-12-10 Thread jefferyyuan
Sorry to ask a question again, but I want to round date(TireDate) and
TrieLongField, seems they don't support configuring analyzer: charFilter ,
tokenizer or filter.

What I should do? Now I am thinking to write my custom date or long field,
is there any other way? :)

Thanks :)
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405p4025793.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Nested document workaround?

2012-12-10 Thread Otis Gospodnetic
Would http://search-lucene.com/?q=solr+join do it for you?

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Dec 10, 2012 at 1:17 PM, Michael Jones michaelj...@gmail.comwrote:

 Hi,

 I realise that you can't get nested document to search in solr.

 But if I did this:

   doc
 field name=sourceTest/field
 field name=typebar/field
 field name=labelMap/field
 field name=date_long/
 field name=date_short/
 field name=persons-0446_nameGraham/field
 field name=persons-0446_linkfoo/field
 field name=persons-0446_locationCrosby/field
 field name=persons-0446_office/
 field name=persons-0188_nameBob/field
 field name=persons-0188_linkfoo/field
 field name=persons-0188_locationtest/field
 field name=persons-0188_office/
 field name=persons-0183_nameDenzil/field
 field name=persons-0183_linkfoo/field
 field name=persons-0183_locationtest/field
 field name=persons-0183_office/
   /doc

 Could I still search for location with *_location ?

 Or is there another way to get relational data into solr?

 Thanks



RE: Is there a way to round data when index, but still able to return original content?

2012-12-10 Thread Swati Swoboda
Hi,

Nope...they don't. Generally, I am not sure if I'd bother rounding this 
information to reduce the index size. Have you determined how much index size 
space you'll actually be saving? I am not confident that it'd be worth your 
time; i.e. I'd just go with indexing/storing the time information as well. 

Regardless, if you do want to go this route, the only way I can think of that 
wouldn't be a complicated solution is to have one field that is 
indexed/rounded (and not stored) and another field that is just stored (and not 
indexed).

Hope this helps.

-Original Message-
From: jefferyyuan [mailto:yuanyun...@gmail.com] 
Sent: Monday, December 10, 2012 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: Is there a way to round data when index, but still able to return 
original content?

Sorry to ask a question again, but I want to round date(TireDate) and 
TrieLongField, seems they don't support configuring analyzer: charFilter , 
tokenizer or filter.

What I should do? Now I am thinking to write my custom date or long field, is 
there any other way? :)

Thanks :)
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405p4025793.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Nested document workaround?

2012-12-10 Thread Alexandre Rafalovitch
How about aggregating all location fields into one searchable multi-Value
field using copyField? It could be an index-only collection. Then, you just
say all_locations:Crosby

Regards,
Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



On Tue, Dec 11, 2012 at 5:17 AM, Michael Jones michaelj...@gmail.comwrote:

 Hi,

 I realise that you can't get nested document to search in solr.

 But if I did this:

   doc
 field name=sourceTest/field
 field name=typebar/field
 field name=labelMap/field
 field name=date_long/
 field name=date_short/
 field name=persons-0446_nameGraham/field
 field name=persons-0446_linkfoo/field
 field name=persons-0446_locationCrosby/field
 field name=persons-0446_office/
 field name=persons-0188_nameBob/field
 field name=persons-0188_linkfoo/field
 field name=persons-0188_locationtest/field
 field name=persons-0188_office/
 field name=persons-0183_nameDenzil/field
 field name=persons-0183_linkfoo/field
 field name=persons-0183_locationtest/field
 field name=persons-0183_office/
   /doc

 Could I still search for location with *_location ?

 Or is there another way to get relational data into solr?

 Thanks



Retrieving one object

2012-12-10 Thread Drone42
I have stored multiple objects with the values;

uniqueUri
name
timestamp.

There can be multiple object with the same name, but they will have
different timestamps (and different uniqueUri)

I want to retrieve the object of a given name with the latest timestamp. As
an example I might have

1. uniqueUri=99661, name=FOO, timestamp=1355174089270
2. uniqueUri=98765, name=FOO, timestamp=1355174089870

I want to retrieve only object 2.

I have tried retrieving 1 row and sorting DECENDING on timestamp, but this
still returns the first object.

How can I do this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Retrieving-one-object-tp4025810.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud OOM heap space

2012-12-10 Thread shreejay
Hi All, 

I am getting constant OOM errors on a SolrCloud instance. (3 shards, 2 solr
instance in each shard, each server with 22gb Of Memory, Xmx = 12GB for java
) . 

Here is a error log:
http://pastie.org/private/dcga3kfatvvamslmtvrp0g


As of now  Iam not indexing any more documents. The total size of index on
each server is around 36-40 GB. 
-Xmx12288m -DSTOP.PORT=8079 -DSTOP.KEY=ABC
-XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log
-Djetty.port=8983 -DzkHost=ZooKeeperServer001:2181 -jar start.jar

If anyone has faced similar issues please let me know.

--Shreejay




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Documentation issue: apache-solr-XXX.jar?

2012-12-10 Thread Alexandre Rafalovitch
Thanks Shawn,

I am looking at README.txt file and jars/wars that came with Solr 4 binary
distribution. So, if it is out of date, should I do Jira request? Or are
documentation fixes handled differently?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



On Tue, Dec 11, 2012 at 9:43 AM, Shawn Heisey s...@elyograg.org wrote:

 On 12/10/2012 3:08 PM, Alexandre Rafalovitch wrote:

 In README.txt, it says:
 dist/apache-solr-XX.jar
The Apache Solr Libraries.  This JAR file is needed to compile
Apache Solr Plugins (see 
 http://wiki.apache.org/solr/**SolrPluginshttp://wiki.apache.org/solr/SolrPluginsfor
more information).

 But I cannot see that in my 4.0 distribution. Has that changed (and doc
 needs to be updated) or am I missing something?


 If you have the dist directory at all, then you probably have the binary
 distribution, which is what you'll need.  If your downloaded file contains
 -src or the unpacked file does not contain a dist directory, then you
 likely have the source distribution.  With the source distribution, you
 would have to compile the dist target.  Better to just get the binary
 distribution.

 Looking at the dist directory on what I just downloaded, it appears that
 most of the functionality required for writing code related to Solr would
 actually be in apache-solr-core-4.0.0.jar, and depending on what you are
 doing, you may need one or more of the other jars there.  It looks like
 whatever documentation you are reading is definitely out of date.

 Thanks,
 Shawn




RE: SolrCloud OOM heap space

2012-12-10 Thread Markus Jelsma
Hi - the stack trace and preceding log entries look similar to what i've seen 
and reported on. A patch has just been attached to the issue, perhaps you can 
try it if the description matches your scenario and report back on Jira.

https://issues.apache.org/jira/browse/SOLR-4144 
 
-Original message-
 From:shreejay shreej...@gmail.com
 Sent: Mon 10-Dec-2012 23:22
 To: solr-user@lucene.apache.org
 Subject: SolrCloud OOM heap space
 
 Hi All, 
 
 I am getting constant OOM errors on a SolrCloud instance. (3 shards, 2 solr
 instance in each shard, each server with 22gb Of Memory, Xmx = 12GB for java
 ) . 
 
 Here is a error log:
 http://pastie.org/private/dcga3kfatvvamslmtvrp0g
 
 
 As of now  Iam not indexing any more documents. The total size of index on
 each server is around 36-40 GB. 
 -Xmx12288m -DSTOP.PORT=8079 -DSTOP.KEY=ABC
 -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log
 -Djetty.port=8983 -DzkHost=ZooKeeperServer001:2181 -jar start.jar
 
 If anyone has faced similar issues please let me know.
 
 --Shreejay
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: Documentation issue: apache-solr-XXX.jar?

2012-12-10 Thread Shawn Heisey

On 12/10/2012 3:51 PM, Alexandre Rafalovitch wrote:

Thanks Shawn,

I am looking at README.txt file and jars/wars that came with Solr 4 binary
distribution. So, if it is out of date, should I do Jira request? Or are
documentation fixes handled differently?


Yes, filing a jira issue is an excellent idea.The README for trunk (5.x) 
has the same out of date info, so put both 4.1 and 5.0 in for the 'fix' 
versions.


I'm not terrribly surprised that something like this slipped through the 
cracks.  The number of incremental and major changes in Solr/Lucene that 
made up the 4.0 version is CRAZY.


Thanks,
Shawn



RE: SolrCloud OOM heap space

2012-12-10 Thread shreejay
Thanks Markus. Is this issue only on 4.x and 5.x branches? I am currently
running a v recent build of 4.x branch with an applied patch. 

I just want to make sure that this is not an issue with 4.0. In which case I
can think of applying my patch to 4.0 instead of 4x or 5x. 



--Shreejay



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821p4025839.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Documentation issue: apache-solr-XXX.jar?

2012-12-10 Thread Chris Hostetter

: Looking at the dist directory on what I just downloaded, it appears that most
: of the functionality required for writing code related to Solr would actually
: be in apache-solr-core-4.0.0.jar, and depending on what you are doing, you may
: need one or more of the other jars there.  It looks like whatever
: documentation you are reading is definitely out of date.

I'm confused ... how is this info out ot date?

1) it's in a section labled Files included in an Apache Solr binary 
distribution to try and make it clear that the dist/ dir only exists in 
the binary distributions.

the dist/apache-solr-*.jar files are in fact what you need to compile 
against if you are building a plugin (admitedly: you don't need to compile 
against *all* of them for a simple plugin).

Don't get me wrong: I'm not saying there is no problem -- the README.txt 
should be targeting knew users/developers, not old farts like me who know 
all of this stuff in my sleep.  If as a new uuser you are confused by 
README.txt then i am eger to change it to be less confusing, i'm just not 
sure i understand what the confusion is.



-Hoss


RE: SolrCloud OOM heap space

2012-12-10 Thread Markus Jelsma
Hi - We're using trunk (5x) but we don't see it on trunk builds from a few 
months ago. In the case of the linked issue the oom occurs some time after 
start up but i'm not sure this applies to you. You can test the patch if you 
think it applies to you, we will test it tomorrow.

If the patch does not help you or you experience a different issue you might 
want to open a new issue. 
 
-Original message-
 From:shreejay shreej...@gmail.com
 Sent: Tue 11-Dec-2012 00:31
 To: solr-user@lucene.apache.org
 Subject: RE: SolrCloud OOM heap space
 
 Thanks Markus. Is this issue only on 4.x and 5.x branches? I am currently
 running a v recent build of 4.x branch with an applied patch. 
 
 I just want to make sure that this is not an issue with 4.0. In which case I
 can think of applying my patch to 4.0 instead of 4x or 5x. 
 
 
 
 --Shreejay
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821p4025839.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: Documentation issue: apache-solr-XXX.jar?

2012-12-10 Thread Alexandre Rafalovitch
Hi Chris (Hoss?),

The issue is that README refers to a specific file apache-solr-XXX.jar,
which does not exist. There is apache-solr-4.0.0.war which is referred in a
para before, but not this one. So, maybe the fix is just to say that there
is a bunch of jars now. (apache-solr-component-XXX.jar ?)

Anyway, I created https://issues.apache.org/jira/browse/SOLR-4163.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



On Tue, Dec 11, 2012 at 10:25 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Looking at the dist directory on what I just downloaded, it appears that
 most
 : of the functionality required for writing code related to Solr would
 actually
 : be in apache-solr-core-4.0.0.jar, and depending on what you are doing,
 you may
 : need one or more of the other jars there.  It looks like whatever
 : documentation you are reading is definitely out of date.

 I'm confused ... how is this info out ot date?

 1) it's in a section labled Files included in an Apache Solr binary
 distribution to try and make it clear that the dist/ dir only exists in
 the binary distributions.

 the dist/apache-solr-*.jar files are in fact what you need to compile
 against if you are building a plugin (admitedly: you don't need to compile
 against *all* of them for a simple plugin).

 Don't get me wrong: I'm not saying there is no problem -- the README.txt
 should be targeting knew users/developers, not old farts like me who know
 all of this stuff in my sleep.  If as a new uuser you are confused by
 README.txt then i am eger to change it to be less confusing, i'm just not
 sure i understand what the confusion is.



 -Hoss



RE: SolrCloud OOM heap space

2012-12-10 Thread shreejay
Thanks Marcus. I will apply the patch to the 4x branch I have, and report
back. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821p4025858.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Intersect Circle is matching points way outside the radius ( Solr 4 Spatial)

2012-12-10 Thread Javier Molina
Hi David,

As it happens the points are using the right projection, I can see them in
the same position using the page you just provided.

There is something wrong with the radius of the circle though I need to
investigate that but it is a relief to know that there is nothing wrong
with Solr and that I didn't mix the concepts, it is just as in many cases
the problem is somewhere else where you would never imagine.

Thanks for the hint.

Cheers,
Javier





On 11 December 2012 02:47, David Smiley (@MITRE.org) dsmi...@mitre.orgwrote:

 Javi,
   The center point of your query circle and the indexed point is just under
 49.9km (just under your query radius); this is why it matched.  I plugged
 in
 your numbers here:
 http://www.movable-type.co.uk/scripts/latlong.html
 Perhaps you are misled by the projection you are using to view the map, on
 how far away the points are.

 FYI The default distErrPct of 0.025 should be fine in general and wasn't
 the
 issue.  You should (almost) never use 0.0 on the field type because that
 means your indexed non-point shapes (rectangles you said) will use a ton of
 indexed terms unless they are very small rectangles (relative to your grid
 resolution -- 1 meter in your case).  Using distErrPct=0 in the query is
 safe, on the other hand.

 Cheers,
   David



 -
  Author:
 http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Intersect-Circle-is-matching-points-way-outside-the-radius-Solr-4-Spatial-tp4025609p4025704.html
 Sent from the Solr - User mailing list archive at Nabble.com.



How to parse XML attributes with prefix using DIH?

2012-12-10 Thread zhk011
Hi there,

I'm new to Solr and DIH, recently I've been planning to use Solr/DIH to
index some local xml files. Following the DIH example page on solr wiki,
most things work fine, but I found that xml attributes with prefix cannot be
parse. take the following xml file to be indexed for instance:
---
book xmlns:bk='urn:samples' bk:genre='novel' self='test1'
  idtest/id
  title Pride And Prejudice/title
/book
---

The data-config.xml is like:
---
field column=tsip.action xpath=/book/@xmlns:bk/
field column=tsip.cc xpath=/book/@bk:genre/
field column=tsip.se xpath=/book/@self/
field column=tsip.ki xpath=/book/id/

---

And all the columns have corresponding field definitions in schema.xml.

But in the index result, only the following fields contain value.
---
doc
str name=tsip.setest/str
str name=tsip.kitest/str
date name=timestamp2012-12-11T09:26:42.716Z/date
/doc
---

Which means I cannot get the value for attributes with prefixes: tsip.action
and tsip.cc. 

What configuration do I need to let DIH parse these attributes with prefix?
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-parse-XML-attributes-with-prefix-using-DIH-tp4025888.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCell takes InputStream

2012-12-10 Thread Chris Hostetter

: However my raw files are stored on some remote storage devices. I am able to
: get an InputStream object for the file to be indexed. To me it may seem
: awkward to have the file temporarily stored locally. Is there a way of
: directly passing the InputStream in (e.g. constructing ContentStream using
: the InputStream)?

Sure, go write ahead.

ContentStream is a really simple abstraction designed to make it easy to 
add some common pieces of information to either an InputStream or a 
Reader.  Take a look at ContentStreamBase as a starting point for creating 
your own subclass that can point to whatever InputStream you want.


-Hoss


Re: Different schema.xml versions in the binary distribution

2012-12-10 Thread Mark Miller
Seems like a good idea. Could you open a JIRA issue for this task?

Mark

Sent from my iPhone

On Dec 10, 2012, at 6:44 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 Hello,
 
 I lost good several hours on this, so wanted to check whether this is
 fixable.
 
 In the (binary) distribution of Solr 4, there is a large number of
 schema.xml files. But their version numbers are all over the place. Some
 are 1.1, others 1.2 and - I think - only one on 1.5. I am talking about:
 schema name=rss version=1.1
 
 Given that this tiny number strongly affects defaults and I think in some
 cases flips them, getting wrong (e.g. by coping DIH example instead of the
 main one) could really get a newbie confused.
 
 Would it be possible to make them all use a latest version? What would the
 testing process be?
 
 Would it be sufficient to index once with existing value, change the value,
 build second index and do binary file compare? Or is Lucene not as
 predictable as that (e.g. due to internal timestamps)?
 
 Or is there a way to export a normative encoding of the field definition
 and compare what changed and which fields/properties need to be set
 explicitly after version change?
 
 Regards,
   Alex.
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-12-10 Thread vj
For anyone else looking to run JSPs on solr 4.0, note that supplying
OPTIONS=jsp to the server etc doesn't work (checkout startup config in
start.jar and you'll see why) - don't bother with all that. Instead do the
following:
create a directory ext under: $SOLR_HOME\example\lib
copy the following jar files to this new folder
($SOLR_HOME\example\lib\ext):

ant-1.8.2.jar
ant-launcher.jar
jsp-2.1-glassfish-2.1.v20091210.jar
jsp-api-2.1-glassfish-2.1.v20091210
tools.jar

the ant and glassfish jars can be downloaded from: http://search.maven.org.
copy tools.jar from your jdk 1.6+ installation.
Restart solr..




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-How-do-I-enable-JSP-support-tp3983763p4025897.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud - Query performance degrades with multiple servers

2012-12-10 Thread Mark Miller
I missed this bug report! https://issues.apache.org/jira/browse/SOLR-3912

Will fix this very shortly. It's a problem with numShards=1.

- Mark

On Sun, Dec 9, 2012 at 4:21 PM, sausarkar sausar...@ebay.com wrote:
 Thank you very much will wait for the results from your tests.

 From: Mark Miller-3 [via Lucene] 
 ml-node+s472066n4025457...@n3.nabble.commailto:ml-node+s472066n4025457...@n3.nabble.com
 Date: Saturday, December 8, 2012 11:08 PM
 To: Sarkar, Sauvik sausar...@ebay.commailto:sausar...@ebay.com
 Subject: Re: SolrCloud - Query performance degrades with multiple servers

 If that's true, we will fix it for 4.1. I can look closer tomorrow.

 Mark

 Sent from my iPhone

 On Dec 9, 2012, at 2:04 AM, sausarkar [hidden 
 email]/user/SendEmail.jtp?type=nodenode=4025457i=0 wrote:

 Spoke too early it seems that SolrCloud is still distributing queries to all
 the servers even if numShards=1 We are seeing POST request to all servers in
 the cluster, please let me know what is the solution. Here is an example:
 (the variable isShard should be false in our case as single shard, please
 help)

 POST /solr/core0/select HTTP/1.1
 Content-Charset: UTF-8
 Content-Type: application/x-www-form-urlencoded; charset=UTF-8
 User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0
 Content-Length: 991
 Host: server1
 Connection: Keep-Alive

 lowercaseOperators=truemm=70%fl=EntityIddf=EntityIdq.op=ANDq.alt=*:*qs=10stopwords=truedefType=edismaxrows=3000q=*:*start=0fsv=truedistrib=false*isShard=true*shard.url=*server1*:9090/solr/core0/|*server2*:9090/solr/core0/|*server3*:9090/solr/core0/NOW=1354918880447wt=javabinversion=2


 Re: SolrCloud - Query performance degrades with multiple servers
 Dec 06, 2012; 6:29pm — by   Mark Miller-3

 On Dec 6, 2012, at 5:08 PM, sausarkar [hidden email] wrote:

 We solved the issue by explicitly adding numShards=1 argument to the solr
 start up script. Is this a bug?

 Sounds like it…perhaps related to SOLR-3971…not sure though.

 - Mark




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4025455.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4025457.html
 To unsubscribe from SolrCloud - Query performance degrades with multiple 
 servers, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4024660code=c2F1c2Fya2FyQGViYXkuY29tfDQwMjQ2NjB8LTE0MTU2ODg5MDk=.
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4025573.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
- Mark


Re: difference these two queries

2012-12-10 Thread Otis Gospodnetic
Hi,

The fq one is a FilterQuery that only does matching, but not scoring. It's
results are stored in the filter cache, while the q uses the query cache.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html





On Mon, Dec 10, 2012 at 10:11 PM, Floyd Wu floyd...@gmail.com wrote:

 Hi There,
 Sorry for sapmming if this question had already asked.

 Wha't the main difference between

 q=fieldA:value AND fieldB:value

 q=fieldA:valuefq=fieldB:value

 both query will give me the same result, I wonder what's the main
 difference and in practice what the better way?

 Thanks in advance

 Floyd



Re: difference these two queries

2012-12-10 Thread Floyd Wu
Thanks Otis.

When talked about query performance(ignore scoring). To use fq is better?

Floyd


2012/12/11 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 The fq one is a FilterQuery that only does matching, but not scoring. It's
 results are stored in the filter cache, while the q uses the query cache.

 Otis
 --
 SOLR Performance Monitoring - http://sematext.com/spm/index.html





 On Mon, Dec 10, 2012 at 10:11 PM, Floyd Wu floyd...@gmail.com wrote:

  Hi There,
  Sorry for sapmming if this question had already asked.
 
  Wha't the main difference between
 
  q=fieldA:value AND fieldB:value
 
  q=fieldA:valuefq=fieldB:value
 
  both query will give me the same result, I wonder what's the main
  difference and in practice what the better way?
 
  Thanks in advance
 
  Floyd
 



Re: difference these two queries

2012-12-10 Thread Otis Gospodnetic
If you don't need scoring on it then yes, just use fq.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html





On Mon, Dec 10, 2012 at 10:34 PM, Floyd Wu floyd...@gmail.com wrote:

 Thanks Otis.

 When talked about query performance(ignore scoring). To use fq is better?

 Floyd


 2012/12/11 Otis Gospodnetic otis.gospodne...@gmail.com

  Hi,
 
  The fq one is a FilterQuery that only does matching, but not scoring.
 It's
  results are stored in the filter cache, while the q uses the query cache.
 
  Otis
  --
  SOLR Performance Monitoring - http://sematext.com/spm/index.html
 
 
 
 
 
  On Mon, Dec 10, 2012 at 10:11 PM, Floyd Wu floyd...@gmail.com wrote:
 
   Hi There,
   Sorry for sapmming if this question had already asked.
  
   Wha't the main difference between
  
   q=fieldA:value AND fieldB:value
  
   q=fieldA:valuefq=fieldB:value
  
   both query will give me the same result, I wonder what's the main
   difference and in practice what the better way?
  
   Thanks in advance
  
   Floyd
  
 



Re: Intersect Circle is matching points way outside the radius ( Solr 4 Spatial)

2012-12-10 Thread David Smiley (@MITRE.org)
Javier,

I want to expand upon what I said; you might already get this point but
others may come along and read this and might not.

Naturally you are using a 2D map as most applications do (Google Earth is
the stand-out exception), and fundamentally this means the map is projected
-- it has to be.  There isn't a right (correct) projection, generally
speaking.  Most/all web based map APIs are strictly web mercator.  If you
have a map GUI selection tool in which a circle is drawn, a perfect looking
round circle, then it's a lie unless you're looking directly at the equator. 
If the intent is for the user to draw a distance based circle, then ideally
your map tool should draw an elliptical looking circle if it's to be
accurate.  This is why you got confused; you saw a circle yet the point
wasn't drawn in the circle because that circle *should have been* stretched
vertically to barely pass it.  If on the other hand you intend for the query
shape to be exactly what it displays to be (what appears to be a perfect
circle), even though this means the true geodetic shape is not a perfect
circle, then you could use geo=false (and configure some other attributes)
such that you are using standard planar math, not geodetic.  Then your query
shape would appear to work correctly but IMO its misleading over the first
option (draw an ellipse, not a circle).  The circle misleads the user; it
mislead you.

~ David


Javier Molina wrote
 Hi David,
 
 As it happens the points are using the right projection, I can see them in
 the same position using the page you just provided.
 
 There is something wrong with the radius of the circle though I need to
 investigate that but it is a relief to know that there is nothing wrong
 with Solr and that I didn't mix the concepts, it is just as in many cases
 the problem is somewhere else where you would never imagine.
 
 Thanks for the hint.
 
 Cheers,
 Javier
 
 
 
 
 
 On 11 December 2012 02:47, David Smiley (@MITRE.org) lt;

 DSMILEY@

 gt;wrote:
 
 Javi,
   The center point of your query circle and the indexed point is just
 under
 49.9km (just under your query radius); this is why it matched.  I plugged
 in
 your numbers here:
 http://www.movable-type.co.uk/scripts/latlong.html
 Perhaps you are misled by the projection you are using to view the map,
 on
 how far away the points are.

 FYI The default distErrPct of 0.025 should be fine in general and wasn't
 the
 issue.  You should (almost) never use 0.0 on the field type because that
 means your indexed non-point shapes (rectangles you said) will use a ton
 of
 indexed terms unless they are very small rectangles (relative to your
 grid
 resolution -- 1 meter in your case).  Using distErrPct=0 in the query is
 safe, on the other hand.

 Cheers,
   David



 -
  Author:
 http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Intersect-Circle-is-matching-points-way-outside-the-radius-Solr-4-Spatial-tp4025609p4025704.html
 Sent from the Solr - User mailing list archive at Nabble.com.






-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Intersect-Circle-is-matching-points-way-outside-the-radius-Solr-4-Spatial-tp4025609p4025924.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Update / replication of offline indexes

2012-12-10 Thread Walter Underwood
You do not need to manage online and offline indexes. Commit when you are done 
with your updates and Solr will take care of it for you. The changes are not 
live until you commit.

wunder

On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:

 Hi,
 
 How can we do delta update of offline indexes?
 
 We have the master index on which data import will be done. The index
 directory will be copied to slave machine in case of full update, through
 CD as the  slave/client machine is offline.
 So, what should be the approach for getting the delta to the slave. I can
 think of two approaches.
 
 1.Create separate indexes of the delta on the master machine, copy it to
 the slave machine and merge. Before merging the indexes on the client
 machine, delete all the updated and deleted documents in client machine
 else merge will add duplicates. So along with the index, we need to
 transfer the list of documents which has been updated/deleted.
 
 2. Extract all the documents which has changed since a particular time in
 XML/JSON and index it in client machine.
 
 The size of indexes are huge, so we cannot rollover index everytime.
 
 Please help me with your take and challenges you see in the above
 approaches. Please suggest if you think of any other better approach.
 
 Thanks a ton!
 
 Regards,
 Dikchant

--
Walter Underwood
wun...@wunderwood.org





Re: Update / replication of offline indexes

2012-12-10 Thread Dikchant Sahi
Hi Walter,

Thanks for the response.

Commit will help to reflect changes on Box1. We are able to achieve this.
We want the changes to reflect in Box2.

We have two indexes. Say
Box1: Master  DB has been setup. Data Import runs on this.
Box2: Slave running.

We want all the updates on Box1 to be merged/present in index on Box2. Both
the boxes are not connected over n/w. How can be achieve this.

Please let me know, if am not clear.

Thanks again!

Regards,
Dikchant

On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.orgwrote:

 You do not need to manage online and offline indexes. Commit when you are
 done with your updates and Solr will take care of it for you. The changes
 are not live until you commit.

 wunder

 On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:

  Hi,
 
  How can we do delta update of offline indexes?
 
  We have the master index on which data import will be done. The index
  directory will be copied to slave machine in case of full update, through
  CD as the  slave/client machine is offline.
  So, what should be the approach for getting the delta to the slave. I can
  think of two approaches.
 
  1.Create separate indexes of the delta on the master machine, copy it to
  the slave machine and merge. Before merging the indexes on the client
  machine, delete all the updated and deleted documents in client machine
  else merge will add duplicates. So along with the index, we need to
  transfer the list of documents which has been updated/deleted.
 
  2. Extract all the documents which has changed since a particular time in
  XML/JSON and index it in client machine.
 
  The size of indexes are huge, so we cannot rollover index everytime.
 
  Please help me with your take and challenges you see in the above
  approaches. Please suggest if you think of any other better approach.
 
  Thanks a ton!
 
  Regards,
  Dikchant

 --
 Walter Underwood
 wun...@wunderwood.org